Re: [HACKERS] Convert query plan to sql query

2014-11-04 Thread Antonin Houska
mariem  wrote:

> Hello,
> 
> I would like to transform the query plan (output of the planner,
> debug_print_plan) into an sql query.

I don't think SQL can express the information the plan contains. For example,
join methods (hash, nest loop, merge).

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format and API changes (9.5)

2014-11-04 Thread Amit Kapila
On Tue, Nov 4, 2014 at 10:03 PM, Heikki Linnakangas 
wrote:
>
> On 10/30/2014 06:02 PM, Andres Freund wrote:
>>
>> On 2014-10-29 10:24:20 +0200, Heikki Linnakangas wrote:
>>>
>>> On 10/06/2014 02:29 PM, Andres Freund wrote:

 On 2014-10-06 14:19:39 +0300, Heikki Linnakangas wrote:
>
> Barring objections, I'll commit this, and then continue benchmarking
the
> second patch with the WAL format and API changes.


 I'd like to have a look at it beforehand.
>>>
>>>
>>> Ping? Here's an rebased patch. I'd like to proceed with this.
>>
>>
>> Doing so.
>
>
> Here's a new version of this refactoring patch. It fixes all the concrete
issues you pointed out.
>
 I've not yet really looked,
 but on a quick readthrough XLogInsertRecData() staying in xlog.c
doesn't
 make me happy...
>>>
>>>
>>> Ok.. Can you elaborate?
>>
>>
>> To me the split between xloginsert.c doing some of the record assembly,
>> and xlog.c doing the lower level part of the assembly is just wrong.
>
>
> I moved the checks for bootstrap mode and xl_len == 0, from the current
XLogInsert to the new XLogInsert() function. I don't imagine that to be
enough address your feelings about the split between xlog.c and
xloginsert.c, but makes it a tiny bit clearer IMHO. I don't know what else
to do about it, as it feels very sensible to me as it is now. So unless you
object more loudly, I'm going to just go ahead with this and commit,
probably tomorrow.
>

Few observations while reading the latest patch:

1.
+XLogRecPtr
+XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata)
{
..
+ /* info's high bits are reserved for use by me */
+ if (info & XLR_INFO_MASK)
+ elog(PANIC, "invalid xlog info mask %02X", info);
..
}

Earlier before this check, we use to check XLogInsertAllowed()
which is moved to XLogInsertRecord(), isn't it better to keep
the check in beginning of the function XLogInsert()?

2.
XLogRecPtr
XLogInsertRecord(XLogRecData *rdata, XLogRecPtr fpw_lsn)
{
..
if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)
{
/*
 * Oops, some buffer now needs to be backed up that the caller
 * didn't back up.  Start over.
 */
WALInsertLockRelease();
END_CRIT_SECTION();
return InvalidXLogRecPtr;
}
..
}

IIUC, there can be 4 states for doPageWrites w.r.t when record is getting
assembled (XLogRecordAssemble) and just before actual insert (
XLogInsertRecord)

I think the behaviour for the state when doPageWrites is true during
XLogRecordAssemble and false during XLogInsertRecord (just before
actual insert) is different as compare to HEAD.

In the patch, it will not retry if doPageWrites is false when we are
about to insert even though fpw_lsn <= RedoRecPtr whereas in HEAD it
will detect this during assembly of record and retry, isn't this a
problem?


3. There are couple of places where *XLogInsert* is used in wal.sgml
and it seems to me some of those needs change w.r.t this patch.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Proposal for better support of time-varying timezone abbreviations

2014-11-04 Thread Michael Meskes
On Wed, Oct 22, 2014 at 11:32:41AM -0400, Tom Lane wrote:
> I don't have a strong opinion about which of the above things to do ...
> what's your preference?

I think it's better for the future if me make a clean cut. Yes, the option
keeps compatability a little bit higher, but that doesn't matter that much as
the codes won't be used by the server anyway. Besides, they may eventually
change and then we have to make sure to remember ecpg's copy again.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at gmail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-04 Thread Jim Nasby

On 11/3/14, 2:36 PM, Alvaro Herrera wrote:

Jim Nasby wrote:

On 11/1/14, 8:41 AM, Petr Jelinek wrote:

Well this is not BDR specific thing, the idea is that with logical replication, 
commit timestamp is not enough for conflict handling, you also need to have 
additional info in order to identify some types of conflicts conflicts (local 
update vs remote update for example). So the extradata field was meant as 
something that could be used to add the additional info to the xid.


Related to this... is there any way to deal with 2 transactions that commit in 
the same microsecond? It seems silly to try and handle that for every commit 
since it should be quite rare, but perhaps we could store the LSN as extradata 
if we detect a conflict?


Well, two things.  One, LSN is 8 bytes and extradata (at least in this
patch when I last saw it) is only 4 bytes.  But secondly and more
important is that detecting a conflict is done in node B *after* node A
has recorded the transaction's commit time; there is no way to know at
commit time that there is going to be a conflict caused by that
transaction in the future.  (If there was a way to tell, you could just
as well not commit the transaction in the first place.)


I'm worried about 2 commits in the same microsecond on the same system, not on 
2 different systems. Or, put another way, if we're going to expose this I think 
it should also provide a guaranteed unique commit ordering for a single 
cluster. Presumably, this shouldn't be that hard since we do know the exact 
order in which things committed.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-04 Thread David Fetter
On Tue, Nov 04, 2014 at 08:30:21AM +, Laurenz Albe wrote:
> David Fetter wrote:
> > On Tue, Nov 04, 2014 at 07:51:06AM +0900, Tatsuo Ishii wrote:
> >> Just out of curiosity, why is Oracle's NUMBER (I assume you are
> >> talking about this) so fast?
> > 
> > I suspect that what happens is that NUMBER is stored as a native
> > type (int2, int4, int8, int16) that depends on its size and then
> > cast to the next upward thing as needed, taking any performance
> > hits at that point.  The documentation hints (38 decimal places)
> > at a 128-bit internal representation as the maximum.  I don't know
> > what happens when you get past what 128 bits can represent.
> 
> No, Oracle stores NUMBERs as variable length field (up to 22 bytes),
> where the first byte encodes the sign and the comma position and the
> remaining bytes encode the digits, each byte representing two digits
> in base-100 notation (see Oracle Metalink note 1007641.6).

Thanks for clearing that up, and sorry for spreading misinformed
guesses.

> So it's not so different from PostgreSQL.
> No idea why their arithmetic should be faster.

I have an idea, but this time, I think it's right.  They have at least
one team of people whose job it is to make sure that it is fast.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Convert query plan to sql query

2014-11-04 Thread Ashutosh Bapat
May be you want to check how it's done in Postgres-XC. Postgres-XC works on
plans being created by PostgreSQL and "reverse-engineers" queries (for
parts of the plans which are "shippable".) The notions of "shippability"
may not be of interest to you, but the code to "reverse-engineer" most of
the plan nodes is there in Postgres-XC.

On Wed, Nov 5, 2014 at 8:47 AM, mariem  wrote:

> Hello,
>
> I would like to transform the query plan (output of the planner,
> debug_print_plan) into an sql query.
> I know that there are pieces of the query plan that might be machine
> dependent (in var for example).
> So I wanted to have your suggestions or thoughts before I put efforts into
> it.
>
> Basically, if I have:
>  query1 -> parser -> rewriter -> planner
> the process would be :
>  query_plan -> planner -> parser -> query2
>
> query1 and query2 are not necessarily the same due to rewrite, stats..
>
> Thanks!
>
>
>
>
> --
> View this message in context:
> http://postgresql.1045698.n5.nabble.com/Convert-query-plan-to-sql-query-tp5825727.html
> Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] WAL replay bugs

2014-11-04 Thread Michael Paquier
On Wed, Nov 5, 2014 at 6:29 AM, Peter Eisentraut  wrote:

> On 11/4/14 3:21 PM, Alvaro Herrera wrote:
> > FWIW I gave this a trial run and found I needed some tweaks to test.sh
> > and the Makefile in order to make it work on VPATH; mainly replace ./
> > with `dirname $0` in a couple test.sh in a couple of places, and
> > something similar in the Makefile.  Also you have $PG_ROOT_DIR somewhere
> > which doesn't work.
>
> I also saw some bashisms in the script.
>
> Maybe the time for shell-based test scripts has passed?
>
Except pg_upgrade, are there other tests using bash?
-- 
Michael


Re: [HACKERS] WAL replay bugs

2014-11-04 Thread Michael Paquier
Thanks for the tests.

On Wed, Nov 5, 2014 at 5:21 AM, Alvaro Herrera 
wrote:

> Michael Paquier wrote:
> > On Mon, Jul 14, 2014 at 6:14 PM, Kyotaro HORIGUCHI
> >  wrote:
> > > Although I doubt necessity of the flexibility seeing the current
> > > testing framework, I don't have so strong objection about
> > > that. Nevertheless, perhaps you are appreciated to put a notice
> > > on.. README or somewhere.
> > Hm, well... Fine, I added it in this updated series.
>
> FWIW I gave this a trial run and found I needed some tweaks to test.sh
> and the Makefile in order to make it work on VPATH; mainly replace ./
> with `dirname $0` in a couple test.sh in a couple of places, and
> something similar in the Makefile.  Also you have $PG_ROOT_DIR somewhere
> which doesn't work.
>
Ah thanks, forgot that.


> Also you have the Makefile checking for -DBUFFER_CAPTURE exactly but for
> some reason I used -DBUFFER_CAPTURE=1 which wasn't well received by your
> $(filter) stuff.  Instead of checking CFLAGS it might make more sense to
> expose it as a read-only GUC and grep `postmaster -C buffer_capture` or
> similar.
>
Yes that's a good idea.

Now, do we really want this feature in-core? That's somewhat a duplicate of
what is mentioned here:
http://www.postgresql.org/message-id/CAB7nPqQMq=4ejak317mxz4has0i+1rslbqu29zx18jwlb2j...@mail.gmail.com
Of course both things do not have the same coverage as the former is for
buildfarm and dev, while the latter is dedidated to production systems, but
could be used for development as well.

The patch sent there is a bit outdated, but a potential implementation gets
simpler with XLogReadBufferForRedo able to return flags about each block
state during redo. I am still planning to come back to it for this cycle,
though I stopped for now waiting for the WAL format patches finish to shape
the APIs this feature would rely on.
Regards,
-- 
Michael


Re: [HACKERS] Add CREATE support to event triggers

2014-11-04 Thread Michael Paquier
On Fri, Oct 31, 2014 at 11:27 AM, Michael Paquier  wrote:

> So, what I think is missing is really a friendly interface to manipulate
> JsonbContainers directly, and I think that we are not far from it with
> something like this set, roughly:
> - Initialization of an empty container
> - Set of APIs to directly push a value to a container (boolean, array,
> null, string, numeric or other jsonb object)
> - Initialization of JsonbValue objects
>
Here are more thoughts among those lines looking at the current state of
the patch 4 that introduces the infrastructure of the whole feature. This
would make possible in-memory manipulation of jsonb containers without
relying on a 3rd-part set of APIs like what this patch is doing with
ObjTree to deparse the DDL parse trees.
1) Set of constructor functions for JsonbValue: null, bool, string, array,
JSONB object for nested values. Note that keys for can be used as Jsonb
string objects
2) Lookup functions for values in a JsonbContainer. Patch 4 is introducing
that with find_string_in_jsonbcontainer and find_bool_in_jsonbcontainer. We
may as well extend it to be able to look for another Jsonb object for
nested searches for example.
3) Functions to push JsonbValue within a container, using a key and a
value. This is where most of the work would be necessary, for bool, null,
string, Jsonb object and numeric.

This infrastructure would allow in-memory manipulation of jsonb containers.
Containers that can then be easily be manipulated to be changed back to
strings and for value lookups using key strings.
Regards,
--
Michael


[HACKERS] Convert query plan to sql query

2014-11-04 Thread mariem
Hello,

I would like to transform the query plan (output of the planner,
debug_print_plan) into an sql query.
I know that there are pieces of the query plan that might be machine
dependent (in var for example).
So I wanted to have your suggestions or thoughts before I put efforts into
it.

Basically, if I have:
 query1 -> parser -> rewriter -> planner
the process would be :
 query_plan -> planner -> parser -> query2

query1 and query2 are not necessarily the same due to rewrite, stats..

Thanks!




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Convert-query-plan-to-sql-query-tp5825727.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Preferring MemSet or memset?

2014-11-04 Thread Tom Lane
Michael Paquier  writes:
> MemSet is an internal macro faster than the system memset for zeroing small
> word-aligned structures defined in src/include/c.h. Both are being used
> here and there with no real preference.
> An example of that is that in many code paths we have for example nulls and
> values used to build tuples, that are sometimes initialized with memset,
> other times with MemSet. Wouldn't we gain a bit of performance by switching
> to MemSet the initializations of nulls and values currently done with
> memset?

MemSet is *not* a win unless the target area is word-sized+word-aligned,
so it'd be a pretty unlikely thing for it to win on zeroing bool arrays.

In principle it could win for zeroing Datum arrays, but I don't think
I want to read code that is zeroing a Datum array with MemSet and the
adjacent bool array with memset; that's just weird, and it's unlikely
that we get enough win out of it to justify the notational inconsistency.

I keep expecting that we'll find that on modern platforms memset is a
winner across the board.  There is no reason that gcc couldn't optimize
memset calls into code as good as or better than MemSet, and I have the
impression that the compiler guys have actually put some effort into
optimizing it lately.  I'm not sure if anyone's recently redone the
tests that showed MemSet was worthwhile ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Preferring MemSet or memset?

2014-11-04 Thread Michael Paquier
Hi all,

MemSet is an internal macro faster than the system memset for zeroing small
word-aligned structures defined in src/include/c.h. Both are being used
here and there with no real preference.
An example of that is that in many code paths we have for example nulls and
values used to build tuples, that are sometimes initialized with memset,
other times with MemSet. Wouldn't we gain a bit of performance by switching
to MemSet the initializations of nulls and values currently done with
memset?
Opinions?
Regards,
-- 
Michael


Re: [HACKERS] tracking commit timestamps

2014-11-04 Thread Andres Freund
On 2014-11-05 08:57:07 +0900, Michael Paquier wrote:
> On Tue, Nov 4, 2014 at 10:01 PM, Alvaro Herrera 
> wrote:
> 
> > Michael Paquier wrote:
> >
> > > I'm still on a -1 for that. You mentioned that there is perhaps no reason
> > > to delay a decision on this matter, but IMO there is no reason to rush
> > > either in doing something we may regret. And I am not the only one on
> > this
> > > thread expressing concern about this extra data thingy.
> > >
> > > If this extra data field is going to be used to identify from which node
> > a
> > > commit comes from, then it is another feature than what is written on the
> > > subject of this thread. In this case let's discuss it in the thread
> > > dedicated to replication identifiers, or come up with an extra patch once
> > > the feature for commit timestamps is done.
> >
> > Introducing the extra data field in a later patch would mean an on-disk
> > representation change, i.e. pg_upgrade trouble.
> 
> Then why especially 4 bytes for the extra field? Why not 8 or 16?

It's sufficiently long that you can build infrastructure to storing more
transaction metadata data ontop. I could live making it 8 bytes, but I
don't see a clear advantage.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start

2014-11-04 Thread Álvaro Hernández Tortosa


On 04/11/14 09:07, Craig Ringer wrote:

On 11/04/2014 07:31 AM, Álvaro Hernández Tortosa wrote:

 Thank you for your comment, Tom. However I think this behavior, as
seen from a user perspective, it's not the expected one.

That may be the case, but I think it's the SQL-standard behaviour, so we
can't really mess with it.

The spec requires SET TRANSACTION ISOLATION, and you can't implement
that if you take a snapshot at BEGIN.


It's true that the standard mandates SET TRANSACTION rather than 
setting the isolation level with the BEGIN statement, and in any case 
you can raise/lower the isolation level with SET regardless of what the 
session or the begin command said. However, is it really a problem 
taking a snapshot at BEGIN time --only if the tx is started with BEGIN 
... (REPEATABLE READ | SERIALIZABLE)? AFAIK, and I may be missing some 
internal details here, the worst that can happen is that you took one 
extra, unnecessary snapshot. I don't see that as a huge problem.


The standard (92) says that transaction is initiated when a 
transaction-initiating SQL-statement is executed. To be fair, that 
sounds to me more of a "SELECT" rather than a "BEGIN", but I may be wrong.



 If it is still the intended behavior, I think it should be clearly
documented as such, and a recommendation similar to "issue a 'SELECT 1'
right after BEGIN to freeze the data before any own query" or similar
comment should be added. Again, as I said in my email, the documentation
clearly says that "only sees data committed before the transaction
began". And this is clearly not the real behavior.

It's more of a difference in when the transaction "begins".

Arguably, "BEGIN" says "I intend to begin a new transaction with the
next query" rather than "immediately begin executing a new transaction".

This concept could be clearer in the docs.


If this is really how it should behave, I'd +1000 to make it 
clearer in the docs, and to explicitly suggest the user to perform a 
query discarding the results early after BEGIN if the user wants the 
state freezed if there may span time between BEGIN and the real queries 
to be executed (like doing a SELECT 1).





 Sure, there are, that was the link I pointed out, but I found no
explicit mention to the fact that I'm raising here.

I'm sure it's documented *somewhere*, in that I remember reading about
this detail in the docs, but I can't find _where_ in the docs.

It doesn't seem to be in:

http://www.postgresql.org/docs/current/static/transaction-iso.html

where I'd expect.


Yepp, there's no mention there.



In any case, we simply cannot take the snapshot at BEGIN time, because
it's permitted to:

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

in a DB that has default serializable isolation or has a SET SESSION
CHARACTERISTICS isolation mode of serializable. Note that SET
TRANSACTION is SQL-standard.


As I said, AFAIK it shouldn't matter a lot to take the snapshot at 
BEGIN. The worst that can happen is that you end up in read committed 
and you need to take more snapshots, one per query.




AFAIK deferring the snapshot that's consistent with other RDBMSes that
use snapshots, too.


I tried Oracle and SQL Server. SQL Server seems to behave as 
PostgreSQL, but just because it locks the table if accessed in a 
serializable transaction, so it definitely waits until select to lock 
it. However, Oracle behaved as I expected: data is frozen at BEGIN time. 
I haven't tested others.





The docs of that command allude to, but doesn't explicitly state, the
behaviour you mention.

http://www.postgresql.org/docs/current/static/sql-set-transaction.html




Should we improve then the docs stating this more clearly? Any 
objection to do this?


Regards,

Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] to_char_at_timezone()?

2014-11-04 Thread Marko Tiikkaja

On 11/5/14, 12:59 AM, Tom Lane wrote:

Marko Tiikkaja  writes:

So I got into thinking whether it would make sense to provide a new
function, say, to_char_at_timezone() to solve this problem.  For example:
...
Any thoughts?  The patch is quite trivial.


I'm not convinced that it's all that trivial.  Is the input timestamp or
timestamptz, and what's the semantics of that exactly (ie what timezone
rotation happens)?  One's first instinct is often wrong in this area.


In my example, the input is a "timestamptz", and the output is converted 
to the target time zone the same way timestamptz_out() does, except 
based on the input timezone instead of TimeZone.


Not sure whether it would make sense to do this for "timestamp", or 
whether there's even a clear intuitive behaviour there.



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] to_char_at_timezone()?

2014-11-04 Thread Tom Lane
Marko Tiikkaja  writes:
> So I got into thinking whether it would make sense to provide a new 
> function, say, to_char_at_timezone() to solve this problem.  For example:
> ...
> Any thoughts?  The patch is quite trivial.

I'm not convinced that it's all that trivial.  Is the input timestamp or
timestamptz, and what's the semantics of that exactly (ie what timezone
rotation happens)?  One's first instinct is often wrong in this area.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-04 Thread Michael Paquier
On Tue, Nov 4, 2014 at 10:01 PM, Alvaro Herrera 
wrote:

> Michael Paquier wrote:
>
> > I'm still on a -1 for that. You mentioned that there is perhaps no reason
> > to delay a decision on this matter, but IMO there is no reason to rush
> > either in doing something we may regret. And I am not the only one on
> this
> > thread expressing concern about this extra data thingy.
> >
> > If this extra data field is going to be used to identify from which node
> a
> > commit comes from, then it is another feature than what is written on the
> > subject of this thread. In this case let's discuss it in the thread
> > dedicated to replication identifiers, or come up with an extra patch once
> > the feature for commit timestamps is done.
>
> Introducing the extra data field in a later patch would mean an on-disk
> representation change, i.e. pg_upgrade trouble.

Then why especially 4 bytes for the extra field? Why not 8 or 16?
-- 
Michael


[HACKERS] to_char_at_timezone()?

2014-11-04 Thread Marko Tiikkaja

Hi,

9.4 FINALLY added the UTC offset formatting pattern to to_char(). 
However, it falls a bit short in the sense that it's always the time 
zone offset according to the effective TimeZone value.  This has a few 
issues as far as I can tell:


  1) It's not truly controlled by the query which produces the 
timestamptz values in the case of e.g. functions

  2) Having to SET LOCAL before a query is quite ugly
  3) It supports only a single TimeZone value per query

So I got into thinking whether it would make sense to provide a new 
function, say, to_char_at_timezone() to solve this problem.  For example:


local:marko=#* select now();
  now
---
 2014-11-05 00:43:47.954662+01
(1 row)

local:marko=#* select to_char_at_timezone(now(), '-MM-DD 
HH24:MI:SSOF', 'Etc/Utc');

   to_char_at_timezone

 2014-11-04 23:43:47+00
(1 row)

local:marko=#* select to_char_at_timezone(now(), '-MM-DD 
HH24:MI:SSOF', 'America/Los_Angeles');

   to_char_at_timezone

 2014-11-04 15:43:47-08
(1 row)


Any thoughts?  The patch is quite trivial.


.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] ltree::text not immutable?

2014-11-04 Thread Tom Lane
I wrote:
> An alternative that just occurred to me is to put the no-volatile-
> I/O-functions check into CREATE TYPE, but make it just WARNING not
> ERROR.  That would be nearly as good as an ERROR in terms of nudging
> people who'd accidentally omitted a volatility marking from their
> custom types.  But we could leave chkpass as-is and wait to see if
> anyone complains "hey, why am I getting this warning?".  If we don't
> hear anyone complaining, maybe that means we can get away with changing
> the type's behavior in 9.6 or later.

Attached is a complete patch along these lines.  As I suggested earlier,
this just makes the relevant changes in ltree--1.0.sql and
pg_trgm--1.1.sql without bumping their extension version numbers,
since it doesn't seem important enough to justify a version bump.

I propose that we could back-patch the immutability-additions in ltree and
pg_trgm, since they won't hurt anything and might make life a little
easier for future adopters of those modules.  The WARNING additions should
only go into HEAD though.

regards, tom lane

diff --git a/contrib/ltree/ltree--1.0.sql b/contrib/ltree/ltree--1.0.sql
index 5a2f375..7d55fc6 100644
*** a/contrib/ltree/ltree--1.0.sql
--- b/contrib/ltree/ltree--1.0.sql
***
*** 6,17 
  CREATE FUNCTION ltree_in(cstring)
  RETURNS ltree
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT;
  
  CREATE FUNCTION ltree_out(ltree)
  RETURNS cstring
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT;
  
  CREATE TYPE ltree (
  	INTERNALLENGTH = -1,
--- 6,17 
  CREATE FUNCTION ltree_in(cstring)
  RETURNS ltree
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE FUNCTION ltree_out(ltree)
  RETURNS cstring
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE TYPE ltree (
  	INTERNALLENGTH = -1,
*** CREATE OPERATOR CLASS ltree_ops
*** 303,314 
  CREATE FUNCTION lquery_in(cstring)
  RETURNS lquery
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT;
  
  CREATE FUNCTION lquery_out(lquery)
  RETURNS cstring
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT;
  
  CREATE TYPE lquery (
  	INTERNALLENGTH = -1,
--- 303,314 
  CREATE FUNCTION lquery_in(cstring)
  RETURNS lquery
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE FUNCTION lquery_out(lquery)
  RETURNS cstring
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE TYPE lquery (
  	INTERNALLENGTH = -1,
*** CREATE OPERATOR ^? (
*** 414,425 
  CREATE FUNCTION ltxtq_in(cstring)
  RETURNS ltxtquery
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT;
  
  CREATE FUNCTION ltxtq_out(ltxtquery)
  RETURNS cstring
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT;
  
  CREATE TYPE ltxtquery (
  	INTERNALLENGTH = -1,
--- 414,425 
  CREATE FUNCTION ltxtq_in(cstring)
  RETURNS ltxtquery
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE FUNCTION ltxtq_out(ltxtquery)
  RETURNS cstring
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE TYPE ltxtquery (
  	INTERNALLENGTH = -1,
*** CREATE OPERATOR ^@ (
*** 481,492 
  CREATE FUNCTION ltree_gist_in(cstring)
  RETURNS ltree_gist
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT;
  
  CREATE FUNCTION ltree_gist_out(ltree_gist)
  RETURNS cstring
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT;
  
  CREATE TYPE ltree_gist (
  	internallength = -1,
--- 481,492 
  CREATE FUNCTION ltree_gist_in(cstring)
  RETURNS ltree_gist
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE FUNCTION ltree_gist_out(ltree_gist)
  RETURNS cstring
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE TYPE ltree_gist (
  	internallength = -1,
diff --git a/contrib/pg_trgm/pg_trgm--1.1.sql b/contrib/pg_trgm/pg_trgm--1.1.sql
index 1fff7af..34b37e4 100644
*** a/contrib/pg_trgm/pg_trgm--1.1.sql
--- b/contrib/pg_trgm/pg_trgm--1.1.sql
*** CREATE OPERATOR <-> (
*** 53,64 
  CREATE FUNCTION gtrgm_in(cstring)
  RETURNS gtrgm
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT;
  
  CREATE FUNCTION gtrgm_out(gtrgm)
  RETURNS cstring
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT;
  
  CREATE TYPE gtrgm (
  INTERNALLENGTH = -1,
--- 53,64 
  CREATE FUNCTION gtrgm_in(cstring)
  RETURNS gtrgm
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE FUNCTION gtrgm_out(gtrgm)
  RETURNS cstring
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE TYPE gtrgm (
  INTERNALLENGTH = -1,
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 55a6881..cbb65f8 100644
*** a/src/backend/commands/typecmds.c
--- b/src/backend/commands/typecmds.c
*** DefineType(List *names, List *parameters
*** 547,552 
--- 547,598 
  	   NameListToString(analyzeName));
  #endif
  
+ 	/*
+ 	 * Print warnings if any of the type's I/O functions are marked volatile.
+ 	 * There is a general assumption that I/O functions are stable or
+ 	 * immutable; this allows us for example to mark record_in/record_out
+ 	 

Re: [GENERAL] Re: [BUGS] [HACKERS] COPY TO returning empty result with parallel ALTER TABLE

2014-11-04 Thread Bernd Helmle



--On 4. November 2014 17:18:14 -0500 Tom Lane  wrote:


Yeah, and I think that it's entirely reasonable for rewriting ALTER TABLEs
to update the xmin of the rewritten tuples; after all, the output data
could be arbitrarily different from what the previous transactions put
into the table.  But that is not the question here.  If the COPY blocks
until the ALTER completes --- as it must --- why is its execution snapshot
not taken *after* the lock is acquired?


COPY waits in DoCopy() but its snapshot gets acquired in PortalRunUtility() 
earlier. SELECT has it's lock already during transform/analyse phase and 
its snapshot is taken much later.  Looks like we need something that 
analyses a utility statement to get its lock before executing.


--
Thanks

Bernd


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [GENERAL] Re: [BUGS] [HACKERS] COPY TO returning empty result with parallel ALTER TABLE

2014-11-04 Thread Tom Lane
Andres Freund  writes:
> On 2014-11-04 13:51:23 -0500, Tom Lane wrote:
>> Not sure.  The OP's point is that in a SELECT, you do get unsurprising
>> results, because a SELECT will acquire its execution snapshot after it's
>> gotten AccessShareLock on the table.  Arguably COPY should behave likewise.
>> Or to be even more concrete, COPY (SELECT * FROM tab) TO ... probably
>> already acts like he wants, so why isn't plain COPY equivalent to that?

> Even a plain SELECT essentially acts that way if I recall correctly if
> you use REPEATABLE READ+ and force a snapshot to be acquired
> beforehand. It's imo not very surprising.

"It doesn't fail in a non-default isolation mode" is hardly much of an
argument for this being okay in READ COMMITTED.

> All ALTER TABLE rewrites just disregard visibility of existing
> tuples. Only CLUSTER/VACUUM FULL do the full hangups to keep all the
> necessary tuples + ctid chains around.

Yeah, and I think that it's entirely reasonable for rewriting ALTER TABLEs
to update the xmin of the rewritten tuples; after all, the output data
could be arbitrarily different from what the previous transactions put
into the table.  But that is not the question here.  If the COPY blocks
until the ALTER completes --- as it must --- why is its execution snapshot
not taken *after* the lock is acquired?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivelog completion command

2014-11-04 Thread Peter Eisentraut
On 11/2/14 8:26 AM, Magnus Hagander wrote:
> The idea is to have pg_receivexlog
> fire off an external command at the end of each segment - for example
> a command to gzip the file, or to archive it off into a Magic Cloud
> (TM) or something like that.

A simple facility to allow gzipping after the file is complete might be
OK, but the cloud use case is probably too abstract to be useful.  I'd
rather write my own consumer for that, or go back to archive_command,
which has the queuing logic built in already.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-04 Thread Petr Jelinek

On 04/11/14 22:20, Peter Eisentraut wrote:

On 11/3/14 5:17 PM, Petr Jelinek wrote:

Please don't name anything "committs".  That looks like a misspelling of
something.

There is nothing wrong with

pg_get_transaction_commit_timestamp()

If you want to reduce the length, lose the "get".



I am fine with that, I only wonder if your definition of "anything" only
concerns the SQL interfaces or also the internals.


I'd be fine with commit_ts for internals, but not committs.

One day, you'll need a function or data structure that works with
multiple of these, and then you'll really be in naming trouble. ;-)



Hmm we use CommitTs in interfaces that uses CamelCase naming so I guess 
commit_ts is indeed natural expansion of that.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL replay bugs

2014-11-04 Thread Peter Eisentraut
On 11/4/14 3:21 PM, Alvaro Herrera wrote:
> FWIW I gave this a trial run and found I needed some tweaks to test.sh
> and the Makefile in order to make it work on VPATH; mainly replace ./
> with `dirname $0` in a couple test.sh in a couple of places, and
> something similar in the Makefile.  Also you have $PG_ROOT_DIR somewhere
> which doesn't work.

I also saw some bashisms in the script.

Maybe the time for shell-based test scripts has passed?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-04 Thread Petr Jelinek

On 04/11/14 09:25, Michael Paquier wrote:

On Tue, Nov 4, 2014 at 5:05 PM, Andres Freund mailto:and...@2ndquadrant.com>> wrote:

On 2014-11-02 19:27:25 +0100, Petr Jelinek wrote:
> Well, Michael has point that the extradata is pretty much useless 
currently,
> perhaps it would help to add the interface to set extradata?

Only accessible via C and useless aren't the same thing. But sure, add
it.

I'm still on a -1 for that. You mentioned that there is perhaps no
reason to delay a decision on this matter, but IMO there is no reason to
rush either in doing something we may regret. And I am not the only one
on this thread expressing concern about this extra data thingy.

If this extra data field is going to be used to identify from which node
a commit comes from, then it is another feature than what is written on
the subject of this thread. In this case let's discuss it in the thread
dedicated to replication identifiers, or come up with an extra patch
once the feature for commit timestamps is done.


The issue as I see it is that both of those features are closely related 
and just one without the other has very limited use. What I learned from 
working on UDR is that for conflict handling, I was actually missing the 
extradata more than the timestamp itself - in other words I have 
extension for 9.4 where I have use for this API already, so the argument 
about dead code or forks or whatever does not really hold.


The other problem is that if we add extradata later we will either break 
upgrade-ability of will have to write essentially same code again which 
will store just the extradata instead of the timestamp, I don't really 
like either of those options to be honest.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-04 Thread Peter Eisentraut
On 11/3/14 5:17 PM, Petr Jelinek wrote:
>> Please don't name anything "committs".  That looks like a misspelling of
>> something.
>>
>> There is nothing wrong with
>>
>> pg_get_transaction_commit_timestamp()
>>
>> If you want to reduce the length, lose the "get".
>>
> 
> I am fine with that, I only wonder if your definition of "anything" only
> concerns the SQL interfaces or also the internals.

I'd be fine with commit_ts for internals, but not committs.

One day, you'll need a function or data structure that works with
multiple of these, and then you'll really be in naming trouble. ;-)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-04 Thread Petr Jelinek

On 04/11/14 09:19, Michael Paquier wrote:

On Sat, Nov 1, 2014 at 10:00 PM, Michael Paquier
mailto:michael.paqu...@gmail.com>> wrote:

More comments:

Here are also more comments about the code that I found while focusing
on committs.c:
1) When the GUC is not enabled, and when the transaction ID provided is
not in a correct range, a dummy value based on InvalidTransactionId (?!)
is returned, like that:
+   /* Return empty if module not enabled */
+   if (!commit_ts_enabled)
+   {
+   if (ts)
+   *ts = InvalidTransactionId;
+   if (data)
+   *data = (CommitExtraData) 0;
+   return;
+   }
This leads to some incorrect results:
=# select pg_get_transaction_committime('1');
  pg_get_transaction_committime
---
  2000-01-01 09:00:00+09
(1 row)


Oh, I had this on my TODO and somehow forgot about it, I am leaning 
towards throwing an error when calling any committs "get" function while 
the tracking is disabled.



Or for example:
=# SELECT txid_current();
  txid_current
--
  1006
(1 row)
=# select pg_get_transaction_committime('1006');
  pg_get_transaction_committime
---
  2014-11-04 14:54:37.589381+09
(1 row)
=# select pg_get_transaction_committime('1007');
  pg_get_transaction_committime
---
  2000-01-01 09:00:00+09
(1 row)
=# SELECT txid_current();
  txid_current
--
  1007
(1 row)
And at other times error is not that helpful for user:
=# select pg_get_transaction_committime('1');
ERROR:  XX000: could not access status of transaction 1
DETAIL:  Could not read from file "pg_committs/" at offset 114688:
Undefined error: 0.
LOCATION:  SlruReportIOError, slru.c:896
I think that you should simply return an error in
TransactionIdGetCommitTsData when xid is not in the range supported.
That will be more transparent for the user.


Agreed.


5) Reading the code, TransactionTreeSetCommitTimestamp is always called
with do_xlog = false, making that actually no timestamps are WAL'd...
Hence WriteSetTimestampXlogRec is just dead code with the latest version
of the patch. IMO, this should be always WAL-logged when
track_commit_timestamp is on.


As Andres explained this is always WAL-logged when called by current 
caller so we don't want it to be double logged, so that's why do_xlog = 
false, but when extension will need call it, it will most likely need 
do_xlog = true.



8) The redo and xlog routines of committs should be out in a dedicated
file, like committsxlog.c or similar. The other rmgr do so, so let's be
consistent.



Most actually don't AFAICS.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL replay bugs

2014-11-04 Thread Alvaro Herrera
Michael Paquier wrote:
> On Mon, Jul 14, 2014 at 6:14 PM, Kyotaro HORIGUCHI
>  wrote:
> > Although I doubt necessity of the flexibility seeing the current
> > testing framework, I don't have so strong objection about
> > that. Nevertheless, perhaps you are appreciated to put a notice
> > on.. README or somewhere.
> Hm, well... Fine, I added it in this updated series.

FWIW I gave this a trial run and found I needed some tweaks to test.sh
and the Makefile in order to make it work on VPATH; mainly replace ./
with `dirname $0` in a couple test.sh in a couple of places, and
something similar in the Makefile.  Also you have $PG_ROOT_DIR somewhere
which doesn't work.

Also you have the Makefile checking for -DBUFFER_CAPTURE exactly but for
some reason I used -DBUFFER_CAPTURE=1 which wasn't well received by your
$(filter) stuff.  Instead of checking CFLAGS it might make more sense to
expose it as a read-only GUC and grep `postmaster -C buffer_capture` or
similar.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-04 Thread Petr Jelinek

On 04/11/14 09:05, Andres Freund wrote:

On 2014-11-02 19:27:25 +0100, Petr Jelinek wrote:

Well, Michael has point that the extradata is pretty much useless currently,
perhaps it would help to add the interface to set extradata?


Only accessible via C and useless aren't the same thing. But sure, add
it.



I actually meant nicer C api - the one that will make it possible to say 
for this transaction, use this extradata (or for all transactions from 
now on done by this backend use this extradata), instead of current API 
where you have to overwrite what RecordCommit already wrote.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-11-04 Thread Petr Jelinek

On 04/11/14 13:11, Heikki Linnakangas wrote:

On 10/13/2014 01:01 PM, Petr Jelinek wrote:

Hi,

I rewrote the patch with different API along the lines of what was
discussed.


Thanks, that's better.

It would be good to see an alternative seqam to implement this API, to
see how it really works. The "local" one is too dummy to expose any
possible issues.



Yeah, I don't know what that would be, It's hard for me to find 
something that would sever purely demonstration purposes. I guess I 
could port BDR sequences to this if it would help (once we have bit more 
solid agreement that the proposed API at least theoretically seems ok so 
that I don't have to rewrite it 10 times if at all possible).



...
Only the alloc and reloptions methods are required (and implemented by
the local AM).

The caching, xlog writing, updating the page, etc is handled by backend,
the AM does not see the tuple at all. I decided to not pass even the
struct around and just pass the relevant options because I think if we
want to abstract the storage properly then the AM should not care about
how the pg_sequence looks like at all, even if it means that the
sequence_alloc parameter list is bit long.


Hmm. The division of labour between the seqam and commands/sequence.c
still feels a bit funny. sequence.c keeps track of how many values have
been WAL-logged, and thus usable immediately, but we still call
sequence_alloc even when using up those already WAL-logged values. If
you think of using this for something like a centralized sequence server
in a replication cluster, you certainly don't want to make a call to the
remote server for every value - you'll want to cache them.

With the "local" seqam, there are two levels of caching. Each backend
caches some values (per the CACHE  option in CREATE SEQUENCE). In
addition to that, the server WAL-logs 32 values at a time. If you have a
remote seqam, it would most likely add a third cache, but it would
interact in strange ways with the second cache.

Considering a non-local seqam, the locking is also a bit strange. The
server keeps the sequence page locked throughout nextval(). But if the
actual state of the sequence is maintained elsewhere, there's no need to
serialize the calls to the remote allocator, i.e. the sequence_alloc()
calls.

I'm not exactly sure what to do about that. One option is to completely
move the maintenance of the "current" value, i.e. sequence.last_value,
to the seqam. That makes sense from an abstraction point of view. For
example with a remote server managing the sequence, storing the
"current" value in the local catalog table makes no sense as it's always
going to be out-of-date. The local seqam would store it as part of the
am-private data. However, you would need to move the responsibility of
locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to
provide an API that the seqam can call to do that. Perhaps just let the
seqam call heap_inplace_update on the sequence relation.



My idea of how this works is - sequence_next handles the allocation and 
the level2 caching (WAL logged cache) via amdata if it supports it, or 
returns single value if it doesn't - then the WAL will always just write 
the 1 value and there will basically be no level2 cache, since it is the 
sequence_next who controls how much will be WAL-logged, what backend 
asks for is just "suggestion".


The third level caching as you say should be solved by the 
sequence_request_update and sequence_update callback - that will enable 
the sequence AM to handle this kind of caching asynchronously without 
blocking the sequence_next unnecessarily.


That way it's possible to implement many different strategies, from no 
cache at all, to three levels of caching, with and without blocking of 
calls to sequence_next.



For the amdata handling (which is the AM's private data variable) the
API assumes that (Datum) 0 is NULL, this seems to work well for
reloptions so should work here also and it simplifies things a little
compared to passing pointers to pointers around and making sure
everything is allocated, etc.

Sadly the fact that amdata is not fixed size and can be NULL made the
page updates of the sequence relation quite more complex that it used to
be.


It would be nice if the seqam could define exactly the columns it needs,
with any datatypes. There would be a set of common attributes:
sequence_name, start_value, cache_value, increment_by, max_value,
min_value, is_cycled. The local seqam would add "last_value", "log_cnt"
and "is_called" to that. A remote seqam that calls out to some other
server might store the remote server's hostname etc.

There could be a seqam function that returns a TupleDesc with the
required columns, for example.



Wouldn't that somewhat bloat catalog if we had new catalog table for 
each sequence AM? It also does not really solve the amdata being dynamic 
size "issue". I think just dynamic field where AM stores whatever it 
wants is enough (ie the current sol

Re: [HACKERS] pg_basebackup fails with long tablespace paths

2014-11-04 Thread Peter Eisentraut
On 10/20/14 4:51 PM, Peter Eisentraut wrote:
> On 10/20/14 2:59 PM, Tom Lane wrote:
>> What do we want to do about this?  I think a minimum expectation would be
>> for pg_basebackup to notice and complain when it's trying to create an
>> unworkably long symlink entry, but it would be far better if we found a
>> way to cope instead.
> 
> Isn't it the backend that should error out before sending truncated
> files names?
> 
> src/port/tar.c:
> 
> /* Name 100 */
> sprintf(&h[0], "%.99s", filename);

Here are patches to address that.  First, it reports errors when
attempting to create a tar header that would truncate file or symlink
names.  Second, it works around the problem in the tests by creating a
symlink from the short-name tempdir that we had set up for the
Unix-socket directory case.

The first patch can be backpatched to 9.3.  The tar code before that is
different and would need manual adjustments.

If someone has a too-long tablespace path, I think they can work around
that after this patch by creating a shorter symlink and updating the
pg_tblspc symlinks to point there.
From 16d5eb9514c89391e6c2361212363a7ac2f16e0b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 4 Nov 2014 15:19:18 -0500
Subject: [PATCH 1/2] Error when creating names too long for tar format

The tar format (at least the version we are using), does not support
file names or symlink targets longer than 99 bytes.  Until now, the tar
creation code would silently truncate any names that are too long.  (Its
original application was pg_dump, where this never happens.)  This
creates problems when running base backups over the replication
protocol.

The most important problem is when a tablespace path is longer than 99
bytes, which will result in a truncated tablespace path being backed up.
Less importantly, the basebackup protocol also promises to back up any
other files it happens to find in the data directory, which would also
lead to file name truncation if someone put a file with a long name in
there.

Now both of these cases result in an error during the backup.
---
 src/backend/replication/basebackup.c | 21 -
 src/include/pgtar.h  | 10 +-
 src/port/tar.c   | 10 +-
 3 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index fbcecbb..5835725 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1234,11 +1234,30 @@ _tarWriteHeader(const char *filename, const char *linktarget,
 struct stat * statbuf)
 {
 	char		h[512];
+	enum tarError rc;
 
-	tarCreateHeader(h, filename, linktarget, statbuf->st_size,
+	rc = tarCreateHeader(h, filename, linktarget, statbuf->st_size,
 	statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
 	statbuf->st_mtime);
 
+	switch (rc)
+	{
+		case TAR_OK:
+			break;
+		case TAR_NAME_TOO_LONG:
+			ereport(ERROR,
+	(errmsg("file name too long for tar format: \"%s\"",
+			filename)));
+			break;
+		case TAR_SYMLINK_TOO_LONG:
+			ereport(ERROR,
+	(errmsg("symbolic link target too long for tar format: file name \"%s\", target \"%s\"",
+			filename, linktarget)));
+			break;
+		default:
+			elog(ERROR, "unrecognized tar error: %d", rc);
+	}
+
 	pq_putmessage('d', h, 512);
 }
 
diff --git a/src/include/pgtar.h b/src/include/pgtar.h
index 2dd6f6b..6b6c1e1 100644
--- a/src/include/pgtar.h
+++ b/src/include/pgtar.h
@@ -11,5 +11,13 @@
  *
  *-
  */
-extern void tarCreateHeader(char *h, const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime);
+
+enum tarError
+{
+	TAR_OK = 0,
+	TAR_NAME_TOO_LONG,
+	TAR_SYMLINK_TOO_LONG
+};
+
+extern enum tarError tarCreateHeader(char *h, const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime);
 extern int	tarChecksum(char *header);
diff --git a/src/port/tar.c b/src/port/tar.c
index 09fd6c1..74168e8 100644
--- a/src/port/tar.c
+++ b/src/port/tar.c
@@ -49,10 +49,16 @@ tarChecksum(char *header)
  * must always have space for 512 characters, which is a requirement by
  * the tar format.
  */
-void
+enum tarError
 tarCreateHeader(char *h, const char *filename, const char *linktarget,
 size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime)
 {
+	if (strlen(filename) > 99)
+		return TAR_NAME_TOO_LONG;
+
+	if (linktarget && strlen(linktarget) > 99)
+		return TAR_SYMLINK_TOO_LONG;
+
 	/*
 	 * Note: most of the fields in a tar header are not supposed to be
 	 * null-terminated.  We use sprintf, which will write a null after the
@@ -141,4 +147,6 @@ tarCreateHeader(char *h, const char *filename, const char *linktarget,
 	 * 6 digits, a space, and a null, which is legal per POSIX.
 	 */
 	sprintf(&h[148], "%06o ", tarChecksum(h));
+
+	return TAR_OK;
 }
-- 
2.1.3

From 25f6daded3a4be8

Re: [JDBC] [HACKERS] Pipelining executions to postgresql server

2014-11-04 Thread Mikko Tiihonen
Kevin Wooten  wrote:
> > On Nov 4, 2014, at 12:55 AM, Craig Ringer  wrote:
> >
> > I have to say I like some aspects of how pgjdbc-ng is being done - in
> > particular use of Maven and being built around non-blocking I/O.
> >
> > OTOH, the use of Google Guava I find pretty inexplicable in a JDBC
> > driver, and since it hard-depends on JDK 7 I don't understand why Netty
> > was used instead of the async / non-blocking features of the core JVM.
> > That may simply be my inexperience with writing non-blocking socket I/O
> > code on Java though.
> >

Java6 has been EOL since 2011 and Java7 is EOL in less than 6 months. I think
that depending on old Java 7 version that soon should not even be used in
production (without paying for extra support) can hardly be too hard 
requirement.

> It confuses me as to why you consider using stable, well implemented, well 
> tested and well cared for libraries as inexplicable.  > Just because we are 
> building a “driver” means we have to write every line of code ourselves?  No 
> thanks.

Embedding parts of other projects into code-base during build with renamed 
packages is nowadays common practice in java projects: spring does it, 
elasticsearch embeds whole netty and more, even jre embeds for example xerces 
and asm.
It might not be the optimal solution, but still definitely better than writing 
everything from scratch or copy-pasting code from other projects.

If pgjdbc-ng provides both a thin maven version (with external versioned 
dependencies) and a fat-jar with the external versions repackaged inside the 
users can choose either old way: just-drop-a-jdbc-jar-into-project or new way 
with their chosen build tool that automatically manages the dependencies.

-Mikko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL replay bugs

2014-11-04 Thread Alvaro Herrera
Michael Paquier wrote:
> On Mon, Jul 14, 2014 at 6:14 PM, Kyotaro HORIGUCHI
>  wrote:
> > Although I doubt necessity of the flexibility seeing the current
> > testing framework, I don't have so strong objection about
> > that. Nevertheless, perhaps you are appreciated to put a notice
> > on.. README or somewhere.
> Hm, well... Fine, I added it in this updated series.

Did this go anywhere?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [BUGS] [HACKERS] COPY TO returning empty result with parallel ALTER TABLE

2014-11-04 Thread Andres Freund
On 2014-11-04 13:51:23 -0500, Tom Lane wrote:
> Bernd Helmle  writes:
> > --On 3. November 2014 18:15:04 +0100 Sven Wegener 
> >  wrote:
> >> I've check git master and 9.x and all show the same behaviour. I came up
> >> with the patch below, which is against curent git master. The patch
> >> modifies the COPY TO code to create a new snapshot, after acquiring the
> >> necessary locks on the source tables, so that it sees any modification
> >> commited by other backends.
> 
> > Well, i have the feeling that there's nothing wrong with it. The ALTER 
> > TABLE command has rewritten all tuples with its own XID, thus the current 
> > snapshot does not "see" these tuples anymore. I suppose that in 
> > SERIALIZABLE or REPEATABLE READ transaction isolation your proposal still 
> > doesn't return the tuples you'd like to see.
> 
> Not sure.  The OP's point is that in a SELECT, you do get unsurprising
> results, because a SELECT will acquire its execution snapshot after it's
> gotten AccessShareLock on the table.  Arguably COPY should behave likewise.
> Or to be even more concrete, COPY (SELECT * FROM tab) TO ... probably
> already acts like he wants, so why isn't plain COPY equivalent to that?

Even a plain SELECT essentially acts that way if I recall correctly if
you use REPEATABLE READ+ and force a snapshot to be acquired
beforehand. It's imo not very surprising.

All ALTER TABLE rewrites just disregard visibility of existing
tuples. Only CLUSTER/VACUUM FULL do the full hangups to keep all the
necessary tuples + ctid chains around.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] superuser() shortcuts

2014-11-04 Thread Adam Brightwell
Thanks for looking at this patch.


> I suggest moving the rest of the changes into separate patches.
>

Hmmm... perhaps the following?

* superuser-cleanup - contains above mentioned superuser shortcuts only.
* has_privilege-cleanup - contains has_*_priviledge cleanup only.

Would that also require a separate commitfest entry?

The ha*_something_privilege() changes are also not very consistent.
>
> We already have have_createrole_privilege(), which does include a
> superuser check, and you add has_replication_privilege() with a
> superuser check, but has_catupdate_privilege() and
> has_inherit_privilege() don't include a superuser check.  That's clearly
> a mess.
>

Good catch.  Though, according to the documentation, not even superuser is
allowed to bypass CATUPDATE.

http://www.postgresql.org/docs/9.4/static/catalog-pg-authid.html.

However, I can't think of a reason why "inherit" wouldn't need the
superuser check.  Obviously superuser is considered a member of every role,
but is there a reason that a superuser would not be allowed to bypass
this?  I only ask because it did not have a check previously, so I figure
there might have been a good reason for it?

Btw., why rename have_createrole_privilege()?
>

Well, actually it wasn't necessarily a rename.  It was a removal of that
function all together as all it did was simply return the result of
"has_createrole_privilege".  That seemed rather redundant and unnecessary,
IMO.

Also, your patch has spaces between tabs.  Check for whitespace errors
> with git.
>

Yikes.

-Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-04 Thread Josh Berkus
On 11/04/2014 07:33 AM, Tom Lane wrote:
> More generally, it seems like a grab bag of not terribly well designed
> features, and the features that do seem well designed seem like they
> ought to be more generic than just for int4 arrays.  So to me it feels
> like proof-of-concept experimentation rather than a production-grade
> thing that we could feel good about moving into core.

Yah, as a user of intarray, I think it belongs as an external extension.
 Heck, I think it might get more attention as an external extension, and
if it were external then folks could fork it and create versions that
don't have operator conflicts.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY TO returning empty result with parallel ALTER TABLE

2014-11-04 Thread Sven Wegener
On Tue, 4 Nov 2014, Tom Lane wrote:

> Bernd Helmle  writes:
> > --On 3. November 2014 18:15:04 +0100 Sven Wegener 
> >  wrote:
> >> I've check git master and 9.x and all show the same behaviour. I came up
> >> with the patch below, which is against curent git master. The patch
> >> modifies the COPY TO code to create a new snapshot, after acquiring the
> >> necessary locks on the source tables, so that it sees any modification
> >> commited by other backends.
> 
> > Well, i have the feeling that there's nothing wrong with it. The ALTER 
> > TABLE command has rewritten all tuples with its own XID, thus the current 
> > snapshot does not "see" these tuples anymore. I suppose that in 
> > SERIALIZABLE or REPEATABLE READ transaction isolation your proposal still 
> > doesn't return the tuples you'd like to see.
> 
> Not sure.  The OP's point is that in a SELECT, you do get unsurprising
> results, because a SELECT will acquire its execution snapshot after it's
> gotten AccessShareLock on the table.  Arguably COPY should behave likewise.
> Or to be even more concrete, COPY (SELECT * FROM tab) TO ... probably
> already acts like he wants, so why isn't plain COPY equivalent to that?

No, both plain COPY and COPY (SELECT ...) TO are broken.

Sven


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY TO returning empty result with parallel ALTER TABLE

2014-11-04 Thread Sven Wegener
On Tue, 4 Nov 2014, Bernd Helmle wrote:

> --On 3. November 2014 18:15:04 +0100 Sven Wegener 
> wrote:
> 
> > I've check git master and 9.x and all show the same behaviour. I came up
> > with the patch below, which is against curent git master. The patch
> > modifies the COPY TO code to create a new snapshot, after acquiring the
> > necessary locks on the source tables, so that it sees any modification
> > commited by other backends.
> 
> Well, i have the feeling that there's nothing wrong with it. The ALTER TABLE
> command has rewritten all tuples with its own XID, thus the current snapshot
> does not "see" these tuples anymore. I suppose that in SERIALIZABLE or
> REPEATABLE READ transaction isolation your proposal still doesn't return the
> tuples you'd like to see.

No, but with REPEATABLE READ and SERIALIZABLE the plain SELECT case is 
currently broken too. The issue I'm fixing is that COPY (SELECT ...) TO 
and SELECT behave differently.

So, how does an ALTER TABLE should behave transaction-wise? PostgreSQL 
claims transactional DDL support. As mentioned above a parallel SELECT 
with SERIALIZABLE returns an empty result, but it also sees the schema 
change, at least the change shows up in the result tuple description. The 
change doesn't show up in pg_catalog, so that bit is handled correctly. 
The schema change transaction can be rolled back the way it is now, so 
it's kind of transactional, but other transactions seeing the schema 
change in query results is broken.

The empty result might be fixed by just keeping the old XID of rewritten 
tuples, as the exclusive lock ALTER TABLE helds should be enough to make 
sure nobody is actively accessing the rewritten table. But I'm pondering 
if there is a solution for the visible schema change that doesn't involve 
keeping the old data files around or to just raise an serialization error.

Coming back on my mentioned solution of setting the XID of rewritten 
tuples to the FrozenXID. That's broken as in the REPEATABLE 
READ/SERIALIZABLE case there might be tuples that should not be seen by 
older transactions and moving the tuples to the FrozenXID would make them 
visible.

Sven


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY TO returning empty result with parallel ALTER TABLE

2014-11-04 Thread Andrew Dunstan


On 11/04/2014 01:51 PM, Tom Lane wrote:

Bernd Helmle  writes:

--On 3. November 2014 18:15:04 +0100 Sven Wegener
 wrote:

I've check git master and 9.x and all show the same behaviour. I came up
with the patch below, which is against curent git master. The patch
modifies the COPY TO code to create a new snapshot, after acquiring the
necessary locks on the source tables, so that it sees any modification
commited by other backends.

Well, i have the feeling that there's nothing wrong with it. The ALTER
TABLE command has rewritten all tuples with its own XID, thus the current
snapshot does not "see" these tuples anymore. I suppose that in
SERIALIZABLE or REPEATABLE READ transaction isolation your proposal still
doesn't return the tuples you'd like to see.

Not sure.  The OP's point is that in a SELECT, you do get unsurprising
results, because a SELECT will acquire its execution snapshot after it's
gotten AccessShareLock on the table.  Arguably COPY should behave likewise.
Or to be even more concrete, COPY (SELECT * FROM tab) TO ... probably
already acts like he wants, so why isn't plain COPY equivalent to that?


Yes, that seems like an outright bug.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY TO returning empty result with parallel ALTER TABLE

2014-11-04 Thread Tom Lane
Bernd Helmle  writes:
> --On 3. November 2014 18:15:04 +0100 Sven Wegener 
>  wrote:
>> I've check git master and 9.x and all show the same behaviour. I came up
>> with the patch below, which is against curent git master. The patch
>> modifies the COPY TO code to create a new snapshot, after acquiring the
>> necessary locks on the source tables, so that it sees any modification
>> commited by other backends.

> Well, i have the feeling that there's nothing wrong with it. The ALTER 
> TABLE command has rewritten all tuples with its own XID, thus the current 
> snapshot does not "see" these tuples anymore. I suppose that in 
> SERIALIZABLE or REPEATABLE READ transaction isolation your proposal still 
> doesn't return the tuples you'd like to see.

Not sure.  The OP's point is that in a SELECT, you do get unsurprising
results, because a SELECT will acquire its execution snapshot after it's
gotten AccessShareLock on the table.  Arguably COPY should behave likewise.
Or to be even more concrete, COPY (SELECT * FROM tab) TO ... probably
already acts like he wants, so why isn't plain COPY equivalent to that?

What concerns me more is whether there are other utility statements that
have comparable issues.  We should not fix just COPY if there are more
cases.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_cast_func syscache utility function

2014-11-04 Thread Tom Lane
Andrew Dunstan  writes:
>> here's a patch for a utility function to look up the cast function for 
>> a from/to pair of types, as recently suggested by Alvaro. Although it 
>> only contains one use (in json.c), the upcoming jsonb generators would 
>> also use it twice. I'd like to get this committed fairly quickly so I 
>> can prepare an updated patch for the jsonb generators.

I'm not exactly convinced that this is an appropriate utility function,
because casts for which a COERCION_METHOD_FUNCTION entry exists in pg_cast
are by no means the whole universe of casts.  I'm concerned that creating
this function will encourage patch authors to blow off other
possibilities when they should not.  In the particular context at hand,
it seems like you might be better advised to think about how to refactor
json_categorize_type to be more helpful to your new use-case.

A concrete example of what I'm worried about is that json_categorize_type
already ignores the possibility of binary-compatible (WITHOUT FUNCTION)
casts to json.  Since it's eliminated the domain case earlier, that's
perhaps not too horridly broken; but it'd be very easy for new uses of
this get_cast_func() function to overlook such considerations.

In short, I'd rather see this addressed through functions with slightly
higher-level APIs that are capable of covering more cases.  In most cases
it'd be best if callers were using find_coercion_pathway() rather than
taking shortcuts.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY TO returning empty result with parallel ALTER TABLE

2014-11-04 Thread Bernd Helmle



--On 3. November 2014 18:15:04 +0100 Sven Wegener 
 wrote:



I've check git master and 9.x and all show the same behaviour. I came up
with the patch below, which is against curent git master. The patch
modifies the COPY TO code to create a new snapshot, after acquiring the
necessary locks on the source tables, so that it sees any modification
commited by other backends.


Well, i have the feeling that there's nothing wrong with it. The ALTER 
TABLE command has rewritten all tuples with its own XID, thus the current 
snapshot does not "see" these tuples anymore. I suppose that in 
SERIALIZABLE or REPEATABLE READ transaction isolation your proposal still 
doesn't return the tuples you'd like to see.


--
Thanks

Bernd


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_cast_func syscache utility function

2014-11-04 Thread Andrew Dunstan


On 11/04/2014 12:57 PM, Andrew Dunstan wrote:


here's a patch for a utility function to look up the cast function for 
a from/to pair of types, as recently suggested by Alvaro. Although it 
only contains one use (in json.c), the upcoming jsonb generators would 
also use it twice. I'd like to get this committed fairly quickly so I 
can prepare an updated patch for the jsonb generators.





This time with patch.

cheers

andrew
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index d2bf640..fe9cf83 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -1299,22 +1299,12 @@ json_categorize_type(Oid typoid,
 /* but let's look for a cast to json, if it's not built-in */
 if (typoid >= FirstNormalObjectId)
 {
-	HeapTuple	tuple;
+	Oid castfunc = get_cast_func(typoid, JSONOID);
 
-	tuple = SearchSysCache2(CASTSOURCETARGET,
-			ObjectIdGetDatum(typoid),
-			ObjectIdGetDatum(JSONOID));
-	if (HeapTupleIsValid(tuple))
+	if (OidIsValid(castfunc))
 	{
-		Form_pg_cast castForm = (Form_pg_cast) GETSTRUCT(tuple);
-
-		if (castForm->castmethod == COERCION_METHOD_FUNCTION)
-		{
-			*tcategory = JSONTYPE_CAST;
-			*outfuncoid = castForm->castfunc;
-		}
-
-		ReleaseSysCache(tuple);
+		*tcategory = JSONTYPE_CAST;
+		*outfuncoid = castfunc;
 	}
 }
 			}
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 552e498..0da08fe 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -21,6 +21,7 @@
 #include "bootstrap/bootstrap.h"
 #include "catalog/pg_amop.h"
 #include "catalog/pg_amproc.h"
+#include "catalog/pg_cast.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_namespace.h"
@@ -2889,3 +2890,36 @@ get_range_subtype(Oid rangeOid)
 	else
 		return InvalidOid;
 }
+
+/*
+ *
+ * get_cast_func(fromtype, totype) -> funcoid
+ *
+ * find a cast function if any from fromtype to totype.
+ *
+ * return the Oid of the fucntion found or InvalidOid otherwise.
+ */
+
+Oid
+get_cast_func(Oid from_type, Oid to_type)
+{
+	HeapTuple	tuple;
+	Oid castfunc = InvalidOid;
+
+	tuple = SearchSysCache2(CASTSOURCETARGET,
+			ObjectIdGetDatum(from_type),
+			ObjectIdGetDatum(to_type));
+	if (HeapTupleIsValid(tuple))
+	{
+		Form_pg_cast castForm = (Form_pg_cast) GETSTRUCT(tuple);
+
+		if (castForm->castmethod == COERCION_METHOD_FUNCTION)
+		{
+			castfunc = castForm->castfunc;
+		}
+
+		ReleaseSysCache(tuple);
+	}
+
+	return castfunc;
+}
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 07d24d4..0291b96 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -151,6 +151,7 @@ extern void free_attstatsslot(Oid atttype,
   float4 *numbers, int nnumbers);
 extern char *get_namespace_name(Oid nspid);
 extern Oid	get_range_subtype(Oid rangeOid);
+extern Oid  get_cast_func(Oid from_type, Oid to_type);
 
 #define type_is_array(typid)  (get_element_type(typid) != InvalidOid)
 /* type_is_array_domain accepts both plain arrays and domains over arrays */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_cast_func syscache utility function

2014-11-04 Thread Pavel Stehule
missing patch

Regards

Pavel

2014-11-04 18:57 GMT+01:00 Andrew Dunstan :

>
> here's a patch for a utility function to look up the cast function for a
> from/to pair of types, as recently suggested by Alvaro. Although it only
> contains one use (in json.c), the upcoming jsonb generators would also use
> it twice. I'd like to get this committed fairly quickly so I can prepare an
> updated patch for the jsonb generators.
>
> cheers
>
> andrew
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


[HACKERS] get_cast_func syscache utility function

2014-11-04 Thread Andrew Dunstan


here's a patch for a utility function to look up the cast function for a 
from/to pair of types, as recently suggested by Alvaro. Although it only 
contains one use (in json.c), the upcoming jsonb generators would also 
use it twice. I'd like to get this committed fairly quickly so I can 
prepare an updated patch for the jsonb generators.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] superuser() shortcuts

2014-11-04 Thread Peter Eisentraut
On 10/27/14 11:40 AM, Stephen Frost wrote:
> * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
>>> As I started looking at this, there are multiple other places where
>>> these types of error messages occur (opclasscmds.c, user.c,
>>> postinit.c, miscinit.c are just a few), not just around the changes in
>>> this patch.  If we change them in one place, wouldn't it be best to
>>> change them in the rest?  If that is the case, I'm afraid that might
>>> distract from the purpose of this patch.  Perhaps, if we want to
>>> change them, then that should be submitted as a separate patch?
>>
>> Yeah.  I'm just saying that maybe this patch should adopt whatever
>> wording we agree to, not that we need to change other places.  On the
>> other hand, since so many other places have adopted the different
>> wording, maybe there's a reason for it and if so, does anybody know what
>> it is.  But I have to say that it does look inconsistent to me.
> 
> Updated patch attached.  Comments welcome.

The changes in

src/backend/commands/alter.c
src/backend/commands/foreigncmds.c
src/backend/commands/tablecmds.c
src/backend/commands/typecmds.c

are OK, because the superuser() calls are redundant, and cleaning this
up is clearly in line with planned future work.

I suggest moving the rest of the changes into separate patches.


The error message wording changes might well be worth considering, but
there are tons more "must be $someone to do $something" error messages,
and changing two of them isn't making anything more or less consistent.


The ha*_something_privilege() changes are also not very consistent.

We already have have_createrole_privilege(), which does include a
superuser check, and you add has_replication_privilege() with a
superuser check, but has_catupdate_privilege() and
has_inherit_privilege() don't include a superuser check.  That's clearly
a mess.

Btw., why rename have_createrole_privilege()?


Also, your patch has spaces between tabs.  Check for whitespace errors
with git.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alter user set local_preload_libraries.

2014-11-04 Thread Tom Lane
Peter Eisentraut  writes:
> On 10/9/14 1:58 PM, Fujii Masao wrote:
>> Also I think that it's useful to allow ALTER ROLE/DATABASE SET to
>> set PGC_BACKEND and PGC_SU_BACKEND parameters. So, what
>> about applying the attached patch? This patch allows that and
>> changes the context of session_preload_libraries to PGC_SU_BACKEND.

> After looking through this again, I wonder whether there is any reason
> why ignore_system_indexes cannot be plain PGC_USERSET.  With this
> change, we'd allow setting it via ALTER ROLE, but the access to
> pg_db_role_setting happens before it.  So if there is anything unsafe
> about changing ignore_system_indexes, then this would be a problem, but
> I don't see anything.

There are some, um, "interesting" consequences of setting
ignore_system_indexes; AFAIK, none that put data integrity at risk, but it
can destroy performance in ways beyond the obvious ones.  See for example
the comments for get_mergejoin_opfamilies and get_ordering_op_properties.
I don't particularly want to answer user bug reports about such behaviors,
nor do I care to put any effort into making the behavior without system
indexes smarter than it is now.  (We should also consider the risk that
there might be as-yet-unrecognized dependencies on catalog scan order that
would amount to actual bugs.)

So I'm -1 on any change that might make it look like we were encouraging
people to use ignore_system_indexes except as a very last resort.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #11867: Strange behaviour with composite types after resetting database tablespace

2014-11-04 Thread Tom Lane
I wrote:
> However: at no point in this sequence did we flush the original catalog
> blocks out of shared buffers.  Therefore, a read of the database's
> pg_class or pg_type at this point is likely to find the previous states of
> those pages (pre-composite-type-drop) as valid, matching entries, so that
> we skip reading the up-to-date page contents back from disk.

> A quick fix for this would be to issue DropDatabaseBuffers during
> movedb()

BTW, I wondered whether the exact same hazard didn't exist for ALTER TABLE
SET TABLESPACE.  At first glance it looks bad, because ATExecSetTableSpace
doesn't forcibly drop old shared buffers for the moved table either.
Closer examination says we're safe though:

* The buffers will be dropped during smgrdounlink at end of transaction,
so you could only be at risk if you moved a table, modified it, and moved
it back in a single transaction.

* A new relfilenode will be assigned during each movement.

* When assigning a relfilenode during the move-back, we'd be certain to
choose a relfilenode different from the original one, because the old
relation still exists on-disk at this point.

So it's safe as things stand; but this seems a bit, um, rickety.  Should
we insert a DropRelFileNodesAllBuffers call into ATExecSetTableSpace to
make it safer?  It's kind of annoying to have to scan the buffer pool
twice, but I'm afraid that sometime in the future somebody will try to get
clever about optimizing away the end-of-transaction buffer pool scan, and
break this case.  Or maybe I'm just worrying too much.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [JDBC] [HACKERS] Pipelining executions to postgresql server

2014-11-04 Thread Kevin Wooten

> On Nov 4, 2014, at 12:55 AM, Craig Ringer  wrote:
> 
> On 11/04/2014 07:56 AM, Mikko Tiihonen wrote:
>> I also think the async I/O is the way to go. Luckily that has already been 
>> done
>> in the pgjdbc-ng  (https://github.com/impossibl/pgjdbc-ng), built on top
>> of netty java NIO library. It has quite good feature parity with the original
>> pgjdbc driver. 
> 
> Huh, it does seem to now. The big omission, unless I'm blind, is support
> for COPY. (It also lacks support for JDBC3 and JDBC4, requiring JDK 7
> and JDBC 4.1, but that's reasonable enough.)
> 

I focused on 4.1 compliance instead of legacy support. I am proud to say I 
believe pgjdbc-ng is 100% feature compliant for JDBC 4.1.  Legacy support is 
still not, and likely will never be, planned.

It does lack COPY support as I have never used it myself from the driver, only 
from tools outside the JVM.

> It now has LISTEN/NOTIFY by the looks, and of course it's built around
> binary protocol support.
> 
> I freely admit I haven't looked at it too closely. I'm a tad frustrated
> by the parallel development of a different driver when the "official"
> driver has so much in the way of low hanging fruit to improve.
> 

I’ll only nibble this bait…. I outlined my reasons when I started the project.  
They were all valid then and still are now.  My apologies for causing any 
frustration with this new path. It’s a lot cleaner, simpler and provides more 
JDBC features than the original driver because of it.  Although I must say, 
without the original driver and it’s exhausting battery of unit tests, building 
a new driver would seem impossible.

> I have to say I like some aspects of how pgjdbc-ng is being done - in
> particular use of Maven and being built around non-blocking I/O.
> 
> OTOH, the use of Google Guava I find pretty inexplicable in a JDBC
> driver, and since it hard-depends on JDK 7 I don't understand why Netty
> was used instead of the async / non-blocking features of the core JVM.
> That may simply be my inexperience with writing non-blocking socket I/O
> code on Java though.
> 

It confuses me as to why you consider using stable, well implemented, well 
tested and well cared for libraries as inexplicable.  Just because we are 
building a “driver” means we have to write every line of code ourselves?  No 
thanks.  You can imagine our differences on this philosophy are one of the 
reasons why I consider pgjdbc-ng’s parallel development to be a godsend rather 
than hacking on the original code.

Concerning Guava…  A great library with an amazing number of classes that make 
everyday Java easier.  The requirement for JDK 7 was chosen before Guava was 
considered not because of it.  Using it seemed obvious after that decision.  
Also, we have internalized the classes we use out of Guava to remove it as a 
dependency. This is more work to maintain on our part but makes it worth it 
when deploying a single JAR.

Concerning Netty…  All options were entertained at the beginning.  The original 
version actually used a basic NIO socket.  After I realized I had to basically 
write my own framework to work with this socket correctly I searched for an 
alternative and found Netty.  The deciding factor was that Implementing SSL on 
on top of the NIO API was considered next to impossible to get right; according 
to all prevailing wisdom at the time.  Whereas with Netty, SSL support is 
basically a single line change.

> I'm also concerned about how it'll handle new JDBC versions, as it seems
> to lack the JDBC interface abstraction of the core driver.
> 

My plan is to handle adding support for 4.2 and beyond by using a Maven based 
conditional preprocessor.  If that fails, or seems just too ugly, I’ll probably 
have to look at an abstract class based method like that of the original driver.

>> I do not think the JDBC batch interface even allow doing updates to multiple
>> tables when using prepared statements?
> 
> Ah, I missed this before.
> 
> That's correct. You get prepared statements _or_ multiple different
> statements.
> 
> That's a more understandable reason to concoct a new API, and explains
> why you're not just solving the issues with batches. Though I still
> think you're going to have to fix the buffer management code before you
> do anything like this.
> 
> -- 
> Craig Ringer   http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
> 
> 
> -- 
> Sent via pgsql-jdbc mailing list (pgsql-j...@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alter user set local_preload_libraries.

2014-11-04 Thread Peter Eisentraut
On 10/9/14 1:58 PM, Fujii Masao wrote:
> Also I think that it's useful to allow ALTER ROLE/DATABASE SET to
> set PGC_BACKEND and PGC_SU_BACKEND parameters. So, what
> about applying the attached patch? This patch allows that and
> changes the context of session_preload_libraries to PGC_SU_BACKEND.

After looking through this again, I wonder whether there is any reason
why ignore_system_indexes cannot be plain PGC_USERSET.  With this
change, we'd allow setting it via ALTER ROLE, but the access to
pg_db_role_setting happens before it.  So if there is anything unsafe
about changing ignore_system_indexes, then this would be a problem, but
I don't see anything.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] BUG #11867: Strange behaviour with composite types after resetting database tablespace

2014-11-04 Thread Marc Munro
On Mon, 2014-11-03 at 22:08 -0500, Tom Lane wrote:
> Fascinating.

:-)

> I believe what is happening is:

[ . . . ]

> > This is not mission-critical for me but I'd be grateful for suggestions for
> > work-arounds.
> 
> I don't see any workaround that's much easier than fixing the bug.
> But what's your use case that involves flipping databases from one
> tablespace to another and then back again within the life expectancy of
> unused shared-buffers pages?  It doesn't seem terribly surprising that
> nobody noticed this before ...

It's a completely unreal use case.  I am working on a diff tool, and
this was found as part of my test suite: build db x, drop db x,  build
db y, apply diff y->x, compare with original x, apply diff x->y, compare
with original y.

So this is a fairly minor inconvenience for me.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-04 Thread Tom Lane
Craig Ringer  writes:
> On 11/03/2014 03:41 AM, Tom Lane wrote:
>> Nothing that I recall at the moment, but there is certainly plenty of
>> stuff of dubious quality in there.  I'd argue that chkpass, intagg,
>> intarray, isn, spi, and xml2 are all in worse shape than the money type.

> What's wrong with intarray?

The single biggest practical problem with it is that it creates new
definitions of the <@ and @> operators, which cause confusion with the
core operators of those names.  For example, we've repeatedly seen bug
reports along the lines of "why does this query not use this index"
because of people trying to use the contrib @> operator with a core GIN
index or vice versa.  (In fairness, I think intarray might be older
than the core operators, but that doesn't make this less of a problem.)

Another thing that grates on me is the mostly-arbitrary restriction
to non-null-containing arrays.  That's an implementation artifact rather
than something semantically required by the operations.

More generally, it seems like a grab bag of not terribly well designed
features, and the features that do seem well designed seem like they
ought to be more generic than just for int4 arrays.  So to me it feels
like proof-of-concept experimentation rather than a production-grade
thing that we could feel good about moving into core.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN pageinspect functions

2014-11-04 Thread Amit Kapila
On Tue, Oct 7, 2014 at 10:33 PM, Heikki Linnakangas 
wrote:
>
> Some time ago, when debugging a GIN bug, I wrote these pageinspect
functions to inspect GIN indexes. They were very useful; we should add them.
>

I think these functions will be quite useful for debugging purpose
and we already have similar function's for other index (btree).

Few suggestions for patch:

1. Documentation seems to be missing, other API's exposed
via pageinspect are documented at:
http://www.postgresql.org/docs/devel/static/pageinspect.html

2.
+CREATE FUNCTION gin_metapage(IN page bytea,
+OUT pending_head bigint,
+OUT pending_tail bigint,
+
OUT tail_free_size int4,
+OUT n_pending_pages bigint,
+OUT n_pending_tuples bigint,
+OUT
n_total_pages bigint,
+OUT n_entry_pages bigint,
+OUT n_data_pages bigint,
+OUT n_entries bigint,
+
 OUT version int4)
+AS 'MODULE_PATHNAME', 'gin_metapage'
+LANGUAGE C STRICT;

a. Isn't it better to name the function as gin_metap(..) similar to
existing function bt_metap(..)?
b. Can this function have a similar signature as bt_metap() which means
it should take input as relname?

3. Can gin_dataleafpage() API have similar name and signature as
API bt_page_items() exposed for btree?

4. Can we have any better name for gin_pageopaq (other API name's
in this module are self explanatory)?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-11-04 Thread Peter Eisentraut
On 10/6/14 10:24 PM, Ali Akbar wrote:
> While reviewing the patch myself, i spotted some formatting problems in
> the code. Fixed in this v5 patch.
> 
> Also, this patch uses context patch format (in first versions, because
> of the small modification, context patch format obfucates the changes.
> After reimplementation this isn't the case anymore)

I think the problem this patch is addressing is real, and your approach
is sound, but I'd ask you to go back to the xmlCopyNode() version, and
try to add a test case for why the second argument = 1 is necessary.  I
don't see any other problems.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-11-04 Thread Peter Eisentraut
On 7/8/14 6:03 AM, Ali Akbar wrote:
> If we put 1 as the parameter, the resulting Node will link to the
> existing children. In this case, the namespace problem for the children
> might be still exists. If any of the children uses different
> namespace(s) than the parent, the namespace definition will not be
> copied correctly.

This analysis might very well be right, but I think we should prove it
with a test case.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-11-04 Thread Peter Eisentraut
On 7/11/14 4:36 AM, Ali Akbar wrote:
> But i found some bug in libxml2's code, because we call xmlCopyNode with
> 1 as the second parameter, internally xmlCopyNode will call
> xmlStaticCopyNode recursively via xmlStaticCopyNodeList. And
> xmlStaticCopyNodeList doesn't check the return of xmlStaticCopyNode
> whether it's NULL. So if someday the recursion returns NULL (maybe
> because of memory exhaustion), it will SEGFAULT.
> 
> Knowing this but in libxml2 code, is this patch is still acceptable?

This problem was fixed in libxml2 2.9.2 (see
https://bugzilla.gnome.org/show_bug.cgi?id=708681).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-11-04 Thread Alexey Vasiliev



Tue, 4 Nov 2014 14:41:56 +0100 от Andres Freund :
> Hi,
> 
> On 2014-11-03 14:04:00 +0300, Alexey Vasiliev wrote:
> > *  What the patch does in a short paragraph: This patch should add option 
> > recovery_timeout, which help to control timeout of restore_command nonzero 
> > status code. Right now default value is 5 seconds. This is useful, if I 
> > using for restore of wal logs some external storage (like AWS S3) and no 
> > matter what the slave database will lag behind the master. The problem, 
> > what for each request to AWS S3 need to pay, what is why for N nodes, which 
> > try to get next wal log each 5 seconds will be bigger price, than for 
> > example each 30 seconds. Before I do this in this way: " if ! 
> > (/usr/local/bin/envdir /etc/wal-e.d/env /usr/local/bin/wal-e wal-fetch "%f" 
> > "%p"); then sleep 60; fi ". But in this case restart/stop database slower.
> 
> Without saying that the feature is unneccessary, wouldn't this better be
> solved by using streaming rep most of the time?

But we don't need streaming rep. Master database no need to know about slaves 
(and no need to add this little overhead to master). Slaves read wal logs from 
S3 and the same S3 wal logs used as backups. 

-- 
Alexey Vasiliev
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-04 Thread Andrew Dunstan


On 11/03/2014 10:11 PM, Tom Lane wrote:

Josh Berkus  writes:

On 11/02/2014 11:41 AM, Tom Lane wrote:

Nothing that I recall at the moment, but there is certainly plenty of
stuff of dubious quality in there.  I'd argue that chkpass, intagg,
intarray, isn, spi, and xml2 are all in worse shape than the money type.

Why are we holding on to xml2 again?

IIRC, there's some xpath-related functionality in there that's not
yet available in core (and needs some redesign before it'd ever get
accepted into core, so there's not a real quick fix to be had).




Yes, xpath_table is badly broken, as Robert Haas documented and I 
expanded on a while back. See for example 
 I 
don't have any time of my own to work on this any time soon. But I know 
that it has users, so just throwing it out would upset some people.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-04 Thread Merlin Moncure
On Tue, Nov 4, 2014 at 8:16 AM, Kevin Grittner  wrote:
> In any event, I'm against removing or re-deprecating the money
> type.  There are some problems with money; there are other problems
> with numeric.  If the docs are failing to make the trade-offs
> clear, we should fix the docs.  And we can always look at improving
> either or both types.  The fact that numeric is 5x to 20x slower
> than money in important applications, combined with the fact that
> it performs far worse than a similar type in another product,
> suggest that numeric could stand a serious optimization pass.

Money should stay.  It is the only fixed point integer based numeric
that comes with the database.  Should a non locale dependent fixed
point type that offers the same performance come along, then you have
a good argument to deprecate.

numeric is definitely a dog which the fundamental problem IMNSHO.  I'm
guessing optimization paths are going to revolve around utilizing
binary integer ops in cases where it's known to be safe to do so.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-04 Thread Andreas Karlsson

On 11/02/2014 06:41 PM, Tom Lane wrote:

I wrote:

Either way, requiring a dump/reload for upgrade is surely a better answer
for users of the type than just summarily screwing them.


BTW, after reflecting a bit more I'm less than convinced that this
datatype is completely useless.  Even if you prefer to store currency
values in numeric columns, casting to or from money provides a way to
accept or emit values in whatever monetary format the LC_MONETARY locale
setting specifies.  That seems like a useful feature, and it's one you
could not easily duplicate using to_char/to_number (not to mention that
those functions aren't without major shortcomings of their own).


I personally find that one of the money type's missfeatures, since the 
currency symbol does not vary just per language/locale but also for 
which currency you are working with. For databases with multiple 
currencies the output format of money is just confusing.


Andreas



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-04 Thread Kevin Grittner
Feng Tian  wrote:

> Performance different between Money and Numeric is *HUGE*.  For
> TPCH Q1, the performance difference is 5x for stock postgres, and
> ~20x for vitesse.
>
> Stock postgres, for my laptop, TPCH 1G, Q1, use money type ~ 9s,
> use Numeric (15, 2) is ~53s.
>
>> test=# do $$ begin perform sum('1.01'::numeric) from 
>> generate_series(1,1000); end; $$;
>
> This may not reflect the difference of the two data type.   One
> aggregate is not where most of the time is spent.  TPCH Q1 has
> many more computing.

That's why I gave the count(*) baseline.  If you subtract that from
the sum() timings for money and numeric, numeric takes 5x as long
as money in my quick-and-dirty benchmark. I have no problem
believing that some applications can see that level of performance
hit from using numeric instead of money.  It's a little surprising
to me to see more than a factor of five slowdown from using numeric
instead of money; it leaves me a little curious how that happens.
Perhaps that workload is more conducive to keeping those money
amounts in registers and doing register arithmetic.

In any event, I'm against removing or re-deprecating the money
type.  There are some problems with money; there are other problems
with numeric.  If the docs are failing to make the trade-offs
clear, we should fix the docs.  And we can always look at improving
either or both types.  The fact that numeric is 5x to 20x slower
than money in important applications, combined with the fact that
it performs far worse than a similar type in another product,
suggest that numeric could stand a serious optimization pass.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-04 Thread k...@rice.edu
On Tue, Nov 04, 2014 at 11:44:22AM +0900, Michael Paquier wrote:
> On Sun, Nov 2, 2014 at 2:30 AM, Tom Lane  wrote:
> 
> > In the case of hash indexes, because we still have to have the hash
> > opclasses in core, there's no way that it could be pushed out as an
> > extension module even if we otherwise had full support for AMs as
> > extensions.  So what I hear you proposing is "let's break this so
> > thoroughly that it *can't* be fixed".  I'm not on board with that.
> > I think the WARNING will do just fine to discourage novices who are
> > not familiar with the state of the hash AM.  In the meantime, we
> > could push forward with the idea of making hash indexes automatically
> > unlogged, so that recovering from a crash wouldn't be quite so messy/
> > dangerous.
> >
> There is as well another way: finally support WAL-logging for hash indexes.

+1

Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-11-04 Thread Andres Freund
Hi,

On 2014-11-03 14:04:00 +0300, Alexey Vasiliev wrote:
> *  What the patch does in a short paragraph: This patch should add option 
> recovery_timeout, which help to control timeout of restore_command nonzero 
> status code. Right now default value is 5 seconds. This is useful, if I using 
> for restore of wal logs some external storage (like AWS S3) and no matter 
> what the slave database will lag behind the master. The problem, what for 
> each request to AWS S3 need to pay, what is why for N nodes, which try to get 
> next wal log each 5 seconds will be bigger price, than for example each 30 
> seconds. Before I do this in this way: " if ! (/usr/local/bin/envdir 
> /etc/wal-e.d/env /usr/local/bin/wal-e wal-fetch "%f" "%p"); then sleep 60; fi 
> ". But in this case restart/stop database slower.

Without saying that the feature is unneccessary, wouldn't this better be
solved by using streaming rep most of the time?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What exactly is our CRC algorithm?

2014-11-04 Thread Andres Freund
On 2014-11-04 08:21:13 -0500, Robert Haas wrote:
> On Tue, Nov 4, 2014 at 4:47 AM, Heikki Linnakangas
>  wrote:
> > I hear none, so committed with some small fixes.
> 
> Are you going to get the slice-by-N stuff working next, to speed this up?

I don't plan to do anything serious with it, but I've hacked up the crc
code to use the hardware instruction. The results are quite good - crc
vanishes entirely from the profile for most workloads. It's still
visible for bulk copy, but that's pretty much it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-04 Thread Andres Freund
On 2014-11-04 10:01:00 -0300, Alvaro Herrera wrote:
> Michael Paquier wrote:
> 
> > I'm still on a -1 for that. You mentioned that there is perhaps no reason
> > to delay a decision on this matter, but IMO there is no reason to rush
> > either in doing something we may regret. And I am not the only one on this
> > thread expressing concern about this extra data thingy.
> > 
> > If this extra data field is going to be used to identify from which node a
> > commit comes from, then it is another feature than what is written on the
> > subject of this thread. In this case let's discuss it in the thread
> > dedicated to replication identifiers, or come up with an extra patch once
> > the feature for commit timestamps is done.
> 
> Introducing the extra data field in a later patch would mean an on-disk
> representation change, i.e. pg_upgrade trouble.

It's also simply not necessarily related to replication
identifiers. This is useful whether replication identifiers make it in
or not. It allows to implement something like replication identifiers
outside of core (albeit with a hefty overhead in OLTP workloads).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What exactly is our CRC algorithm?

2014-11-04 Thread Robert Haas
On Tue, Nov 4, 2014 at 4:47 AM, Heikki Linnakangas
 wrote:
> I hear none, so committed with some small fixes.

Are you going to get the slice-by-N stuff working next, to speed this up?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-11-04 Thread Robert Haas
On Mon, Nov 3, 2014 at 4:44 PM, Peter Eisentraut  wrote:
> On 10/29/14 8:42 AM, Robert Haas wrote:
>> I'm sympathetic to that line of reasoning, but I really think that if
>> you want to keep this infrastructure, it needs to be made portable.
>
> Let me clarify that this was my intention.  I have looked at many test
> frameworks, many of which are much nicer than what we have, but the
> portability and dependency implications for this project would have been
> between shocking and outrageous.  I settled for what I felt was the
> absolute minimum: Perl + IPC::Run.  It was only later on that I learned
> that 1) subtests don't work in Perl 5.10, and 2) subtests are broken in
> Perl 5.12.  So we removed the use of subtests and now we are back at the
> baseline I started with.

Thanks.  At last check, which I think was approximately last week, the
tests were running and passing on my machine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-04 Thread Alvaro Herrera
Michael Paquier wrote:

> I'm still on a -1 for that. You mentioned that there is perhaps no reason
> to delay a decision on this matter, but IMO there is no reason to rush
> either in doing something we may regret. And I am not the only one on this
> thread expressing concern about this extra data thingy.
> 
> If this extra data field is going to be used to identify from which node a
> commit comes from, then it is another feature than what is written on the
> subject of this thread. In this case let's discuss it in the thread
> dedicated to replication identifiers, or come up with an extra patch once
> the feature for commit timestamps is done.

Introducing the extra data field in a later patch would mean an on-disk
representation change, i.e. pg_upgrade trouble.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-11-04 Thread Heikki Linnakangas

On 10/13/2014 01:01 PM, Petr Jelinek wrote:

Hi,

I rewrote the patch with different API along the lines of what was
discussed.


Thanks, that's better.

It would be good to see an alternative seqam to implement this API, to 
see how it really works. The "local" one is too dummy to expose any 
possible issues.



The API now consists of following functions:
sequence_alloc - allocating range of new values
The function receives the sequence relation, current value, number of
requested values amdata and relevant sequence options like min/max and
returns new amdata, new current value, number of values allocated and
also if it needs wal write (that should be returned if amdata has
changed plus other reasons the AM might have to force the wal update).

sequence_setval - notification that setval is happening
This function gets sequence relation, previous value and new value plus
the amdata and returns amdata (I can imagine some complex sequence AMs
will want to throw error that setval can't be done on them).

sequence_request_update/sequence_update - used for background processing
Basically AM can call the sequence_request_update and backend will then
call the sequence_update method of an AM with current amdata and will
write the updated amdata to disk

sequence_seqparams - function to process/validate the standard sequence
options like start position, min/max, increment by etc by the AM, it's
called in addition to the standard processing

sequence_reloptions - this is the only thing that remained unchanged
from previous patch, it's meant to pass custom options to the AM

Only the alloc and reloptions methods are required (and implemented by
the local AM).

The caching, xlog writing, updating the page, etc is handled by backend,
the AM does not see the tuple at all. I decided to not pass even the
struct around and just pass the relevant options because I think if we
want to abstract the storage properly then the AM should not care about
how the pg_sequence looks like at all, even if it means that the
sequence_alloc parameter list is bit long.


Hmm. The division of labour between the seqam and commands/sequence.c 
still feels a bit funny. sequence.c keeps track of how many values have 
been WAL-logged, and thus usable immediately, but we still call 
sequence_alloc even when using up those already WAL-logged values. If 
you think of using this for something like a centralized sequence server 
in a replication cluster, you certainly don't want to make a call to the 
remote server for every value - you'll want to cache them.


With the "local" seqam, there are two levels of caching. Each backend 
caches some values (per the CACHE  option in CREATE SEQUENCE). In 
addition to that, the server WAL-logs 32 values at a time. If you have a 
remote seqam, it would most likely add a third cache, but it would 
interact in strange ways with the second cache.


Considering a non-local seqam, the locking is also a bit strange. The 
server keeps the sequence page locked throughout nextval(). But if the 
actual state of the sequence is maintained elsewhere, there's no need to 
serialize the calls to the remote allocator, i.e. the sequence_alloc() 
calls.


I'm not exactly sure what to do about that. One option is to completely 
move the maintenance of the "current" value, i.e. sequence.last_value, 
to the seqam. That makes sense from an abstraction point of view. For 
example with a remote server managing the sequence, storing the 
"current" value in the local catalog table makes no sense as it's always 
going to be out-of-date. The local seqam would store it as part of the 
am-private data. However, you would need to move the responsibility of 
locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to 
provide an API that the seqam can call to do that. Perhaps just let the 
seqam call heap_inplace_update on the sequence relation.



For the amdata handling (which is the AM's private data variable) the
API assumes that (Datum) 0 is NULL, this seems to work well for
reloptions so should work here also and it simplifies things a little
compared to passing pointers to pointers around and making sure
everything is allocated, etc.

Sadly the fact that amdata is not fixed size and can be NULL made the
page updates of the sequence relation quite more complex that it used to
be.


It would be nice if the seqam could define exactly the columns it needs, 
with any datatypes. There would be a set of common attributes: 
sequence_name, start_value, cache_value, increment_by, max_value, 
min_value, is_cycled. The local seqam would add "last_value", "log_cnt" 
and "is_called" to that. A remote seqam that calls out to some other 
server might store the remote server's hostname etc.


There could be a seqam function that returns a TupleDesc with the 
required columns, for example.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailp

[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-11-04 Thread Alexey Vasiliev



Mon, 3 Nov 2014 22:55:02 -0200 от Fabrízio de Royes Mello 
:
> 
> 
> 
> On Mon, Nov 3, 2014 at 7:25 PM, Alexey Vasiliev < leopard...@inbox.ru > wrote:
> >
> >
> >
> >
> > Mon, 3 Nov 2014 19:18:51 -0200 от Fabrízio de Royes Mello < 
> > fabriziome...@gmail.com >:
> > >
> > > >
> > > > Should I change my patch and send it by email? And also as I understand 
> > > > I should change message ID for   
> > > > https://commitfest.postgresql.org/action/patch_view?id=1636 , isn't it?
> > > >
> > >
> > > You should just send another version of your patch by email and add a new 
> > > "comment" to commit-fest using the "Patch" comment type.
> >
> >
> > Added new patch.
> >
> 
> And you should add the patch to an open commit-fest not to an in-progress. 
> The next opened is 2014-12 [1].
> 

Moved. Thanks.

> 
> --
> Fabrízio de Royes Mello
> Consultoria/Coaching PostgreSQL
> >> Timbira:  http://www.timbira.com.br
> >> Blog:  http://fabriziomello.github.io
> >> Linkedin:  http://br.linkedin.com/in/fabriziomello
> >> Twitter:  http://twitter.com/fabriziomello
> >> Github:  http://github.com/fabriziomello

-- 
Alexey Vasiliev

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-11-04 Thread Jeff Janes
On Mon, Nov 3, 2014 at 2:18 PM, Alvaro Herrera 
wrote:

>
> So here's v21.  I also attach a partial diff from v20, just in case
> anyone wants to give it a look.
>

This needs a bump to 1.3, or the extension won't install:

contrib/pageinspect/pageinspect.control


During crash recovery, I am getting a segfault:

#0  0x0067ee35 in LWLockRelease (lock=0x0) at lwlock.c:1161
#1  0x00664f4a in UnlockReleaseBuffer (buffer=0) at bufmgr.c:2888
#2  0x00465a88 in brin_xlog_revmap_extend (lsn=, record=) at brin_xlog.c:261
#3  brin_redo (lsn=, record=) at
brin_xlog.c:284
#4  0x004ce505 in StartupXLOG () at xlog.c:6795

I failed to preserve the data directory, I'll try to repeat this later this
week if needed.

Cheers,

Jeff


Re: [HACKERS] What exactly is our CRC algorithm?

2014-11-04 Thread Heikki Linnakangas

On 10/27/2014 06:02 PM, Heikki Linnakangas wrote:

I came up with the attached patches. They do three things:

1. Get rid of the 64-bit CRC code. It's not used for anything, and
haven't been for years, so it doesn't seem worth spending any effort to
fix them.

2. Switch to CRC-32C (Castagnoli) for WAL and other places that don't
need to remain compatible across major versions.

3. Use the same lookup table for hstore and ltree, as used for the
legacy "almost CRC-32" algorithm. The tables are identical, so might as
well.

Any objections?


I hear none, so committed with some small fixes.
- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-04 Thread Andres Freund
On 2014-11-04 17:29:04 +0900, Michael Paquier wrote:
> On Tue, Nov 4, 2014 at 5:23 PM, Andres Freund 
> wrote:
> 
> > On 2014-11-04 17:19:18 +0900, Michael Paquier wrote:
> > > 5) Reading the code, TransactionTreeSetCommitTimestamp is always called
> > > with do_xlog = false, making that actually no timestamps are WAL'd...
> > Hence
> > > WriteSetTimestampXlogRec is just dead code with the latest version of the
> > > patch. IMO, this should be always WAL-logged when track_commit_timestamp
> > is
> > > on.
> >
> > It's callable via a 'extern' function. So, I'd not consider it dead. And
> > the WAL logging is provided by xact.c's own WAL logging - it always does
> > the corresponding committs calls.
> >
> The code path is unused.

No. It is not. It can be called by extensions?

> We'd better make the XLOG record mandatory if
> tracking is enabled, as this information is useful on standbys as well.

Did you read what I wrote? To quote "And the WAL logging is provided by
xact.c's own WAL logging - it always does the corresponding committs
calls.".

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-11-04 Thread Simon Riggs
On 3 November 2014 22:18, Alvaro Herrera  wrote:

> So here's v21.  I also attach a partial diff from v20, just in case
> anyone wants to give it a look.

Looks really good.

I'd like to reword this sentence in the readme, since one of the main
use cases would be tables without btrees
   It's unlikely that BRIN would be the only
+ indexes in a table, though, because primary keys can be btrees only, and so
+ we don't implement this optimization.

I don't see a regression test. Create, use, VACUUM, just so we know it
hasn't regressed after commit.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-04 Thread Albe Laurenz
David Fetter wrote:
> On Tue, Nov 04, 2014 at 07:51:06AM +0900, Tatsuo Ishii wrote:
>> Just out of curiosity, why is Oracle's NUMBER (I assume you are
>> talking about this) so fast?
> 
> I suspect that what happens is that NUMBER is stored as a native type
> (int2, int4, int8, int16) that depends on its size and then cast to
> the next upward thing as needed, taking any performance hits at that
> point.  The documentation hints (38 decimal places) at a 128-bit
> internal representation as the maximum.  I don't know what happens
> when you get past what 128 bits can represent.

No, Oracle stores NUMBERs as variable length field (up to 22 bytes),
where the first byte encodes the sign and the comma position and the
remaining bytes encode the digits, each byte representing two digits
in base-100 notation (see Oracle Metalink note 1007641.6).

So it's not so different from PostgreSQL.
No idea why their arithmetic should be faster.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-04 Thread Michael Paquier
On Tue, Nov 4, 2014 at 5:23 PM, Andres Freund 
wrote:

> On 2014-11-04 17:19:18 +0900, Michael Paquier wrote:
> > 5) Reading the code, TransactionTreeSetCommitTimestamp is always called
> > with do_xlog = false, making that actually no timestamps are WAL'd...
> Hence
> > WriteSetTimestampXlogRec is just dead code with the latest version of the
> > patch. IMO, this should be always WAL-logged when track_commit_timestamp
> is
> > on.
>
> It's callable via a 'extern' function. So, I'd not consider it dead. And
> the WAL logging is provided by xact.c's own WAL logging - it always does
> the corresponding committs calls.
>
The code path is unused. We'd better make the XLOG record mandatory if
tracking is enabled, as this information is useful on standbys as well.


> > 7) pg_xlogdump has no support for RM_COMMITTS_ID, something that would be
> > useful for developers.
>
> What do you mean by that? There's the corresponding rmgrdesc.c support I
> think?
>
Oops sorry. I thought there was some big switch in pg_xlogdump when writing
this comment. Yeah that's fine.
-- 
Michael


Re: [HACKERS] tracking commit timestamps

2014-11-04 Thread Michael Paquier
On Tue, Nov 4, 2014 at 5:05 PM, Andres Freund 
wrote:

> On 2014-11-02 19:27:25 +0100, Petr Jelinek wrote:
> > Well, Michael has point that the extradata is pretty much useless
> currently,
> > perhaps it would help to add the interface to set extradata?
>
> Only accessible via C and useless aren't the same thing. But sure, add
> it.
>
I'm still on a -1 for that. You mentioned that there is perhaps no reason
to delay a decision on this matter, but IMO there is no reason to rush
either in doing something we may regret. And I am not the only one on this
thread expressing concern about this extra data thingy.

If this extra data field is going to be used to identify from which node a
commit comes from, then it is another feature than what is written on the
subject of this thread. In this case let's discuss it in the thread
dedicated to replication identifiers, or come up with an extra patch once
the feature for commit timestamps is done.
-- 
Michael


Re: [HACKERS] tracking commit timestamps

2014-11-04 Thread Andres Freund
On 2014-11-04 17:19:18 +0900, Michael Paquier wrote:
> 5) Reading the code, TransactionTreeSetCommitTimestamp is always called
> with do_xlog = false, making that actually no timestamps are WAL'd... Hence
> WriteSetTimestampXlogRec is just dead code with the latest version of the
> patch. IMO, this should be always WAL-logged when track_commit_timestamp is
> on.

It's callable via a 'extern' function. So, I'd not consider it dead. And
the WAL logging is provided by xact.c's own WAL logging - it always does
the corresponding committs calls.

> 6) Shouldn't any value update of track_commit_timestamp be tracked in
> XLogReportParameters? That's thinking about making the commit timestamp
> available on standbys as well..

Yes, it should.

> 7) pg_xlogdump has no support for RM_COMMITTS_ID, something that would be
> useful for developers.

What do you mean by that? There's the corresponding rmgrdesc.c support I
think?

> 8) The redo and xlog routines of committs should be out in a dedicated
> file, like committsxlog.c or similar. The other rmgr do so, so let's be
> consistent.

Seems pointless to me. The file isn't that big and the other SLRUs don't
do it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-04 Thread Michael Paquier
On Sat, Nov 1, 2014 at 10:00 PM, Michael Paquier 
wrote:

> More comments:
>

I have done a couple of tests on my laptop with pgbench like that to
generate a maximum of transaction commits:
$ pgbench --no-vacuum -f ~/Desktop/commit.sql -T 60 -c 24 -j 24
$ cat ~/Desktop/commit.sql
SELECT txid_current()
Here is an average of 5 runs:
- master: 49842.44
- GUC off, patched = 49688.71
- GUC on, patched = 49459.73
So there is little noise.

Here are also more comments about the code that I found while focusing on
committs.c:
1) When the GUC is not enabled, and when the transaction ID provided is not
in a correct range, a dummy value based on InvalidTransactionId (?!) is
returned, like that:
+   /* Return empty if module not enabled */
+   if (!commit_ts_enabled)
+   {
+   if (ts)
+   *ts = InvalidTransactionId;
+   if (data)
+   *data = (CommitExtraData) 0;
+   return;
+   }
This leads to some incorrect results:
=# select pg_get_transaction_committime('1');
 pg_get_transaction_committime
---
 2000-01-01 09:00:00+09
(1 row)
Or for example:
=# SELECT txid_current();
 txid_current
--
 1006
(1 row)
=# select pg_get_transaction_committime('1006');
 pg_get_transaction_committime
---
 2014-11-04 14:54:37.589381+09
(1 row)
=# select pg_get_transaction_committime('1007');
 pg_get_transaction_committime
---
 2000-01-01 09:00:00+09
(1 row)
=# SELECT txid_current();
 txid_current
--
 1007
(1 row)
And at other times error is not that helpful for user:
=# select pg_get_transaction_committime('1');
ERROR:  XX000: could not access status of transaction 1
DETAIL:  Could not read from file "pg_committs/" at offset 114688:
Undefined error: 0.
LOCATION:  SlruReportIOError, slru.c:896
I think that you should simply return an error in
TransactionIdGetCommitTsData when xid is not in the range supported. That
will be more transparent for the user.
2) You may as well want to return a different error if the GUC
track_commit_timestamps is disabled.
3) This comment should be updated in committs.c, we are not talking about
CLOG here:
+/*
+ * Link to shared-memory data structures for CLOG control
+ */
4) Similarly, I think more comments should be put in here. It is OK to
truncate the commit timestamp stuff similarly to CLOG to have a consistent
status context available, but let's explain it.
 * multixacts; that will be done by the next checkpoint.
 */
TruncateCLOG(frozenXID);
+   TruncateCommitTs(frozenXID)
5) Reading the code, TransactionTreeSetCommitTimestamp is always called
with do_xlog = false, making that actually no timestamps are WAL'd... Hence
WriteSetTimestampXlogRec is just dead code with the latest version of the
patch. IMO, this should be always WAL-logged when track_commit_timestamp is
on.
6) Shouldn't any value update of track_commit_timestamp be tracked in
XLogReportParameters? That's thinking about making the commit timestamp
available on standbys as well..
7) pg_xlogdump has no support for RM_COMMITTS_ID, something that would be
useful for developers.
8) The redo and xlog routines of committs should be out in a dedicated
file, like committsxlog.c or similar. The other rmgr do so, so let's be
consistent.

Regards,
-- 
Michael


Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start

2014-11-04 Thread Craig Ringer
On 11/04/2014 07:31 AM, Álvaro Hernández Tortosa wrote:
> Thank you for your comment, Tom. However I think this behavior, as
> seen from a user perspective, it's not the expected one.

That may be the case, but I think it's the SQL-standard behaviour, so we
can't really mess with it.

The spec requires SET TRANSACTION ISOLATION, and you can't implement
that if you take a snapshot at BEGIN.

> If it is still the intended behavior, I think it should be clearly
> documented as such, and a recommendation similar to "issue a 'SELECT 1'
> right after BEGIN to freeze the data before any own query" or similar
> comment should be added. Again, as I said in my email, the documentation
> clearly says that "only sees data committed before the transaction
> began". And this is clearly not the real behavior.

It's more of a difference in when the transaction "begins".

Arguably, "BEGIN" says "I intend to begin a new transaction with the
next query" rather than "immediately begin executing a new transaction".

This concept could be clearer in the docs.

> Sure, there are, that was the link I pointed out, but I found no
> explicit mention to the fact that I'm raising here.

I'm sure it's documented *somewhere*, in that I remember reading about
this detail in the docs, but I can't find _where_ in the docs.

It doesn't seem to be in:

http://www.postgresql.org/docs/current/static/transaction-iso.html

where I'd expect.

In any case, we simply cannot take the snapshot at BEGIN time, because
it's permitted to:

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

in a DB that has default serializable isolation or has a SET SESSION
CHARACTERISTICS isolation mode of serializable. Note that SET
TRANSACTION is SQL-standard.

AFAIK deferring the snapshot that's consistent with other RDBMSes that
use snapshots, too.


The docs of that command allude to, but doesn't explicitly state, the
behaviour you mention.

http://www.postgresql.org/docs/current/static/sql-set-transaction.html


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-04 Thread Andres Freund
On 2014-11-02 19:27:25 +0100, Petr Jelinek wrote:
> Well, Michael has point that the extradata is pretty much useless currently,
> perhaps it would help to add the interface to set extradata?

Only accessible via C and useless aren't the same thing. But sure, add
it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers