date:20160224

Re: [HACKERS] Identifying a message in emit_log_hook.

2016-02-24 Thread Kyotaro HORIGUCHI

Hello,

At Wed, 17 Feb 2016 09:13:01 +, Simon Riggs  wrote 
in 
> On 17 February 2016 at 08:34, Kyotaro HORIGUCHI <
> horiguchi.kyot...@lab.ntt.co.jp> wrote:
> 
> >
> > > I'm guessing this would require making the pre-translated error text
> > > available to plugins as well as translated form.
> >
> > If I understand you correctly, edata->messgage_id in my patch is
> > just what offers the pre-translated error text to plugins.
> 
> 
> OK, now I understand the patch, I am happy to apply it.

Thank you very much. I have one concern about this patch.

I have added an id only for .message in the patch but it
theoretically can apply to all other message typs eventually
given to gettext() in two macros EVALUATE_MESSAGE(_PLURAL). They
are detail, detail_log, hint and context and the modification
could be limited within the two macros by doing so but extra four
*_id members are to be added to ErrorData. I doubt it is useful
for the extra members.

If you agree with this, it doesn't seem to me to need more
modification. Is there anything else to do?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prepared Statement support for Parallel query

2016-02-24 Thread Robert Haas

On Thu, Feb 25, 2016 at 8:53 AM, Amit Kapila  wrote:
>>   But if the user says
>> they want to PREPARE the query, they are probably not going to fetch
>> all rows.
>
> After PREPARE, user will execute the statement using EXECUTE and
> I don't see how user can decide number of rows to fetch which can
> influence the execution.  Can you please elaborate your point more
> and what is your expectation for the same?

Argh.  I'm getting confused between prepared statements and cursors.
So if the user does PREPARE followed by EXECUTE, then that is OK.  The
problem is only if they use DECLARE .. CURSOR FOR, which your patch
doesn't affect.

So, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] POC: Cache data in GetSnapshotData()

2016-02-24 Thread Mithun Cy

On Sat, Jan 16, 2016 at 10:23 AM, Amit Kapila 
wrote:


> >On Fri, Jan 15, 2016 at 11:23 AM, Mithun Cy 
> wrote:
>
>> On Mon, Jan 4, 2016 at 2:28 PM, Andres Freund  wrote:
>>
>> >> I think at the very least the cache should be protected by a separate
>> >> lock, and that lock should be acquired with TryLock. I.e. the cache is
>> >> updated opportunistically. I'd go for an lwlock in the first iteration.
>>
>
> >I also think this observation of yours is right and currently that is
> >okay because we always first check TransactionIdIsCurrentTransactionId().
>
> >+ const uint32 snapshot_cached= 0;
>

I have fixed all of the issues reported by regress test. Also now when
backend try to cache the snapshot we also try to store the self-xid and sub
xid, so other backends can use them.

I also did some read-only perf tests.

Non-Default Settings.

scale_factor=300.

./postgres -c shared_buffers=16GB -N 200 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c
checkpoint_completion_target=0.9

./pgbench -c $clients -j $clients -T 300 -M prepared -S  postgres

Machine Detail:
cpu   : POWER8
cores: 24 (192 with HT).

ClientsBase With cached snapshot
1   19653.91440919926.884664
16 190764.519336190040.299297
32 339327.881272354467.445076
48 462632.02493464767.917813
64 522642.515148533271.556703
80 515262.813189513353.962521

But did not see any perf improvement. Will continue testing the same.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 5cb28cf..a441ca0 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1561,6 +1561,7 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 			elog(ERROR, "cache lookup failed for relation %u", OIDOldHeap);
 		relform = (Form_pg_class) GETSTRUCT(reltup);
 
+		Assert(TransactionIdIsNormal(frozenXid));
 		relform->relfrozenxid = frozenXid;
 		relform->relminmxid = cutoffMulti;
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 97e8962..8db028f 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -372,11 +372,19 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
 	(arrayP->numProcs - index - 1) * sizeof(int));
 			arrayP->pgprocnos[arrayP->numProcs - 1] = -1;		/* for debugging */
 			arrayP->numProcs--;
+
+			pg_atomic_write_u32(&ProcGlobal->snapshot_cached, 0);
+
+			ProcGlobal->cached_snapshot_valid = false;
 			LWLockRelease(ProcArrayLock);
 			return;
 		}
 	}
 
+	pg_atomic_write_u32(&ProcGlobal->snapshot_cached, 0);
+
+	ProcGlobal->cached_snapshot_valid = false;
+
 	/* Ooops */
 	LWLockRelease(ProcArrayLock);
 
@@ -420,6 +428,8 @@ ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
 		if (LWLockConditionalAcquire(ProcArrayLock, LW_EXCLUSIVE))
 		{
 			ProcArrayEndTransactionInternal(proc, pgxact, latestXid);
+			pg_atomic_write_u32(&ProcGlobal->snapshot_cached, 0);
+			ProcGlobal->cached_snapshot_valid = false;
 			LWLockRelease(ProcArrayLock);
 		}
 		else
@@ -568,6 +578,9 @@ ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
 		nextidx = pg_atomic_read_u32(&proc->procArrayGroupNext);
 	}
 
+	pg_atomic_write_u32(&ProcGlobal->snapshot_cached, 0);
+	ProcGlobal->cached_snapshot_valid = false;
+
 	/* We're done with the lock now. */
 	LWLockRelease(ProcArrayLock);
 
@@ -1553,6 +1566,8 @@ GetSnapshotData(Snapshot snapshot)
 	 errmsg("out of memory")));
 	}
 
+	snapshot->takenDuringRecovery = RecoveryInProgress();
+
 	/*
 	 * It is sufficient to get shared lock on ProcArrayLock, even if we are
 	 * going to set MyPgXact->xmin.
@@ -1567,12 +1582,39 @@ GetSnapshotData(Snapshot snapshot)
 	/* initialize xmin calculation with xmax */
 	globalxmin = xmin = xmax;
 
-	snapshot->takenDuringRecovery = RecoveryInProgress();
+	if (!snapshot->takenDuringRecovery && ProcGlobal->cached_snapshot_valid)
+	{
+		Snapshot csnap = &ProcGlobal->cached_snapshot;
+		TransactionId *saved_xip;
+		TransactionId *saved_subxip;
 
-	if (!snapshot->takenDuringRecovery)
+		saved_xip = snapshot->xip;
+		saved_subxip = snapshot->subxip;
+
+		memcpy(snapshot, csnap, sizeof(SnapshotData));
+
+		snapshot->xip = saved_xip;
+		snapshot->subxip = saved_subxip;
+
+		memcpy(snapshot->xip, csnap->xip,
+			   sizeof(TransactionId) * csnap->xcnt);
+		memcpy(snapshot->subxip, csnap->subxip,
+			   sizeof(TransactionId) * csnap->subxcnt);
+		globalxmin = ProcGlobal->cached_snapshot_globalxmin;
+		xmin = csnap->xmin;
+
+		count = csnap->xcnt;
+		subcount = csnap->subxcnt;
+		suboverflowed = csnap->suboverflowed;
+
+		Assert(TransactionIdIsValid(globalxmin));
+		Assert(TransactionIdIsValid(xmin));
+	}
+	else if (!snapshot->takenDuringRecovery)
 	{
 		int		   *pgprocnos = arrayP->pgprocnos;
 		in

[HACKERS] Performance degradation in commit 6150a1b0

2016-02-24 Thread Amit Kapila

>From past few weeks, we were facing some performance degradation in the
read-only performance bench marks in high-end machines.  My colleague
Mithun, has tried by reverting commit ac1d794 which seems to degrade the
performance in HEAD on high-end m/c's as reported previously[1], but still
we were getting degradation, then we have done some profiling to see what
has caused it  and we found that it's mainly caused by spin lock when
called via pin/unpin buffer and then we tried by reverting commit 6150a1b0
which has recently changed the structures in that area and it turns out
that reverting that patch, we don't see any degradation in performance.
The important point to note is that the performance degradation doesn't
occur every time, but if the tests are repeated twice or thrice, it
is easily visible.

m/c details
IBM POWER-8
24 cores,192 hardware threads
RAM - 492GB

Non-default postgresql.conf settings-
shared_buffers=16GB
max_connections=200
min_wal_size=15GB
max_wal_size=20GB
checkpoint_timeout=900
maintenance_work_mem=1GB
checkpoint_completion_target=0.9

scale_factor - 300

Performance at commit 43cd468cf01007f39312af05c4c92ceb6de8afd8 is 469002 at
64-client count and then at 6150a1b08a9fe7ead2b25240be46dddeae9d98e1, it
went down to 200807.  This performance numbers are median of 3 15-min
pgbench read-only tests.  The similar data is seen even when we revert the
patch on latest commit.  We have yet to perform detail analysis as to why
the commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1 lead to degradation,
but any ideas are welcome.


[1] -
http://www.postgresql.org/message-id/CAB-SwXZh44_2ybvS5Z67p_CDz=XFn4hNAD=cnmef+qqkxwf...@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-02-24 Thread Robert Haas

On Thu, Feb 25, 2016 at 10:31 AM, Amit Kapila  wrote:
>> There's no requirement that every session have every tranche
>> registered.  I think we should consider displaying "extension" for any
>> tranche that's not built-in, or at least for tranches that are not
>> registered (rather than "unknown wait event").
>
> I think it is better to display as an "extension" for unregistered tranches,
> but do you see any case where we will have wait event information
> for any unregistered tranche?

Sure.  If backend A has the tranche registered - because it loaded
some .so which registered it in PG_init() - and is waiting on the
lock, and backend B does not have the tranche registered - because it
didn't do that - and backend B selects from pg_stat_activity, then
you'll see a wait in an unregistered tranche.

>> The changes to LockBufferForCleanup() don't look right to me.  Waiting
>> for a buffer pin might be a reasonable thing to define as a wait
>> event, but it shouldn't reported as if we were waiting on the LWLock
>> itself.
>
> makes sense, how about having a new wait class, something like
> WAIT_BUFFER and then have wait event type as Buffer and
> wait event as BufferPin.  At this moment, I think there will be
> only one event in this class, but it seems to me waiting on buffer
> has merit to be considered as a separate class.

I would just go with BufferPin/BufferPin for now.  I can't think what
else I'd want to group with BufferPins in the same class.

>> What happens if an error is thrown while we're in a wait?
>
> For LWLocks, only FATAL error is possible which will anyway lead
> to initialization of all backend states.  For lock.c, if an error is
> thrown, then state is reset in Catch block. In
> LockBufferForCleanup(), after we set the wait event and before we
> reset it, there is only a chance of FATAL error, if any system call
> fails.  We do have one error in enable_timeouts which is called from
> ResolveRecoveryConflictWithBufferPin(), but that doesn't seem to
> be possible.  Now one question to answer is that what if tomorrow
> some one adds new error after we set the wait state, so may be
> it is better to clear wait event in AbortTransaction()?

Yeah, I think so.  Probably everywhere that we do LWLockReleaseAll()
we should also clear the wait event.

>> Does this patch hurt performance?
>
> This patch adds additional code in the path where we are going
> to sleep/wait, but we have changed some shared memory structure, so
> it is good to verify performance tests.  I am planning to run read-only
> and read-write pgbench tests (when data fits in shared), is that
> sufficient or do you expect anything more?

I'm open to ideas from others on tests that would be good to run, but
I don't have any great ideas myself right now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH v5] GSSAPI encryption support

2016-02-24 Thread Michael Paquier

On Wed, Feb 24, 2016 at 7:12 PM, Robbie Harwood  wrote:
> David Steele  writes:
>
>> On 2/15/16 12:45 PM, Robbie Harwood wrote:
>>> David Steele  writes:
>>>
 1) It didn't apply cleanly to HEAD.  It did apply cleanly on a455878
 which I figured was recent enough for testing.  I didn't bisect to find
 the exact commit that broke it.
>>>
>>> It applied to head of master (57c932475504d63d8f8a68fc6925d7decabc378a)
>>> for me (`patch -p1 < v4-GSSAPI-encryption-support.patch`).  I rebased it
>>> anyway and cut a v5 anyway, just to be sure.  It's attached, and
>>> available on github as well:
>>> https://github.com/frozencemetery/postgres/commit/dc10e3519f0f6c67f79abd157dc8ff1a1c293f53
>>
>> It could have been my mistake.  I'll give it another try when you have a
>> new patch.
>
> Please do let me know how v5 goes.  If you run into trouble, in addition
> to the logs you helpfully provided before, I'd like a traffic dump (pcap
> preferable; I need tcp/udp port 88 for Kerberos and tcp port 5432 or
> whatever you're running postgres on) if possible.  Thanks!
>
 2) While I was able to apply the patch and get it compiled it seemed
 pretty flaky - I was only able to logon about 1 in 10 times on average.
  Here was my testing methodology:
>>>
>>> What I can't tell from looking at your methodology is whether both the
>>> client and server were running my patches or no.  There's no fallback
>>> here (I'd like to talk about how that should work, with example from
>>> v1-v3, if people have ideas).  This means that both the client and the
>>> server need to be running my patches for the moment.  Is this your
>>> setup?
>>
>> I was testing on a system with no version of PostgreSQL installed.  I
>> applied your patch to master and then ran both server and client from
>> that patched version.  Is there something I'm missing?
>
> Not that I can immediately see.  As long as the client and server are
> both patched, everything should work.  My process is the same as with
> previous versions of this patchset [0], and though I'm using FreeIPA
> there is no reason it shouldn't work with any other KDC (MIT, for
> instance[1]) provided the IPA calls are converted.

I used a custom krb5kdc set up manually, and all my connection
attempts are working on HEAD, not with your patch (both client and
server patched).

> I am curious, though - I haven't changed any of the authentication code
> in v4/v5 from what's in ~master, so how often can you log in using
> GSSAPI using master?

My guess is that there is something not been correctly cleaned up when
closing the connection. The first attempt worked for me, not after.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH v5] GSSAPI encryption support

2016-02-24 Thread Michael Paquier

On Tue, Feb 16, 2016 at 2:45 AM, Robbie Harwood  wrote:
> David Steele  writes:
>> On 2/10/16 4:06 PM, Robbie Harwood wrote:
>>> Hello friends,
>>>
>>> For your consideration, here is a new version of GSSAPI encryption
>>> support.  For those who prefer, it's also available on my github:
>>> https://github.com/frozencemetery/postgres/commit/c92275b6605d7929cda5551de47a4c60aab7179e
>>
>> It tried out this patch and ran into a few problems:
>>
>> 1) It didn't apply cleanly to HEAD.  It did apply cleanly on a455878
>> which I figured was recent enough for testing.  I didn't bisect to find
>> the exact commit that broke it.
>
> It applied to head of master (57c932475504d63d8f8a68fc6925d7decabc378a)
> for me (`patch -p1 < v4-GSSAPI-encryption-support.patch`).  I rebased it
> anyway and cut a v5 anyway, just to be sure.  It's attached, and
> available on github as well:
> https://github.com/frozencemetery/postgres/commit/dc10e3519f0f6c67f79abd157dc8ff1a1c293f53

v5 is applying fine for me. There were two diff hunks in routine.sgml
but nothing to worry about.

>> 2) While I was able to apply the patch and get it compiled it seemed
>> pretty flaky - I was only able to logon about 1 in 10 times on average.
>>  Here was my testing methodology:
>>
>> a) Build Postgres from a455878 (without your patch), install/configure
>> Kerberos and get everything working.  I was able the set the auth method
>> to gss in pg_hba.conf and logon successfully every time.
>>
>> b) On the same system rebuild Postgres from a455878 including your patch
>> and attempt authentication.
>>
>> The problems arose after step 2b.  Sometimes I would try to logon twenty
>> times without success and sometimes it only take five or six attempts.
>> I was never able to logon successfully twice in a row.
>>
>> When not successful the client always output this incomplete message
>> (without  terminating LF):
>>
>> psql: expected authentication request from server, but received
>>
>> From the logs I can see the server is reporting EOF from the client,
>> though the client does not core dump and prints the above message before
>> exiting.
>>
>> I have attached files that contain server logs at DEBUG5 and tcpdump
>> output for both the success and failure cases.
>>
>> Please let me know if there's any more information you would like me to
>> provide.
>
> What I can't tell from looking at your methodology is whether both the
> client and server were running my patches or no.  There's no fallback
> here (I'd like to talk about how that should work, with example from
> v1-v3, if people have ideas).  This means that both the client and the
> server need to be running my patches for the moment.  Is this your
> setup?

We need to be careful here, backward-compatibility is critical for
both the client and the server, or to put it in other words, an
uptables client should still be able to connect a patched server, and
vice-versa. This is an area where it is important to not break any
third-part tool, either using libpq or reimplementing the frontend
protocol.

So I finally began to dive into your new patch... And after testing
this is breaking the authentication protocol for GSS. I have been able
to connect to a server once, but at the second attempt and after
connection is failing with the following error:
psql: expected authentication request from server, but received ioltas

Also, something that is missing is the parametrization that has been
discussed the last time this patch was on the mailing list. Where is
the capacity to control if a client is connecting to a server that is
performing encryption and downgrade to non-ecrypted connection should
the server not be able to support it? Enforcing a client to require
encryption support using pg_hba.conf was as well a good thing. Some
more infrastructure is needed here, I thought that there was an
agreement previously regarding that.

Also, and to make the review a bit easier, it would be good to split
the patch into smaller pieces (thanks for waiting on my input here,
this became quite clear after looking at this code). Basically,
pg_GSS_error is moved to a new file, bringing with it pg_GSS_recvauth
because the error routine is being used by the new ones you are
introducing: be_gssapi_write, etc. The split that this patch is doing
is a bit confusing, all the GSS-related stuff is put within one single
file:
- read/write routines
- authentication routine
- constants
- routines for error handling
Mixing read/write routines with the authentication routine looks
wrong, because the read-write routines just need to create a
dependency with for example be-secure.c on the backend. In short,
where before authentication and secure read/writes after
authentication get linked to each other, and create a dependency that
did not exist before.

For the sake of clarity I would suggest the following split:
- be-gss-common.c, where all the constants and the error handling
routine are located.
- Let authrecv in auth.c
- Move only the

Re: [HACKERS] get current log file

2016-02-24 Thread Robert Haas

On Thu, Feb 25, 2016 at 1:15 AM, Euler Taveira  wrote:
> On 02-02-2016 10:22, Armor wrote:
>>  As we known, the name of current log file depends on the number of
>> seconds (for simple, later I will call it last_syslogger_file_time)
>> since Epoch when create new log file. So, for this feature, the key is
>> how syslogger process pass last_syslogger_file_time to backend processes.
>>
> I didn't like the name. Let's call it syslogger_file_name. It describes
> what the variable is (actual file name that syslogger is writing on).
>
>> To pass last_syslogger_file_time, we have 2 solutions: 1, add a
>> global variable to record last_syslogger_file_time which shared by
>> backends and syslogger, so backends can get last_syslogger_file_time
>> very easily; 2 syslogger process send last_syslogger_file_time to pgstat
>> process when last_syslogger_file_time changes, just as other auxiliary
>> processes send stat  message to pgstat process, and  pgstat process will
>> write  last_syslogger_file_time into stat file so that backend can
>> get last_syslogger_file_time via reading this stat file.
>>
> I prefer (1) because (i) logfile name is not statistics and (ii) stats
> collector could not respond in certain circumstances (and even discard
> some messages).

(1) seems like a bad idea, because IIUC, the syslogger process doesn't
currently touch shared memory.  And in fact, shared memory can be
reset after a backend exits abnormally, but the syslogger (alone among
all PostgreSQL processes other than the postmaster) lasts across
multiple such resets.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-02-24 Thread Catalin Iacob

On Fri, Feb 19, 2016 at 9:41 PM, Pavel Stehule  wrote:
>  It looks like good idea. Last version are not breaking compatibility - and
> I think so it can works.
>
> I wrote the code, that works on Python2 and Python3

Hi,

I've attached a patch on top of yours with some documentation
improvements, feel free to fold it in your next version.

I think the concept is good. We're down to code level changes. Most of
them are cosmetical, misleading comments and so on but there are some
bugs in there as far as I see.

In object_to_string you don't need to test for Py_None. PyObject_Str
will return NULL on failure and whatever str() returns on the
underlying object. No need to special case None.

In object_to_string, when you're already in a (so != NULL) block, you
can use Py_DECREF instead of XDECREF.

object_to_string seems buggy to me: it returns the pointer returned by
PyString_AsString which points to the internal buffer of the Python
object but it also XDECREFs that object. You seem to be returning a
pointer to freed space.

get_string_attr seems to have the same issue as object_to_string.

In PLy_get_error_data query and position will never be set for
plpy.Error since you don't offer them to Python and therefore don't
set them in PLy_output_kw. So I think you should remove the code
related to them, including the char **query, int *position parameters
for PLy_get_error_data. Removing position also allows removing
get_int_attr and the need to exercise this function in the tests.

You're using PLy_get_spi_sqlerrcode in PLy_get_error_data which makes
the spi in the name unsuitable. I would rename it to just
PLy_get_sqlerrcode. At least the comment at the top of
PLy_get_spi_sqlerrcode needs to change since it now also extracts info
from Error not just SPIError.

/* set value of string pointer on object field */ comments are weird
for a function that gets a value. But those functions need to change
anyway since they're buggy (see above).

The only change in plpy_plpymodule.h is the removal of an empty line
unrelated to this patch, probably from previous versions. You should
undo it to leave plpy_plpymodule.h untouched.

Why rename PLy_output to PLy_output_kw? It only makes the patch bigger
without being an improvement. Maybe you also have this from previous
versions.

In PLy_output_kw you don't need to check message for NULL, just as sv
wasn't checked before.

In PLy_output_kw you added a FreeErrorData(edata) which didn't exist
before. I'm not familiar with that API but I'm wondering if it's
needed or not, good to have it or not etc.

In PLy_output_kw you didn't update the "Note: If sv came from
PyString_AsString(), it points into storage..." comment which still
refers to sv which is now deleted.

In PLy_output_kw you removed the "return a legal object so the
interpreter will continue on its merry way" comment which might not be
the most valuable comment in the world but removing it is still
unrelated to this patch.

In PLy_output_kw you test for kw != NULL && kw != Py_None. The kw !=
NULL is definitely needed but I'm quite sure Python will never pass
Py_None into it so the != Py_None isn't needed. Can't find a reference
to prove this at the moment.

Some more tests could be added to exercise more parts of the patch:
- check that SPIError is enhanced with all the new things:
schema_name, table_name etc.
- in the plpy.error call use every keyword argument not just detail
and hint: it proves you save and restore every field correctly from
the Error fields since that's not exercised by the info call above
which does use every argument
- use a wrong keyword argument to see it gets rejected with you error message
- try some other types than str (like detail=42) for error as well
since error goes on another code path than info
- a test exercising the "invalid sqlstate" code

0001-Doc-improvements.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Michael Paquier

On Wed, Feb 24, 2016 at 11:34 PM, Bruce Momjian  wrote:
> On Wed, Feb 24, 2016 at 12:17:28PM +0300, Alexander Korotkov wrote:
>> Hi, Bruce!
>>
>> The important point for me is to distinguish different kind of plans:
>> implementation plan and research plan.
>> If we're talking about implementation plan then it should be proven that
>> proposed approach works in this case. I.e research should be already done.
>> If we're talking about research plan then we should realize that result is
>> unpredictable. And we would probably need to dramatically change our way.
>
> Yes, good point.  I would say FDW-based sharding is certainly still a
> research approach, but an odd one because we are adding code even while
> in research mode.  I think that is possible because the FDW improvements
> have other uses beyond sharding.
>
> I think another aspect is that we already know that modifying the
> Postgres source code can produce a useful sharding solution --- XC, XL,
> Greenplum, and CitusDB all prove that, and pg_shard does it as a plugin.
> So, we know that with unlimited code changes, it is possible.  What we
> don't know is whether it is possible with acceptable code changes, and
> how much of the feature-set can be supported this way.
>
> We had a similar case with the Windows port, where SRA (my employer at
> the time) and Nusphere both had native Windows ports of Postgres, and
> they supplied source code to help with the port.  So, in that case also,
> we knew a native Windows port was possible, and we (or at least I) could
> see the code that was required to do it.  The big question was whether a
> native Windows port could be added in a community-acceptable way, and
> the community agreed we could try if we didn't make the code messier ---
> that was a success.
>
> For pg_upgrade, I had code from EDB (my employer at the time) that kind
> of worked, but needed lots of polish, and again, I could do it in
> contrib as long as I didn't mess up the backend code --- that worked
> well too.
>
> So, I guess I am saying, the FDW/sharding thing is a research project,
> but one that is implementing code because of existing proven solutions
> and because the improvements are benefiting other use-cases beyond
> sharding.
>
> Also, in the big picture, the existence of many Postgres forks, all
> doing sharding, indicates that there is demand for this capability, and
> if we can get some this capability into Postgres we will increase the
> number of people using native Postgres.  We might also be able to reduce
> the amount of duplicate work being done in all these forks and allow
> them to more easily focus on more advanced use-cases.
>
>> This two things would work with FDW:
>> 1) Pull data from data nodes to coordinator.
>> 2) Pushdown computations from coordinator to data nodes: joins, aggregates 
>> etc.
>> It's proven and clear. This is good.
>> Another point is that these FDW advances are useful by themselves. This is 
>> good
>> too.
>>
>> However, the model of FDW assumes that communication happen only between
>> coordinator and data node. But full-weight distributed optimized can't be 
>> done
>> under this restriction, because it requires every node to communicate every
>> other node if it makes distributed query faster. And as I get, FDW approach
>> currently have no research and no particular plan for that.
>
> This is very true.  I imagine cross-node connections will certainly
> complicate the implementation and lead to significant code changes,
> which might be unacceptable.  I think we need to go with a
> non-cross-node implementation first, then if that is accepted, we can
> start to think what cross-node code changes would look like.  It
> certainly would require FDW knowledge to exist on every shard.  Some
> have suggested that FDWs wouldn't work well for cross-node connections
> or wouldn't scale and we shouldn't be using them --- I am not sure what
> to think of that.
>
>> As I get from Robert Haas's talk 
>> (https://docs.google.com/viewer?a=v&pid=sites&;
>> srcid=ZGVmYXVsdGRvbWFpbnxyb2JlcnRtaGFhc3xneDo1ZmFhYzBhNjNhNzVhMDM0)
>>
>> Before we consider repartitioning joins, we should probably get 
>> everything
>> previously discussed working first.
>> – Join Pushdown For Parallelism, FDWs
>> – PartialAggregate/FinalizeAggregate
>> – Aggregate Pushdown For Parallelism, FDWs
>> – Declarative Partitioning
>> – Parallel-Aware Append
>>
>>
>> So, as I get we didn't ever think about possibility of data redistribution
>> using FDW. Probably, something changed since that time. But I haven't heard
>> about it.
>
> No, you didn't miss it.  :-(  We just haven't gotten to studying that
> yet.  One possible outcome is that built-in Postgres has non-cross-node
> sharding, and forks of Postgres have cross-node sharding, again assuming
> cross-node sharding requires an unacceptable amount of code change.  I
> don't think anyone knows the answer yet.
>
>> On Tue, Feb 23, 2016 at 7:43 PM, Bruce M

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-02-24 Thread Amit Kapila

On Wed, Feb 24, 2016 at 7:14 PM, Robert Haas  wrote:

> On Mon, Feb 22, 2016 at 10:05 AM, Amit Kapila 
> wrote:
> >> I wouldn't bother tinkering with it at this point.  The value isn't
> >> going to be recorded on disk anywhere, so it will be easy to change
> >> the way it's computed in the future if we ever need to do that.
> >>
> >
> > Okay. Find the rebased patch attached with this mail.  I have moved
> > this patch to upcoming CF.
>
> I would call the functions pgstat_report_wait_start() and
> pgstat_report_wait_end() instead of pgstat_report_start_waiting() and
> pgstat_report_end_waiting().
>
> I think pgstat_get_wait_event_type should not return HWLock, a term
> that appears nowhere in our source tree at present.  How about just
> "Lock"?
>
> I think the wait event types should be documented - and the wait
> events too, perhaps.
>
> Maybe it's worth having separate wait event type names for lwlocks and
> lwlock tranches.  We could report LWLockNamed and LWLockTranche and
> document the difference: "LWLockNamed indicates that the backend is
> waiting for a specific, named LWLock.  The event name is the name of
> that lock.  LWLockTranche indicates that the backend is waiting for
> any one of a group of locks with similar function.  The event name
> identifies the general purpose of locks in that group."
>
>
Agreed with all the above points and will change the patch accordingly.


> There's no requirement that every session have every tranche
> registered.  I think we should consider displaying "extension" for any
> tranche that's not built-in, or at least for tranches that are not
> registered (rather than "unknown wait event").
>
>
I think it is better to display as an "extension" for unregistered tranches,
but do you see any case where we will have wait event information
for any unregistered tranche?

Another point to consider is that if it is not possible to have wait event
for unregisteredtranche, then should we have Assert or elog(ERROR)
instead of "unknown wait event"?



> +   if (lock->tranche == 0 && lockId < NUM_INDIVIDUAL_LWLOCKS)
>
> Isn't the second part automatically true at this point?
>


Yes, that point is automatically true and I think we should change
the same check in PRINT_LWDEBUG and LOG_LWDEBUG, although
as separate patches.



>
> The changes to LockBufferForCleanup() don't look right to me.  Waiting
> for a buffer pin might be a reasonable thing to define as a wait
> event, but it shouldn't reported as if we were waiting on the LWLock
> itself.
>

makes sense, how about having a new wait class, something like
WAIT_BUFFER and then have wait event type as Buffer and
wait event as BufferPin.  At this moment, I think there will be
only one event in this class, but it seems to me waiting on buffer
has merit to be considered as a separate class.


>
> What happens if an error is thrown while we're in a wait?
>
>
For LWLocks, only FATAL error is possible which will anyway lead
to initialization of all backend states.  For lock.c, if an error is
thrown, then state is reset in Catch block. In
LockBufferForCleanup(), after we set the wait event and before we
reset it, there is only a chance of FATAL error, if any system call
fails.  We do have one error in enable_timeouts which is called from
ResolveRecoveryConflictWithBufferPin(), but that doesn't seem to
be possible.  Now one question to answer is that what if tomorrow
some one adds new error after we set the wait state, so may be
it is better to clear wait event in AbortTransaction()?



> Does this patch hurt performance?
>
>
This patch adds additional code in the path where we are going
to sleep/wait, but we have changed some shared memory structure, so
it is good to verify performance tests.  I am planning to run read-only
and read-write pgbench tests (when data fits in shared), is that
sufficient or do you expect anything more?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Prepared Statement support for Parallel query

2016-02-24 Thread Amit Kapila

On Wed, Feb 24, 2016 at 7:27 PM, Robert Haas  wrote:

> On Wed, Feb 17, 2016 at 6:41 PM, Amit Kapila 
> wrote:
> > Commit d1b7c1ffe72e86932b5395f29e006c3f503bc53d has added
> > the support for passing bind parameters to parallel workers, however
> > prepared statement which uses bind parameters wasn't enabled
> > for parallel query as the 'Execute' message in FE-BE protocol
> > can pass the row_count which can make parallel plans unusable.
> > (parallel plans are only possible when query can run to completion)
> >
> > Later Commit bfc78d7196eb28cd4e3d6c24f7e607bacecf1129 has
> > ensure that if the row_count is non-zero then we won't enter
> > parallel mode which means that even if parallel plan is selected
> > by optimizer, it will run such a plan locally.
> >
> > With above support, it was just a matter of enabling parallel
> > mode for prepared statements which is done in attached patch
> > (prepared_stmt_parallel_query_v1.patch).
> >
> > I have tested that parallel plans are getting generated both
> > via Prepare/Execute statements and libpq prepared
> > statement execution.  Attached is a libpq program
> > (prepare_parallel_query.c) which I have used for testing prepared
> > statement support.  I have done the verification manually
> > (using auto_explain) to ensure that parallel plans gets generated
> > and executed via this libpq program.  This program expects some
> > data to be generated before-hand and the information of same is
> > added in file-header.
>
> Hmm.   I agree we should change exec_parse_message like this, but
> changing PrepareQuery seems wrong.  I mean, there's a very good chance
> that a parse message will be followed by an Execute message with a
> zero row count, so we'll get parallel execution.


Yes and I think libpq doesn't even provide a interface to specify row
count for prepared statements.


>   But if the user says
> they want to PREPARE the query, they are probably not going to fetch
> all rows.
>
>
After PREPARE, user will execute the statement using EXECUTE and
I don't see how user can decide number of rows to fetch which can
influence the execution.  Can you please elaborate your point more
and what is your expectation for the same?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS][PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

2016-02-24 Thread Vitaly Burovoy

On 2/24/16, Vitaly Burovoy  wrote:
> On 2/2/16, Jim Nasby  wrote:
>> On 2/2/16 6:39 PM, Tom Lane wrote:
>>> I'm inclined to think that a good solution would be to create an
>>> artificial restriction to not accept years beyond, say, 10 AD.
>>> That would leave us with a lot of daylight to not have to worry
>>> about corner-case overflows in timestamp arithmetic.  I'm not sure
>>> though where we'd need to enforce such a restriction; certainly in
>>> timestamp[tz]_in, but where else?
>>
>> Probably some of the casts (I'd think at least timestamp->timestamptz).
>> Maybe timestamp[tz]_recv. Most of the time*pl* functions. :/
>
> Please find attached a patch checks boundaries of date/timestamp[tz].
> There are more functions: converting to/from timestamptz, truncating,
> constructing from date and time etc.
>
> I left the upper boundary as described[1] in the documentation
> (294276-12-31 AD), lower - "as is" (4714-11-24 BC).
> It is easy to change the lower boundary to 4713-01-01BC (as described
> in the documentation) and it seems necessary because it allows to
> simplify IS_VALID_JULIAN and IS_VALID_JULIAN4STAMPS and avoid the next
> behavior:
>
> postgres=# select
> postgres-#  to_char(date_trunc('week', '4713-01-01 BC'::date),'day')
> postgres-# ,to_char(date_trunc('week', '4714-12-29 BC'::date),'day')
> postgres-# ,to_char(date_trunc('week', '4714-12-28 BC'::date),'day');
>   to_char  |  to_char  |  to_char
> ---+---+---
>  monday| monday| thursday
> (1 row)
>
> since 4714-12-28 BC and to the past detection when a week is starting
> is broken (because it is boundary of isoyears -4713 and -4712).
> Is it worth to break undocumented range or leave it as is?
>
> There is one more flaw: checking for a correctness begins from date
> and if default TZ is not UTC, dump/restore of values of type
> timestamptz which are close to allowed boundaries can be broken (and
> such result can't be restored because date is not in allowed range):
> postgres=# SET TIME ZONE 'GMT+1';
> SET
> postgres=# COPY (SELECT '4714-11-24 00:00:00.00+00
> BC'::timestamptz) TO STDOUT;
> 4714-11-23 23:00:00-01 BC
>
> Also I'm asking for a help because the query (in default TZ='GMT+1'):
> postgres=# SELECT '4714-11-24 00:00:00.00+00 BC'::timestamptz;
>
> in psql gives a result "4714-11-23 23:00:00-01 BC",
> but in a testing system gives "Sun Nov 23 23:00:00 4714 GMT BC"
> without TZ offset.
>
>
> I don't see what can be added to the documentation with the applied patch.
>
> More testings, finding bugs, uncovered functions, advice, comment
> improvements are very appreciated.
>
> [1]http://www.postgresql.org/docs/devel/static/datatype-datetime.html

I'm sorry, gmail hasn't added a header "In-Reply-To" to the last email.
Previous thread is by the link[2].

http://www.postgresql.org/message-id/flat/cakoswnked3joe++pes49gmdnfr2ieu9a2jfcez6ew-+1c5_...@mail.gmail.com

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Random note of encouragement

2016-02-24 Thread Thomas Munro

On Thu, Feb 25, 2016 at 1:43 PM, Bruce Momjian  wrote:
> On Thu, Feb 25, 2016 at 12:50:06PM +1300, Thomas Munro wrote:
>> On Thu, Feb 25, 2016 at 12:26 PM, Bruce Momjian  wrote:
>> > On Thu, Feb 25, 2016 at 10:06:34AM +1100, James Sewell wrote:
>> >> Now when I run the following SQL (multiple times to allow for getting
>> >> everything into shared buffers, which is 4GB on my machine):
>> >>
>> >>
>> >> select sum(count_n) from base group by view_time_day;
>> >>
>> >>
>> >> I get the following results:
>> >>
>> >>
>> >> PSQL 9.5 - ~21 seconds
>> >> PSQL 9.6 devel - ~8.5 seconds
>> >>
>> >>
>> >> I think that's pretty good!
>> >>
>> >> I know this is a devel release, things may change, blah blah. But still,
>> >> something has changed for the better here!
>> >
>> > Wow, that is cool.  Can anyone suggest which commit improved this?
>>
>> Since it sums numerics, maybe integer transition functions from commit
>> 959277a4f579da5243968c750069570a58e92b38 helped?
>
> Seems it was a wrong report, but anyway, this commit was in 9.5, while
> the user reported a speedup in 9.6.

Oops, right, and as David said it's also irrelevant.

FWIW I couldn't reproduce this comparing 9.5 with 9.6, but the numbers
reported just happen to match nearly exactly what I get comparing -O2
and -O0 builds here...

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Random note of encouragement

2016-02-24 Thread Bruce Momjian

On Thu, Feb 25, 2016 at 12:50:06PM +1300, Thomas Munro wrote:
> On Thu, Feb 25, 2016 at 12:26 PM, Bruce Momjian  wrote:
> > On Thu, Feb 25, 2016 at 10:06:34AM +1100, James Sewell wrote:
> >> Now when I run the following SQL (multiple times to allow for getting
> >> everything into shared buffers, which is 4GB on my machine):
> >>
> >>
> >> select sum(count_n) from base group by view_time_day;
> >>
> >>
> >> I get the following results:
> >>
> >>
> >> PSQL 9.5 - ~21 seconds
> >> PSQL 9.6 devel - ~8.5 seconds
> >>
> >>
> >> I think that's pretty good!
> >>
> >> I know this is a devel release, things may change, blah blah. But still,
> >> something has changed for the better here!
> >
> > Wow, that is cool.  Can anyone suggest which commit improved this?
> 
> Since it sums numerics, maybe integer transition functions from commit
> 959277a4f579da5243968c750069570a58e92b38 helped?

Seems it was a wrong report, but anyway, this commit was in 9.5, while
the user reported a speedup in 9.6.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Random note of encouragement

2016-02-24 Thread James Sewell

Argh seems like a false alarm for now.

I installed 9.5 from RPM source (the other was one I had installed
previously) and the performance matched 9.6

Sorry about that, I must have *something* screwed up on the other one.

Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Random note of encouragement

2016-02-24 Thread David Rowley

On 25 February 2016 at 12:50, Thomas Munro
 wrote:
> On Thu, Feb 25, 2016 at 12:26 PM, Bruce Momjian  wrote:
>> On Thu, Feb 25, 2016 at 10:06:34AM +1100, James Sewell wrote:
>>> I get the following results:
>>>
>>>
>>> PSQL 9.5 - ~21 seconds
>>> PSQL 9.6 devel - ~8.5 seconds
>>>
>>>
>>> I think that's pretty good!
>>>
>>> I know this is a devel release, things may change, blah blah. But still,
>>> something has changed for the better here!
>>
>> Wow, that is cool.  Can anyone suggest which commit improved this?
>
> Since it sums numerics, maybe integer transition functions from commit
> 959277a4f579da5243968c750069570a58e92b38 helped?

Those changes were only really to allow bigint to use int128
internally instead of NUMERIC, it didn't make any changes to any
NUMERIC aggregate functions.

It would be interesting to see the explain analyze buffers for both.
Perhaps 9.5 just read more buffers from disk than 9.6 did.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Random note of encouragement

2016-02-24 Thread James Sewell

I've actually just tested this on 9.3 - and I get roughly the same as
9.6devel.

Now going back to make sure my 9.5 environment is sane.

Hopefully this isn't me jumping the gun.

Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Thu, Feb 25, 2016 at 10:50 AM, Thomas Munro <
thomas.mu...@enterprisedb.com> wrote:

> On Thu, Feb 25, 2016 at 12:26 PM, Bruce Momjian  wrote:
> > On Thu, Feb 25, 2016 at 10:06:34AM +1100, James Sewell wrote:
> >> Now when I run the following SQL (multiple times to allow for getting
> >> everything into shared buffers, which is 4GB on my machine):
> >>
> >>
> >> select sum(count_n) from base group by view_time_day;
> >>
> >>
> >> I get the following results:
> >>
> >>
> >> PSQL 9.5 - ~21 seconds
> >> PSQL 9.6 devel - ~8.5 seconds
> >>
> >>
> >> I think that's pretty good!
> >>
> >> I know this is a devel release, things may change, blah blah. But still,
> >> something has changed for the better here!
> >
> > Wow, that is cool.  Can anyone suggest which commit improved this?
>
> Since it sums numerics, maybe integer transition functions from commit
> 959277a4f579da5243968c750069570a58e92b38 helped?
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Random note of encouragement

2016-02-24 Thread Thomas Munro

On Thu, Feb 25, 2016 at 12:26 PM, Bruce Momjian  wrote:
> On Thu, Feb 25, 2016 at 10:06:34AM +1100, James Sewell wrote:
>> Now when I run the following SQL (multiple times to allow for getting
>> everything into shared buffers, which is 4GB on my machine):
>>
>>
>> select sum(count_n) from base group by view_time_day;
>>
>>
>> I get the following results:
>>
>>
>> PSQL 9.5 - ~21 seconds
>> PSQL 9.6 devel - ~8.5 seconds
>>
>>
>> I think that's pretty good!
>>
>> I know this is a devel release, things may change, blah blah. But still,
>> something has changed for the better here!
>
> Wow, that is cool.  Can anyone suggest which commit improved this?

Since it sums numerics, maybe integer transition functions from commit
959277a4f579da5243968c750069570a58e92b38 helped?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Random note of encouragement

2016-02-24 Thread Bruce Momjian

On Thu, Feb 25, 2016 at 10:06:34AM +1100, James Sewell wrote:
> Now when I run the following SQL (multiple times to allow for getting
> everything into shared buffers, which is 4GB on my machine):
> 
> 
> select sum(count_n) from base group by view_time_day;
> 
> 
> I get the following results:
> 
> 
> PSQL 9.5 - ~21 seconds
> PSQL 9.6 devel - ~8.5 seconds
> 
> 
> I think that's pretty good!
> 
> I know this is a devel release, things may change, blah blah. But still,
> something has changed for the better here!

Wow, that is cool.  Can anyone suggest which commit improved this?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Random note of encouragement

2016-02-24 Thread James Sewell

Hey All,

I've been doing some (futile) work trying to speed up aggregates with a
group by in PostgreSQL 9.5.

I installed PostgreSQL 9.6 on the same machine to see if I could get
anything running in parallel when using partitioning - which didn't work.

But - I did find this:

With the following setup:

CREATE TABLE base(
view_time TIMESTAMP WITHOUT time ZONE,
view_time_day TIMESTAMP WITHOUT time ZONE,
count_n numeric);


INSERT INTO base
SELECT view_time,
   date_trunc('day', view_time),
   COUNT::numeric,
FROM
  (SELECT
  TIMESTAMP '2015-12-01' + random() * interval '30 days' AS view_time,
trunc(random() * 99 + 1) AS COUNT
   FROM generate_series(1,3000)) a;

analyze base;

Now when I run the following SQL (multiple times to allow for getting
everything into shared buffers, which is 4GB on my machine):

select sum(count_n) from base group by view_time_day;


I get the following results:

PSQL 9.5 - *~21 seconds*
PSQL 9.6 devel - *~8.5 seconds*


I think that's pretty good!

I know this is a devel release, things may change, blah blah. But still,
something has changed for the better here!

I get the same plan on both nodes:

 HashAggregate  (cost=670590.56..670590.95 rows=31 width=13)
   Group Key: view_time_day
   ->  Seq Scan on base  (cost=0.00..520590.04 rows=3104 width=13)

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-02-24 Thread Peter Eisentraut

Could you enhance the documentation about the difference between "wait
event type name" and "wait event name" (examples?)?  This is likely to
be quite confusing for users who are used to just the plain "waiting"
column.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types

2016-02-24 Thread Peter Eisentraut

On 1/18/16 4:21 PM, Robert Haas wrote:
> One idea that occurs to me is: If you can DECLARE BAR FOO%TYPE, but
> then you want to make BAR an array of that type rather than a scalar,
> why not write that as DECLARE BAR FOO%TYPE[]?  That seems quite
> natural to me.

Right, and it's arguably dubious that that doesn't already work.
Unfortunately, these % things are just random plpgsql parser hacks, not
real types.  Maybe this should be done in the main PostgreSQL parser
with parameter hooks, if we wanted this feature to be available outside
plpgsql as well.

> I think the part of this patch that makes %TYPE work for more kinds of
> types is probably a good idea, although I haven't carefully studied
> exactly what it does.

I agree that this should be more general.  For instance, this patch
would allow you to get the element type of an array-typed variable, but
there is no way to get the element type of just another type.  If we
could do something like

DECLARE
  var ELEMENT OF point;

(not necessary that syntax)

then

DECLARE
  var ELEMENT OF othervar%TYPE;

should just fall into place.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] get current log file

2016-02-24 Thread Euler Taveira

On 02-02-2016 10:22, Armor wrote:
>  As we known, the name of current log file depends on the number of
> seconds (for simple, later I will call it last_syslogger_file_time)
> since Epoch when create new log file. So, for this feature, the key is
> how syslogger process pass last_syslogger_file_time to backend processes.
> 
I didn't like the name. Let's call it syslogger_file_name. It describes
what the variable is (actual file name that syslogger is writing on).

> To pass last_syslogger_file_time, we have 2 solutions: 1, add a
> global variable to record last_syslogger_file_time which shared by
> backends and syslogger, so backends can get last_syslogger_file_time
> very easily; 2 syslogger process send last_syslogger_file_time to pgstat
> process when last_syslogger_file_time changes, just as other auxiliary
> processes send stat  message to pgstat process, and  pgstat process will
> write  last_syslogger_file_time into stat file so that backend can
> get last_syslogger_file_time via reading this stat file.
> 
I prefer (1) because (i) logfile name is not statistics and (ii) stats
collector could not respond in certain circumstances (and even discard
some messages).


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATH] Jsonb, insert a new value into an array at arbitrary position

2016-02-24 Thread Petr Jelinek



On 18/02/16 15:38, Dmitry Dolgov wrote:

Hi

As far as I see there is one basic update function for jsonb, that can't be
covered by `jsonb_set` - insert a new value into an array at arbitrary
position.
Using `jsonb_set` function we can only append into array at the end/at the
beginning, and it looks more like a hack:

```
=# select jsonb_set('{"a": [1, 2, 3]}', '{a, 100}', '4');
   jsonb_set
-
  {"a": [1, 2, 3, 4]}
(1 row)


=# select jsonb_set('{"a": [1, 2, 3]}', '{a, -100}', '4');
   jsonb_set
-
  {"a": [4, 1, 2, 3]}
(1 row)
```

I think the possibility to insert into arbitrary position will be quite
useful,
something like `json_array_insert` in MySql:

```
mysql> set @j = '["a", {"b": [1, 2]}, [3, 4]]';
mysql> select json_array_insert(@j, '$[1].b[0]', 'x');

  json_array_insert(@j, '$[1].b[0]', 'x')
+-+
  ["a", {"b": ["x", 1, 2]}, [3, 4]]
```

It can look like `jsonb_insert` function in our case:

```
=# select jsonb_insert('{"a": [0,1,2]}', '{a, 1}', '"new_value"');
  jsonb_insert
---
  {"a": [0, "new_value", 1, 2]}
(1 row)
```



I think it makes sense to have interface like this, I'd strongly prefer 
the jsonb_array_insert naming though.




I attached possible implementation, which is basically quite small (all
logic-related
modifications is only about 4 lines in `setPath` function). This
implementation
assumes a flag to separate "insert before"/"insert after" operations, and an
alias to `jsonb_set` in case if we working with a jsonb object, not an
array.



I don't think it's a good idea to use set when this is used on object, I 
think that we should throw error in that case.


Also this patch needs documentation.

--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH v5] GSSAPI encryption support

2016-02-24 Thread Robbie Harwood

David Steele  writes:

> On 2/15/16 12:45 PM, Robbie Harwood wrote:
>> David Steele  writes:
>>
>>> 1) It didn't apply cleanly to HEAD.  It did apply cleanly on a455878
>>> which I figured was recent enough for testing.  I didn't bisect to find
>>> the exact commit that broke it.
>> 
>> It applied to head of master (57c932475504d63d8f8a68fc6925d7decabc378a)
>> for me (`patch -p1 < v4-GSSAPI-encryption-support.patch`).  I rebased it
>> anyway and cut a v5 anyway, just to be sure.  It's attached, and
>> available on github as well:
>> https://github.com/frozencemetery/postgres/commit/dc10e3519f0f6c67f79abd157dc8ff1a1c293f53
>
> It could have been my mistake.  I'll give it another try when you have a
> new patch.

Please do let me know how v5 goes.  If you run into trouble, in addition
to the logs you helpfully provided before, I'd like a traffic dump (pcap
preferable; I need tcp/udp port 88 for Kerberos and tcp port 5432 or
whatever you're running postgres on) if possible.  Thanks!

>>> 2) While I was able to apply the patch and get it compiled it seemed
>>> pretty flaky - I was only able to logon about 1 in 10 times on average.
>>>  Here was my testing methodology:
>> 
>> What I can't tell from looking at your methodology is whether both the
>> client and server were running my patches or no.  There's no fallback
>> here (I'd like to talk about how that should work, with example from
>> v1-v3, if people have ideas).  This means that both the client and the
>> server need to be running my patches for the moment.  Is this your
>> setup?
>
> I was testing on a system with no version of PostgreSQL installed.  I
> applied your patch to master and then ran both server and client from
> that patched version.  Is there something I'm missing?

Not that I can immediately see.  As long as the client and server are
both patched, everything should work.  My process is the same as with
previous versions of this patchset [0], and though I'm using FreeIPA
there is no reason it shouldn't work with any other KDC (MIT, for
instance[1]) provided the IPA calls are converted.

I am curious, though - I haven't changed any of the authentication code
in v4/v5 from what's in ~master, so how often can you log in using
GSSAPI using master?

[0]: https://mivehind.net/2015/06/11/kerberized-postgresql/
[1]: http://web.mit.edu/kerberos/krb5-devel/doc/admin/install_kdc.html

signature.asc
Description: PGP signature

Re: [HACKERS] get current log file

2016-02-24 Thread Armor

As we known, the name of current log file depends on the number of seconds (for 
simple, later I will call it last_syslogger_file_time) since Epoch when create 
new log file. So, for this feature, the key is how syslogger process pass 
last_syslogger_file_time to backend processes.

To pass last_syslogger_file_time, we have 2 solutions: 1, add a global 
variable to record last_syslogger_file_time which shared by backends and 
syslogger, so backends can get last_syslogger_file_time very easily; 2 
syslogger process send last_syslogger_file_time to pgstat process when 
last_syslogger_file_time changes, just as other auxiliary processes send stat  
message to pgstat process, and  pgstat process will write  
last_syslogger_file_time into stat file so that backend can get 
last_syslogger_file_time via reading this stat file.

   For these 2 solutions, we prefer to later, because we want to keep the 
global variables space much simpler.
   On the other side, we need to add a new function named 
pg_stat_get_log_file_name() which will return the current log file name  
according to last_syslogger_file_time and log file name format.
   If you have any question, please let me know.
--
Jerry Yu

-- Original --
From:  "Alvaro Herrera";;
Date:  Tue, Feb 2, 2016 06:30 PM
To:  "Armor"; 
Cc:  "pgsql-hackers"; 
Subject:  Re: [HACKERS] get current log file

Armor wrote:
> Hello,
> 
> 
> I find there is a new feature about getting current log file name on the 
> TODO list (for detail please check 
> http://www.postgresql.org/message-id/pine.gso.4.64.0811101325260.9...@westnet.com).
>  On the other side, we finish a ticket to this requirement for our customer. 
> If the PG community still need this feature,  there will be a pleasure 
> for us to make contribution. 

Please propose a design and we'll discuss.  There's clearly need for
this feature.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Service

Re: [HACKERS] Allow to specify (auto-)vacuum cost limits relative to the database/cluster size?

2016-02-24 Thread Joe Conway

On 02/24/2016 08:54 AM, Alvaro Herrera wrote:
> Joe Conway wrote:
> 
>> In my experience it is almost always best to run autovacuum very often
>> and very aggressively. That generally means tuning scaling factor and
>> thresholds as well, such that there are never more than say 50-100k dead
>> rows. Then running vacuum with no delays or limits runs quite fast is is
>> generally not noticeable/impactful.
>>
>> However that strategy does not work well for vacuums which run long,
>> such as an anti-wraparound vacuum. So in my opinion we need to think
>> about this as at least two distinct cases requiring different solutions.
> 
> With the freeze map there is no need for anti-wraparound vacuums to be
> terribly costly, since they don't need to scan the whole table each
> time.  That patch probably changes things a lot in this area.

Yes, I had forgotten about that. It would be a huge help.


-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types

2016-02-24 Thread Pavel Stehule

Hi

2016-02-24 10:48 GMT+01:00 Artur Zakirov :

On 21.02.2016 11:31, Pavel Stehule wrote:
>
>> Hi
>>
>> I am sending updated version - the changes are related to fix comments.
>>
>>
> Great.
>
> I am new in reviewing, I think Pavel took into account all comments. This
> patch is compiled and regression tests are passed. So I change its status
> to "Ready for Committer".
>

Thank you very much

Regards

Pavel


>
>
> --
> Artur Zakirov
> Postgres Professional: http://www.postgrespro.com
> Russian Postgres Company
>

Re: [HACKERS] Declarative partitioning

2016-02-24 Thread Corey Huinker

>
> Hm, I see.  How about multi-column keys?  Do we care enough about that use
> case?  I don't see a nice-enough-looking range literal for such keys.
> Consider for instance,
>
> With the partitioned table defined as:
>
> CREATE TABLE foo(c1 char(2), c2 char(2)) PARTITION BY RANGE (c1, c2);
>

Good question! I would assume that we'd use a syntax that presumes c1 and
c2 are a hypothetical composite type. But what does that look like?

To answer it, I tried this:

# create type duple as (a text, b text);
CREATE TYPE
# create type duplerange as range (subtype = duple);
CREATE TYPE
# select '(beebop,alula)'::duple;
 duple

 (beebop,alula)
(1 row)

# select '("hey ho","letsgo")'::duple;
   duple
---
 ("hey ho",letsgo)
(1 row)

analytics=# select duplerange('(beebop,alula)','("hey ho","letsgo")','(]');
duplerange
--
 ("(beebop,alula)","(""hey ho"",letsgo)"]
(1 row)

So I would assume that we'd use a syntax that presumed the columns were in
a composite range type.

Which means your creates would look like (following Robert Haas's implied
suggestion that we leave off the string literal quotes):

CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES (  , (b,2) );
CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (b,2), (b,3) );
CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (b,3), (b,4) );
CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (b,4), (c,2) );
CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (c,2), (c,3) );
CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (c,3), (c,4) );

That's not terrible looking.

We would want to also think about what subset of many permutations of this
> syntax to accept range specs for new partitions.  Mostly to preserve the
> non-overlapping invariant and I think it would also be nice to prevent
> gaps.
>

Gaps *might* be intentional. I can certainly see where we'd want to set up
warnings for discontinuity, or perhaps some utility functions:
pg_partitions_ranges_are_continuous('master_table_name')
pg_partitions_are_adjacent('master_table_name','p1','p2')

But for the most part, range partitions evolve from splits when one
partition grows too big, so that won't be such a problem.


Consider that once we create:
>
> PARTITION FOR VALUES [current_date,);
>
> Now to create a new partition starting at later date, we have to have a
> "split partition" feature which would scan the above partition to
> distribute the existing data rows appropriately to the resulting two
> partitions. Right?
>

Correct. And depending on policy, that might be desirable and might be not.
If the table were for death records, we'd very much want to reject rows in
the future, if only to avoid upsetting the person.
If the table were of movie release dates, we'd *expect* that only dates
(,current_date] would be entered, but if someone chose to leak a release
date, we'd want to capture that and deal with it later.
So yeah, we're going to (eventually) need a SPLIT PARTITION that migrates
rows to a new partition.


> IOW, one shouldn't create an unbounded partition if more partitions in the
> unbounded direction are expected to be created.  It would be OK for
> unbounded partitions to be on the lower end most of the times.
>

On this I'll have to disagree. My own use case where I use my
range_partitioning extension starts off with a single partition () and all
new partitions are splits of that. The ranges evolve over time as
partitions grow and slow down. It's nice because we're not trying to
predict where growth will be, we split where growth is.


>
> > p.s. Sorry I haven't been able to kick the tires just yet. We have a very
> > good use case for this, it's just a matter of getting a machine and the
> > time to devote to it.
>
> I would appreciate it.  You could wait a little more for my next
> submission which will contain some revisions to the tuple routing code.
>
>
Ok, I'll wait a bit. In the mean time I can tell you a bit about the
existing production system I'm hoping to replicate in true partitioning
looks like this:

Big Master Table:
 Range partition by C collated text
   Date Range
   Date Range
   ...
 Range partition by C collated text
   Date Range
   Date Range
   ...
...

Currently this is accomplished through my range_partitioning module, and
then using pg_partman on those partitions. It works, but it's a lot of
moving parts.

The machine will be a 32 core AWS box. As per usual with AWS, it will be
have ample memory and CPU, and be somewhat starved for I/O.

Question: I haven't dove into the code, but I was curious about your tuple
routing algorithm. Is there any way for the algorithm to begin it's scan of
candidate partitions based on the destination of the last row inserted this
statement? I ask because most use cases (that I am aware of) have data that
would naturally cluster in the same partition.

Re: [HACKERS] Proposal: Generic WAL logical messages

2016-02-24 Thread Petr Jelinek



Hello,

It seems that you forgot regression tests for test_decoding. There is an
entry in test_decoding/Makefile, but there are not files
sql/messages.sql and expected/messages.out. However they are included in
the first version of the patch.



Hi, yes, git add missing.

--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From 55a771c4770b73b5bedac7cd91bd0a50b0d6da45 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Wed, 24 Feb 2016 17:02:36 +0100
Subject: [PATCH] Logical Decoding Messages

---
 contrib/test_decoding/Makefile  |   2 +-
 contrib/test_decoding/expected/messages.out |  56 ++
 contrib/test_decoding/sql/messages.sql  |  17 +++
 contrib/test_decoding/test_decoding.c   |  22 +++-
 doc/src/sgml/func.sgml  |  42 
 doc/src/sgml/logicaldecoding.sgml   |  22 
 src/backend/access/rmgrdesc/Makefile|   4 +-
 src/backend/access/rmgrdesc/logicalmsgdesc.c|  33 ++
 src/backend/access/transam/rmgr.c   |   1 +
 src/backend/replication/logical/Makefile|   2 +-
 src/backend/replication/logical/decode.c|  37 +++
 src/backend/replication/logical/logical.c   |  38 +++
 src/backend/replication/logical/logicalfuncs.c  |  27 +
 src/backend/replication/logical/message.c   | 132 
 src/backend/replication/logical/reorderbuffer.c |  73 +
 src/bin/pg_xlogdump/rmgrdesc.c  |   1 +
 src/include/access/rmgrlist.h   |   1 +
 src/include/catalog/pg_proc.h   |   4 +
 src/include/replication/logicalfuncs.h  |   2 +
 src/include/replication/message.h   |  42 
 src/include/replication/output_plugin.h |  13 +++
 src/include/replication/reorderbuffer.h |  21 
 22 files changed, 587 insertions(+), 5 deletions(-)
 create mode 100644 contrib/test_decoding/expected/messages.out
 create mode 100644 contrib/test_decoding/sql/messages.sql
 create mode 100644 src/backend/access/rmgrdesc/logicalmsgdesc.c
 create mode 100644 src/backend/replication/logical/message.c
 create mode 100644 src/include/replication/message.h

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index a362e69..8fdcfbc 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -38,7 +38,7 @@ submake-test_decoding:
 	$(MAKE) -C $(top_builddir)/contrib/test_decoding
 
 REGRESSCHECKS=ddl rewrite toast permissions decoding_in_xact decoding_into_rel \
-	binary prepared replorigin
+	binary prepared replorigin messages
 
 regresscheck: | submake-regress submake-test_decoding temp-install
 	$(MKDIR_P) regression_output
diff --git a/contrib/test_decoding/expected/messages.out b/contrib/test_decoding/expected/messages.out
new file mode 100644
index 000..d256851
--- /dev/null
+++ b/contrib/test_decoding/expected/messages.out
@@ -0,0 +1,56 @@
+-- predictability
+SET synchronous_commit = on;
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+ ?column? 
+--
+ init
+(1 row)
+
+SELECT 'msg1' FROM pg_logical_send_message(true, 'test', 'msg1');
+ ?column? 
+--
+ msg1
+(1 row)
+
+SELECT 'msg2' FROM pg_logical_send_message(false, 'test', 'msg2');
+ ?column? 
+--
+ msg2
+(1 row)
+
+BEGIN;
+SELECT 'msg3' FROM pg_logical_send_message(true, 'test', 'msg3');
+ ?column? 
+--
+ msg3
+(1 row)
+
+SELECT 'msg4' FROM pg_logical_send_message(false, 'test', 'msg4');
+ ?column? 
+--
+ msg4
+(1 row)
+
+SELECT 'msg5' FROM pg_logical_send_message(true, 'test', 'msg5');
+ ?column? 
+--
+ msg5
+(1 row)
+
+COMMIT;
+SELECT regexp_replace(data, 'lsn: [^ ]+', '') FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'force-binary', '0', 'skip-empty-xacts', '1');
+   regexp_replace
+-
+ message:  transactional: 1 prefix: test, sz: 4 content:msg1
+ message:  transactional: 0 prefix: test, sz: 4 content:msg2
+ message:  transactional: 0 prefix: test, sz: 4 content:msg4
+ message:  transactional: 1 prefix: test, sz: 4 content:msg3
+ message:  transactional: 1 prefix: test, sz: 4 content:msg5
+(5 rows)
+
+SELECT 'init' FROM pg_drop_replication_slot('regression_slot');
+ ?column? 
+--
+ init
+(1 row)
+
diff --git a/contrib/test_decoding/sql/messages.sql b/contrib/test_decoding/sql/messages.sql
new file mode 100644
index 000..7f462b8
--- /dev/null
+++ b/contrib/test_decoding/sql/messages.sql
@@ -0,0 +1,17 @@
+-- predictability
+SET synchronous_commit = on;
+
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+
+SELECT 'msg1' FROM pg_logical_send_message(true, 'test', 'msg1');
+SELECT 'msg2' FROM pg_logical_send_message(false, 'test', 'msg2');
+
+BEGIN;
+

Fw: Re: [HACKERS] get current log file

2016-02-24 Thread Armor

Sorry， forgot forwarding the mail to the mail list.
Please put some comments.

--
Jerry Yu
https://github.com/scarbrofair

-- Original --
From:  "Armor";;
Date:  Tue, Feb 2, 2016 09:22 PM
To:  "Alvaro Herrera"; 

Subject:  Re:  [HACKERS] get current log file

 As we known, the name of current log file depends on the number of seconds 
(for simple, later I will call it last_syslogger_file_time) since Epoch when 
create new log file. So, for this feature, the key is how syslogger process 
pass last_syslogger_file_time to backend processes.

To pass last_syslogger_file_time, we have 2 solutions: 1, add a global 
variable to record last_syslogger_file_time which shared by backends and 
syslogger, so backends can get last_syslogger_file_time very easily; 2 
syslogger process send last_syslogger_file_time to pgstat process when 
last_syslogger_file_time changes, just as other auxiliary processes send stat  
message to pgstat process, and  pgstat process will write  
last_syslogger_file_time into stat file so that backend can get 
last_syslogger_file_time via reading this stat file.

   For these 2 solutions, we prefer to later, because we want to keep the 
global variables space much simpler.
   On the other side, we need to add a new function named 
pg_stat_get_log_file_name() which will return the current log file name  
according to last_syslogger_file_time and log file name format.
   If you have any question, please let me know.
--
Jerry Yu

-- Original --
From:  "Alvaro Herrera";;
Date:  Tue, Feb 2, 2016 06:30 PM
To:  "Armor"; 
Cc:  "pgsql-hackers"; 
Subject:  Re: [HACKERS] get current log file

Armor wrote:
> Hello,
> 
> 
> I find there is a new feature about getting current log file name on the 
> TODO list (for detail please check 
> http://www.postgresql.org/message-id/pine.gso.4.64.0811101325260.9...@westnet.com).
>  On the other side, we finish a ticket to this requirement for our customer. 
> If the PG community still need this feature,  there will be a pleasure 
> for us to make contribution. 

Please propose a design and we'll discuss.  There's clearly need for
this feature.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Service

Re: [HACKERS] Allow to specify (auto-)vacuum cost limits relative to the database/cluster size?

2016-02-24 Thread Alvaro Herrera

Joe Conway wrote:

> In my experience it is almost always best to run autovacuum very often
> and very aggressively. That generally means tuning scaling factor and
> thresholds as well, such that there are never more than say 50-100k dead
> rows. Then running vacuum with no delays or limits runs quite fast is is
> generally not noticeable/impactful.
> 
> However that strategy does not work well for vacuums which run long,
> such as an anti-wraparound vacuum. So in my opinion we need to think
> about this as at least two distinct cases requiring different solutions.

With the freeze map there is no need for anti-wraparound vacuums to be
terribly costly, since they don't need to scan the whole table each
time.  That patch probably changes things a lot in this area.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-02-24 Thread Teodor Sigaev


Thank you for remembering this problem, at least for me.


Well, turns out there's a quite significant difference, actually. The
index sizes I get (quite stable after multiple runs):

9.5 : 2428 MB
9.6 + alone cleanup : 730 MB
9.6 + pending lock : 488 MB
Interesting, I don't see why alone_cleanup and pending_lock are so differ. I'd 
like to understand that, but does somebody have an good theory? The single point 
in pending_lock patch is an suspicious exception in ProcSleep, this exception 
may cause problem in future.




So that's quite a significant difference, I guess. The load duration for
each version look like this:

9.5 : 1415 seconds
9.6 + alone cleanup : 1310 seconds
9.6 + pending lock  : 1380 seconds

I'd say I'm happy with sacrificing ~5% of time in exchange for ~35%
reduction of index size.
I think, alone_cleanup patch is faster because regular insert could break its 
cleanup process if autovacuum waits to begin work on cleanup. So, insert process 
could returns earlier from pending cleanup process.


In attachment just rebased v2 alone_cleanup patch.
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


gin_alone_cleanup-3.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow to specify (auto-)vacuum cost limits relative to the database/cluster size?

2016-02-24 Thread Joe Conway

On 02/23/2016 10:23 PM, Robert Haas wrote:
> On Tue, Jan 12, 2016 at 6:12 PM, Andres Freund  wrote:
>> right now the defaults for autovacuum cost limiting are so low that they
>> regularly cause problems for our users. It's not exactly obvious that
>> any installation above a couple gigabytes definitely needs to change
>> autovacuum_vacuum_cost_delay &
>> autovacuum_vacuum_cost_limit/vacuum_cost_limit. Especially
>> anti-wraparound/full table vacuums basically take forever with the
>> default settings.
>>
>> On the other hand we don't want a database of a couple hundred megabytes
>> to be vacuumed as fast as possible and trash the poor tiny system. So we
>> can't just massively increase the limits by default; although I believe
>> some default adjustment would be appropriate anyway.
>>
>> I wonder if it makes sense to compute the delays / limits in relation to
>> either cluster or relation size. If you have a 10 TB table, you
>> obviously don't want to scan with a few megabytes a second, which the
>> default settings will do for you. With that in mind we could just go for
>> something like the autovacuum_*_scale_factor settings. But e.g. for
>> partitioned workloads with a hundreds of tables in the couple gigabyte
>> range that'd not work that well.
>>
>> Somehow computing the speed in relation to the cluster/database size is
>> probably possible, but I wonder how we can do so without constantly
>> re-computing something relatively expensive?
>>
>> Thoughts?
> 
> Thanks for bringing this up.  I fully agree we should try to do
> something about this.  This comes up quite regularly in EnterpriseDB
> support discussions, and I'm sure lots of other people have problems
> with it too.  It seems to me that what we really want to do is try to
> finish vacuuming the table before we again need to vacuum the table.
> For the sake of simplicity, just consider the anti-wraparound case for
> a second.  If it takes three days to vacuum the table and we consume
> 200 million XIDs in two days, we are pretty clearly not vacuuming fast
> enough.

In my experience it is almost always best to run autovacuum very often
and very aggressively. That generally means tuning scaling factor and
thresholds as well, such that there are never more than say 50-100k dead
rows. Then running vacuum with no delays or limits runs quite fast is is
generally not noticeable/impactful.

However that strategy does not work well for vacuums which run long,
such as an anti-wraparound vacuum. So in my opinion we need to think
about this as at least two distinct cases requiring different solutions.


> I think we should do something similar to what we do for checkpoints.
> We estimate when the table will next need vacuuming based on the rate
> of XID advancement and the rate at which dead tuples are being
> created.  We can also estimate what percentage of the relation we've
> vacuumed and derive some estimate of when we'll be done - perhaps
> assuming only one index pass, for the sake of simplicity.  If we're
> behind, we should vacuum faster to try to catch up.  We could even try
> to include some fudge factor in the calculation - e.g. if the time
> until the next vacuum is estimated to be 30 hours from the start of
> the current vacuum, we try to make the current vacuum finish in no
> more than 75% * 30 hours = 22.5 hours.

This seems reasonable for the anti-wraparound case.

> I think this is better than your proposal to scale it just based on
> the size of the relation because it may be find for the vacuum to run
> slowly if we're creating very few dead tuples and consuming very few
> XIDs.  IME, there's one very specific scenario where the wheels come
> off, and that's when the table doesn't get fully vacuumed before it's
> due to be vacuumed again.  Of course, anything we did here wouldn't be
> perfect - it would all be based on estimates - but I bet we could make
> things a lot better.  There's an even more global version of this
> problem, which is that you could have a situation when any given table
> gets vacuumed it runs quick enough to finish before that table gets
> vacuumed again, but there are lots of large tables so overall we don't
> make enough progress.  It would be nice to fix that, too, but even
> something simple that ignored that more global problem would help a
> lot of people.

This brings up a third scenario I've seen, which is lots (think
thousands) of individual tables needing vacuum almost constantly. In
that case autovacuum_naptime and autovacuum_max_workers also need tuning
or you never get to all of them.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] PATCH: use foreign keys to improve join estimates v1

2016-02-24 Thread Tomas Vondra


Hi,

On 09/30/2015 03:12 AM, David Rowley wrote:
...

CREATE TABLE f (id1 INT, id2 INT, PRIMARY KEY (id1, id2));

CREATE TABLE d1 (id1 INT, id2 INT, FOREIGN KEY (id1, id2) REFERENCES
f(id1, id2));
CREATE TABLE d2 (id1 INT, id2 INT, FOREIGN KEY (id1, id2) REFERENCES
f(id1, id2));

INSERT INTO f SELECT i, i FROM generate_series(1,100) s(i);

INSERT INTO d1 SELECT i, i FROM generate_series(1,10) s(i);
INSERT INTO d2 SELECT i, i FROM generate_series(1,30) s(i);

now, both pair-wise joins (f JOIN d1) and (f JOIN d2) are estimated
perfectly accurately, but as soon as the query involves both of
them, this happens:

SELECT * FROM f JOIN d1 ON (f.id1 = d1.id1 AND f.id2 = d1.id2)
 JOIN d2 ON (f.id1 = d2.id1 AND f.id2 = d2.id2);

   QUERY PLAN
-
  Nested Loop  (cost=3334.43..12647.57 rows=3 width=24)
   (actual time=221.086..1767.206 rows=10 loops=1)
Join Filter: ((d1.id1 = f.id1) AND (d1.id2 = f.id2))
->  Hash Join  (cost=3334.00..12647.01 rows=1 width=16)
   (actual time=221.058..939.482 rows=10 loops=1)
  Hash Cond: ((d2.id1 = d1.id1) AND (d2.id2 = d1.id2))
  ->  Seq Scan on d2  (cost=0.00..4328.00 rows=30 width=8)
  (actual time=0.038..263.356 rows=30 loops=1)
  ->  Hash  (cost=1443.00..1443.00 rows=10 width=8)
(actual time=220.721..220.721 rows=10 loops=1)
Buckets: 131072  Batches: 2  Memory Usage: 2982kB
->  Seq Scan on d1  (cost=0.00..1443.00 rows=10 ...)
(actual time=0.033..101.547 rows=10 loops=1)
->  Index Only Scan using f_pkey on f  (cost=0.42..0.54 rows=1 ...)
 (actual time=0.004..0.004 rows=1 loops=10)
  Index Cond: ((id1 = d2.id1) AND (id2 = d2.id2))
  Heap Fetches: 10

Clearly, the inner join (d1 JOIN d2) is poorly estimated (1 vs.
10). I assume that's only because we find FK only on the second
join with f.

So it seems like s a clear improvement, both compared to master and
the previous versions of the patch.


I've been experimenting with this example. Of course, the reason why we
get the 1 row estimate on the join between d1 and d2 is that there's no
foreign key between those two relations.

The attached patch changes things so that the foreign key matching code
is better able to see foreign keys "hidden" behind eclasses. So it does
now in fact detect a foreign key on d2 referencing d1, by looking for
Vars foreign keys which have Vars in the same eclasses as the joinquals
are built from. This has improved the result

postgres=# EXPLAIN ANALYZE SELECT * FROM f JOIN d1 ON (f.id1 = d1.id1
AND f.id2 = d1.id2) JOIN d2 ON (f.id1 = d2.id1 AND f.id2 = d2.id2);

  QUERY PLAN
-
  Hash Join  (cost=16655.94..26066.95 rows=3 width=24) (actual
time=267.322..468.383 rows=10 loops=1)
Hash Cond: ((d2.id1 = f.id1) AND (d2.id2 = f.id2))
->  Seq Scan on d2  (cost=0.00..4328.00 rows=30 width=8) (actual
time=0.019..31.396 rows=30 loops=1)
->  Hash  (cost=14666.94..14666.94 rows=10 width=16) (actual
time=266.263..266.263 rows=10 loops=1)
  Buckets: 131072  Batches: 2  Memory Usage: 3373kB
  ->  Merge Join  (cost=9748.32..14666.94 rows=10 width=16)
(actual time=104.494..224.908 rows=10 loops=1)
Merge Cond: ((f.id1 = d1.id1) AND (f.id2 = d1.id2))
->  Index Only Scan using f_pkey on f
  (cost=0.42..36214.93 rows=100 width=8) (actual time=0.045..35.758
rows=11 loops=1)
  Heap Fetches: 11
->  Sort  (cost=9747.82..9997.82 rows=10 width=8)
(actual time=104.440..122.401 rows=10 loops=1)
  Sort Key: d1.id1, d1.id2
  Sort Method: external sort  Disk: 2152kB
  ->  Seq Scan on d1  (cost=0.00..1443.00
rows=10 width=8) (actual time=0.019..9.443 rows=10 loops=1)

The problem is that the code I added is sometimes a bit too optimistic
at finding a suitable foreign key. When performing estimates for the
join between (f,d1) <-> (d2), since the code loops over each relation
making up the set of relations at either side of the join, we find a
foreign key on 'f' which references d2, this one actually exists. It
then goes on and also finds a foreign key for (d1) references (d2), of
course this one does not exists and it's only could due to the eclasses.
The problem here is, which one do we use? If we multiply the selectivity
for each of these foreign keys then we'd end up with a

Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Bruce Momjian

On Wed, Feb 24, 2016 at 01:02:21PM -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > On Wed, Feb 24, 2016 at 01:08:29AM +, Simon Riggs wrote:
> 
> > > It's never been our policy to try to include major projects in single code
> > > drops. Any move of XL/XC code into PostgreSQL core would need to be done 
> > > piece
> > > by piece across many releases. XL is definitely too big for the elephant 
> > > to eat
> > > in one mouthful.
> > 
> > Is there any plan to move the XL/XC code into Postgres?  If so, I have
> > not heard of it.  I thought everyone agreed it was too much code change,
> > which is why it is a separate code tree.  Is that incorrect?
> 
> Yes, I think that's incorrect.
> 
> What was said, as I understood it, is that Postgres-XL is too big to
> merge in a single commit -- just like merging BDR would have been.
> Indulge me while I make a parallel with BDR for a bit.
> 2ndQuadrant never pushed for merging BDR in a single commit; what was
> done was to split it, and propose individual pieces for commit.  Many of
> these pieces are now already committed (event triggers, background
> workers, logical decoding, replication slots, and many others).  The
> "BDR patch" is now much smaller, and it's quite possible that we will
> see it merged someday.  Will it be different from what it was when the
> BDR project started, all those years ago?  You bet.  Having the
> prototype BDR initially was what allowed the whole plan to make sense,
> because it showed that the pieces interacted in the right ways to make
> it work as a whole.

Yes, that is my understanding too.

> (I'm not saying 2ndQuadrant is so wise to do things this way.  I'm
> pretty sure you can see the same thing in parallel query development,
> for instance.)
> 
> In the same way, Postgres-XL is far too big to merge in a single commit.
> But that doesn't mean it will never be merged.  What is more likely to
> happen instead is that some pieces of it are going to be submitted
> separately for consideration.  It is a slow process, but progress is
> real and tangible.  We know this process will yield a useful outcome,

I was not aware there was any process to merge XC/XL into Postgres, at
least from the XC/XL side.  I know there is desire to take code from
XC/XL on the FDW-sharding side.

I think the most conservative merge approach is to try to enhance
existing Postgres features first (FDWs, partitioning, parallelism),
perhaps features that didn't exist at the time XC/XL were designed. If
they work, keep them and add the XC/XL-specific parts.  If the
enhance-features approach doesn't work, we then have to consider how
much additional code will be needed.  We have to evaluate this for the
FDW-based approach too, but it is likely to be smaller, which is its
attraction.

> because the architecture has already proven by the existence of
> Postgres-XL itself.  It's the prototype that proves the overall design,
> even if the pieces change shape during the process.  (Really, it's way
> more than merely a prototype at this point because of how long it has
> matured.)

True, it is beyond a prototype.

> In contrast, we don't have a prototype for FDW-based sharding; as you
> admitted, there is no actual plan, other than "let's push FDWs in this
> direction and hope that sharding will emerge".  We don't really know
> what pieces we need or how will they interact with each other; we have a
> vague idea of a direction but there's no clear path forward.  As the
> saying goes, if you don't know where you're going, you will probably end
> up somewhere else.

I think I have covered that already.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Alvaro Herrera

Bruce Momjian wrote:
> On Wed, Feb 24, 2016 at 01:08:29AM +, Simon Riggs wrote:

> > It's never been our policy to try to include major projects in single code
> > drops. Any move of XL/XC code into PostgreSQL core would need to be done 
> > piece
> > by piece across many releases. XL is definitely too big for the elephant to 
> > eat
> > in one mouthful.
> 
> Is there any plan to move the XL/XC code into Postgres?  If so, I have
> not heard of it.  I thought everyone agreed it was too much code change,
> which is why it is a separate code tree.  Is that incorrect?

Yes, I think that's incorrect.

What was said, as I understood it, is that Postgres-XL is too big to
merge in a single commit -- just like merging BDR would have been.
Indulge me while I make a parallel with BDR for a bit.
2ndQuadrant never pushed for merging BDR in a single commit; what was
done was to split it, and propose individual pieces for commit.  Many of
these pieces are now already committed (event triggers, background
workers, logical decoding, replication slots, and many others).  The
"BDR patch" is now much smaller, and it's quite possible that we will
see it merged someday.  Will it be different from what it was when the
BDR project started, all those years ago?  You bet.  Having the
prototype BDR initially was what allowed the whole plan to make sense,
because it showed that the pieces interacted in the right ways to make
it work as a whole.

(I'm not saying 2ndQuadrant is so wise to do things this way.  I'm
pretty sure you can see the same thing in parallel query development,
for instance.)

In the same way, Postgres-XL is far too big to merge in a single commit.
But that doesn't mean it will never be merged.  What is more likely to
happen instead is that some pieces of it are going to be submitted
separately for consideration.  It is a slow process, but progress is
real and tangible.  We know this process will yield a useful outcome,
because the architecture has already proven by the existance of
Postgres-XL itself.  It's the prototype that proves the overall design,
even if the pieces change shape during the process.  (Really, it's way
more than merely a prototype at this point because of how long it has
matured.)

In contrast, we don't have a prototype for FDW-based sharding; as you
admitted, there is no actual plan, other than "let's push FDWs in this
direction and hope that sharding will emerge".  We don't really know
what pieces we need or how will they interact with each other; we have a
vague idea of a direction but there's no clear path forward.  As the
saying goes, if you don't know where you're going, you will probably end
up somewhere else.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Bruce Momjian

On Wed, Feb 24, 2016 at 09:34:37AM -0500, Bruce Momjian wrote:
> > I have nothing against particular FDW advances. However, it's unclear for me
> > that FDW should be the only sharding approach.
> > It's unproven that FDW can do work that Postgres XC/XL does. With FDW we can
> > have some low-hanging fruits. That's good.
> > But it's unclear we can have high-hanging fruits (like data redistribution)
> > with FDW approach. And if we can it's unclear that it would be easier than 
> > with
> > other approaches.
> > Just let's don't call this community chosen plan for implementing sharding.
> > Until we have full picture we can't select one way and reject others.
> 
> I agree.  I think the FDW approach is the only existing approach for
> built-in sharding though.  The forks of Postgres doing sharding are,
> just that, forks and just Postgres community ecosystem projects.   (Yes,
> they are open source.)  If the forks were community-chosen plans we
> hopefully would not have 5+ of them.  If FDW works, it has the potential
> to be the community-chosen plan, at least for the workloads it supports,
> because it is built into community Postgres in a way the others cannot.
> 
> That doesn't mean the forks go away, but rather their value is in doing
> things the FDW approach can't, but there are a lot of "if's" in there.

Actually, this seems similar to how we handled replication.  For years
we had multiple external replication solutions.  When we implemented
streaming replication, we knew it would become the default for workloads
it supports.  The external solutions didn't go away, but their value was
in handling workloads that streaming replication didn't support.

I think the only difference is that we knew streaming replication would
have this effect before we implemented it, while with FDW-based
sharding, we don't know.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Bruce Momjian

On Wed, Feb 24, 2016 at 12:22:20PM +0300, Konstantin Knizhnik wrote:
> Sorry, but based on this plan it is possible to make a conclusion
> that there are only two possible cluster solutions for Postgres:
> XC/XL and FDW-based.  From my point of view there are  much more
> possible alternatives.
> Our main idea with XTM (eXtensible Transaction Manager API) was to
> make it possible to develop cluster solutions for Postgres as
> extensions without patching code of Postgres core. And FDW is one of
> the mechanism which makes it possible to reach this goal.

Yes, this is a good example of code reuse.

> IMHO it will be hard to implement efficient execution of complex
> OLAP queries (including cross-node joins  and aggregation) within
> FDW paradigm. It will be necessary to build distributed query
> execution plan and coordinate it execution at cluster nodes. And
> definitely we need specialized optimizer for distributed queries.
> Right now solution of the problem are provided by XL and Greenplum,
> but both are forks of Posrgres with a lot of changes in Postgres
> core. The challenge is to provide the similar functionality, but at
> extension level (using custom nodes, pluggable transaction manager,
> ...).

Agreed.

> But, as you noticed,  complex OLAP is just one of the scenarios and
> this is not the only possible way of using clusters. In some cases
> FDW-based sharding can be quite efficient. Or pg_shard approach
> which also adds sharding at extension level and in some aspects is
> more flexible than FDW-based solution. Not all scenarios require
> global transaction manager. But if one need global consistency, then
> XTM API allows to provide ACID for both approaches (and not only for
> them).

Yep.

> We currently added to commitfest our XTM patch together with
> postgres_fdw patch integrating timestamp-based DTM implementation in
> postgres_fdw. It illustrates how global consistency canbe reached
> for FDW-based sharding.
> If this XTM patch will be committed, then in 9.6 we will have wide
> flexibility to play with different distributed transaction managers.
> And it can be used for many cluster solutions.
> 
> IMHO it will be very useful to extend your classification of cluster
> use cases, more precisely  formulate demands in all cases,
> investigate  how them can be covered by existed cluster solutions
> for Postgres and which niches are still vacant. We are currently
> continue work on "multimaster" - some more convenient alternative to
> hot-standby replication. Looks like PostgreSQL is missing some
> product providing functionality similar to Oracle RAC or MySQL
> Gallera. It is yet another direction of cluster development for
> PostgreSQL.  Let's be more open and flexible.

Yes, I listed only the workloads I could think of.  It would be helpful
to list more workloads and start to decide what can be accomplished with
each approach.  I don't even know all the workloads supported by the
sharding forks of Postgres.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Bruce Momjian

On Wed, Feb 24, 2016 at 12:35:15PM +0300, Oleg Bartunov wrote:
> I have nothing against particular FDW advances. However, it's unclear for
> me that FDW should be the only sharding approach.
> It's unproven that FDW can do work that Postgres XC/XL does. With FDW we
> can have some low-hanging fruits. That's good.
> But it's unclear we can have high-hanging fruits (like data 
> redistribution)
> with FDW approach. And if we can it's unclear that it would be easier than
> with other approaches.
> Just let's don't call this community chosen plan for implementing 
> sharding.
> Until we have full picture we can't select one way and reject others.
> 
> 
> I already several times pointed, that we need XTM to be able to continue
> development in different directions, since there is no clear winner.  
> Moreover,
> I think there is no fits-all  solution and while I agree we need one built-in
> in the core, other approaches should have ability to exists without patching.

Yep.  I think much of what we eventually add to core will be either
copied from an existing soltion, which then doesn't need to be
maintained anymore, or used by existing solutions.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Bruce Momjian

On Wed, Feb 24, 2016 at 12:17:28PM +0300, Alexander Korotkov wrote:
> Hi, Bruce!
> 
> The important point for me is to distinguish different kind of plans:
> implementation plan and research plan.
> If we're talking about implementation plan then it should be proven that
> proposed approach works in this case. I.e research should be already done.
> If we're talking about research plan then we should realize that result is
> unpredictable. And we would probably need to dramatically change our way.

Yes, good point.  I would say FDW-based sharding is certainly still a
research approach, but an odd one because we are adding code even while
in research mode.  I think that is possible because the FDW improvements
have other uses beyond sharding.

I think another aspect is that we already know that modifying the
Postgres source code can produce a useful sharding solution --- XC, XL,
Greenplum, and CitusDB all prove that, and pg_shard does it as a plugin.
So, we know that with unlimited code changes, it is possible.  What we
don't know is whether it is possible with acceptable code changes, and
how much of the feature-set can be supported this way.

We had a similar case with the Windows port, where SRA (my employer at
the time) and Nusphere both had native Windows ports of Postgres, and
they supplied source code to help with the port.  So, in that case also,
we knew a native Windows port was possible, and we (or at least I) could
see the code that was required to do it.  The big question was whether a
native Windows port could be added in a community-acceptable way, and
the community agreed we could try if we didn't make the code messier ---
that was a success.

For pg_upgrade, I had code from EDB (my employer at the time) that kind
of worked, but needed lots of polish, and again, I could do it in
contrib as long as I didn't mess up the backend code --- that worked
well too.

So, I guess I am saying, the FDW/sharding thing is a research project,
but one that is implementing code because of existing proven solutions
and because the improvements are benefiting other use-cases beyond
sharding.

Also, in the big picture, the existence of many Postgres forks, all
doing sharding, indicates that there is demand for this capability, and
if we can get some this capability into Postgres we will increase the
number of people using native Postgres.  We might also be able to reduce
the amount of duplicate work being done in all these forks and allow
them to more easily focus on more advanced use-cases.

> This two things would work with FDW:
> 1) Pull data from data nodes to coordinator.
> 2) Pushdown computations from coordinator to data nodes: joins, aggregates 
> etc.
> It's proven and clear. This is good.
> Another point is that these FDW advances are useful by themselves. This is 
> good
> too.
> 
> However, the model of FDW assumes that communication happen only between
> coordinator and data node. But full-weight distributed optimized can't be done
> under this restriction, because it requires every node to communicate every
> other node if it makes distributed query faster. And as I get, FDW approach
> currently have no research and no particular plan for that.

This is very true.  I imagine cross-node connections will certainly
complicate the implementation and lead to significant code changes,
which might be unacceptable.  I think we need to go with a
non-cross-node implementation first, then if that is accepted, we can
start to think what cross-node code changes would look like.  It
certainly would require FDW knowledge to exist on every shard.  Some
have suggested that FDWs wouldn't work well for cross-node connections
or wouldn't scale and we shouldn't be using them --- I am not sure what
to think of that.

> As I get from Robert Haas's talk 
> (https://docs.google.com/viewer?a=v&pid=sites&;
> srcid=ZGVmYXVsdGRvbWFpbnxyb2JlcnRtaGFhc3xneDo1ZmFhYzBhNjNhNzVhMDM0)
> 
> Before we consider repartitioning joins, we should probably get everything
> previously discussed working first.
> – Join Pushdown For Parallelism, FDWs
> – PartialAggregate/FinalizeAggregate
> – Aggregate Pushdown For Parallelism, FDWs
> – Declarative Partitioning
> – Parallel-Aware Append
> 
> 
> So, as I get we didn't ever think about possibility of data redistribution
> using FDW. Probably, something changed since that time. But I haven't heard
> about it.

No, you didn't miss it.  :-(  We just haven't gotten to studying that
yet.  One possible outcome is that built-in Postgres has non-cross-node
sharding, and forks of Postgres have cross-node sharding, again assuming
cross-node sharding requires an unacceptable amount of code change.  I
don't think anyone knows the answer yet.

> On Tue, Feb 23, 2016 at 7:43 PM, Bruce Momjian  wrote:
> 
> Second, as part of this staged implementation, there are several use
> cases that will be shardable at first, and then only later, more complex
> ones.  For example, here

Re: [HACKERS] Prepared Statement support for Parallel query

2016-02-24 Thread Robert Haas

On Wed, Feb 17, 2016 at 6:41 PM, Amit Kapila  wrote:
> Commit d1b7c1ffe72e86932b5395f29e006c3f503bc53d has added
> the support for passing bind parameters to parallel workers, however
> prepared statement which uses bind parameters wasn't enabled
> for parallel query as the 'Execute' message in FE-BE protocol
> can pass the row_count which can make parallel plans unusable.
> (parallel plans are only possible when query can run to completion)
>
> Later Commit bfc78d7196eb28cd4e3d6c24f7e607bacecf1129 has
> ensure that if the row_count is non-zero then we won't enter
> parallel mode which means that even if parallel plan is selected
> by optimizer, it will run such a plan locally.
>
> With above support, it was just a matter of enabling parallel
> mode for prepared statements which is done in attached patch
> (prepared_stmt_parallel_query_v1.patch).
>
> I have tested that parallel plans are getting generated both
> via Prepare/Execute statements and libpq prepared
> statement execution.  Attached is a libpq program
> (prepare_parallel_query.c) which I have used for testing prepared
> statement support.  I have done the verification manually
> (using auto_explain) to ensure that parallel plans gets generated
> and executed via this libpq program.  This program expects some
> data to be generated before-hand and the information of same is
> added in file-header.

Hmm.   I agree we should change exec_parse_message like this, but
changing PrepareQuery seems wrong.  I mean, there's a very good chance
that a parse message will be followed by an Execute message with a
zero row count, so we'll get parallel execution.  But if the user says
they want to PREPARE the query, they are probably not going to fetch
all rows.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prepared Statement support for Parallel query

2016-02-24 Thread Robert Haas

On Wed, Feb 17, 2016 at 6:41 PM, Amit Kapila  wrote:
> Commit d1b7c1ffe72e86932b5395f29e006c3f503bc53d has added
> the support for passing bind parameters to parallel workers, however
> prepared statement which uses bind parameters wasn't enabled
> for parallel query as the 'Execute' message in FE-BE protocol
> can pass the row_count which can make parallel plans unusable.
> (parallel plans are only possible when query can run to completion)
>
> Later Commit bfc78d7196eb28cd4e3d6c24f7e607bacecf1129 has
> ensure that if the row_count is non-zero then we won't enter
> parallel mode which means that even if parallel plan is selected
> by optimizer, it will run such a plan locally.
>
> With above support, it was just a matter of enabling parallel
> mode for prepared statements which is done in attached patch
> (prepared_stmt_parallel_query_v1.patch).
>
> I have tested that parallel plans are getting generated both
> via Prepare/Execute statements and libpq prepared
> statement execution.  Attached is a libpq program
> (prepare_parallel_query.c) which I have used for testing prepared
> statement support.  I have done the verification manually
> (using auto_explain) to ensure that parallel plans gets generated
> and executed via this libpq program.  This program expects some
> data to be generated before-hand and the information of same is
> added in file-header.

Hmm.   I agree we should change exec_parse_message like this, but
changing PrepareQuery seems wrong.  I mean, there's a very good chance
that a parse message will be followed by an Execute message with a
zero row count, so we'll get parallel execution.  But if the user says
they want to PREPARE the query, they are probably not going to fetch
all rows.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Writing new unit tests with PostgresNode

2016-02-24 Thread Michael Paquier

On Tue, Feb 23, 2016 at 10:39 PM, Craig Ringer  wrote:
> Just finished doing that. Thanks for taking a look at the first patch so
> quickly. I hope this one's better.
>
> FWIW I think you were right that using pod for the actual test wasn't the
> best choice, I should've just used comments. I do think it's important for
> the modules to have structured docs.



> I've removed the example suite in favour of adding a SYNOPSIS section to
> PostgresNode.pm and describing the rest in the README. It won't be necessary
> once your replication tests go in, they'll be a perfectly adequate example.

Software engineering is an adapt-and-survive or die universe, I
usually choose the former way of thinking on this ML. In short, I
don't mind rebasing on top of this stuff as needed or put efforts
where need be if the overall result is better.

> I also cut out the changes to the backup method; I'll send a pull to add to
> your proposed replication patch instead.

Thanks.

For patch 1:
+Not all these tests get run by "make check". Check src/test/Makefile to see
+which tests get run automatically.
Perhaps adding that those are listed under SUBDIRS, ALWAYS_SUBDIRS
meaning that those paths are not part of the main suite?

+examples/
+  demo programs for libpq that double as regression tests via "make check"
s/demo/demonstration

Nitpick: adding a dot at the end of the sentences describing each folder?

+  extensions used only or mainly for test purposes, generally not useful
+  or suitable for installing in production databases. Some of these have
+  their own tests, some are used by tests elsewhere.
I wouldn't say not useful, just not suitable.

+  infrastructure for Perl-based Test::More tests. There are no actual tests
+  here, the code is used by other tests in src/bin/, contrib/ and in the
+  subdirectories of src/test.
Hm, those are called TAP tests (Test Anything Protocol) and not
Test::More tests, no?

+elsewhere in the test tree.
"test tree" or "code tree"?

Those two files are good additions btw.

For patch 2:
(Somebody more familiar than I would be better commenting on the
format that this patch proposes for PostgresNode.pm).

+  use PostgresNode;
+
+  my $node = get_new_node('mynode');
It would be good to add a comment like grab a new node and assign a
free port to it.

+It requires IPC::Run - install libperl-ipc-run (Debian/Ubuntu) or perl-IPC-Run
+(Fedora/RHEL).
Archlinux: perl-ipc-run. I would remove the package portion honestly,
this is really platform dependent and there are as many package names
as platforms, many.

+around Test::More functions to run commands with an envronment set up to
s/envronment/environment

+You should generally prefer to use get_new_node() instead since it takes care
+of finding port numbers, registering instances for cleanup, etc.
Is "you" something used a lot in perl module docs? This is not really
Postgres-like.

+=item $node->data_dir()
All those items can be listed without parenthesis.

+If archiving is enabled, WAL files go here.
Location of WAL segments when WAL archiving is enabled.

+=cut
+
+sub info
+{
Do you have a use case in mind for PostgresNode::info? If this is just
a doc patch I would suggest adding that as a separate patch if that's
really needed.

+   $self->host eq $test_pghost
+ or die "set_replication_conf only works with the default host";
Er, why? On Windows 127.0.0.1 is used all the time. On Linux local is
used, but a node can still connect to another node via replication by
using the connection string of the node it is connecting to.

+Create a hot backup with pg_basebackup in $node->backup_dir,
+including the transaction logs. xlogs are fetched at the
+end of the backup, not streamed.
s/xlogs/WAL data.

+You'll have to configure a suitable max_wal_senders on the
+target server since it isn't done by default.
My patch does that :) We could remove it.

+src/test/ssl, or should be added to one of the suites for an existing utility
Nitpick: dot at the end of a sentence.

+You should add the new suite to src/test/Makefile . See the comments there.
Nitpick: space-dot.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-02-24 Thread Robert Haas

On Mon, Feb 22, 2016 at 10:05 AM, Amit Kapila  wrote:
>> I wouldn't bother tinkering with it at this point.  The value isn't
>> going to be recorded on disk anywhere, so it will be easy to change
>> the way it's computed in the future if we ever need to do that.
>>
>
> Okay. Find the rebased patch attached with this mail.  I have moved
> this patch to upcoming CF.

I would call the functions pgstat_report_wait_start() and
pgstat_report_wait_end() instead of pgstat_report_start_waiting() and
pgstat_report_end_waiting().

I think pgstat_get_wait_event_type should not return HWLock, a term
that appears nowhere in our source tree at present.  How about just
"Lock"?

I think the wait event types should be documented - and the wait
events too, perhaps.

Maybe it's worth having separate wait event type names for lwlocks and
lwlock tranches.  We could report LWLockNamed and LWLockTranche and
document the difference: "LWLockNamed indicates that the backend is
waiting for a specific, named LWLock.  The event name is the name of
that lock.  LWLockTranche indicates that the backend is waiting for
any one of a group of locks with similar function.  The event name
identifies the general purpose of locks in that group."

There's no requirement that every session have every tranche
registered.  I think we should consider displaying "extension" for any
tranche that's not built-in, or at least for tranches that are not
registered (rather than "unknown wait event").

+   if (lock->tranche == 0 && lockId < NUM_INDIVIDUAL_LWLOCKS)

Isn't the second part automatically true at this point?

The changes to LockBufferForCleanup() don't look right to me.  Waiting
for a buffer pin might be a reasonable thing to define as a wait
event, but it shouldn't reported as if we were waiting on the LWLock
itself.

What happens if an error is thrown while we're in a wait?

Does this patch hurt performance?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-02-24 Thread Tomas Vondra


On 02/24/2016 06:56 AM, Robert Haas wrote:

On Wed, Feb 24, 2016 at 9:17 AM, Tomas Vondra
 wrote:

...

Are we going to anything about this? While the bug is present in 9.5 (and
possibly other versions), fixing it before 9.6 gets out seems important
because reproducing it there is rather trivial (while I've been unable to
reproduce it on 9.5).


I'm not going to promise to commit anything here, because GIN is not
usually my area, but could you provide a link to the email that
contains the patch you think should be committed?


Sure. There are actually two candidate patches in two separate threads, 
I'm nots sure which one is better. Based on the testing both seem to fix 
the issue and the "pending lock" patch produces much smaller indexes (at 
least in my benchmark):


[1] http://www.postgresql.org/message-id/56041b26.2040...@sigaev.ru
[2] 
http://www.postgresql.org/message-id/CAMkU=1w7uu1gz8n0bxmykrlgth-uph+gphfhmneryzpcv7f...@mail.gmail.com


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Failover Slots

2016-02-24 Thread Craig Ringer

On 24 February 2016 at 03:53, Oleksii Kliukin  wrote:

>
> I found the following issue when shutting down a master with a connected
> replica that uses a physical failover slot:
>
> 2016-02-23 20:33:42.546 CET,,,54998,,56ccb3f3.d6d6,3,,2016-02-23 20:33:07
> CET,,0,DEBUG,0,"performing replication slot checkpoint",""
> 2016-02-23 20:33:42.594 CET,,,55002,,56ccb3f3.d6da,4,,2016-02-23 20:33:07
> CET,,0,DEBUG,0,"archived transaction log file
> ""00010003""",""
> 2016-02-23 20:33:42.601 CET,,,54998,,56ccb3f3.d6d6,4,,2016-02-23 20:33:07
> CET,,0,PANIC,XX000,"concurrent transaction log activity while database
> system is shutting down",""
> 2016-02-23 20:33:43.537 CET,,,54995,,56ccb3f3.d6d3,5,,2016-02-23 20:33:07
> CET,,0,LOG,0,"checkpointer process (PID 54998) was terminated by signal
> 6: Abort trap",""
> 2016-02-23 20:33:43.537 CET,,,54995,,56ccb3f3.d6d3,6,,2016-02-23 20:33:07
> CET,,0,LOG,0,"terminating any other active server processes",
>
>
Odd that I didn't see that in my testing. Thanks very much for this. I
concur with your explanation.

Basically, the issue is that CreateCheckPoint calls
> CheckpointReplicationSlots, which currently produces WAL, and this violates
> the assumption at line xlog.c:8492
>
> if (shutdown && checkPoint.redo != ProcLastRecPtr)
> ereport(PANIC,
> (errmsg("concurrent transaction log activity while database system is
> shutting down")));
>

Interesting problem.

It might be reasonably harmless to omit writing WAL for failover slots
during a shutdown checkpoint. We're using WAL to move data to the replicas
but we don't really need it for local redo and correctness on the master.
The trouble is that we do of course redo failover slot updates on the
master and we don't really want a slot to go backwards vs its on-disk state
before a crash. That's not too harmful - but might be able to lead to us
losing a slot catalog_xmin increase so the slot thinks catalog is still
readable that could've actually been vacuumed away.

CheckpointReplicationSlots notes that:

 * This needn't actually be part of a checkpoint, but it's a convenient
 * location.

... and I suspect the answer there is simply to move the slot checkpoint to
occur prior to the WAL checkpoint rather than during it. I'll investigate.

I really want to focus on the first patch, timeline following for logical
slots. That part is much less invasive and is useful stand-alone. I'll move
it to a separate CF entry and post it to a separate thread as I think it
needs consideration independently of failover slots.

(BTW, the slot docs promise that slots will replay a change exactly once,
but this is not correct and the client must keep track of replay position.
I'll post a patch to correct it separately).

> There are a couple of incorrect comments
>

Thanks, will amend.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types

2016-02-24 Thread Artur Zakirov


On 21.02.2016 11:31, Pavel Stehule wrote:

Hi

I am sending updated version - the changes are related to fix comments.



Great.

I am new in reviewing, I think Pavel took into account all comments. 
This patch is compiled and regression tests are passed. So I change its 
status to "Ready for Committer".





 By the way, these functions are misnamed after this patch.
They are
 called "wordtype" and "cwordtype" originally because they
accept
 "word%TYPE" and "compositeword%TYPE", but after the patch
they not only
 accept TYPE at the right of the percent sign but also
ELEMENTTYPE and
 ARRAYTYPE.  Not sure that this is something we want to be
too strict
 about.


Understand - used name ***reftype instead type


I am not sure, but it seems that new names is a little worse. I
think original names are good too. They accept a word and return the
PLpgSQL_type structure.


The "TYPE" word in this name was related to syntax %TYPE. And because
new syntax allows more constructs, then the change name is correct. I am
think. But choosing names is hard work. The new name little bit more
strongly show relation to work with referenced types.



Agree.


--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Oleg Bartunov

On Wed, Feb 24, 2016 at 12:17 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> Hi, Bruce!
>
> The important point for me is to distinguish different kind of plans:
> implementation plan and research plan.
> If we're talking about implementation plan then it should be proven that
> proposed approach works in this case. I.e research should be already done.
> If we're talking about research plan then we should realize that result is
> unpredictable. And we would probably need to dramatically change our way.
>
> This two things would work with FDW:
> 1) Pull data from data nodes to coordinator.
> 2) Pushdown computations from coordinator to data nodes: joins, aggregates
> etc.
> It's proven and clear. This is good.
> Another point is that these FDW advances are useful by themselves. This is
> good too.
>
> However, the model of FDW assumes that communication happen only between
> coordinator and data node. But full-weight distributed optimized can't be
> done under this restriction, because it requires every node to communicate
> every other node if it makes distributed query faster. And as I get, FDW
> approach currently have no research and no particular plan for that.
>
> As I get from Robert Haas's talk (
> https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxyb2JlcnRtaGFhc3xneDo1ZmFhYzBhNjNhNzVhMDM0
> )
>
>> Before we consider repartitioning joins, we should probably get
>> everything previously discussed working first.
>> – Join Pushdown For Parallelism, FDWs
>> – PartialAggregate/FinalizeAggregate
>> – Aggregate Pushdown For Parallelism, FDWs
>> – Declarative Partitioning
>> – Parallel-Aware Append
>
>
> So, as I get we didn't ever think about possibility of data redistribution
> using FDW. Probably, something changed since that time. But I haven't heard
> about it.
>
> On Tue, Feb 23, 2016 at 7:43 PM, Bruce Momjian  wrote:
>
>> Second, as part of this staged implementation, there are several use
>> cases that will be shardable at first, and then only later, more complex
>> ones.  For example, here are some use cases and the technology they
>> require:
>>
>> 1. Cross-node read-only queries on read-only shards using aggregate
>> queries, e.g. data warehouse:
>>
>> This is the simplest to implement as it doesn't require a global
>> transaction manager, global snapshot manager, and the number of rows
>> returned from the shards is minimal because of the aggregates.
>>
>> 2. Cross-node read-only queries on read-only shards using non-aggregate
>> queries:
>>
>> This will stress the coordinator to collect and process many returned
>> rows, and will show how well the FDW transfer mechanism scales.
>>
>
> FDW would work for queries which fits pull-pushdown model. I see no plan
> to make other queries work.
>
>
>> 3. Cross-node read-only queries on read/write shards:
>>
>> This will require a global snapshot manager to make sure the shards
>> return consistent data.
>>
>> 4. Cross-node read-write queries:
>>
>> This will require a global snapshot manager and global snapshot manager.
>>
>
> At this point, it unclear why don't you refer work done in the direction
> of distributed transaction manager (which is also distributed snapshot
> manager in your terminology)
> http://www.postgresql.org/message-id/56bb7880.4020...@postgrespro.ru
>
>
>> In 9.6, we will have FDW join and sort pushdown
>> (http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
>> calability.html
>> ).
>> Unfortunately I don't think we will have aggregate
>> pushdown, so we can't test #1, but we might be able to test #2, even in
>> 9.5.  Also, we might have better partitioning syntax in 9.6.
>>
>> We need things like parallel partition access and replicated lookup
>> tables for more join pushdown.
>>
>> In a way, because these enhancements are useful independent of sharding,
>> we have not tested to see how well an FDW sharding setup will work and
>> for which workloads.
>>
>
> This is the point I agree. I'm not objecting against any single FDW
> advance, because it's useful by itself.
>
> We know Postgres XC/XL works, and scales, but we also know they require
>> too many code changes to be merged into Postgres (at least based on
>> previous discussions).  The FDW sharding approach is to enhance the
>> existing features of Postgres to allow as much sharding as possible.
>>
>
> This comparison doesn't seems correct to me. Postgres XC/XL supports data
> redistribution between nodes. And I haven't heard any single idea of
> supporting this in FDW. You are comparing not equal things.
>
>
>> Once that is done, we can see what workloads it covers and
>> decide if we are willing to copy the volume of code necessary
>> to implement all supported Postgres XC or XL workloads.
>> (The Postgres XL license now matches the Postgres license,
>> http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
>> Postgres XC has

Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Konstantin Knizhnik

Sorry, but based on this plan it is possible to make a conclusion that 
there are only two possible cluster solutions for Postgres:
XC/XL and FDW-based.  From my point of view there are  much more 
possible alternatives.
Our main idea with XTM (eXtensible Transaction Manager API) was to make 
it possible to develop cluster solutions for Postgres as extensions 
without patching code of Postgres core. And FDW is one of the mechanism 
which makes it possible to reach this goal.


IMHO it will be hard to implement efficient execution of complex OLAP 
queries (including cross-node joins  and aggregation) within FDW 
paradigm. It will be necessary to build distributed query execution plan 
and coordinate it execution at cluster nodes. And definitely we need 
specialized optimizer for distributed queries. Right now solution of the 
problem are provided by XL and Greenplum, but both are forks of Posrgres 
with a lot of changes in Postgres core. The challenge is to provide the 
similar functionality, but at extension level (using custom nodes, 
pluggable transaction manager, ...).


But, as you noticed,  complex OLAP is just one of the scenarios and this 
is not the only possible way of using clusters. In some cases FDW-based 
sharding can be quite efficient. Or pg_shard approach which also adds 
sharding at extension level and in some aspects is more flexible than 
FDW-based solution. Not all scenarios require global transaction 
manager. But if one need global consistency, then XTM API allows to 
provide ACID for both approaches (and not only for them).


We currently added to commitfest our XTM patch together with 
postgres_fdw patch integrating timestamp-based DTM implementation in 
postgres_fdw. It illustrates how global consistency canbe reached for 
FDW-based sharding.
If this XTM patch will be committed, then in 9.6 we will have wide 
flexibility to play with different distributed transaction managers. And 
it can be used for many cluster solutions.


IMHO it will be very useful to extend your classification of cluster use 
cases, more precisely  formulate demands in all cases, investigate  how 
them can be covered by existed cluster solutions for Postgres and which 
niches are still vacant. We are currently continue work on "multimaster" 
- some more convenient alternative to hot-standby replication. Looks 
like PostgreSQL is missing some product providing functionality similar 
to Oracle RAC or MySQL Gallera. It is yet another direction of cluster 
development for PostgreSQL.  Let's be more open and flexible.



On 23.02.2016 19:43, Bruce Momjian wrote:

There was discussion at the FOSDEM/PGDay Developer Meeting
(https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting)
about sharding so I wanted to outline where I think we are going with
sharding and FDWs.

First, let me point out that, unlike pg_upgrade and the Windows port,
which either worked or didn't work, sharding is going be implemented and
useful in stages.  It will take several years to complete, similar to
parallelism, streaming replication, and logical replication.

Second, as part of this staged implementation, there are several use
cases that will be shardable at first, and then only later, more complex
ones.  For example, here are some use cases and the technology they
require:

1. Cross-node read-only queries on read-only shards using aggregate
queries, e.g. data warehouse:

This is the simplest to implement as it doesn't require a global
transaction manager, global snapshot manager, and the number of rows
returned from the shards is minimal because of the aggregates.

2. Cross-node read-only queries on read-only shards using non-aggregate
queries:

This will stress the coordinator to collect and process many returned
rows, and will show how well the FDW transfer mechanism scales.

3. Cross-node read-only queries on read/write shards:

This will require a global snapshot manager to make sure the shards
return consistent data.

4. Cross-node read-write queries:

This will require a global snapshot manager and global snapshot manager.

In 9.6, we will have FDW join and sort pushdown
(http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
calability.html).  Unfortunately I don't think we will have aggregate
pushdown, so we can't test #1, but we might be able to test #2, even in
9.5.  Also, we might have better partitioning syntax in 9.6.

We need things like parallel partition access and replicated lookup
tables for more join pushdown.

In a way, because these enhancements are useful independent of sharding,
we have not tested to see how well an FDW sharding setup will work and
for which workloads.

We know Postgres XC/XL works, and scales, but we also know they require
too many code changes to be merged into Postgres (at least based on
previous discussions).  The FDW sharding approach is to enhance the
existing features of Postgres to allow as much sharding as possible.

Once that is done, we can see what workloa

Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Alexander Korotkov

Hi, Bruce!

The important point for me is to distinguish different kind of plans:
implementation plan and research plan.
If we're talking about implementation plan then it should be proven that
proposed approach works in this case. I.e research should be already done.
If we're talking about research plan then we should realize that result is
unpredictable. And we would probably need to dramatically change our way.

This two things would work with FDW:
1) Pull data from data nodes to coordinator.
2) Pushdown computations from coordinator to data nodes: joins, aggregates
etc.
It's proven and clear. This is good.
Another point is that these FDW advances are useful by themselves. This is
good too.

However, the model of FDW assumes that communication happen only between
coordinator and data node. But full-weight distributed optimized can't be
done under this restriction, because it requires every node to communicate
every other node if it makes distributed query faster. And as I get, FDW
approach currently have no research and no particular plan for that.

As I get from Robert Haas's talk (
https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxyb2JlcnRtaGFhc3xneDo1ZmFhYzBhNjNhNzVhMDM0
)

> Before we consider repartitioning joins, we should probably get everything
> previously discussed working first.
> – Join Pushdown For Parallelism, FDWs
> – PartialAggregate/FinalizeAggregate
> – Aggregate Pushdown For Parallelism, FDWs
> – Declarative Partitioning
> – Parallel-Aware Append

So, as I get we didn't ever think about possibility of data redistribution
using FDW. Probably, something changed since that time. But I haven't heard
about it.

On Tue, Feb 23, 2016 at 7:43 PM, Bruce Momjian  wrote:

> Second, as part of this staged implementation, there are several use
> cases that will be shardable at first, and then only later, more complex
> ones.  For example, here are some use cases and the technology they
> require:
>
> 1. Cross-node read-only queries on read-only shards using aggregate
> queries, e.g. data warehouse:
>
> This is the simplest to implement as it doesn't require a global
> transaction manager, global snapshot manager, and the number of rows
> returned from the shards is minimal because of the aggregates.
>
> 2. Cross-node read-only queries on read-only shards using non-aggregate
> queries:
>
> This will stress the coordinator to collect and process many returned
> rows, and will show how well the FDW transfer mechanism scales.
>

FDW would work for queries which fits pull-pushdown model. I see no plan to
make other queries work.

> 3. Cross-node read-only queries on read/write shards:
>
> This will require a global snapshot manager to make sure the shards
> return consistent data.
>
> 4. Cross-node read-write queries:
>
> This will require a global snapshot manager and global snapshot manager.
>

At this point, it unclear why don't you refer work done in the direction of
distributed transaction manager (which is also distributed snapshot manager
in your terminology)
http://www.postgresql.org/message-id/56bb7880.4020...@postgrespro.ru

> In 9.6, we will have FDW join and sort pushdown
> (http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
> calability.html
> ).
> Unfortunately I don't think we will have aggregate
> pushdown, so we can't test #1, but we might be able to test #2, even in
> 9.5.  Also, we might have better partitioning syntax in 9.6.
>
> We need things like parallel partition access and replicated lookup
> tables for more join pushdown.
>
> In a way, because these enhancements are useful independent of sharding,
> we have not tested to see how well an FDW sharding setup will work and
> for which workloads.
>

This is the point I agree. I'm not objecting against any single FDW
advance, because it's useful by itself.

We know Postgres XC/XL works, and scales, but we also know they require
> too many code changes to be merged into Postgres (at least based on
> previous discussions).  The FDW sharding approach is to enhance the
> existing features of Postgres to allow as much sharding as possible.
>

This comparison doesn't seems correct to me. Postgres XC/XL supports data
redistribution between nodes. And I haven't heard any single idea of
supporting this in FDW. You are comparing not equal things.

> Once that is done, we can see what workloads it covers and
> decide if we are willing to copy the volume of code necessary
> to implement all supported Postgres XC or XL workloads.
> (The Postgres XL license now matches the Postgres license,
> http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
> Postgres XC has always used the Postgres license.)
>
> If we are not willing to add code for the missing Postgres XC/XL
> features, Postgres XC/XL will probably remain a separate fork of
> Postgres.  I don't think anyone knows the answer to this question, and

Re: [HACKERS] Convert pltcl from strings to objects

2016-02-24 Thread Victor Wagner

On Tue, 23 Feb 2016 17:14:26 -0600
Jim Nasby  wrote:

> On 2/23/16 6:04 AM, Victor Wagner wrote:

> 
> > Please, send updated patch to the list in this thread, so it would
> > appear in the commitfest and I can mark your patch as "ready for
> > committer".  
> 
> Done!

Nice job. I've marking the patch as "Ready for committer".



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-24 Thread Masahiko Sawada

On Wed, Feb 24, 2016 at 5:37 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> Ok, I think we should concentrate the parser part for now.
>
> At Tue, 23 Feb 2016 17:44:44 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>  wrote in 
> <20160223.17.178687579.horiguchi.kyot...@lab.ntt.co.jp>
>> Hello,
>>
>> At Mon, 22 Feb 2016 22:52:29 +0900, Fujii Masao  
>> wrote in 
>> > Thanks for updating the patch!
>> >
>> > When I changed s_s_names to 'hoge*' and reloaded the configuration file,
>> > the server crashed unexpectedly with the following error message.
>> > This is obviously a bug.
>> >
>> > FATAL:  syntax error
>>
>> I had a glance on the lexer part in the new patch.  It'd be
>> better to design the lexer from the beginning according to the
>> required behavior.
>>
>> The documentation for the syntax is saying as the following,
>>
>> http://www.postgresql.org/docs/current/static/runtime-config-logging.html
>>
>> > application_name (string)
>> >
>> > The application_name can be any string of less than NAMEDATALEN
>> > characters (64 characters in a standard build).  Only
>> > printable ASCII characters may be used in the application_name
>> > value. Other characters will be replaced with question marks (?).
>>
>> And according to what some functions mentioned so far do, totally
>> an application_name is treated as follwoing, I suppose.
>>
>> - check_application_name() currently allows [\x20-\x7e], which
>>   differs from the definition of the SQL identifiers.
>>
>> - SplitIdentifierString() and syncrep code
>>
>>   - allows any byte except a double quote in double-quoted
>>representation. A double-quote just after a delimiter can open
>>quoted representation.
>>
>>   - Non-quoted name can contain any character including double
>> quotes except ',' and white spaces.
>>
>>   - The syncrep code does case-insensitive matching with the
>>application_name.
>>
>> So, to preserve or following the current behavior expct the last
>> one, the following pattern definitions would do. The
>> lexer/grammer for the new format of s_s_names could be simpler
>> than what it is.
>>
>> space [ \n\r\f\t\v] /* See the definition of isspace(3) */
>> whitespace{space}+
>> dquote\"
>> app_name_chars[\x21-\x2b\x2d-\x7e]   /* excluding ' ', ',' */
>> app_name_indq_chars [\x20\x21\x23-\x7e]  /* excluding '"'  */
>> app_name_dq_chars ({app_name_indq_chars}|{dquote}{dquote})
>> delimiter {whitespace}*,{whitespace}*
>> app_name  ({app_name_chars}+|{dquote}{app_name_dq_chars}+{dquote})
>> s_s_names {app_name}({delimiter}{app_name})*
>
>
> So I made a hasty independent parser for the syntax including the
> group names for the convenience for separate testing.  The parser
> takes input from stdin and prints the result structure.
>
> It can take old s_s_name format and new list format. We haven't
> discussed how to add gruop names but I added it as ""
> just before the # of syncronous standbys of [] and {} lists.
>
> Is this usable for further discussions?

Thank you for your suggestion.

Another option is to add group name with ":" to immediately after set
of standbys as I said earlier.


s_s_names with group name would be as follows.
s_s_names = '2[local, 2[london1, london2, london3]:london, (tokyo1,
tokyo2):tokyo]'

Though?

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-24 Thread Kyotaro HORIGUCHI

Hello,

Ok, I think we should concentrate the parser part for now.

At Tue, 23 Feb 2016 17:44:44 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20160223.17.178687579.horiguchi.kyot...@lab.ntt.co.jp>
> Hello,
> 
> At Mon, 22 Feb 2016 22:52:29 +0900, Fujii Masao  wrote 
> in 
> > Thanks for updating the patch!
> > 
> > When I changed s_s_names to 'hoge*' and reloaded the configuration file,
> > the server crashed unexpectedly with the following error message.
> > This is obviously a bug.
> > 
> > FATAL:  syntax error
> 
> I had a glance on the lexer part in the new patch.  It'd be
> better to design the lexer from the beginning according to the
> required behavior.
> 
> The documentation for the syntax is saying as the following,
> 
> http://www.postgresql.org/docs/current/static/runtime-config-logging.html
> 
> > application_name (string)
> > 
> > The application_name can be any string of less than NAMEDATALEN
> > characters (64 characters in a standard build).  Only
> > printable ASCII characters may be used in the application_name
> > value. Other characters will be replaced with question marks (?).
> 
> And according to what some functions mentioned so far do, totally
> an application_name is treated as follwoing, I suppose.
> 
> - check_application_name() currently allows [\x20-\x7e], which
>   differs from the definition of the SQL identifiers.
> 
> - SplitIdentifierString() and syncrep code
> 
>   - allows any byte except a double quote in double-quoted
>representation. A double-quote just after a delimiter can open
>quoted representation.
> 
>   - Non-quoted name can contain any character including double
> quotes except ',' and white spaces.
> 
>   - The syncrep code does case-insensitive matching with the
>application_name.
> 
> So, to preserve or following the current behavior expct the last
> one, the following pattern definitions would do. The
> lexer/grammer for the new format of s_s_names could be simpler
> than what it is.
> 
> space [ \n\r\f\t\v] /* See the definition of isspace(3) */
> whitespace{space}+
> dquote\"
> app_name_chars[\x21-\x2b\x2d-\x7e]   /* excluding ' ', ',' */
> app_name_indq_chars [\x20\x21\x23-\x7e]  /* excluding '"'  */
> app_name_dq_chars ({app_name_indq_chars}|{dquote}{dquote})
> delimiter {whitespace}*,{whitespace}*
> app_name  ({app_name_chars}+|{dquote}{app_name_dq_chars}+{dquote})
> s_s_names {app_name}({delimiter}{app_name})*


So I made a hasty independent parser for the syntax including the
group names for the convenience for separate testing.  The parser
takes input from stdin and prints the result structure.

It can take old s_s_name format and new list format. We haven't
discussed how to add gruop names but I added it as ""
just before the # of syncronous standbys of [] and {} lists.

Is this usable for further discussions?

The sources can be compiles by the following commandline.

$ bison -v test.y; flex -l test.l; gcc -g -DYYDEBUG=1 -DYYERROR_VERBOSE -o 
ltest test.tab.c

and it makes the output like following.

[horiguti@drain tmp]$ echo '123[1,3,3{a,b,e},4,*]' | ./ltest

TYPE: PRIO_LIST
GROUPNAME: 
NSYNC: 123
NEST: 2
CHILDREN {
  {
TYPE: HOSTNAME
HOSTNAME: 1
QUOTED: No
NEST: 1
  }
  {
TYPE: HOSTNAME
HOSTNAME: 3
QUOTED: No
NEST: 0
  }
  TYPE: QUORUM_LIST
  GROUPNAME: x
  NSYNC: 3
  NEST: 1
  CHILDREN {
{
  TYPE: HOSTNAME
  HOSTNAME: a
  QUOTED: No
  NEST: 0
}
{
  TYPE: HOSTNAME
  HOSTNAME: b
  QUOTED: No
  NEST: 0
}
{
  TYPE: HOSTNAME
  HOSTNAME: e
  QUOTED: No
  NEST: 0
}
  }
  {
TYPE: HOSTNAME
HOSTNAME: 4
QUOTED: No
NEST: 0
  }
  {
TYPE: HOSTNAME
HOSTNAME: *
QUOTED: No
NEST: 0
  }
}


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
%{
#include 
#include 

%}

%option noyywrap

%x DQNAME
%x APPNAME

space   [ \t\n\r\f]
whitespace  {space}+

dquote\"
app_name_chars[\x21-\x2b\x2d-\x3b\x3d\x3f-\x5a\x5c\x5e-\x7a\x7c\x7e]
app_name_indq_chars [\x20\x21\x23-\x7e]
app_name{app_name_chars}+
app_name_dq ({app_name_indq_chars}|{dquote}{dquote})+
delimiter {whitespace}*,{whitespace}*
app_name_start {app_name_chars}
any_app \*|({dquote}\*{dquote})
xdstart {dquote}
xdstop  {dquote}
self[\[\]\{\}<>]
%%
{xdstart} { BEGIN(DQNAME); }
{xdstop} { BEGIN(INITIAL); }
{app_name_dq} {
  static char name[64];
  int i, j;

  for (i = j = 0 ; j < 63 && yytext[i] ; i++, j++)
  {
if (yytext[i] == '"')
{
if (yytext[i+1] == '"')
name[j] = '"';
else
fprintf(stderr, "illegal quote escape");
i++;
}
else
name[j] = yytext[i];

  }
  name[j] = 0;

  yylval.str = strdup(name);
  return QUOTED_NAME;
}
{app_name_start} { BEGIN(APPNAME); yyless(0);}
{app_name} {

Re: [HACKERS] Declarative partitioning

2016-02-24 Thread Amit Langote

On 2016/02/20 5:06, Corey Huinker wrote:
> On Thu, Feb 18, 2016 at 12:41 AM, Amit Langote wrote:
> 
>> START [ EXCL ] (startval) END [ INCL ] (endval)
>>
>> That is, in range type notation, '[startval, endval)' is the default
>> behavior. So for each partition, there is at least the following pieces of
>> metadata:
>>
> 
> This is really close, and if it is what we ended up with we would be able
> to use it.
> 
> I suggest that the range notation can be used even when no suitable range
> type exists.
> 
> I assume the code for parsing a range spec regardless of data type already
> exists, but in case it doesn't, take a range spec of unknown type:
> 
> [x,y)
> 
> x and y are either going to be raw strings or doublequoted strings with
> possible doublequote escapes, each of which would be coercible into the the
> type of the partition column.
> 
> In other words, if your string values were 'blah , blah ' and 'fabizzle',
> the [) range spec would be ["blah , blah ",fabizzle).

Hm, I see.  How about multi-column keys?  Do we care enough about that use
case?  I don't see a nice-enough-looking range literal for such keys.
Consider for instance,

With the partitioned table defined as:

CREATE TABLE foo(c1 char(2), c2 char(2)) PARTITION BY RANGE (c1, c2);

where we'd use LESS THAN as:

CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES LESS THAN ('b', '2');
CREATE TABLE foo_ax2x PARTITION OF foo FOR VALUES LESS THAN ('b', '3');
CREATE TABLE foo_ax3x PARTITION OF foo FOR VALUES LESS THAN ('b', '4');

CREATE TABLE foo_bx1x PARTITION OF foo FOR VALUES LESS THAN ('c', '2');
CREATE TABLE foo_bx2x PARTITION OF foo FOR VALUES LESS THAN ('c', '3');
CREATE TABLE foo_bx3x PARTITION OF foo FOR VALUES LESS THAN ('c', '4');

I guess it would be the following with the new range spec:

CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES ['b', 'b'], ['1', '2');
CREATE TABLE foo_ax2x PARTITION OF foo FOR VALUES ['b', 'b'], ['2', '3');
CREATE TABLE foo_ax3x PARTITION OF foo FOR VALUES ['b', 'b'], ['3', '4');

CREATE TABLE foo_bx1x PARTITION OF foo FOR VALUES ['c', 'c'], ['1', '2');
CREATE TABLE foo_bx2x PARTITION OF foo FOR VALUES ['c', 'c'], ['2', '3');
CREATE TABLE foo_bx3x PARTITION OF foo FOR VALUES ['c', 'c'], ['3', '4');

The first column does not unambiguously select a partition with such keys.
Now I can imagine that it is possible to "emulate" such a multi-column
range key with k columns by k levels of range-range-* partitioning:

CREATE TABLE foo(c1 char(2), c2 char(2)) PARTITION BY RANGE (c1);

CREATE TABLE foo_ax PARTITION OF foo
  FOR VALUES ['b, 'c') PARTITION BY RANGE (c2);

CREATE TABLE foo_ax1x PARTITION OF foo_ax FOR VALUES ['1', '2');
CREATE TABLE foo_ax2x PARTITION OF foo_ax FOR VALUES ['2', '3');
CREATE TABLE foo_ax3x PARTITION OF foo_ax FOR VALUES ['3', '4');

CREATE TABLE foo_bx PARTITION OF foo
  FOR VALUES ['b, 'c') PARTITION BY RANGE (c2);

CREATE TABLE foo_bx1x PARTITION OF foo_bx FOR VALUES ['1', '2');
CREATE TABLE foo_bx2x PARTITION OF foo_bx FOR VALUES ['2', '3');
CREATE TABLE foo_bx3x PARTITION OF foo_bx FOR VALUES ['3', '4');

But IIRC, I have been contradicted once before regarding whether two ways
are exactly the same.

> Using regular range specs syntax also allows for the range to be unbounded
> in either or both directions, which is a possibility, especially in newer
> tables where the expected distribution of data is unknown.

We would want to also think about what subset of many permutations of this
syntax to accept range specs for new partitions.  Mostly to preserve the
non-overlapping invariant and I think it would also be nice to prevent gaps.

Consider that once we create:

PARTITION FOR VALUES [current_date,);

Now to create a new partition starting at later date, we have to have a
"split partition" feature which would scan the above partition to
distribute the existing data rows appropriately to the resulting two
partitions. Right?

IOW, one shouldn't create an unbounded partition if more partitions in the
unbounded direction are expected to be created.  It would be OK for
unbounded partitions to be on the lower end most of the times.

> p.s. Sorry I haven't been able to kick the tires just yet. We have a very
> good use case for this, it's just a matter of getting a machine and the
> time to devote to it.

I would appreciate it.  You could wait a little more for my next
submission which will contain some revisions to the tuple routing code.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

56 matches

Mail list logo