Re: [HACKERS] Horizontal scalability/sharding

2015-09-06 Thread Ashutosh Bapat
On Sat, Sep 5, 2015 at 4:22 AM, Ozgun Erdogan  wrote:

> Hey Robert,
>
> Now the question is, where should the code that does all of this live?
>>  postgres_fdw?  Some new, sharding-specific FDW?  In core?  I don't
>> know for sure, but what I do know is that we could make a lot of
>> progress over where we are today by just improving postgres_fdw, and I
>> don't think those improvements are even all that difficult.  If we
>> decide we need to implement something new, it's going to be a huge
>> project that will take years to complete, with uncertain results.  I'd
>> rather have a postgres_fdw-based implementation that is imperfect and
>> can't handle some kinds of queries in 9.6 than a promise that by 9.9
>> we'll have something really great that handles MPP perfectly.
>>
>
> Distributed shuffles (Map/Reduce) are hard. When we looked at using FDWs
> for pg_shard, we thought that Map/Reduce would require a comprehensive
> revamp of the APIs.
>
> For Citus, a second part of the question is as FDW writers. We implemented
> cstore_fdw, json_fdw, and mongo_fdw, and these wrappers don't benefit from
> even the simple join pushdown that doesn't require Map/Reduce.
>

I didn't get this. Join pushdown infrastructure (chiefly set of hooks
provided in join planning paths) is part of 9.5. Isn't that sufficient to
implement join push-down for above FDWs? Or FDW writers are facing problems
while implementing those hooks. In either case that should be reported on
hackers.


>
> The PostgreSQL wiki lists 85 foreign data wrappers, and only 18 of these
> have support for joins:
> https://wiki.postgresql.org/wiki/Foreign_data_wrappers
>
> Best,
> Ozgun
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] checkpointer continuous flushing

2015-09-06 Thread Amit Kapila
On Mon, Sep 7, 2015 at 3:09 AM, Andres Freund  wrote:
>
> On 2015-09-06 16:05:01 +0200, Fabien COELHO wrote:
> > >Wouldn't it be just as easy to put this logic into the checkpointing
code?
> >
> > Not sure it would simplify anything, because the checkpointer currently
> > knows about buffers but flushing is about files, which are hidden from
> > view.
>
> It'd not really simplify things, but it'd keep it local.
>

How about using the value of guc (checkpoint_flush_to_disk) and
AmCheckpointerProcess to identify whether to do async flush in FileWrite?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] pg_rewind tap test unstable

2015-09-06 Thread Michael Paquier
On Mon, Sep 7, 2015 at 1:16 PM, Noah Misch  wrote:
> On Tue, Aug 04, 2015 at 02:21:16PM +0900, Michael Paquier wrote:
>> >> On Tue, Jul 28, 2015 at 5:57 PM, Christoph Berg  wrote:
>> >> > for something between 10% and 20% of the devel builds for 
>> >> > apt.postgresql.org
>> >> > (which happen every 6h if there's a git change, so it happens every few 
>> >> > days),
>> >> > I'm seeing this:
>
>> In test case 2, the failure happens to be that the standby did not
>> have the time to replicate the database beforepromotion that has been
>> created on the master. One possible explanation for this failure is
>> that the standby has been promoted before all the WAL needed for the
>> tests has been replayed, hence we had better be sure that the
>> replay_location of the standby matches pg_current_xlog_location()
>> before promotion.
>
>> Perhaps the attached patch helps?
>
> Thanks.  In light of your diagnosis, I can reliably reproduce the failure by
> injecting a sleep into XLogSendPhysical().  Your patch fixes the problem, but
> it adds wal_receiver_status_interval (= 10s) stalls, doubling
> src/bin/pg_rewind/t/001_basic.pl runtime on a fast system.  (The standby
> applies the final WAL quickly, then sleeps for wal_receiver_status_interval
> before notifying the master.)

Indeed, thanks for double-checking.

> The standby will apply any written, unapplied
> WAL during promotion.  Therefore, I plan to commit the attached
> performance-neutral variant of your patch.

Explaining the use of write_location. This looks fine to me. Thanks again.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reusing abbreviated keys during second pass of ordered [set] aggregates

2015-09-06 Thread Thomas Munro
On Sat, Jul 11, 2015 at 10:05 AM, Peter Geoghegan  wrote:

> Currently, there are certain aggregates that sort some tuples before
> making a second pass over their memtuples array (through the tuplesort
> "gettuple" interface), calling one or more attribute equality operator
> functions as they go. For example, this occurs during the execution of
> the following queries:
>
> -- Salary is a numeric column:
> SELECT count(DISTINCT salary) AS distinct_compensation_levels FROM
> employees;
> -- Get most common distinct salary in organization:
> SELECT mode() WITHIN GROUP (ORDER BY salary) AS
> most_common_compensation FROM employees;
>
> Since these examples involve numeric comparisons, the equality
> operator calls involved in that second pass are expensive. There is no
> equality fast-path for numeric equality as there is for text; I
> imagine that in practice most calls to texteq() are resolved based on
> raw datum size mismatch, which is dirt cheap (while the alternative, a
> memcmp(), is still very cheap).


Forgive me for going off on a tangent here:  I understand that the point of
your patch is to avoid calls to numeric_eq altogether in these scenarios
when cardinality is high, but the above sentence made me wonder how likely
it is that numeric values in typical workloads have the same size, weight
and sign and could therefore be handled in numeric_eq with memcmp() == 0
rather than numeric_cmp() == 0, and whether there would be any benefit.

Running various contrived aggregate queries on a large low cardinality
dataset in a small range (therefore frequently the same weight & size), I
managed to measure a small improvement of up to a few percent with the
attached patch.  I also wonder whether numeric_cmp could be profitably
implemented with memcmp on big endian systems and some unrolled loops on
little endian systems when size & weight match.

Of course there are a ton of other overheads involved with numeric.  I
wonder how crazy or difficult it would be to make it so that we could
optionally put a pass-by-value NUMERIC in a Datum, setting a low order tag
bit to say 'I'm an immediate value, not a pointer', and then packing 3
digits (= 12 significant figures) + sign + weight into the other 63 bits.

-- 
Thomas Munro
http://www.enterprisedb.com


numeric-eq-memcmp.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind tap test unstable

2015-09-06 Thread Noah Misch
On Tue, Aug 04, 2015 at 02:21:16PM +0900, Michael Paquier wrote:
> >> On Tue, Jul 28, 2015 at 5:57 PM, Christoph Berg  wrote:
> >> > for something between 10% and 20% of the devel builds for 
> >> > apt.postgresql.org
> >> > (which happen every 6h if there's a git change, so it happens every few 
> >> > days),
> >> > I'm seeing this:

> In test case 2, the failure happens to be that the standby did not
> have the time to replicate the database beforepromotion that has been
> created on the master. One possible explanation for this failure is
> that the standby has been promoted before all the WAL needed for the
> tests has been replayed, hence we had better be sure that the
> replay_location of the standby matches pg_current_xlog_location()
> before promotion.

> Perhaps the attached patch helps?

Thanks.  In light of your diagnosis, I can reliably reproduce the failure by
injecting a sleep into XLogSendPhysical().  Your patch fixes the problem, but
it adds wal_receiver_status_interval (= 10s) stalls, doubling
src/bin/pg_rewind/t/001_basic.pl runtime on a fast system.  (The standby
applies the final WAL quickly, then sleeps for wal_receiver_status_interval
before notifying the master.)  The standby will apply any written, unapplied
WAL during promotion.  Therefore, I plan to commit the attached
performance-neutral variant of your patch.
diff --git a/src/bin/pg_rewind/RewindTest.pm b/src/bin/pg_rewind/RewindTest.pm
index 22e5cae..a4c1737 100644
--- a/src/bin/pg_rewind/RewindTest.pm
+++ b/src/bin/pg_rewind/RewindTest.pm
@@ -222,12 +222,8 @@ recovery_target_timeline='latest'
   '-l', "$log_path/standby.log",
   '-o', "-p $port_standby", 'start');
 
-   # Wait until the standby has caught up with the primary, by polling
-   # pg_stat_replication.
-   my $caughtup_query =
-"SELECT pg_current_xlog_location() = replay_location FROM pg_stat_replication 
WHERE application_name = 'rewind_standby';";
-   poll_query_until($caughtup_query, $connstr_master)
- or die "Timed out while waiting for standby to catch up";
+   # The standby may have WAL to apply before it matches the primary.  That
+   # is fine, because no test examines the standby before promotion.
 }
 
 sub promote_standby
@@ -235,6 +231,12 @@ sub promote_standby
 Now run the test-specific parts to run after standby has been 
started
# up standby
 
+   # Wait for the standby to receive and write all WAL.
+   my $wal_received_query =
+"SELECT pg_current_xlog_location() = write_location FROM pg_stat_replication 
WHERE application_name = 'rewind_standby';";
+   poll_query_until($wal_received_query, $connstr_master)
+ or die "Timed out while waiting for standby to receive and write WAL";
+
# Now promote slave and insert some new data on master, this will put
# the master out-of-sync with the standby. Wait until the standby is
# out of recovery mode, and is ready to accept read-write connections.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Waits monitoring

2015-09-06 Thread Amit Kapila
On Sun, Sep 6, 2015 at 5:58 PM, Andres Freund  wrote:
>
> On 2015-09-04 23:44:21 +0100, Simon Riggs wrote:
> > I see the need for both current wait information and for cumulative
> > historical detail.
> >
> > I'm willing to wait before reviewing this, but not for more than 1 more
CF.
> >
> > Andres, please decide whether we should punt to next CF now, based upon
> > other developments. Thanks
>
> I think we can do some of the work concurrently - the whole lwlock
> infrastructure piece is rather independent and what currently most of
> the arguments are about. I agree that the actual interface will need to
> be coordinated.
>
> Ildus, could you please review Amit & Robert's patch?
>

Are you talking about patch where I have fixed few issues in Robert's
patch [1] or the original patch [3] written by me.

IIUC, this is somewhat different than what Ildus is doing in his latest
patch [2].

Sorry, but I think there is some confusion about that patch [1] development.
Let me try to summarize what I think has happened and why I feel there is
some confusion.  Initially Robert has proposed the idea of having a
column in pg_stat_activity for wait_event on hackers and then I wrote an
initial patch so that we can discuss the same in a more meaningful way
and wanted to extend that patch based on consensus and what any other
patch like Waits monitoring would need from it.  In-between Ildus has
proposed
Waits monitoring patch and also started hacking the other pg_stat_activity
patch and that was the starting point of confusion.  Now I think that the
current
situation is there's one patch [1] of Robert (with some fixes by myself)
for standardising
LWLock stuff, so that we can build pg_stat_activity patch on top of it and
then
a patch [2] from Ildus for doing something similar but I think it hasn't
used Robert's
patch.

I think the intention of having multiple people develop same patch is to get
the work done faster, but I think here it is causing confusion and I feel
that
is one reason the patch is getting dragged as well.  Let me know your
thoughts
about what is the best way to proceed?

[1] -
http://www.postgresql.org/message-id/caa4ek1kdec1tm5ya9gkv85vtn4qqsrxzkjru-tu70g_tl1x...@mail.gmail.com
[2] -
http://www.postgresql.org/message-id/3f71da37-a17b-4961-9908-016e6323e...@postgrespro.ru
[3] -
http://www.postgresql.org/message-id/caa4ek1kt2e6xhvigisr5o1ac9nfo0j2wtb8n0ggd1_jklde...@mail.gmail.com

P.S. - This mail is not to point anything wrong with any particular
individual,
rather about the development of a particular patch getting haphazard because
of some confusion.  I am not sure that this is the right thread to write
about
it, but as it has been asked here to review the patch in other related
thread,
so I have mentioned my thoughts on the same.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Multi-column distinctness.

2015-09-06 Thread Kyotaro HORIGUCHI
Hello,

> > 5) syntax
> > -
> > The syntax might be one of the pain points if we eventually decide to
> commit the multivariate stats patch. I have no intention in blocking this
> patch for that reasons, but if we might design the syntax to make it
> compatible with the multivariate patch, that'd be nice. But it seems to me
> the syntax is pretty much the same, no?
> >
> > I.e. it uses
> >
> > ADD STATISTICS (options) ON (columns)
> >
> > just like the multivariate patch, no? Well, it doesn't really check the
> stattypes in ATExecAddDropMvStatistics except for checking there's a single
> entry, but the syntax seems OK.
>
> > BTW mixing ADD and DROP in ATExecAddDropMvStatistics seems a bit
> confusing. Maybe two separate methods would be better?

No problem.

> BTW one more comment about the syntax - you ran into the same conflict
> between "ADD [COLUMN] column" and "ADD STATISTICS" like I did, but you
> solved it by making the COLUMN required while I made STATISTICS keyword.
> I'm not enthusiastic about the keyword thing,

I don't have firm idea that it has the priority than duplication
of syntax, but I followed the policy to reduce keywords. Which do
you think is more acceptable between the additional definition
caused by new keywords and syntax splitting caused by avoiding
them for this case?

> but making the COLUMN
> required is certainly much worse as it breaks many existing scripts. The
> keyword inky breaks cases that manipulate "statistics" column.

Ouch! It is simply by accident, or my lack of carefulness. I will
come up with fixed syntax later..

> If any of this is unacceptable, then we probably need to come up with a
> different syntax.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing pg_controldata and pg_config as functions

2015-09-06 Thread Peter Eisentraut
On 9/6/15 3:34 PM, Joe Conway wrote:
> On 09/02/2015 02:54 PM, Alvaro Herrera wrote:
>> Josh Berkus wrote:
>>> On 09/02/2015 02:34 PM, Alvaro Herrera wrote:
 I think trying to duplicate the exact strings isn't too nice an
 interface.
>>>
>>> Well, for pg_controldata, no, but what else would you do for pg_config?
>>
>> I was primarily looking at pg_controldata, so we agree there.
>>
>> As for pg_config, I'm confused about its usefulness -- which of these
>> lines are useful in the SQL interface?  Anyway, I don't see anything
>> better than a couple of text columns for this case.
> 
> There are production environments where even the superuser has no
> direct, local, command line access on production database servers

But then they also have no use for the information that pg_config prints
out.

> and the
> only interface for getting information from postgres is via a database
> connection. So to the extent pg_config and pg_controldata command line
> binaries are useful, so is the ability to get the same output via SQL.
> 
> Given that, my own feeling is that if we provide a SQL interface at all,
> it ought to be pretty much the exact same output as the command line
> programs produce.

That argument makes no sense to me.

Again, we need to think about what actual use there is for this
information.  Just because the information exists somewhere, but you
can't access it, doesn't mean we just need to copy it around.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-column distinctness.

2015-09-06 Thread Kyotaro HORIGUCHI
Hello,

> FWIW Horiguchi-san is one of the few people who actually took time to
> review

I personally think such kind of things should not to be counted
in judging this issue:)

> the multivariate stats patch, and I don't quite see this patch
> as conflicting with the multivariate one.
> 
> It implements a small subset of the (much larger) multivariate stats
> patch, and reusing it within my patch should not be a big
> deal. Actually, this type of statistics was proposed by Horiguchi-san
> himself, and the multivariate patch does not implement it yet
> (although I intend to address that soon).

Year. I think we can deal Tomas's patch as natural enhancement of
this patch and it is not so bad as a vangard(?).

> So no conflict here. Of course, we need to be a bit careful to make it
> compatible (especially the syntax part).

I agree with it. They don't make bad conflict but we might stuc
by wrong decision in API, like syntax will.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-column distinctness.

2015-09-06 Thread Kyotaro HORIGUCHI
Hello,

Thank you for pointing that. It is one crucial point of this
patch. Sorry for not mentioning on the point.

At Sun, 6 Sep 2015 09:24:48 +0100, Simon Riggs  wrote in 

> > Tomas Vondra is now working on heavily-equipped multivariate
> > statistics for OLAP usage. In contrast, this is a lightly
> > implemented solution which calculates only the ratio between a
> > rows estimated by current method and a actual row number. I think
> > this doesn't conflict with his work except the grammar part.
> >
> 
> I think it very obviously does conflict, so I don't see this patch as
> appropriate.
> 
> If you think a cut version of Tomas' patch is appropriate, then the usual
> response is to give a review that says "Tomas, I think a cut down version
> is appropriate here, can we reduce the scope of this patch for now?". If
> you have done that and he refuses to listen, then a separate patch version
> is appropriate. Otherwise we should just reject this second patchset to
> avoid confusion and to avoid encouraging people to take this approach.

You are absolutely right generally and I agree if this is 'a cut
version' of Tomas's patch. I might have wrong concept about size
of a piece of work.

I will discontinue this patch if Tomas and/or Simon, or many
think this as inappropriate to be brought up now (or ever after)
after reading the following explanation.

==
I already asked Tomas to *add* this feature in his patch and got
a reply that it will be after the completion of undergoing work.

Tomas's patch and mine are generally aiming similar objective but
as discussed with Tomas I understood there's some crucial
differences between them.

He is considering more precise and widely-applicable estimation
baesd on a firm theoretical basis, and as described in the
ciation above, it is aiming OLAP usage and allowing rather
complex calculation. It will be reduced through future
discussions but the priority to do so is not so high for now.

Although Tomas's patch is very complex and needs more labor to
complete, resolving the wrong prediction caused by multicolumn
correlation (especially on OLTP usage) is demanded. So I tried a
patch that suit the objective. It has only rough distinctness
coefficient and doesn't have MV-MCV, MV-HISTOGRAM and strict
functional dependency but needs quire small storage and less
calculation.

The two are so different in concrete objective and
characteristics and does not have common functional piece exept
grammer part so I concluded that this patch doesn't break the
foothold (standpoint?) of Toamas's patch and we can continue to
work on it as we did until now.

This is why I think this is not a cut-down version of Tomas's and
dosn't break or conflict with it. But if many don't think so, I
should dismiss this, of course.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting total and free disk space from paths in PGDATA

2015-09-06 Thread Tom Lane
Michael Paquier  writes:
> statvfs is part of the POSIX spec and is "normally" present on modern
> platforms (BSD, OSX, Linux and Solaris have it as far as I saw, still
> there may be some strange platform without it).

There are considerably less strange platforms that have per-user
disk quotas.  I wonder what statvfs does with those.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Getting total and free disk space from paths in PGDATA

2015-09-06 Thread Michael Paquier
Hi all,

There is currently no in-core function to query the amount of
available and free space in a path of PGDATA, with something like that
for example:
SELECT * FROM pg_get_diskspace_info('pg_xlog');
 total_space | free_space
-+
 4812 MB | 3925 MB
(1 row)

This would be definitely useful for monitoring purposes to have a look
at the disk space bloat in PGDATA, pg_xlog, or even pg_log which are
usually located on different partitions. Some of my customers have
requested such a thing for a couple of times, and even if you can
already do it with for example plpython/os.statvfs or plperl by
parsing the output of df, I am getting the feeling that we should have
something in-core not directly relying on an PL language. genfile.c
has also what is needed to restrict the path used, by blocking
absolute paths that are not part of log_directory or PGDATA.

statvfs is part of the POSIX spec and is "normally" present on modern
platforms (BSD, OSX, Linux and Solaris have it as far as I saw, still
there may be some strange platform without it). Windows does not have
it, though we could use GetDiskFreeSpaceEx to retrieve this
information (and actually this is the reason why I am only proposing
to get the available/free space from a path):
https://msdn.microsoft.com/en-us/library/windows/desktop/aa364937%28v=vs.85%29.aspx

Another minor issue is that fsblkcnt_t is an unsigned long, and the
return values should be bigint for both the free and the available
space, so we could have incorrect results once we have values higher
than 2^63 - 1, or disk space values higher than 8.39EB = 8.39e6TB if
your prefer, but it's not like we're at this scale yet :)

Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Too many duplicated condition query return wrong value

2015-09-06 Thread Jeff Janes
On Thu, Sep 3, 2015 at 10:55 PM, Atsushi Yoshida 
wrote:

> >> Can you give an "explain (analyze, buffers)"  for each query?  Maybe
> you have a corrupted index, and one query uses the index and the other does
> not.
>
>
> >
> >  Index Scan using idx_attend_00 on attend  (cost=0.29..627.20 rows=172
> width=12) (actual time=5.158..10.179 rows=5 loops=1)
> >Index Cond: (sid = 325)
> >Filter: (lid = ANY ('{ABF0010,ABF0010,ABF0010,ABF0010,ABF0010 ...
> ABF0060,ABF0060,ABF0060,ABF0060}'::text[]))
> >Rows Removed by Filter: 414
>

...


> >
> >  Index Scan using index_attend_on_sid_and_lid on attend
> (cost=0.42..36.32 rows=3 width=12) (actual time=0.011..0.034 rows=6 loops=1)
> >Index Cond: ((sid = 325) AND (lid = ANY
> ('{ABF0010,ABF0020,ABF0030,ABF0040,ABF0050,ABF0060}'::text[])))
> >Buffers: shared hit=24
>
>

> Is this result aims idx_attend_00 corrupted?
> How to fix it?
> What countermeasure do I it?
>

Yes, almost certainly.  You can fix it by rebuilding the index ("REINDEX
INDEX idx_attend_00").  Whether this will completely fix the problem
depends on what caused it.  There could be a wider corruption issue of
which this is only a symptom.

If you would like to preserve the ability to investigate the root cause,
you should make a full file-level backup of the database *before* doing the
re-index.

What type of index is it?  (I'm now guessing btree, but maybe not)?  Is
there a defined time window during which you know the corruption occurred?
If so, do you still have the server logs from that time window?  The WAL
logs?

Do you know if the sometimes-missing tuple actually belongs in the table or
not?  It could be that the row was marked deleted from the table, scrubbed
from the index, and then inappropriately got "revived" like a zombie in the
table, so that the "corrupt" index is correct and it is the table that is
wrong.

And of course, if you are running with fsync=off or full_page_writes=off,
don't do that.

Cheers,

Jeff


Re: [HACKERS] checkpointer continuous flushing

2015-09-06 Thread Andres Freund
On 2015-09-06 16:05:01 +0200, Fabien COELHO wrote:
> >Wouldn't it be just as easy to put this logic into the checkpointing code?
> 
> Not sure it would simplify anything, because the checkpointer currently
> knows about buffers but flushing is about files, which are hidden from
> view.

It'd not really simplify things, but it'd keep it local.

> >* Wouldn't a binary heap over the tablespaces + progress be nicer?
> 
> I'm not sure where it would fit exactly.

Imagine a binaryheap.h style heap over a structure like (tablespaceid,
progress, progress_inc, nextbuf) where the comparator compares the progress.

> Anyway, I think it would complicate the code significantly (compared to the
> straightforward array)

I doubt it. I mean instead of your GetNext you'd just do:
next_tblspc = DatumGetPointer(binaryheap_first(heap));
if (next_tblspc == 0)
return 0;
next_tblspc.progress += next_tblspc.progress_slice;
binaryheap_replace_first(PointerGetDatum(next_tblspc));

return next_tblspc.nextbuf++;


progress_slice is the number of buffers in the tablespace divided by the
number of total buffers, to avoid doing any sort of expensive math in
the more frequently executed path.

> Moreover such a data structure would probably require some kind of pointer
> (probably 8 bytes added per node, maybe more), and the amount of memory is
> already a concern, at least to me, and moreover it has to reside in shared
> memory which does not simplify allocation of tree data structures.

I'm not seing where you'd need an extra pointer? Maybe the
misunderstanding is that I'm proposing to do a heap over the
*tablespaces* not the actual buffers.

> >If you make the sorting criterion include the tablespace id you wouldn't
> >need the lookahead loop in NextBufferToWrite().
> 
> Yep, I thought of it. It would mean 4 more bytes per buffer, and bsearch to
> find the boundaries, so significantly less simple code.

What for would you need to bsearch?


> I think that the current approach is ok as the number of tablespace
> should be small.

Right that's often the case.

> >Isn't the current approach O(NBuffers^2) in the worst case?
> 
> ISTM that the overall lookahead complexity is Nbuffers * Ntablespace:
> buffers are scanned once for each tablespace.

Which in the worst case is NBuffers * 2...

> ISTM that using a tablespace in the sorting would reduce the complexity to
> ln(NBuffers) * Ntablespace for finding the boundaries, and then Nbuffers *
> (Ntablespace/Ntablespace) = NBuffers for scanning, at the expense of more
> memory and code complexity.

Afaics finding the boundaries can be done as part of the enumeration of
tablespaces in BufferSync(). That code needs to be moved, but that's not
too bad. I don't see the code be that much more complicated?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2015-09-06 Thread and...@anarazel.de
On 2015-09-06 22:57:04 +0300, Ildus Kurbangaliev wrote:
> Ok, I've kept only one tranche for individual LWLocks

But you've added the lock names as a statically sized array to all
tranches? Why not just a pointer to an array that's either NULL ('not
individualy named locks') or appropriately sized?

> > I don't really like the tranche model as in the patch right now. I'd
> > rather have in a way that we have one tranch for all the individual
> > lwlocks, where the tranche points to an array of names alongside the
> > tranche's name. And then for the others we just supply the tranche name,
> > but leave the name array empty, whereas a name can be generated.
> 
> Maybe I don't understand something here, but why add extra field to all 
> tranches
> if we need only one array (especially the array for individual LWLocks).

It's cheap to add an optional pointer field to an individual
tranche. It'll be far less cheap to extend the max number of
tranches. But it's also just ugly to to use a tranche per lock - they're
intended to describe 'runs' of locks.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2015-09-06 Thread Ildus Kurbangaliev
On Sep 6, 2015, at 2:36 PM, and...@anarazel.de wrote:On 2015-09-05 12:48:12 +0300, Ildus Kurbangaliev wrote:Another parts require a some discussion so I didn't touch them yet.At this point I don't see any point in further review until these areaddressed.The idea to create an individual tranches for individual LWLocks havecome from Heikki Linnakangas and I also think that tranche is a good place to keepLWLock name.I think it's rather ugly.Base of these tranches points to MainLWLockArrayAnd that's just plain wrong. The base of a tranche ought to point to thefirst lwlock in it.Ok, I've kept only one tranche for individual LWLocksOn Sep 2, 2015, at 1:43 AM, and...@anarazel.de wrote:I don't really like the tranche model as in the patch right now. I'drather have in a way that we have one tranch for all the individuallwlocks, where the tranche points to an array of names alongside thetranche's name. And then for the others we just supply the tranche name,but leave the name array empty, whereas a name can be generated.Maybe I don't understand something here, but why add extra field to all tranchesif we need only one array (especially the array for individual LWLocks).Ildus KurbangalievPostgres Professional: http://www.postgrespro.comThe Russian Postgres Company

lwlocks_refactor_v2.patch
Description: Binary data


Re: [HACKERS] exposing pg_controldata and pg_config as functions

2015-09-06 Thread Joe Conway
On 09/02/2015 02:54 PM, Alvaro Herrera wrote:
> Josh Berkus wrote:
>> On 09/02/2015 02:34 PM, Alvaro Herrera wrote:
>>> I think trying to duplicate the exact strings isn't too nice an
>>> interface.
>>
>> Well, for pg_controldata, no, but what else would you do for pg_config?
> 
> I was primarily looking at pg_controldata, so we agree there.
> 
> As for pg_config, I'm confused about its usefulness -- which of these
> lines are useful in the SQL interface?  Anyway, I don't see anything
> better than a couple of text columns for this case.

There are production environments where even the superuser has no
direct, local, command line access on production database servers (short
of intentional hacking, which would be frowned upon at best), and the
only interface for getting information from postgres is via a database
connection. So to the extent pg_config and pg_controldata command line
binaries are useful, so is the ability to get the same output via SQL.

Given that, my own feeling is that if we provide a SQL interface at all,
it ought to be pretty much the exact same output as the command line
programs produce.

To the extent that we want specific pg_controldata output in non-text
form, we should identify which items those are and provide individual
functions for them.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


[HACKERS] Discussion summary: Proposal: Implement failover on libpq connect level.

2015-09-06 Thread Victor Wagner
First of all, I wish to thank all the participants of the discussion
for their useful suggestions and critics.

I see that people are really interested in this feature.

Now, list of syntax and behavior changes against original proposal:

1. There is similar feature in JDBC, so syntax of URL-style connect
string should be compatible with JDBC:

postgresql://host:port,host:port/dbname...

2. In the param=value style connect string, there should be way to
specify different non-standard port for each host in the cluster.

So on, it seems that people came to conclusion, that allowing host:port
syntax as value of the host parameter shouldn't break anything.

So, port should be determined following way:

 If there is :portnumber in the host string, this portnumber takes
precedence over anything else for this particular host. To avoid
problems with file system objects with strange names, colon is
interpreted as port number separator only if value of the host
parameter doesn't begin with slash. If it begins with slash, it is
handled exactly as it handled now.

 If port is not explicitly specified in the host string, then use
parameter port= (for all hosts, which are not accompanied with own port
number), if no port parameter in the connect string, use same chain of
fallbacks up to compiled in standard value, as in use now.

3. Simultaneous connection to all the hosts in cluster shouldn't be
default. By default hosts should be tried in order they are specified
in the connect string. To minimize syntax changes there should be
single parameter 'hostorder' which specifies order of hosts with
possible choices
sequential(default), parallel and random.

4. There should be a way to allow cluster software or system
administrator some time to perform failover procedures and promote one
of standbys to master. So, there should be additional parameter,
failover_timeout, which specifies number of seconds connection attempts
should be retried, if no usable master server found.

5. Parameter readonly (which allows to distinguish between failover and
load balancing of readonly clients) should be implemented as in
original proposal.



-- 
   Victor Wagner 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL, RAISE and error context

2015-09-06 Thread Tom Lane
Joe Conway  writes:
>> On 09/05/2015 09:05 AM, Tom Lane wrote:
>>> Pushed.  I'm not too sure about the expected outputs for python other
>>> than 2.6, nor for sepgsql, but hopefully the buildfarm will provide
>>> feedback.

> One-liner required for sepgsql -- attached committed and pushed.

Thanks for checking!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2015-09-06 Thread Pavel Stehule
Hi



> postgres=# select pg_hba_lookup('postgres','all');
>  pg_hba_lookup
> ---
>  (84,local,"[""all""]","[""all""]",,,trust,{})
>  (86,host,"[""all""]","[""all""]",127.0.0.1,,trust,{})
>  (88,host,"[""all""]","[""all""]",::1,,trust,{})
>
> Here I attached a proof of concept patch for the same.
>
> Any suggestions/comments on this proposed approach?
>
>
If I understand well to your proposal, the major benefit is in
impossibility to enter pg_hba keywords - so you don't need to analyse if
parameter is keyword or not? It has sense, although It can be hard to do
image about pg_hba conf from these partial views.

There can be other way - theoretically we can have a function pg_hba_debug
with similar API like pg_hba_conf. The result will be a content of
pg_hba.conf with information about result of any rule.

Regards

Pavel


Re: [HACKERS] checkpointer continuous flushing

2015-09-06 Thread Fabien COELHO


Hello Petr,


function parameters are always in the same line as the function name


ISTM that I did that, or maybe I did not understand what I've done wrong.


I see one instance of this issue
+static int
+NextBufferToWrite(
+   TableSpaceCheckpointStatus *spcStatus, int nb_spaces,
+   int *pspace, int num_to_write, int num_written)


Ok, I was looking for function calls.


should IMHO be formatted as
+static int
+bufcmp(const void * pa, const void * pb)
+{


Indeed.

And I think we generally put the struct typedefs at the top of the C file and 
don't mix them with function definitions (I am talking about the 
TableSpaceCheckpointStatus and TableSpaceCountEntry).


Ok, moved up.

Thanks for the hints!  Two-part v12 attached fixes these.

--
Fabien.diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e3dc23b..96c9a2f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2454,6 +2454,28 @@ include_dir 'conf.d'
   
  
 
+ 
+  checkpoint_sort (bool)
+  
+   checkpoint_sort configuration parameter
+  
+  
+  
+   
+Whether to sort buffers before writting them out to disk on checkpoint.
+For a HDD storage, this setting allows to group together
+neighboring pages written to disk, thus improving performance by
+reducing random write activity.
+This sorting should have limited performance effects on SSD backends
+as such storages have good random write performance, but it may
+help with wear-leveling so be worth keeping anyway.
+The default is on.
+This parameter can only be set in the postgresql.conf
+file or on the server command line.
+   
+  
+ 
+
  
   checkpoint_warning (integer)
   
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index e3941c9..f538698 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -546,6 +546,18 @@
   
 
   
+   When hard-disk drives (HDD) are used for terminal data storage
+allows to sort pages
+   so that neighboring pages on disk will be flushed together by
+   chekpoints, reducing the random write load and improving performance.
+   If solid-state drives (SSD) are used, sorting pages induces no benefit
+   as their random write I/O performance is good: this feature could then
+   be disabled by setting checkpoint_sort to off.
+   It is possible that sorting may help with SSD wear leveling, so it may
+   be kept on that account.
+  
+
+  
The number of WAL segment files in pg_xlog directory depends on
min_wal_size, max_wal_size and
the amount of WAL generated in previous checkpoint cycles. When old log
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 127bc58..74412a6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7999,11 +7999,13 @@ LogCheckpointEnd(bool restartpoint)
 sync_secs,
 total_secs,
 longest_secs,
+sort_secs,
 average_secs;
 	int			write_usecs,
 sync_usecs,
 total_usecs,
 longest_usecs,
+sort_usecs,
 average_usecs;
 	uint64		average_sync_time;
 
@@ -8034,6 +8036,10 @@ LogCheckpointEnd(bool restartpoint)
 		CheckpointStats.ckpt_end_t,
 		&total_secs, &total_usecs);
 
+	TimestampDifference(CheckpointStats.ckpt_sort_t,
+		CheckpointStats.ckpt_sort_end_t,
+		&sort_secs, &sort_usecs);
+
 	/*
 	 * Timing values returned from CheckpointStats are in microseconds.
 	 * Convert to the second plus microsecond form that TimestampDifference
@@ -8052,8 +8058,8 @@ LogCheckpointEnd(bool restartpoint)
 
 	elog(LOG, "%s complete: wrote %d buffers (%.1f%%); "
 		 "%d transaction log file(s) added, %d removed, %d recycled; "
-		 "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
-		 "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
+		 "sort=%ld.%03d s, write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s;"
+		 " sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
 		 "distance=%d kB, estimate=%d kB",
 		 restartpoint ? "restartpoint" : "checkpoint",
 		 CheckpointStats.ckpt_bufs_written,
@@ -8061,6 +8067,7 @@ LogCheckpointEnd(bool restartpoint)
 		 CheckpointStats.ckpt_segs_added,
 		 CheckpointStats.ckpt_segs_removed,
 		 CheckpointStats.ckpt_segs_recycled,
+		 sort_secs, sort_usecs / 1000,
 		 write_secs, write_usecs / 1000,
 		 sync_secs, sync_usecs / 1000,
 		 total_secs, total_usecs / 1000,
diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index 3ae2848..3bd5eab 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -65,7 +65,8 @@ void
 InitBufferPool(void)
 {
 	bool		foundBufs,
-foundDescs;
+foundDescs,
+foundCpid;
 
 	/* Align descriptors to a cacheline boundary. */
 	BufferDescriptors = (BufferDescPadded *) CACHELINEALIGN(
@@ -77,10 +78,14 @@ InitBufferPool(void)
 		ShmemInitStruct("Buffer Blocks",
 		N

Re: [HACKERS] PL/pgSQL, RAISE and error context

2015-09-06 Thread Joe Conway
On 09/05/2015 09:14 AM, Joe Conway wrote:
> On 09/05/2015 09:05 AM, Tom Lane wrote:
>> I wrote:
>>> If there are not major objections, I'll work on cleaning up and
>>> committing the patch.
>>
>> Pushed.  I'm not too sure about the expected outputs for python other
>> than 2.6, nor for sepgsql, but hopefully the buildfarm will provide
>> feedback.
> 
> We don't have the buildfarm actually checking sepgsql yet, but I'll
> check it out manually today or tomorrow.


One-liner required for sepgsql -- attached committed and pushed.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
diff --git a/contrib/sepgsql/expected/label.out b/contrib/sepgsql/expected/label.out
index c84aef7..7af5189 100644
--- a/contrib/sepgsql/expected/label.out
+++ b/contrib/sepgsql/expected/label.out
@@ -156,6 +156,7 @@ LOG:  SELinux: allowed { execute } scontext=unconfined_u:unconfined_r:sepgsql_re
 LOG:  SELinux: allowed { entrypoint } scontext=unconfined_u:unconfined_r:sepgsql_regtest_user_t:s0 tcontext=system_u:object_r:sepgsql_trusted_proc_exec_t:s0 tclass=db_procedure name="function f3()"
 LOG:  SELinux: allowed { transition } scontext=unconfined_u:unconfined_r:sepgsql_regtest_user_t:s0 tcontext=unconfined_u:unconfined_r:sepgsql_trusted_proc_t:s0 tclass=process
 ERROR:  an exception from f3()
+CONTEXT:  PL/pgSQL function f3() line 2 at RAISE
 SELECT f4();			-- failed on domain transition
 LOG:  SELinux: allowed { execute } scontext=unconfined_u:unconfined_r:sepgsql_regtest_user_t:s0 tcontext=system_u:object_r:sepgsql_nosuch_trusted_proc_exec_t:s0 tclass=db_procedure name="public.f4()"
 LOG:  SELinux: allowed { entrypoint } scontext=unconfined_u:unconfined_r:sepgsql_regtest_user_t:s0 tcontext=system_u:object_r:sepgsql_nosuch_trusted_proc_exec_t:s0 tclass=db_procedure name="function f4()"


signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] checkpointer continuous flushing

2015-09-06 Thread Petr Jelinek

On 2015-09-06 19:05, Fabien COELHO wrote:


Here is a rebased two-part v11.


function parameters are always in the same line as the function name


ISTM that I did that, or maybe I did not understand what I've done wrong.



I see one instance of this issue
+static int
+NextBufferToWrite(
+   TableSpaceCheckpointStatus *spcStatus, int nb_spaces,
+   int *pspace, int num_to_write, int num_written)

Also
+static int bufcmp(const void * pa, const void * pb)
+{

should IMHO be formatted as
+static int
+bufcmp(const void * pa, const void * pb)
+{


And I think we generally put the struct typedefs at the top of the C 
file and don't mix them with function definitions (I am talking about 
the TableSpaceCheckpointStatus and TableSpaceCountEntry).


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2015-09-06 Thread Pavel Stehule
Hi


> Any suggestions/comments on this proposed approach?
>

This is interesting functionality - The same was requested by one important
Czech customer.

I'll do review

Regards

Pavel


Re: [HACKERS] checkpointer continuous flushing

2015-09-06 Thread Fabien COELHO


Here is a rebased two-part v11.


* We don't do one-line ifs;


I've found one instance.


function parameters are always in the same line as the function name


ISTM that I did that, or maybe I did not understand what I've done wrong.

--
Fabien.diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e3dc23b..96c9a2f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2454,6 +2454,28 @@ include_dir 'conf.d'
   
  
 
+ 
+  checkpoint_sort (bool)
+  
+   checkpoint_sort configuration parameter
+  
+  
+  
+   
+Whether to sort buffers before writting them out to disk on checkpoint.
+For a HDD storage, this setting allows to group together
+neighboring pages written to disk, thus improving performance by
+reducing random write activity.
+This sorting should have limited performance effects on SSD backends
+as such storages have good random write performance, but it may
+help with wear-leveling so be worth keeping anyway.
+The default is on.
+This parameter can only be set in the postgresql.conf
+file or on the server command line.
+   
+  
+ 
+
  
   checkpoint_warning (integer)
   
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index e3941c9..f538698 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -546,6 +546,18 @@
   
 
   
+   When hard-disk drives (HDD) are used for terminal data storage
+allows to sort pages
+   so that neighboring pages on disk will be flushed together by
+   chekpoints, reducing the random write load and improving performance.
+   If solid-state drives (SSD) are used, sorting pages induces no benefit
+   as their random write I/O performance is good: this feature could then
+   be disabled by setting checkpoint_sort to off.
+   It is possible that sorting may help with SSD wear leveling, so it may
+   be kept on that account.
+  
+
+  
The number of WAL segment files in pg_xlog directory depends on
min_wal_size, max_wal_size and
the amount of WAL generated in previous checkpoint cycles. When old log
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 127bc58..74412a6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7999,11 +7999,13 @@ LogCheckpointEnd(bool restartpoint)
 sync_secs,
 total_secs,
 longest_secs,
+sort_secs,
 average_secs;
 	int			write_usecs,
 sync_usecs,
 total_usecs,
 longest_usecs,
+sort_usecs,
 average_usecs;
 	uint64		average_sync_time;
 
@@ -8034,6 +8036,10 @@ LogCheckpointEnd(bool restartpoint)
 		CheckpointStats.ckpt_end_t,
 		&total_secs, &total_usecs);
 
+	TimestampDifference(CheckpointStats.ckpt_sort_t,
+		CheckpointStats.ckpt_sort_end_t,
+		&sort_secs, &sort_usecs);
+
 	/*
 	 * Timing values returned from CheckpointStats are in microseconds.
 	 * Convert to the second plus microsecond form that TimestampDifference
@@ -8052,8 +8058,8 @@ LogCheckpointEnd(bool restartpoint)
 
 	elog(LOG, "%s complete: wrote %d buffers (%.1f%%); "
 		 "%d transaction log file(s) added, %d removed, %d recycled; "
-		 "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
-		 "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
+		 "sort=%ld.%03d s, write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s;"
+		 " sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
 		 "distance=%d kB, estimate=%d kB",
 		 restartpoint ? "restartpoint" : "checkpoint",
 		 CheckpointStats.ckpt_bufs_written,
@@ -8061,6 +8067,7 @@ LogCheckpointEnd(bool restartpoint)
 		 CheckpointStats.ckpt_segs_added,
 		 CheckpointStats.ckpt_segs_removed,
 		 CheckpointStats.ckpt_segs_recycled,
+		 sort_secs, sort_usecs / 1000,
 		 write_secs, write_usecs / 1000,
 		 sync_secs, sync_usecs / 1000,
 		 total_secs, total_usecs / 1000,
diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index 3ae2848..3bd5eab 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -65,7 +65,8 @@ void
 InitBufferPool(void)
 {
 	bool		foundBufs,
-foundDescs;
+foundDescs,
+foundCpid;
 
 	/* Align descriptors to a cacheline boundary. */
 	BufferDescriptors = (BufferDescPadded *) CACHELINEALIGN(
@@ -77,10 +78,14 @@ InitBufferPool(void)
 		ShmemInitStruct("Buffer Blocks",
 		NBuffers * (Size) BLCKSZ, &foundBufs);
 
-	if (foundDescs || foundBufs)
+	CheckpointBufferIds = (CheckpointSortItem *)
+		ShmemInitStruct("Checkpoint BufferIds",
+		NBuffers * sizeof(CheckpointSortItem), &foundCpid);
+
+	if (foundDescs || foundBufs || foundCpid)
 	{
-		/* both should be present or neither */
-		Assert(foundDescs && foundBufs);
+		/* all should be present or neither */
+		Assert(foundDescs && foundBufs && foundCpid);
 		/* note: this path is only taken in EXEC_BACKEND case */
 	}
 	else
@@ -144,5

Re: [HACKERS] checkpointer continuous flushing

2015-09-06 Thread Fabien COELHO


Hello Andres,


Here's a bunch of comments on this (hopefully the latest?)


Who knows?! :-)


version of the patch:

* I'm not sure I like the FileWrite & FlushBuffer API changes. Do you
 forsee other callsites needing similar logic?


I foresee that the bgwriter should also do something more sensible than 
generating random I/Os over HDDs, and this is also true for workers... But 
this is for another time, maybe.


Wouldn't it be just as easy to put this logic into the checkpointing 
code?


Not sure it would simplify anything, because the checkpointer currently 
knows about buffers but flushing is about files, which are hidden from 
view.


Doing it with this API change means that the code does not have to compute 
twice in which file is a buffer: The buffer/file boundary has to be broken 
somewhere anyway so that flushing can be done when needed, and the 
solution I took seems the simplest way to do it, without having to make 
the checkpointer too much file concious.



* We don't do one-line ifs;


Ok, I'll return them.


function parameters are always in the same line as the function name


Ok, I'll try to improve.


* Wouldn't a binary heap over the tablespaces + progress be nicer?


I'm not sure where it would fit exactly.

Anyway, I think it would complicate the code significantly (compared to 
the straightforward array), so I would not do anything like that without a 
strong intensive, such as an actual failing case.


Moreover such a data structure would probably require some kind of pointer 
(probably 8 bytes added per node, maybe more), and the amount of memory is 
already a concern, at least to me, and moreover it has to reside in shared 
memory which does not simplify allocation of tree data structures.


If you make the sorting criterion include the tablespace id you wouldn't 
need the lookahead loop in NextBufferToWrite().


Yep, I thought of it. It would mean 4 more bytes per buffer, and bsearch 
to find the boundaries, so significantly less simple code. I think that 
the current approach is ok as the number of tablespace should be small.


It may be improved upon later if there is a motivation to do so.


Isn't the current approach O(NBuffers^2) in the worst case?


ISTM that the overall lookahead complexity is Nbuffers * Ntablespace: 
buffers are scanned once for each tablespace. I assume that the number of 
tablespace is kept low, and having a simpler code which use less memory 
seems a good idea.


ISTM that using a tablespace in the sorting would reduce the complexity 
to ln(NBuffers) * Ntablespace for finding the boundaries, and then 
Nbuffers * (Ntablespace/Ntablespace) = NBuffers for scanning, at the 
expense of more memory and code complexity.


So this is a voluntary design decision.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow a per-tablespace effective_io_concurrency setting

2015-09-06 Thread Julien Rouhaud
Hi,

Please find attached a v2 of the patch. See below for changes.


On 02/09/2015 15:53, Andres Freund wrote:
> 
> Hi,
> 
> On 2015-07-18 12:17:39 +0200, Julien Rouhaud wrote:
>> I didn't know that the thread must exists on -hackers to be able to add
>> a commitfest entry, so I transfer the thread here.
> 
> Please, in the future, also update the title of the thread to something
> fitting.
> 
>> @@ -539,6 +541,9 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState 
>> *estate, int eflags)
>>  {
>>  BitmapHeapScanState *scanstate;
>>  RelationcurrentRelation;
>> +#ifdef USE_PREFETCH
>> +int new_io_concurrency;
>> +#endif
>>  
>>  /* check for unsupported flags */
>>  Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
>> @@ -598,6 +603,25 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState 
>> *estate, int eflags)
>>   */
>>  currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, 
>> eflags);
>>  
>> +#ifdef USE_PREFETCH
>> +/* check if the effective_io_concurrency has been overloaded for the
>> + * tablespace storing the relation and compute the 
>> target_prefetch_pages,
>> + * or just get the current target_prefetch_pages
>> + */
>> +new_io_concurrency = get_tablespace_io_concurrency(
>> +currentRelation->rd_rel->reltablespace);
>> +
>> +
>> +scanstate->target_prefetch_pages = target_prefetch_pages;
>> +
>> +if (new_io_concurrency != effective_io_concurrency)
>> +{
>> +double prefetch_pages;
>> +   if (compute_io_concurrency(new_io_concurrency, &prefetch_pages))
>> +scanstate->target_prefetch_pages = rint(prefetch_pages);
>> +}
>> +#endif
> 
> Maybe it's just me - but imo there should be as few USE_PREFETCH
> dependant places in the code as possible. It'll just be 0 when not
> supported, that's fine? Especially changing the size of externally
> visible structs depending on a configure detected ifdef seems wrong to
> me.
> 

I removed these ifdefs, and the more problematic one in the struct.

>> +bool
>> +compute_io_concurrency(int io_concurrency, double *target_prefetch_pages)
>> +{
>> +double  new_prefetch_pages = 0.0;
>> +int i;
>> +
>> +/* make sure the io_concurrency value is correct, it may have been 
>> forced
>> + * with a pg_tablespace UPDATE
>> + */
> 
> Nitpick: Wrong comment style (/* stands on its own line).
> 
>> +if (io_concurrency > MAX_IO_CONCURRENCY)
>> +io_concurrency = MAX_IO_CONCURRENCY;
>> +
>> +/*--
>> + * The user-visible GUC parameter is the number of drives (spindles),
>> + * which we need to translate to a number-of-pages-to-prefetch target.
>> + * The target value is stashed in *extra and then assigned to the actual
>> + * variable by assign_effective_io_concurrency.
>> + *
>> + * The expected number of prefetch pages needed to keep N drives busy 
>> is:
>> + *
>> + * drives |   I/O requests
>> + * ---+
>> + *  1 |   1
>> + *  2 |   2/1 + 2/2 = 3
>> + *  3 |   3/1 + 3/2 + 3/3 = 5 1/2
>> + *  4 |   4/1 + 4/2 + 4/3 + 4/4 = 8 1/3
>> + *  n |   n * H(n)
> 
> I know you just moved this code. But: I don't buy this formula. Like at
> all. Doesn't queuing and reordering entirely invalidate the logic here?
> 

Changing the formula, or changing the GUC to a number of pages to
prefetch is still discussed, so no change here.

> Perhaps more relevantly: Imo nodeBitmapHeapscan.c is the wrong place for
> this. bufmgr.c maybe?
> 

Moved to bufmgr.c


> You also didn't touch
> /*
>  * How many buffers PrefetchBuffer callers should try to stay ahead of their
>  * ReadBuffer calls by.  This is maintained by the assign hook for
>  * effective_io_concurrency.  Zero means "never prefetch".
>  */
> int   target_prefetch_pages = 0;
> which surely doesn't make sense anymore after these changes.
> 
> But do we even need that variable now?

I slighty updated the comment.  If the table doesn't belong to a
tablespace with an overloaded effective_io_concurrency, keeping this
pre-computed target_prefetch_pages can save a few cycles on each
execution, so I think it's better to keep it.

> 
>> diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
>> index dc167f9..57008fc 100644
>> --- a/src/include/utils/guc.h
>> +++ b/src/include/utils/guc.h
>> @@ -26,6 +26,9 @@
>>  #define MAX_KILOBYTES   (INT_MAX / 1024)
>>  #endif
>>  
>> +/* upper limit for effective_io_concurrency */
>> +#define MAX_IO_CONCURRENCY 1000
>> +
>>  /*
>>   * Automatic configuration file name for ALTER SYSTEM.
>>   * This file will be used to store values of configuration parameters
>> @@ -256,6 +259,8 @@ extern int   temp_file_limit;
>>  
>>  extern int  num_temp_buffers;
>>  
>> +extern int  effective_io_concurrency;
>> +
> 
> target_prefetch_pag

Re: [HACKERS] Separating Buffer LWlocks

2015-09-06 Thread Andres Freund
Hi,

On 2015-09-06 14:10:24 +0100, Simon Riggs wrote:
> It separates the Buffer LWLocks from the main LW locks, allowing them to
> have different padding.
> 
> Tests showed noticeable/significant performance gain due to reduced false
> sharing on main LWlocks, though without wasting memory on the buffer
> LWlocks.

Hm. I found that the buffer content lwlocks can actually also be a
significant source of contention - I'm not sure reducing padding for
those is going to be particularly nice. I think we should rather move
the *content* lock inline into the buffer descriptor. The io lock
doesn't matter and can be as small as possible.

Additionally I think we should increase the lwlock padding to 64byte
(i.e. the by far most command cacheline size). In the past I've seen
that to be rather beneficial.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication slot restart_lsn initialization

2015-09-06 Thread Michael Paquier
On Sun, Sep 6, 2015 at 8:32 PM, Andres Freund  wrote:

> [fixes]
>
> Committed, thanks for the patch.
>

Visibly I missed one/two things when hacking out this stuff. Thanks for the
extra cleanup and the commit.
-- 
Michael


[HACKERS] Separating Buffer LWlocks

2015-09-06 Thread Simon Riggs
Looks like the right time to post this patch.

It separates the Buffer LWLocks from the main LW locks, allowing them to
have different padding.

Tests showed noticeable/significant performance gain due to reduced false
sharing on main LWlocks, though without wasting memory on the buffer
LWlocks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


separate_buffer_lwlocks.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Waits monitoring

2015-09-06 Thread Andres Freund
On 2015-09-04 23:44:21 +0100, Simon Riggs wrote:
> I see the need for both current wait information and for cumulative
> historical detail.
> 
> I'm willing to wait before reviewing this, but not for more than 1 more CF.
> 
> Andres, please decide whether we should punt to next CF now, based upon
> other developments. Thanks

I think we can do some of the work concurrently - the whole lwlock
infrastructure piece is rather independent and what currently most of
the arguments are about. I agree that the actual interface will need to
be coordinated.

Ildus, could you please review Amit & Robert's patch?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer continuous flushing

2015-09-06 Thread Andres Freund
Hi,

Here's a bunch of comments on this (hopefully the latest?) version of
the patch:

* I'm not sure I like the FileWrite & FlushBuffer API changes. Do you
  forsee other callsites needing similar logic? Wouldn't it be just as
  easy to put this logic into the checkpointing code?

* We don't do one-line ifs; function parameters are always in the same
  line as the function name

* Wouldn't a binary heap over the tablespaces + progress be nicer? If
  you make the sorting criterion include the tablespace id you wouldn't
  need the lookahead loop in NextBufferToWrite().  Isn't the current
  approach O(NBuffers^2) in the worst case?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2015-09-06 Thread and...@anarazel.de
Hi,

On 2015-09-05 12:48:12 +0300, Ildus Kurbangaliev wrote:
> Another parts require a some discussion so I didn't touch them yet.

At this point I don't see any point in further review until these are
addressed.

> The idea to create an individual tranches for individual LWLocks have
> come from Heikki Linnakangas and I also think that tranche is a good place to 
> keep
> LWLock name.

I think it's rather ugly.

> Base of these tranches points to MainLWLockArray

And that's just plain wrong. The base of a tranche ought to point to the
first lwlock in it.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing

2015-09-06 Thread Andres Freund
On 2015-08-10 07:03:02 +0100, Simon Riggs wrote:
> I was previously a proponent of (2) as a practical way forwards, but my
> proposal here today is that we don't do anything further on 2) yet, and
> seek to make progress on 5) instead.
> 
> If 5) fails to bring a workable solution by the Jan 2016 CF then we commit
> 2) instead.
> 
> If Heikki wishes to work on (5), that's good. Otherwise, I think its
> something I can understand and deliver by 1 Jan, though likely for 1 Nov CF.

I highly doubt that we can get either variant into 9.6 if we only start
to seriously review them by then. Heikki's lsn ranges patch essentially
was a variant of 5) and it ended up being a rather complicated patch. I
don't think using an explicit epoch is going to be that much simpler.

So I think we need to decide now.

My vote is that we should try to get freeze maps into 9.6 - that seems
more realistic given that we have a patch right now. Yes, it might end
up being superflous churn, but it's rather localized. I think around
we've put off significant incremental improvements off with the promise
of more radical stuff too often.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add support for RADIUS passwords longer than 16 octets

2015-09-06 Thread Magnus Hagander
On Tue, Aug 18, 2015 at 11:36 PM, Marko Tiikkaja  wrote:

> On 2015-08-15 17:55, I wrote:
>
>> The attached patch adds support for RADIUS passwords longer than 16
>> octets.
>>
>
> Improved the coding and comments a bit, new version attached.



Looks good to me. Applied, thanks!


As a note - psql truncates passwords at 100 characters when it reads it
from the problem (in the connstr it's fine). Not sure we care enough to
change that, though - I doubt it's a very common usecase for psql...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] replication slot restart_lsn initialization

2015-09-06 Thread Andres Freund
On 2015-08-15 21:16:11 +0900, Michael Paquier wrote:
> Well, this has taken less time than I thought:
> =# CREATE_REPLICATION_SLOT toto PHYSICAL;
>  slot_name | consistent_point | snapshot_name | output_plugin
> ---+--+---+---
>  toto  | 0/0  | null  | null
> (1 row)
> =# CREATE_REPLICATION_SLOT toto2 PHYSICAL RESERVE_WAL;
>  slot_name | consistent_point | snapshot_name | output_plugin
> ---+--+---+---
>  toto2 | 0/0  | null  | null
> (1 row)

Independently I wonder if it's worth splitting the returned columns for
physical/logical slots.

> +  
> +   RESERVE_WAL
> +   
> +
> + Specify that the LSN for this physical replication
> + slot is reserved immediately; otherwise the LSN is
> + reserved on first connection from a streaming replication client.
> +
> +   

I replaced LSN with WAL here - it's not LSNs that are reserved.

> +opt_reserve_wal:
> + K_RESERVE_WAL
> + { $$ = TRUE }
> + | /* EMPTY */
> + { $$ = FALSE }
> +

I felt free to add semicolons here ;)


Committed, thanks for the patch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SQL function to report log message

2015-09-06 Thread dinesh kumar
On Sun, Sep 6, 2015 at 4:46 PM, Pavel Stehule 
wrote:

>
>
> 2015-09-06 13:12 GMT+02:00 dinesh kumar :
>
>>
>> On Sun, Sep 6, 2015 at 4:00 PM, dinesh kumar 
>> wrote:
>>
>>> On Sun, Sep 6, 2015 at 3:39 PM, Pavel Stehule 
>>> wrote:
>>>
 Hi

 attached patch with fixed broken error message

 Regards

 Pavel

>>>
>>> Hi Pavel,
>>>
>>> Thanks much for taking care of it. Patch looks great.
>>>
>>>
>>> Hi Pavel,
>>
>> Could you let me know, what status value i need to change in commitfest's
>> UI.
>>
>
> if you have not objections, the status can be "ready for commiter"
>
>
I do not have objections. Let's take this to committers for more inputs.

Thanks again.



> Regards
>
> Pavel
>
>
>>
>>
>>> --
>>>
>>> Regards,
>>> Dinesh
>>> manojadinesh.blogspot.com
>>>
>>
>>
>>
>> --
>>
>> Regards,
>> Dinesh
>> manojadinesh.blogspot.com
>>
>
>


-- 

Regards,
Dinesh
manojadinesh.blogspot.com


Re: [HACKERS] [PATCH] SQL function to report log message

2015-09-06 Thread Pavel Stehule
2015-09-06 13:12 GMT+02:00 dinesh kumar :

>
> On Sun, Sep 6, 2015 at 4:00 PM, dinesh kumar 
> wrote:
>
>> On Sun, Sep 6, 2015 at 3:39 PM, Pavel Stehule 
>> wrote:
>>
>>> Hi
>>>
>>> attached patch with fixed broken error message
>>>
>>> Regards
>>>
>>> Pavel
>>>
>>
>> Hi Pavel,
>>
>> Thanks much for taking care of it. Patch looks great.
>>
>>
>> Hi Pavel,
>
> Could you let me know, what status value i need to change in commitfest's
> UI.
>

if you have not objections, the status can be "ready for commiter"

Regards

Pavel


>
>
>> --
>>
>> Regards,
>> Dinesh
>> manojadinesh.blogspot.com
>>
>
>
>
> --
>
> Regards,
> Dinesh
> manojadinesh.blogspot.com
>


Re: [HACKERS] [PATCH] SQL function to report log message

2015-09-06 Thread dinesh kumar
On Sun, Sep 6, 2015 at 4:00 PM, dinesh kumar 
wrote:

> On Sun, Sep 6, 2015 at 3:39 PM, Pavel Stehule 
> wrote:
>
>> Hi
>>
>> attached patch with fixed broken error message
>>
>> Regards
>>
>> Pavel
>>
>
> Hi Pavel,
>
> Thanks much for taking care of it. Patch looks great.
>
>
> Hi Pavel,

Could you let me know, what status value i need to change in commitfest's
UI.


> --
>
> Regards,
> Dinesh
> manojadinesh.blogspot.com
>



-- 

Regards,
Dinesh
manojadinesh.blogspot.com


Re: [HACKERS] [PATCH] SQL function to report log message

2015-09-06 Thread dinesh kumar
On Sun, Sep 6, 2015 at 3:39 PM, Pavel Stehule 
wrote:

> Hi
>
> attached patch with fixed broken error message
>
> Regards
>
> Pavel
>

Hi Pavel,

Thanks much for taking care of it. Patch looks great.


-- 

Regards,
Dinesh
manojadinesh.blogspot.com


Re: [HACKERS] [PATCH] SQL function to report log message

2015-09-06 Thread Pavel Stehule
Hi

attached patch with fixed broken error message

Regards

Pavel
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index b3b78d2..b7a2cc2
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** postgres=# SELECT * FROM pg_xlogfile_nam
*** 17925,17930 
--- 17925,17939 
  Return information about a file.
 

+   
+
+ pg_report_log(logleveltext, 
message anyelement[, ishidestmt 
boolean ] [, detail  text] [, 
hint text] [, sqlstate 
text])
+
+void
+
+ Report message or error.
+
+   
   
  
 
*** SELECT (pg_stat_file('filename')).modifi
*** 17993,17998 
--- 18002,18033 
  
 
  
+
+ pg_report_log
+
+
+ pg_report_log is useful to write custom messages
+ or raise exception. This function don't support the PANIC, FATAL
+ log levels due to their unique internal DB usage, which may cause
+ the database instability. Using ishidestmt which default 
values
+ is true, function can write or ignore the current SQL statement
+ into log destination. Also, we can have DETAIL, HINT log messages
+ by provding detail, hint as function
+ arguments, which are NULL by default. The parameter sqlstate
+ allows to set a SQLSTATE of raised exception. Default value of this
+ parameter is 'P0001' for ERROR only level.
+ 
+ Typical usages include:
+ 
+ postgres=# SELECT pg_report_log('NOTICE', 'Custom Message', true);
+ NOTICE:  Custom Message
+  pg_report_log 
+ ---
+  
+ (1 row)
+ 
+
+ 

  

diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
new file mode 100644
index ccc030f..7e551f2
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*** RETURNS jsonb
*** 940,942 
--- 940,961 
  LANGUAGE INTERNAL
  STRICT IMMUTABLE
  AS 'jsonb_set';
+ 
+ CREATE OR REPLACE FUNCTION pg_report_log(loglevel text, message text,
+  ishidestmt boolean DEFAULT true, 
detail text DEFAULT NULL,
+  hint text DEFAULT NULL, sqlstate 
text DEFAULT 'P0001')
+ RETURNS VOID
+ LANGUAGE INTERNAL
+ VOLATILE
+ AS 'pg_report_log';
+ 
+ CREATE OR REPLACE FUNCTION pg_report_log(loglevel text, message anyelement,
+  ishidestmt boolean DEFAULT true, 
detail text DEFAULT NULL,
+  hint text DEFAULT NULL, sqlstate 
text DEFAULT 'P0001')
+ RETURNS VOID
+ VOLATILE
+ AS
+ $$
+ SELECT pg_report_log($1::pg_catalog.text, $2::pg_catalog.text, $3, $4, $5, $6)
+ $$
+ LANGUAGE SQL;
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
new file mode 100644
index c0495d9..b1275bd
*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
*** current_query(PG_FUNCTION_ARGS)
*** 75,80 
--- 75,212 
PG_RETURN_NULL();
  }
  
+ 
+ /*
+  * Parsing error levels
+  */
+ typedef struct
+ {
+   char *err_level;
+   int  ival;
+ } error_levels;
+ 
+ /*
+  * Translate text based elog level to integer value.
+  *
+  * Returns true, when it found known elog elevel else
+  * returns false;
+  */
+ static bool
+ parse_error_level(const char* err_level, int *ival)
+ {
+   error_levels err_levels[]={
+   {"DEBUG5", DEBUG5},
+   {"DEBUG4", DEBUG4},
+   {"DEBUG3", DEBUG3},
+   {"DEBUG2", DEBUG2},
+   {"DEBUG1", DEBUG1},
+   {"LOG", LOG},
+   {"INFO", INFO}, 
+   {"NOTICE", NOTICE},
+   {"WARNING", WARNING},
+   {"ERROR", ERROR},
+   /*
+* Adding PGERROR to elevels if WIN32
+*/
+   #ifdef WIN32
+   {"PGERROR", PGERROR},
+   #endif
+   {NULL, 0}
+   };
+ 
+   error_levels *current;
+ 
+   for (current = err_levels; current->err_level != NULL; current++)
+   {
+   if (pg_strcasecmp(current->err_level, err_level) == 0)
+   {
+   *ival = current->ival;
+ 
+   return true;
+   }
+   }
+ 
+   return false;
+ }
+ 
+ /*
+  * pg_report_log()
+  *
+  * Printing custom log messages in log file.
+  */
+ Datum
+ pg_report_log(PG_FUNCTION_ARGS)
+ {
+   int  elog_level;
+   char *elog_level_str;
+   int  sqlstate = 0;
+   char *sqlstate_str;
+   bool ishidestmt = false;
+   char *err_message = NULL;
+   char *err_detail = NULL;
+   char *err_hint = NULL;
+ 
+   /* log level */
+   if (PG_ARGISNULL(0))
+   ereport(ERROR,
+   (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+   

Re: [HACKERS] Multi-column distinctness.

2015-09-06 Thread Tomas Vondra
On Sep 6, 2015 10:31, "Tomas Vondra"  wrote:
>
> 5) syntax
> -
> The syntax might be one of the pain points if we eventually decide to
commit the multivariate stats patch. I have no intention in blocking this
patch for that reasons, but if we might design the syntax to make it
compatible with the multivariate patch, that'd be nice. But it seems to me
the syntax is pretty much the same, no?
>
> I.e. it uses
>
> ADD STATISTICS (options) ON (columns)
>
> just like the multivariate patch, no? Well, it doesn't really check the
stattypes in ATExecAddDropMvStatistics except for checking there's a single
entry, but the syntax seems OK.
>
> BTW mixing ADD and DROP in ATExecAddDropMvStatistics seems a bit
confusing. Maybe two separate methods would be better?
>

BTW one more comment about the syntax - you ran into the same conflict
between "ADD [COLUMN] column" and "ADD STATISTICS" like I did, but you
solved it by making the COLUMN required while I made STATISTICS keyword.

I'm not enthusiastic about the keyword thing, but making the COLUMN
required is certainly much worse as it breaks many existing scripts. The
keyword inky breaks cases that manipulate "statistics" column.

If any of this is unacceptable, then we probably need to come up with a
different syntax.

Regards

-- 
Tomas Vondra   http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] [PATCH] SQL function to report log message

2015-09-06 Thread Pavel Stehule
Hi

I am sending little bit modified version.

1. sqlstate should be text, not boolean - a boolean is pretty useless
3. fixed formatting and code style

Questions:

I dislike the using empty message when message parameter is null. We have
to show some text or we have to disallow it?

Regards

Pavel
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index b3b78d2..c0b6a72
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** postgres=# SELECT * FROM pg_xlogfile_nam
*** 17925,17930 
--- 17925,17939 
  Return information about a file.
 

+   
+
+ pg_report_log(logleveltext, 
message anyelement[, ishidestmt 
boolean ] [, detail  text] [, 
hint text] [, sqlstate 
text])
+
+void
+
+ Report message or error.
+
+   
   
  
 
*** SELECT (pg_stat_file('filename')).modifi
*** 17993,17998 
--- 18002,18026 
  
 
  
+
+ pg_report_log
+
+
+ pg_report_log is useful to write custom messages
+ into current log destination and returns void.
+ This function don't support the PANIC, FATAL log levels due to their 
unique internal DB usage, which may cause the database instability. Using 
ishidestmt which default values is true, function can write or 
ignore the current SQL statement into log destination. Also, we can have 
DETAIL, HINT log messages by provding detail, hint 
as function arguments, which are NULL by default. Using 
iserrstate which default values is true, enables the function to 
raise the SQLSTATE as ERRCODE_RAISE_EXCEPTION for the only ERROR level.
+ 
+ Typical usages include:
+ 
+ postgres=# SELECT pg_report_log('NOTICE', 'Custom Message', true);
+ NOTICE:  Custom Message
+  pg_report_log 
+ ---
+  
+ (1 row)
+ 
+
+ 

  

diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
new file mode 100644
index ccc030f..1755335
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*** RETURNS jsonb
*** 940,942 
--- 940,961 
  LANGUAGE INTERNAL
  STRICT IMMUTABLE
  AS 'jsonb_set';
+ 
+ CREATE OR REPLACE FUNCTION pg_report_log(loglevel text, message text,
+  ishidestmt boolean DEFAULT true, 
detail text DEFAULT NULL,
+  hint text DEFAULT NULL, sqlstate 
text DEFAULT NULL)
+ RETURNS VOID
+ LANGUAGE INTERNAL
+ VOLATILE
+ AS 'pg_report_log';
+ 
+ CREATE OR REPLACE FUNCTION pg_report_log(loglevel text, message anyelement,
+  ishidestmt boolean DEFAULT true, 
detail text DEFAULT NULL,
+  hint text DEFAULT NULL, sqlstate 
text DEFAULT NULL)
+ RETURNS VOID
+ VOLATILE
+ AS
+ $$
+ SELECT pg_report_log($1::pg_catalog.text, $2::pg_catalog.text, $3, $4, $5, $6)
+ $$
+ LANGUAGE SQL;
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
new file mode 100644
index c0495d9..fd65aae
*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
*** current_query(PG_FUNCTION_ARGS)
*** 75,80 
--- 75,210 
PG_RETURN_NULL();
  }
  
+ 
+ /*
+  * Parsing error levels
+  */
+ typedef struct
+ {
+   char *err_level;
+   int  ival;
+ } error_levels;
+ 
+ /*
+  * Translate text based elog level to integer value.
+  *
+  * Returns true, when it found known elog elevel else
+  * returns false;
+  */
+ static bool
+ parse_error_level(const char* err_level, int *ival)
+ {
+   error_levels err_levels[]={
+   {"DEBUG5", DEBUG5},
+   {"DEBUG4", DEBUG4},
+   {"DEBUG3", DEBUG3},
+   {"DEBUG2", DEBUG2},
+   {"DEBUG1", DEBUG1},
+   {"LOG", LOG},
+   {"INFO", INFO}, 
+   {"NOTICE", NOTICE},
+   {"WARNING", WARNING},
+   {"ERROR", ERROR},
+   /*
+* Adding PGERROR to elevels if WIN32
+*/
+   #ifdef WIN32
+   {"PGERROR", PGERROR},
+   #endif
+   {NULL, 0}
+   };
+ 
+   error_levels *current;
+ 
+   for (current = err_levels; current->err_level != NULL; current++)
+   {
+   if (pg_strcasecmp(current->err_level, err_level) == 0)
+   {
+   *ival = current->ival;
+ 
+   return true;
+   }
+   }
+ 
+   return false;
+ }
+ 
+ /*
+  * pg_report_log()
+  *
+  * Printing custom log messages in log file.
+  */
+ Datum
+ pg_report_log(PG_FUNCTION_ARGS)
+ {
+   int  elog_level;
+   char *elog_level_str;
+   int  sqlstate;
+   char *sqlstate_str;
+   bool ishidestmt = false;
+   char *err_message = NULL;
+   char *err_

Re: [HACKERS] Using quicksort for every external sort run

2015-09-06 Thread Peter Geoghegan
On Sun, Sep 6, 2015 at 1:51 AM, Marc Mamin  wrote:
> Have you considered performances for cases where multiple CREATE INDEX are 
> running in parallel?
> One of our typical use case are large daily tables (50-300 Mio rows) with up 
> to 6 index creations
> that start simultaneously.
> Our servers have 40-60 GB RAM , ca. 12 CPUs and we set maintenance mem to 1-2 
> GB for this.
> If the create index themselves start using parallelism, I guess that we might 
> need to review our workflow...

Not particularly. I imagine that that case would be helped a lot here
(probably more than a simpler case involving only one CREATE INDEX),
because each core would be require fewer main memory accesses overall.
Maybe you can test it and let us know how it goes.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2015-09-06 Thread Marc Mamin
>I will sketch a simple implementation of parallel sorting based on the
>patch series that may be workable, and requires relatively little
>implementation effort compare to other ideas that were raised at
>various times:

Hello,

I've only a very superficial understanding on your work,  
please apologize if this is off topic or if this was already discussed...

Have you considered performances for cases where multiple CREATE INDEX are 
running in parallel?
One of our typical use case are large daily tables (50-300 Mio rows) with up to 
6 index creations
that start simultaneously. 
Our servers have 40-60 GB RAM , ca. 12 CPUs and we set maintenance mem to 1-2 
GB for this.
If the create index themselves start using parallelism, I guess that we might 
need to review our workflow...

best regards,

Marc Mamin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-column distinctness.

2015-09-06 Thread Tomas Vondra

Hello,

On 09/06/2015 10:24 AM, Simon Riggs wrote:

On 28 August 2015 at 09:33, Kyotaro HORIGUCHI
mailto:horiguchi.kyot...@lab.ntt.co.jp>> wrote:

Tomas Vondra is now working on heavily-equipped multivariate
statistics for OLAP usage. In contrast, this is a lightly
implemented solution which calculates only the ratio between a
rows estimated by current method and a actual row number. I think
this doesn't conflict with his work except the grammar part.


I think it very obviously does conflict, so I don't see this patch as
appropriate.

If you think a cut version of Tomas' patch is appropriate, then the
usual response is to give a review that says "Tomas, I think a cut down
version is appropriate here, can we reduce the scope of this patch for
now?". If you have done that and he refuses to listen, then a separate
patch version is appropriate. Otherwise we should just reject this
second patchset to avoid confusion and to avoid encouraging people to
take this approach.


FWIW Horiguchi-san is one of the few people who actually took time to 
review the multivariate stats patch, and I don't quite see this patch as 
conflicting with the multivariate one.


It implements a small subset of the (much larger) multivariate stats 
patch, and reusing it within my patch should not be a big deal. 
Actually, this type of statistics was proposed by Horiguchi-san himself, 
and the multivariate patch does not implement it yet (although I intend 
to address that soon).


So no conflict here. Of course, we need to be a bit careful to make it 
compatible (especially the syntax part).


kind regards

--
Tomas Vondra   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Outdated documentation for PageHeaderData

2015-09-06 Thread Julien Rouhaud
On 06/09/2015 01:39, Michael Paquier wrote:
> 
> On Sun, Sep 6, 2015 at 5:11 AM, Julien Rouhaud
> mailto:julien.rouh...@dalibo.com>> wrote:
> 
> I just noticed that part of storage.sgml was not updated when 9.3
> introduced checksum (and removed pd_tli from PageHeaderData).
> 
> 
> Yep, see that as well:
> http://www.postgresql.org/message-id/cab7npqsv1kr6fqwf2rovter5cxaw8oth+dkzjwp4meji5do...@mail.gmail.com
> I just fell out of committer's laser radar I guess.

Oh, I didn't see that thread. Sounds like I should subscribe on -docs.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-column distinctness.

2015-09-06 Thread Tomas Vondra

Hello Horiguchi-san,

On 08/28/2015 10:33 AM, Kyotaro HORIGUCHI wrote:

Hello, this patch enables planner to be couscious of inter-column
correlation.

Sometimes two or more columns in a table has some correlation
which brings underestimate, which leads to wrong join method and
ends with slow execution.

Tomas Vondra is now working on heavily-equipped multivariate
statistics for OLAP usage. In contrast, this is a lightly
implemented solution which calculates only the ratio between a
rows estimated by current method and a actual row number. I think
this doesn't conflict with his work except the grammar part.


I'd like to sign up as a reviewer of this patch, if you're OK with that. 
I don't see the patch in any of the commitfests, though :-(


Now, a few comments based on reading the patches (will try testing them 
in the next few days, hopefully).


1) catalog design
-
I see you've invented a catalog with a slightly different structure than 
I use in the multivariate stats patch, storing the coefficient for each 
combination of columns in a separate row. The idea in the multivariate 
patch is to generate all possible combinations of columns and store the 
coefficients packed in a single row, which requires implementing a more 
complicated data structure. But your patch does not do that (only 
computes coefficient for the one combination).


This is another potential incompatibility with the multivariate patch, 
although it's rather in the background and should not be difficult to 
rework if needed.


2) find_mv_coeffeicient
-
I don't quite see why this is the right thing

/* Prefer smaller one */
if (mvc->mvccoefficient > 0 && mvc->mvccoefficient < mv_coef)
mv_coef = mvc->mvccoefficient;

i.e. why it's correct to choose the record with the lower coefficient 
and not the one with most columns (the largest subset). Maybe there's 
some rule that those are in fact the same thing (seems like adding a 
column can only lower the coefficient), but it's not quite obvious and 
would deserve a comment.


3) single statistics limitation
---
What annoys me a bit is that this patch only applies a single statistics 
- it won't try "combining" multiple smaller ones (even if they don't 
overlap, which shouldn't be that difficult).


4) no sub-combinations
--
Another slightly annoying thing is that the patch computes statistics 
only for the specified combination of columns, and no subsets. So if you 
have enabled stats on [a,b,c] you won't be able to use that for 
conditions on [a,b], for example. Sure, the user can create the stats 
manually, but IMO the less burden we place on the user the better. It's 
enough that we force users to create the stats at all, we shouldn't 
really force them co create all possible combinations.


5) syntax
-
The syntax might be one of the pain points if we eventually decide to 
commit the multivariate stats patch. I have no intention in blocking 
this patch for that reasons, but if we might design the syntax to make 
it compatible with the multivariate patch, that'd be nice. But it seems 
to me the syntax is pretty much the same, no?


I.e. it uses

ADD STATISTICS (options) ON (columns)

just like the multivariate patch, no? Well, it doesn't really check the 
stattypes in ATExecAddDropMvStatistics except for checking there's a 
single entry, but the syntax seems OK.


BTW mixing ADD and DROP in ATExecAddDropMvStatistics seems a bit 
confusing. Maybe two separate methods would be better?


6) GROUP BY
---
One thing I'd really love to see is improvement of GROUP BY clauses. 
That's one of the main sources of pain in analytical queries, IMHO, 
resulting in either OOM issues in Hash Aggregate, or choice of sort 
paths where hash would be more efficient.


regards
Tomas

--
Tomas Vondra   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-column distinctness.

2015-09-06 Thread Simon Riggs
On 5 September 2015 at 20:46, Bruce Momjian  wrote:

> On Fri, Aug 28, 2015 at 05:33:34PM +0900, Kyotaro HORIGUCHI wrote:
> > Hello, this patch enables planner to be couscious of inter-column
> > correlation.
> >
> > Sometimes two or more columns in a table has some correlation
> > which brings underestimate, which leads to wrong join method and
> > ends with slow execution.
> >
> > Tomas Vondra is now working on heavily-equipped multivariate
> > statistics for OLAP usage. In contrast, this is a lightly
> > implemented solution which calculates only the ratio between a
> > rows estimated by current method and a actual row number. I think
> > this doesn't conflict with his work except the grammar part.
>
> I am very glad multi-variate statistics are being addressed.  I think
> this is our most common cause of optimizer failures.
>

Glad to see you agree.

Why have you said this on *this* patch, when Tomas' patch has been very
obviously addressing this problem since January, with very little traction.

Please add your backing to Tomas' patch.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Multi-column distinctness.

2015-09-06 Thread Simon Riggs
On 28 August 2015 at 09:33, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:


> Tomas Vondra is now working on heavily-equipped multivariate
> statistics for OLAP usage. In contrast, this is a lightly
> implemented solution which calculates only the ratio between a
> rows estimated by current method and a actual row number. I think
> this doesn't conflict with his work except the grammar part.
>

I think it very obviously does conflict, so I don't see this patch as
appropriate.

If you think a cut version of Tomas' patch is appropriate, then the usual
response is to give a review that says "Tomas, I think a cut down version
is appropriate here, can we reduce the scope of this patch for now?". If
you have done that and he refuses to listen, then a separate patch version
is appropriate. Otherwise we should just reject this second patchset to
avoid confusion and to avoid encouraging people to take this approach.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] src/test/ssl broken on HEAD

2015-09-06 Thread Noah Misch
On Thu, Sep 03, 2015 at 01:15:47AM +0200, Andres Freund wrote:
> On 2015-09-02 17:03:46 -0400, Robert Haas wrote:
> > On Wed, Sep 2, 2015 at 4:50 PM, Andrew Dunstan  wrote:
> > > Tell me what's needed and I'll look at creating a buildfarm test module 
> > > for
> > > it.

> > It's not run by the global "check" or "installcheck" targets, because 
> > the
> > temporary installation it creates accepts TCP connections from any user
> > the same host, which is insecure.

The buildfarm client may as well ignore that security hazard, because today's
buildfarm client starts likewise-exposed postmasters directly.

> We could just implement SSL over unix sockets. Obviously the
> connection-encryption aspect isn't actually useful, but e.g. client
> certs still make sense.  Besides, it allows to avoid concerns like the
> above...

Agreed.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers