Re: [HACKERS] Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions

2013-10-24 Thread Kyotaro HORIGUCHI
Hello, 

> Agree that windowing function will return all the rows compared to max and
> group by returing only max rows per group. But even while arriving at the
> aggregate/sorting windowing function seems to spend more effort than group
> by/order by.

(I'll apologise in advance for possible misreading..)

The most cause of the difference in time comes from sorting. Over
90% of total execution time has elapsed while sorting
(49ms->2733ms) for the one using windowing function. If this sort
were useless, the execution time would be less than 300 ms -
seems comparable enough to group-by query.

| Subquery Scan on __unnamed_subquery_0 
|  (actual time=2606.075..2953.937 rows=558 loops=1)
|   Filter: (__unnamed_subquery_0.rn = 1)
|   ->  WindowAgg  (actual time=2606.063..2928.061 rows=122880 loops=1)
| ->  Sort (actual time=2606.020..2733.677 rows=122880 loops=1)
|   Sort Key: student_score.course, student_score.score
|   ->  Seq Scan on student_score  
|   (actual time=0.009..49.026 rows=122880 loops=1)

As you see in above plan, sorting key is (course, score). If your
point is the overall performance but not reusing a kind of
'hash', there's a big chance to eliminate this sorting if you are
able to have an additional index, say,

=# create index idx_co_sc on student_score using btree (course, score);

With this index, you will get a different plan like this,

> uniontest=# explain analyze select student_name from (select student_name, 
> dense_rank() over(partition by course order by score) rn, score from 
> student_score) rnn where rn=2;
>   QUERY PLAN
> ---
>  Subquery Scan on rnn  (actual time=0.088..319.403 rows=135 loops=1)
>Filter: (rnn.rn = 2)
>Rows Removed by Filter: 122746
>->  WindowAgg  (actual time=0.037..296.851 rows=122881 loops=1)
>  ->  Index Scan using idx_co_sc on student_score 
>(actual time=0.027..111.333 rows=122881 loops=1)
>  Total runtime: 319.483 ms

Does this satisfies your needs?

===
> Another thing, (I may be stupid and naive here) does PostgreSQL
> re-uses the hash which has been already created for sort. In
> this case the inner query must have created a hash for windoing
> aggregate. Can't we use that same one while applying the the
> filter "rn=1" ?

Generally saying, hashes cannot yield ordered output by its
nature, I believe.

Windowing function (execnode) always receives tuples sequentially
in the window-defined order (as you see in the explained plan
above) then processes the tuples in semi tuple-by-tuple manner to
perform per-frame aggregaion, and finally outputs tuples of the
same number to input. And furthermore, dense_rank() doesn't even
need per-frame aggregations. So none of the planners so far seems
to have chance to use a kind of hash tables to culculate/execute
windowing fucntions. On the another point, automatically
preserving some internal data within a query beyond the end of
the query brings in 'when to discard it?' problem.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Michael Paquier
On Fri, Oct 25, 2013 at 5:57 AM, Magnus Hagander  wrote:
> In fact I've been considering suggesting we might want to retire the
> difference between archive and hot_standby as wal_level, because the
> difference is usually so small. And the advantage of hot_standby is in
> almost every case worth it. Even in the archive recovery mode, being
> able to do pause_at_recovery_target is extremely useful. And as you
> say in (c) above, many users don't realize that until it's too late.
+1 on removing archive from wal_level. Having both archive and
hot_standby for wal_level is confusing, and if I recall correctly
hot_standby and archive have been kept as possible settings only to
protect people from bugs that the newly-introduced hot_standby could
introduce due to the few WAL records it adds. But it has been a couple
of releases since there have been no such bugs, no?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER FREEZE

2013-10-24 Thread Andres Freund
On 2013-10-24 17:17:22 -0700, Josh Berkus wrote:
> On 10/24/2013 04:55 PM, Robert Haas wrote:
> > On Thu, Oct 24, 2013 at 1:09 PM, Josh Berkus  wrote:
> >> On 10/23/2013 09:58 PM, Amit Kapila wrote:
> >>> I wonder why anyone would like to freeze during CLUSTER command when
> >>> they already have separate way (VACUUM FREEZE) to achieve it, do you
> >>> know or can think of any case where user wants to do it along with
> >>> Cluster command?
> >>
> >> "If I'm rewriting the table anyway, let's freeze it".
> >>
> >> Otherwise, you have to write the same pages twice, if both CLUSTER and
> >> FREEZE are required.
> > 
> > I wonder if we should go so far as to make this the default behavior,
> > instead of just making it an option.
> 
> +1 from me.  Can you think of a reason you *wouldn't* want to freeze?

It makes content from the future appear when you start using the
relation in a query/session with an older snapshot. Currently CLUSTER is
safe against that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regress tests to improve the function coverage of schemacmds and user and tablespace files

2013-10-24 Thread Michael Paquier
Hi,


On Thu, Oct 24, 2013 at 5:59 PM, Haribabu kommi
 wrote:
> Here I have added some more regression tests to improve the missing function
> coverage of schemacmds.c, user.c and tablespace.c.
> The added tests are mainly RENAME TO and OWNER TO support.
Could you add those patches to the next commitfest such as they don't
get lost in the flow?
Here is a URL to it:
https://commitfest.postgresql.org/action/commitfest_view?id=20
Note that you will need a community account to register your patches.

Thanks,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER FREEZE

2013-10-24 Thread Amit Kapila
On Fri, Oct 25, 2013 at 4:29 AM, Thomas Munro  wrote:
> On 24 October 2013 05:58, Amit Kapila  wrote:
>>
>> On Thu, Oct 24, 2013 at 4:58 AM, Thomas Munro  wrote:
>> > Hi
>> > I noticed that CLUSTER doesn't have a FREEZE option.  Here is a patch to
>> > add
>> > that, for consistency with VACUUM.  Is it useful?
>>
>> I wonder why anyone would like to freeze during CLUSTER command when
>> they already have separate way (VACUUM FREEZE) to achieve it, do you
>> know or can think of any case where user wants to do it along with
>> Cluster command?
>
>
> As others have said, the goal is to freeze and cluster in a single step.
> You can already do that if you know how things work under the covers with:
>
> SET vacuum_freeze_min_age = 0;
> CLUSTER my_table;

   True, but in that case why don't we just mention above in the
documentation of CLUSTER command, so that if user wants to freeze
along with Cluster, he can use above way to Cluster.
Some of the reason's, I could think of adding FREEZE as an option are:
a. it's more explicit and easy to use for user.
b. if by chance underlying mechanism changes (which is less likely)
then above way of doing Cluster might not result into freeze.

> This patch lets you say CLUSTER FREEZE instead.  It mirrors VACUUM, which
> can freeze tuples based on the GUC and tuple age or the FREEZE keyword.
>
>>
>> Anyway code side, I think you need to set both feeze_min_age as well
>> as freeze_table_age, see VACUUM command in gram.y
>
>
> Ok, I attach a new version that is more like VACUUM in gram.y.  (Although
> I'm not sure why it isn't a single boolean flag).
The reason of separate flags is that both are used to decide different things,
freeze_min_age - this is used to decide the cutoff_xid, based on which
FrozenTransactionId will be placed on tuple.
freeze_table_age - used do decide, whether to scan all pages of a
relation in Vacuum and in Cluster command it is ignored as it needs to
scan all pages anyway.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER FREEZE

2013-10-24 Thread Amit Kapila
On Thu, Oct 24, 2013 at 10:39 PM, Josh Berkus  wrote:
> On 10/23/2013 09:58 PM, Amit Kapila wrote:
>> I wonder why anyone would like to freeze during CLUSTER command when
>> they already have separate way (VACUUM FREEZE) to achieve it, do you
>> know or can think of any case where user wants to do it along with
>> Cluster command?
>
> "If I'm rewriting the table anyway, let's freeze it".

   So do you think that other places where we are rewriting the table
with exclusive lock, we should have such mechanism or option and even
if it is not there, it is kind of Todo item and tomorrow someone can
write a patch to improve that situation.

> Otherwise, you have to write the same pages twice, if both CLUSTER and
> FREEZE are required.

Yes, this is completely right and I understand this point, but the
question I had in my mind before writing my last mail was that whether
any or all places having this concept deserves to have an option like
FREEZE?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-10-24 Thread Amit Kapila
On Thu, Oct 24, 2013 at 8:37 PM, Robert Haas  wrote:
> On Mon, Oct 21, 2013 at 11:52 PM, Fujii Masao  wrote:
>> So, our consensus is to introduce the hooks for FPW compression so that
>> users can freely select their own best compression algorithm?
>> Also, probably we need to implement at least one compression contrib module
>> using that hook, maybe it's based on pglz or snappy.
>
> I don't favor making this pluggable. I think we should pick snappy or
> lz4 (or something else), put it in the tree, and use it.

The reason why the discussion went towards making it pluggable (or at
least what made me to think like that) was because of below reasons:
a. what somebody needs to do to make snappy or lz4 in the tree, is it
only performance/compression data for some of the scenario's or some
other legal
stuff as well, if it is only performance/compression then what
will be the scenario's (is pgbench sufficient?).
b. there can be cases where one or the other algorithm can be better
or not doing compression is better. For example in one of the other
patches where
we were trying to achieve WAL reduction in Update operation

(http://www.postgresql.org/message-id/8977cb36860c5843884e0a18d8747b036b9a4...@szxeml558-mbs.china.huawei.com),
Heikki has came up with a test (where data is not much
compressible), in such a case, the observation was that LZ was better
than native
compression method used in that patch and Snappy was better than
LZ and not doing compression could be considered preferable in such a
scenario because all the algorithm's were reducing TPS for that case.

Now I think it is certainly better if we could choose one of the
algorithms (snappy or lz4) and test them for most used scenario's for
compression and performance and call it done, but I think giving at
least an option to user to make compression altogether off should be
still considered.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Move rmtree() from libpgport to libpgcommon

2013-10-24 Thread Tom Lane
Peter Eisentraut  writes:
> Move rmtree() from libpgport to libpgcommon

This patch leaves dirmod.c entirely empty on non-Windows platforms.
At least on my OS X Lion laptop, that results in some bleating:

/usr/bin/ranlib: file: libpgport.a(dirmod.o) has no symbols
/usr/bin/ranlib: file: libpgport_srv.a(dirmod_srv.o) has no symbols

Can we do something about that?  Perhaps not build this file on
non-Windows?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER FREEZE

2013-10-24 Thread Tom Lane
Robert Haas  writes:
> I wonder if we should go so far as to make this the default behavior,
> instead of just making it an option.

In that case you'd have to invent a NOFREEZE keyword, no?  Ick.

In any case, it's very far from obvious to me that CLUSTER ought
to throw away information by default, which is what you're proposing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Sawada Masahiko
On Fri, Oct 25, 2013 at 5:57 AM, Magnus Hagander  wrote:
> On Thu, Oct 24, 2013 at 10:51 PM, Josh Berkus  wrote:
>> On 10/24/2013 01:14 PM, Heikki Linnakangas wrote:
>>
>> I think it would be worth estimating what this actually looks like in
>> terms of log write quantity.  My inclication is to say that if it
>> increases log writes less than 10%, we don't need to provide an option
>> to turn it off.
>>
>> The reasons I don't want to provide a disabling GUC are:
>> a) more GUCs
>> b) confusing users
>> c) causing users to disable rewind *until they need it*, at which point
>> it's too late to enable it.
>>
>> So if there's any way we can avoid having a GUC for this, I'm for it.
>> And if we do have a GUC, failback should be enabled by default.
>
> +1 on the principle.
>
> In fact I've been considering suggesting we might want to retire the
> difference between archive and hot_standby as wal_level, because the
> difference is usually so small. And the advantage of hot_standby is in
> almost every case worth it. Even in the archive recovery mode, being
> able to do pause_at_recovery_target is extremely useful. And as you
> say in (c) above, many users don't realize that until it's too late.
>

+1.

Many user would not realize that it is too late If we will provide it
as additional GUC.
And I agree with writing log including the block number of the changed block.
I think that writing log is not lead huge overhead increase.
Is those WAL record replicated to the standby server in synchronous (
of course when configuring sync replication)?
I am concerned that it lead performance overhead with such as
executing SELECT or auto vacuum. especially, when two servers are in
far location.


Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reasons not to like asprintf

2013-10-24 Thread Tom Lane
Robert Haas  writes:
> On Thu, Oct 24, 2013 at 2:18 PM, Peter Eisentraut  wrote:
>> While this is attractive, the same logic would suggest that we rename
>> pg_malloc() to palloc(), and that sounds wrong.  The frontend and
>> backend functions do have different freeing semantics.

> I'd almost be inclined to go the other way and suggest that we try to
> use the pg_ prefix more, at least for things to be shared between
> front and back end code.

Meh.  I think that mainly promotes carpal tunnel syndrome.  The place
for a pg_ prefix is in functions we intend to expose to the "outside
world", such as functions exposed by libpq.  But these are not that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER FREEZE

2013-10-24 Thread Josh Berkus
On 10/24/2013 04:55 PM, Robert Haas wrote:
> On Thu, Oct 24, 2013 at 1:09 PM, Josh Berkus  wrote:
>> On 10/23/2013 09:58 PM, Amit Kapila wrote:
>>> I wonder why anyone would like to freeze during CLUSTER command when
>>> they already have separate way (VACUUM FREEZE) to achieve it, do you
>>> know or can think of any case where user wants to do it along with
>>> Cluster command?
>>
>> "If I'm rewriting the table anyway, let's freeze it".
>>
>> Otherwise, you have to write the same pages twice, if both CLUSTER and
>> FREEZE are required.
> 
> I wonder if we should go so far as to make this the default behavior,
> instead of just making it an option.

+1 from me.  Can you think of a reason you *wouldn't* want to freeze?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER FREEZE

2013-10-24 Thread Robert Haas
On Thu, Oct 24, 2013 at 1:09 PM, Josh Berkus  wrote:
> On 10/23/2013 09:58 PM, Amit Kapila wrote:
>> I wonder why anyone would like to freeze during CLUSTER command when
>> they already have separate way (VACUUM FREEZE) to achieve it, do you
>> know or can think of any case where user wants to do it along with
>> Cluster command?
>
> "If I'm rewriting the table anyway, let's freeze it".
>
> Otherwise, you have to write the same pages twice, if both CLUSTER and
> FREEZE are required.

I wonder if we should go so far as to make this the default behavior,
instead of just making it an option.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER FREEZE

2013-10-24 Thread Thomas Munro
On 24 October 2013 05:58, Amit Kapila  wrote:

> On Thu, Oct 24, 2013 at 4:58 AM, Thomas Munro  wrote:
> > Hi
> > I noticed that CLUSTER doesn't have a FREEZE option.  Here is a patch to
> add
> > that, for consistency with VACUUM.  Is it useful?
>
> I wonder why anyone would like to freeze during CLUSTER command when
> they already have separate way (VACUUM FREEZE) to achieve it, do you
> know or can think of any case where user wants to do it along with
> Cluster command?
>

As others have said, the goal is to freeze and cluster in a single step.
 You can already do that if you know how things work under the covers with:

SET vacuum_freeze_min_age = 0;
CLUSTER my_table;

This patch lets you say CLUSTER FREEZE instead.  It mirrors VACUUM, which
can freeze tuples based on the GUC and tuple age or the FREEZE keyword.


> Anyway code side, I think you need to set both feeze_min_age as well
> as freeze_table_age, see VACUUM command in gram.y
>

Ok, I attach a new version that is more like VACUUM in gram.y.  (Although
I'm not sure why it isn't a single boolean flag).

Thanks,
Thomas Munro


cluster-freeze-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Location for external scripts for Extensions?

2013-10-24 Thread Josh Berkus
On 10/24/2013 02:36 PM, Peter Eisentraut wrote:
> On 10/22/13, 2:27 PM, Josh Berkus wrote:
>> pg_partman has several external (python) scripts which help the
>> extension, located in /extras/ in its source.  The problem currently is
>> that if you install pg_partman via pgxn or package, you don't get those
>> scripts, because there's no "install" location for them.
> 
> Use the SCRIPTS variable in pgxs, and they will get installed.
> 

Oh yeah?  Cool.  Is there an existing extension with an example of this?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Location for external scripts for Extensions?

2013-10-24 Thread Peter Eisentraut
On 10/22/13, 2:27 PM, Josh Berkus wrote:
> pg_partman has several external (python) scripts which help the
> extension, located in /extras/ in its source.  The problem currently is
> that if you install pg_partman via pgxn or package, you don't get those
> scripts, because there's no "install" location for them.

Use the SCRIPTS variable in pgxs, and they will get installed.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Magnus Hagander
On Thu, Oct 24, 2013 at 10:51 PM, Josh Berkus  wrote:
> On 10/24/2013 01:14 PM, Heikki Linnakangas wrote:
>> One extra WAL record whenever a hint bit is set on a page, for the first
>> time after a checkpoint. In other words, a WAL record needs to be
>> written in the same circumstances as with page checksums, but the WAL
>> records are much smaller as they don't need to contain a full page
>> image, just the block number of the changed block.
>>
>> Or maybe we'll write the full page image after all, like with page
>> checksums, just without calculating the checksums. It might be tricky to
>> skip the full-page image, because then a subsequent change of the page
>> (which isn't just a hint-bit update) needs to somehow know it needs to
>> take a full page image even though a WAL record for it was already written.
>
> I think it would be worth estimating what this actually looks like in
> terms of log write quantity.  My inclication is to say that if it
> increases log writes less than 10%, we don't need to provide an option
> to turn it off.
>
> The reasons I don't want to provide a disabling GUC are:
> a) more GUCs
> b) confusing users
> c) causing users to disable rewind *until they need it*, at which point
> it's too late to enable it.
>
> So if there's any way we can avoid having a GUC for this, I'm for it.
> And if we do have a GUC, failback should be enabled by default.

+1 on the principle.

In fact I've been considering suggesting we might want to retire the
difference between archive and hot_standby as wal_level, because the
difference is usually so small. And the advantage of hot_standby is in
almost every case worth it. Even in the archive recovery mode, being
able to do pause_at_recovery_target is extremely useful. And as you
say in (c) above, many users don't realize that until it's too late.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reasons not to like asprintf

2013-10-24 Thread Robert Haas
On Thu, Oct 24, 2013 at 2:18 PM, Peter Eisentraut  wrote:
> On 10/22/13, 3:40 PM, Tom Lane wrote:
>> In order to avoid having to clutter stuff like that with #ifdef FRONTENDs,
>> I'm now thinking we should use exactly the same names for the frontend and
>> backend versions, ie psprintf() and pvsprintf().  The main reason for
>> considering a pg_ prefix for the frontend versions was to avoid cluttering
>> application namespace; but it's already the case that we don't expect
>> libpgcommon to be namespace clean.
>
> While this is attractive, the same logic would suggest that we rename
> pg_malloc() to palloc(), and that sounds wrong.  The frontend and
> backend functions do have different freeing semantics.

I'd almost be inclined to go the other way and suggest that we try to
use the pg_ prefix more, at least for things to be shared between
front and back end code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Josh Berkus
On 10/24/2013 01:14 PM, Heikki Linnakangas wrote:
> One extra WAL record whenever a hint bit is set on a page, for the first
> time after a checkpoint. In other words, a WAL record needs to be
> written in the same circumstances as with page checksums, but the WAL
> records are much smaller as they don't need to contain a full page
> image, just the block number of the changed block.
> 
> Or maybe we'll write the full page image after all, like with page
> checksums, just without calculating the checksums. It might be tricky to
> skip the full-page image, because then a subsequent change of the page
> (which isn't just a hint-bit update) needs to somehow know it needs to
> take a full page image even though a WAL record for it was already written.

I think it would be worth estimating what this actually looks like in
terms of log write quantity.  My inclication is to say that if it
increases log writes less than 10%, we don't need to provide an option
to turn it off.

The reasons I don't want to provide a disabling GUC are:
a) more GUCs
b) confusing users
c) causing users to disable rewind *until they need it*, at which point
it's too late to enable it.

So if there's any way we can avoid having a GUC for this, I'm for it.
And if we do have a GUC, failback should be enabled by default.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Heikki Linnakangas

On 24.10.2013 23:07, Josh Berkus wrote:

On 10/24/2013 11:12 AM, Heikki Linnakangas wrote:

On 24.10.2013 20:39, Josh Berkus wrote:

On 10/24/2013 04:15 AM, Pavan Deolasee wrote:

If we do what you are suggesting, it seems like a single line patch
to me.
In XLogSaveBufferForHint(), we probably need to look at this
additional GUC
to decide whether or not to backup the block.


Wait, what?  Why are we having an additional GUC?

I'm opposed to the idea of having a GUC to enable failback.  When would
anyone using replication ever want to disable that?


For example, if you're not replicating for high availability purposes,
but to keep a reporting standby up-to-date.


What kind of overhead are we talking about here?  You probably said, but
I've had a mail client meltdown and lost a lot of my -hackers emails.


One extra WAL record whenever a hint bit is set on a page, for the first 
time after a checkpoint. In other words, a WAL record needs to be 
written in the same circumstances as with page checksums, but the WAL 
records are much smaller as they don't need to contain a full page 
image, just the block number of the changed block.


Or maybe we'll write the full page image after all, like with page 
checksums, just without calculating the checksums. It might be tricky to 
skip the full-page image, because then a subsequent change of the page 
(which isn't just a hint-bit update) needs to somehow know it needs to 
take a full page image even though a WAL record for it was already written.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Josh Berkus
On 10/24/2013 11:12 AM, Heikki Linnakangas wrote:
> On 24.10.2013 20:39, Josh Berkus wrote:
>> On 10/24/2013 04:15 AM, Pavan Deolasee wrote:
>>> If we do what you are suggesting, it seems like a single line patch
>>> to me.
>>> In XLogSaveBufferForHint(), we probably need to look at this
>>> additional GUC
>>> to decide whether or not to backup the block.
>>
>> Wait, what?  Why are we having an additional GUC?
>>
>> I'm opposed to the idea of having a GUC to enable failback.  When would
>> anyone using replication ever want to disable that?
> 
> For example, if you're not replicating for high availability purposes,
> but to keep a reporting standby up-to-date.

What kind of overhead are we talking about here?  You probably said, but
I've had a mail client meltdown and lost a lot of my -hackers emails.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] missing locking in at least INSERT INTO view WITH CHECK

2013-10-24 Thread Dean Rasheed
On 24 October 2013 18:28, Andres Freund  wrote:
> On 2013-10-23 21:20:58 +0100, Dean Rasheed wrote:
>> On 23 October 2013 21:08, Andres Freund  wrote:
>> > On 2013-10-23 20:51:27 +0100, Dean Rasheed wrote:
>> >> Hmm, my first thought is that rewriteTargetView() should be calling
>> >> AcquireRewriteLocks() on viewquery, before doing too much with it.
>> >> There may be sub-queries in viewquery's quals (and also now in its
>> >> targetlist) and I don't think the relations referred to by those
>> >> sub-queries are getting locked.
>> >
>> > Well, that wouldn't follow the currently documented rule ontop
>> > of QueryRewrite:
>> >  * NOTE: the parsetree must either have come straight from the parser,
>> >  * or have been scanned by AcquireRewriteLocks to acquire suitable locks.
>> >
>> > It might still be the right thing to do, but it seems suspicious that
>> > the rules need to be tweaked like that.
>> >
>>
>> Well it matches what already happens in other places in the rewriter
>> --- see rewriteRuleAction() and ApplyRetrieveRule(). It's precisely
>> because the rule action's query hasn't come from the parser that it
>> needs to be processed in this way.
>
> I really don't know that are of code that well, fortunately I never had
> to wade around it much so far...
>
> Reading your explanation and a bit of the code your plan sound sane. Are
> you going to propose a patch?
>

I thought about making rewriteTargetView() call AcquireRewriteLocks(),
but on closer inspection I think that is probably over the top. 90% of
the code in AcquireRewriteLocks() is to process the query's RTEs, but
rewriteTargetView() is already locking the single RTE in viewquery
anyway. Also AcquireRewriteLocks() would acquire the wrong kind of
lock on that RTE, since viewquery is a SELECT, but we want an
exclusive lock because the RTE is about to become the outer query's
result relation. AcquireRewriteLocks() also processes CTEs, but we
know that we won't have any of those in rewriteTargetView(). So I
think the only thing missing from rewriteTargetView() is to lock any
relations from any sublink subqueries in viewquery using
acquireLocksOnSubLinks() -- patch attached.

Regards,
Dean
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
new file mode 100644
index c52a374..e6a9e7b
*** a/src/backend/rewrite/rewriteHandler.c
--- b/src/backend/rewrite/rewriteHandler.c
*** rewriteTargetView(Query *parsetree, Rela
*** 2589,2594 
--- 2589,2604 
  	heap_close(base_rel, NoLock);
  
  	/*
+ 	 * If the view query contains any sublink subqueries, we should also
+ 	 * acquire locks on any relations they refer to. We know that there won't
+ 	 * be any subqueries in the range table or CTEs, so we can skip those, as
+ 	 * in AcquireRewriteLocks.
+ 	 */
+ 	if (viewquery->hasSubLinks)
+ 		query_tree_walker(viewquery, acquireLocksOnSubLinks, NULL,
+ 		  QTW_IGNORE_RC_SUBQUERIES);
+ 
+ 	/*
  	 * Create a new target RTE describing the base relation, and add it to the
  	 * outer query's rangetable.  (What's happening in the next few steps is
  	 * very much like what the planner would do to "pull up" the view into the

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 regression

2013-10-24 Thread Jon Nelson
On Thu, Oct 24, 2013 at 5:41 AM, Thom Brown  wrote:
> On 5 September 2013 22:24, Bruce Momjian  wrote:
>> On Mon, Aug 19, 2013 at 09:27:57PM -0400, Stephen Frost wrote:
>>> * Andres Freund (and...@2ndquadrant.com) wrote:
>>> > I vote for adapting the patch to additionally zero out the file via
>>> > write(). In your tests that seemed to perform at least as good as the
>>> > old method... It also has the advantage that we can use it a littlebit
>>> > more as a testbed for possibly using it for heap extensions one day.
>>> > We're pretty early in the cycle, so I am not worried about this too 
>>> > much...
>>>
>>> I dunno, I'm pretty disappointed that this doesn't actually improve
>>> things.  Just following this casually, it looks like it might be some
>>> kind of locking issue in the kernel that's causing it to be slower; or
>>> at least some code path that isn't exercise terribly much and therefore
>>> hasn't been given the love that it should.
>>>
>>> Definitely interested in what Ts'o says, but if we can't figure out why
>>> it's slower *without* writing out the zeros, I'd say we punt on this
>>> until Linux and the other OS folks improve the situation.
>>
>> FYI, the patch has been reverted.
>
> Is there an updated patch available for this?  And did anyone hear from Ts'o?

After the patch was reverted, it was not re-submitted. I have tried 3
or 4 times to get more info out of Ts'o , without luck.

-- 
Jon


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deprecations in authentication

2013-10-24 Thread Peter Eisentraut
On 10/24/13, 2:37 PM, Magnus Hagander wrote:
> They're not causing compiler warnings when you just build with gssapi,
> correct? Only if you enable the native krb5?

Well, actually I was just about to reply that gssapi is also deprecated.
 They want you to use some framework instead.

That's something we'll have to look into at some point, if we want to
support gssapi on this platform in the future.

The issue about removing krb5 is valid independent of this, I think.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deprecations in authentication

2013-10-24 Thread Magnus Hagander
On Thu, Oct 24, 2013 at 8:35 PM, Peter Eisentraut  wrote:
> On 10/18/12, 7:20 AM, Magnus Hagander wrote:
>> 1. krb5 authentication. We've had gssapi since 8.3 (which means in all
>> supported versions). krb5 has been deprecated, also since 8.3. Time to
>> remove it?
>
> OS X Mavericks has now marked just about everything in krb5.h as
> deprecated, leading to compiler warnings.  Which reminded me of this
> thread.  Maybe it's time.

Yeah, it's still sitting on my TODO to get done for 9.4. I guess
that's another reason...

They're not causing compiler warnings when you just build with gssapi,
correct? Only if you enable the native krb5?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deprecations in authentication

2013-10-24 Thread Peter Eisentraut
On 10/18/12, 7:20 AM, Magnus Hagander wrote:
> 1. krb5 authentication. We've had gssapi since 8.3 (which means in all
> supported versions). krb5 has been deprecated, also since 8.3. Time to
> remove it?

OS X Mavericks has now marked just about everything in krb5.h as
deprecated, leading to compiler warnings.  Which reminded me of this
thread.  Maybe it's time.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reasons not to like asprintf

2013-10-24 Thread Tom Lane
Peter Eisentraut  writes:
> On 10/22/13, 3:40 PM, Tom Lane wrote:
>> In order to avoid having to clutter stuff like that with #ifdef FRONTENDs,
>> I'm now thinking we should use exactly the same names for the frontend and
>> backend versions, ie psprintf() and pvsprintf().  The main reason for
>> considering a pg_ prefix for the frontend versions was to avoid cluttering
>> application namespace; but it's already the case that we don't expect
>> libpgcommon to be namespace clean.

> While this is attractive, the same logic would suggest that we rename
> pg_malloc() to palloc(), and that sounds wrong.  The frontend and
> backend functions do have different freeing semantics.

We already crossed that bridge, though, by defining "palloc" in frontend
environments to mean pg_malloc.  I'm doubtful that insisting on different
names is going to result in anything except #ifdef clutter.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reasons not to like asprintf

2013-10-24 Thread Peter Eisentraut
On 10/22/13, 3:40 PM, Tom Lane wrote:
> In order to avoid having to clutter stuff like that with #ifdef FRONTENDs,
> I'm now thinking we should use exactly the same names for the frontend and
> backend versions, ie psprintf() and pvsprintf().  The main reason for
> considering a pg_ prefix for the frontend versions was to avoid cluttering
> application namespace; but it's already the case that we don't expect
> libpgcommon to be namespace clean.

While this is attractive, the same logic would suggest that we rename
pg_malloc() to palloc(), and that sounds wrong.  The frontend and
backend functions do have different freeing semantics.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Heikki Linnakangas

On 24.10.2013 20:39, Josh Berkus wrote:

On 10/24/2013 04:15 AM, Pavan Deolasee wrote:

If we do what you are suggesting, it seems like a single line patch to me.
In XLogSaveBufferForHint(), we probably need to look at this additional GUC
to decide whether or not to backup the block.


Wait, what?  Why are we having an additional GUC?

I'm opposed to the idea of having a GUC to enable failback.  When would
anyone using replication ever want to disable that?


For example, if you're not replicating for high availability purposes, 
but to keep a reporting standby up-to-date.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Josh Berkus
On 10/24/2013 04:15 AM, Pavan Deolasee wrote:
> If we do what you are suggesting, it seems like a single line patch to me.
> In XLogSaveBufferForHint(), we probably need to look at this additional GUC
> to decide whether or not to backup the block.

Wait, what?  Why are we having an additional GUC?

I'm opposed to the idea of having a GUC to enable failback.  When would
anyone using replication ever want to disable that?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] missing locking in at least INSERT INTO view WITH CHECK

2013-10-24 Thread Andres Freund
On 2013-10-23 21:20:58 +0100, Dean Rasheed wrote:
> On 23 October 2013 21:08, Andres Freund  wrote:
> > On 2013-10-23 20:51:27 +0100, Dean Rasheed wrote:
> >> Hmm, my first thought is that rewriteTargetView() should be calling
> >> AcquireRewriteLocks() on viewquery, before doing too much with it.
> >> There may be sub-queries in viewquery's quals (and also now in its
> >> targetlist) and I don't think the relations referred to by those
> >> sub-queries are getting locked.
> >
> > Well, that wouldn't follow the currently documented rule ontop
> > of QueryRewrite:
> >  * NOTE: the parsetree must either have come straight from the parser,
> >  * or have been scanned by AcquireRewriteLocks to acquire suitable locks.
> >
> > It might still be the right thing to do, but it seems suspicious that
> > the rules need to be tweaked like that.
> >
> 
> Well it matches what already happens in other places in the rewriter
> --- see rewriteRuleAction() and ApplyRetrieveRule(). It's precisely
> because the rule action's query hasn't come from the parser that it
> needs to be processed in this way.

I really don't know that are of code that well, fortunately I never had
to wade around it much so far...

Reading your explanation and a bit of the code your plan sound sane. Are
you going to propose a patch?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sigh, my old HPUX box is totally broken by DSM patch

2013-10-24 Thread Robert Haas
On Thu, Oct 24, 2013 at 1:13 PM, Tom Lane  wrote:
> Stephen Frost  writes:
>> * Robert Haas (robertmh...@gmail.com) wrote:
>>> On Wed, Oct 23, 2013 at 11:35 AM, Stephen Frost  wrote:
 Would this make sense as a configure-time check, rather than initdb, to
 try blocking SIGSYS and checking for an ENOSYS from shm_open()?  Seems
 preferrable to do that in a configure check rather than initdb.
>
>>> I don't see why.  It's a run-time behavior; the build system may not
>>> be where the binaries will ultimately run.
>
>> I suppose, just need to be more cautious when blocking signals in initdb
>> than in a configure-time check, of course.
>
> Indeed, telling initdb to ignore SIGSYS makes it do what we want on
> this box:
>
> $ git diff
> diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
> index 
> 3983b23731330b78a66a74d14faaf76f7aff85c2..05252df869d128ac2cf3b1c48c6259d6d95b0ffc
>  100644
> *** a/src/bin/initdb/initdb.c
> --- b/src/bin/initdb/initdb.c
> *** setup_signals(void)
> *** 3197,3202 
> --- 3197,3207 
>   #ifdef SIGPIPE
> pqsignal(SIGPIPE, SIG_IGN);
>   #endif
> +
> +   /* Prevent SIGSYS so we can probe for kernel calls that might not work */
> + #ifdef SIGSYS
> +   pqsignal(SIGSYS, SIG_IGN);
> + #endif
>   }
>
>
> $ initdb
> ...
> selecting dynamic shared memory implementation ... sysv
> ...
>
> The above patch ignores SIGSYS throughout initdb.  We could narrow the
> possible side-effects by only disabling SIGSYS around the shm_open call,
> but I'm not sure there's any value in that.  It seems likely to me that
> the same kind of problem might pop up elsewhere in future, as we try
> to make use of other modern kernel facilities.  In fact, I can foresee
> wanting to run the whole backend this way --- though I'm not proposing
> doing so today.
>
> A bit of googling turned up the following paragraph of rationale in the
> POSIX spec (Open Group Base Specifications 2013 edition):
>
>   There is very little that a Conforming POSIX.1 Application can do by
>   catching, ignoring, or masking any of the signals SIGILL, SIGTRAP,
>   SIGIOT, SIGEMT, SIGBUS, SIGSEGV, SIGSYS, or SIGFPE. They will generally
>   be generated by the system only in cases of programming errors. While it
>   may be desirable for some robust code (for example, a library routine)
>   to be able to detect and recover from programming errors in other code,
>   these signals are not nearly sufficient for that purpose. One portable
>   use that does exist for these signals is that a command interpreter can
>   recognize them as the cause of termination of a process (with wait())
>   and print an appropriate message.
>
> So in other words, the reason for delivering SIGSYS rather than returning
> ENOSYS by default is to make it apparent from the process exit code that
> you made an invalid kernel call, should your code be sloppy enough that
> it fails to notice and report kernel call failures.  This argument doesn't
> seem to me to hold a lot of water for Postgres' purposes.
>
> Comments?

Your proposed change to initdb seems fine to me.

If we change initdb but not the backend, then somebody could later
manually change postgresql.conf to set
dynamic_shared_memory_type=posix.  When they try to restart the
postmaster, it'll die with SIGSYS rather than exiting with a
relatively clean error message.  However, at the moment, it seems like
the only people who are likely to encounter that situation are those
who install PostgreSQL 9.4 on very old HP-UX boxen and then change the
configuration settings chosen by initdb, and there shouldn't be many
such people.  Therefore I tend to think that changing initdb is
sufficient for now; we can take the risk of changing the backend's
handling of SIGSYS if and when it becomes clear that there's enough
benefit to doing so to justify the risk.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sigh, my old HPUX box is totally broken by DSM patch

2013-10-24 Thread Andres Freund
On 2013-10-24 13:13:23 -0400, Tom Lane wrote:
> Stephen Frost  writes:
> The above patch ignores SIGSYS throughout initdb.  We could narrow the
> possible side-effects by only disabling SIGSYS around the shm_open call,
> but I'm not sure there's any value in that.  It seems likely to me that
> the same kind of problem might pop up elsewhere in future, as we try
> to make use of other modern kernel facilities.  In fact, I can foresee
> wanting to run the whole backend this way --- though I'm not proposing
> doing so today.

Why not? I don't see the advantage of looking for effects/problems of
such a chance twice.

I'd also much rather see a wrongly configured postgres fail to start
with a legible error message instead of it being killed by a signal.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-10-24 Thread k...@rice.edu
On Thu, Oct 24, 2013 at 12:22:59PM -0400, Robert Haas wrote:
> On Thu, Oct 24, 2013 at 11:40 AM, k...@rice.edu  wrote:
> > On Thu, Oct 24, 2013 at 11:07:38AM -0400, Robert Haas wrote:
> >> On Mon, Oct 21, 2013 at 11:52 PM, Fujii Masao  
> >> wrote:
> >> > So, our consensus is to introduce the hooks for FPW compression so that
> >> > users can freely select their own best compression algorithm?
> >> > Also, probably we need to implement at least one compression contrib 
> >> > module
> >> > using that hook, maybe it's based on pglz or snappy.
> >>
> >> I don't favor making this pluggable. I think we should pick snappy or
> >> lz4 (or something else), put it in the tree, and use it.
> >>
> > Hi,
> >
> > My vote would be for lz4 since it has faster single thread compression
> > and decompression speeds with the decompression speed being almost 2X
> > snappy's decompression speed. The both are BSD licensed so that is not
> > an issue. The base code for lz4 is c and it is c++ for snappy. There
> > is also a HC (high-compression) varient for lz4 that pushes its compression
> > rate to about the same as zlib (-1) which uses the same decompressor which
> > can provide data even faster due to better compression. Some more real
> > world tests would be useful, which is really where being pluggable would
> > help.
> 
> Well, it's probably a good idea for us to test, during the development
> cycle, which algorithm works better for WAL compression, and then use
> that one.  Once we make that decision, I don't see that there are many
> circumstances in which a user would care to override it.  Now if we
> find that there ARE reasons for users to prefer different algorithms
> in different situations, that would be a good reason to make it
> configurable (or even pluggable).  But if we find that no such reasons
> exist, then we're better off avoiding burdening users with the need to
> configure a setting that has only one sensible value.
> 
> It seems fairly clear from previous discussions on this mailing list
> that snappy and lz4 are the top contenders for the position of
> "compression algorithm favored by PostgreSQL".  I am wondering,
> though, whether it wouldn't be better to add support for both - say we
> added both to libpgcommon, and perhaps we could consider moving pglz
> there as well.  That would allow easy access to all of those
> algorithms from both front-end and backend-code.  If we can make the
> APIs parallel, it should very simple to modify any code we add now to
> use a different algorithm than the one initially chosen if in the
> future we add algorithms to or remove algorithms from the list, or if
> one algorithm is shown to outperform another in some particular
> context.  I think we'll do well to isolate the question of adding
> support for these algorithms form the current patch or any other
> particular patch that may be on the table, and FWIW, I think having
> two leading contenders and adding support for both may have a variety
> of advantages over crowning a single victor.
> 
+++1

Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2013-10-24 Thread Robert Haas
On Thu, Oct 24, 2013 at 1:00 PM, Andres Freund  wrote:
> On 2013-10-24 16:06:19 +0300, Heikki Linnakangas wrote:
>> On 24.10.2013 09:03, Abhijit Menon-Sen wrote:
>> >One open question is what to do about rounding up the size. It should
>> >not be necessary, but for the fairly recent bug described at the link
>> >in the comment (https://bugzilla.kernel.org/show_bug.cgi?id=56881). I
>> >tried it without the rounding-up, and it fails on Ubuntu's 3.5.0-28
>> >kernel (mmap returns EINVAL).
>>
>> Let's get rid of the rounding. It's clearly a kernel bug, and it shouldn't
>> be our business to add workarounds for any kernel bug out there. And the
>> worst that will happen if you're running a buggy kernel version is that you
>> fall back to not using huge pages (assuming huge_tlb_pages=try).
>
> But it's a range of relatively popular kernels, that will stay around
> for a good while. So I am hesitant to just not do anything about it. The
> directory scanning code isn't that bad imo.
>
> Either way:
> I think we should log when we tried to use hugepages but fell back to
> plain mmap, currently it's hard to see whether they are used.

Logging it might be a good idea, but suppose the systems been running
for 6 months and you don't have the startup logs.  Might be a good way
to have an easy way to discover later what happened back then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sigh, my old HPUX box is totally broken by DSM patch

2013-10-24 Thread Tom Lane
Stephen Frost  writes:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> On Wed, Oct 23, 2013 at 11:35 AM, Stephen Frost  wrote:
>>> Would this make sense as a configure-time check, rather than initdb, to
>>> try blocking SIGSYS and checking for an ENOSYS from shm_open()?  Seems
>>> preferrable to do that in a configure check rather than initdb.

>> I don't see why.  It's a run-time behavior; the build system may not
>> be where the binaries will ultimately run.

> I suppose, just need to be more cautious when blocking signals in initdb
> than in a configure-time check, of course.

Indeed, telling initdb to ignore SIGSYS makes it do what we want on
this box:

$ git diff
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 
3983b23731330b78a66a74d14faaf76f7aff85c2..05252df869d128ac2cf3b1c48c6259d6d95b0ffc
 100644
*** a/src/bin/initdb/initdb.c
--- b/src/bin/initdb/initdb.c
*** setup_signals(void)
*** 3197,3202 
--- 3197,3207 
  #ifdef SIGPIPE
pqsignal(SIGPIPE, SIG_IGN);
  #endif
+ 
+   /* Prevent SIGSYS so we can probe for kernel calls that might not work */
+ #ifdef SIGSYS
+   pqsignal(SIGSYS, SIG_IGN);
+ #endif
  }
  
  
$ initdb
...
selecting dynamic shared memory implementation ... sysv
...

The above patch ignores SIGSYS throughout initdb.  We could narrow the
possible side-effects by only disabling SIGSYS around the shm_open call,
but I'm not sure there's any value in that.  It seems likely to me that
the same kind of problem might pop up elsewhere in future, as we try
to make use of other modern kernel facilities.  In fact, I can foresee
wanting to run the whole backend this way --- though I'm not proposing
doing so today.

A bit of googling turned up the following paragraph of rationale in the
POSIX spec (Open Group Base Specifications 2013 edition):

  There is very little that a Conforming POSIX.1 Application can do by
  catching, ignoring, or masking any of the signals SIGILL, SIGTRAP,
  SIGIOT, SIGEMT, SIGBUS, SIGSEGV, SIGSYS, or SIGFPE. They will generally
  be generated by the system only in cases of programming errors. While it
  may be desirable for some robust code (for example, a library routine)
  to be able to detect and recover from programming errors in other code,
  these signals are not nearly sufficient for that purpose. One portable
  use that does exist for these signals is that a command interpreter can
  recognize them as the cause of termination of a process (with wait())
  and print an appropriate message.

So in other words, the reason for delivering SIGSYS rather than returning
ENOSYS by default is to make it apparent from the process exit code that
you made an invalid kernel call, should your code be sloppy enough that
it fails to notice and report kernel call failures.  This argument doesn't
seem to me to hold a lot of water for Postgres' purposes.

Comments?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2013-10-24 Thread Josh Berkus
On 10/23/2013 08:52 PM, Alvaro Herrera wrote:
> Peter Geoghegan escribió:
> 
>> I am interested in making it store richer statistics,
>> provided we're very careful about the costs. Every time those counters
>> are incremented, a spinlock is held.
> 
> Hmm, now if we had portable atomic addition, so that we could spare the
> spinlock ...

Oh, sure, just take the *easy* way out.  ;-)


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER FREEZE

2013-10-24 Thread Josh Berkus
On 10/23/2013 09:58 PM, Amit Kapila wrote:
> I wonder why anyone would like to freeze during CLUSTER command when
> they already have separate way (VACUUM FREEZE) to achieve it, do you
> know or can think of any case where user wants to do it along with
> Cluster command?

"If I'm rewriting the table anyway, let's freeze it".

Otherwise, you have to write the same pages twice, if both CLUSTER and
FREEZE are required.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2013-10-24 Thread Andres Freund
On 2013-10-24 16:06:19 +0300, Heikki Linnakangas wrote:
> On 24.10.2013 09:03, Abhijit Menon-Sen wrote:
> >One open question is what to do about rounding up the size. It should
> >not be necessary, but for the fairly recent bug described at the link
> >in the comment (https://bugzilla.kernel.org/show_bug.cgi?id=56881). I
> >tried it without the rounding-up, and it fails on Ubuntu's 3.5.0-28
> >kernel (mmap returns EINVAL).
> 
> Let's get rid of the rounding. It's clearly a kernel bug, and it shouldn't
> be our business to add workarounds for any kernel bug out there. And the
> worst that will happen if you're running a buggy kernel version is that you
> fall back to not using huge pages (assuming huge_tlb_pages=try).

But it's a range of relatively popular kernels, that will stay around
for a good while. So I am hesitant to just not do anything about it. The
directory scanning code isn't that bad imo.

Either way:
I think we should log when we tried to use hugepages but fell back to
plain mmap, currently it's hard to see whether they are used.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-10-24 Thread Robert Haas
On Thu, Oct 24, 2013 at 11:40 AM, k...@rice.edu  wrote:
> On Thu, Oct 24, 2013 at 11:07:38AM -0400, Robert Haas wrote:
>> On Mon, Oct 21, 2013 at 11:52 PM, Fujii Masao  wrote:
>> > So, our consensus is to introduce the hooks for FPW compression so that
>> > users can freely select their own best compression algorithm?
>> > Also, probably we need to implement at least one compression contrib module
>> > using that hook, maybe it's based on pglz or snappy.
>>
>> I don't favor making this pluggable. I think we should pick snappy or
>> lz4 (or something else), put it in the tree, and use it.
>>
> Hi,
>
> My vote would be for lz4 since it has faster single thread compression
> and decompression speeds with the decompression speed being almost 2X
> snappy's decompression speed. The both are BSD licensed so that is not
> an issue. The base code for lz4 is c and it is c++ for snappy. There
> is also a HC (high-compression) varient for lz4 that pushes its compression
> rate to about the same as zlib (-1) which uses the same decompressor which
> can provide data even faster due to better compression. Some more real
> world tests would be useful, which is really where being pluggable would
> help.

Well, it's probably a good idea for us to test, during the development
cycle, which algorithm works better for WAL compression, and then use
that one.  Once we make that decision, I don't see that there are many
circumstances in which a user would care to override it.  Now if we
find that there ARE reasons for users to prefer different algorithms
in different situations, that would be a good reason to make it
configurable (or even pluggable).  But if we find that no such reasons
exist, then we're better off avoiding burdening users with the need to
configure a setting that has only one sensible value.

It seems fairly clear from previous discussions on this mailing list
that snappy and lz4 are the top contenders for the position of
"compression algorithm favored by PostgreSQL".  I am wondering,
though, whether it wouldn't be better to add support for both - say we
added both to libpgcommon, and perhaps we could consider moving pglz
there as well.  That would allow easy access to all of those
algorithms from both front-end and backend-code.  If we can make the
APIs parallel, it should very simple to modify any code we add now to
use a different algorithm than the one initially chosen if in the
future we add algorithms to or remove algorithms from the list, or if
one algorithm is shown to outperform another in some particular
context.  I think we'll do well to isolate the question of adding
support for these algorithms form the current patch or any other
particular patch that may be on the table, and FWIW, I think having
two leading contenders and adding support for both may have a variety
of advantages over crowning a single victor.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-10-24 Thread k...@rice.edu
On Thu, Oct 24, 2013 at 11:07:38AM -0400, Robert Haas wrote:
> On Mon, Oct 21, 2013 at 11:52 PM, Fujii Masao  wrote:
> > So, our consensus is to introduce the hooks for FPW compression so that
> > users can freely select their own best compression algorithm?
> > Also, probably we need to implement at least one compression contrib module
> > using that hook, maybe it's based on pglz or snappy.
> 
> I don't favor making this pluggable. I think we should pick snappy or
> lz4 (or something else), put it in the tree, and use it.
> 
Hi,

My vote would be for lz4 since it has faster single thread compression
and decompression speeds with the decompression speed being almost 2X
snappy's decompression speed. The both are BSD licensed so that is not
an issue. The base code for lz4 is c and it is c++ for snappy. There
is also a HC (high-compression) varient for lz4 that pushes its compression
rate to about the same as zlib (-1) which uses the same decompressor which
can provide data even faster due to better compression. Some more real
world tests would be useful, which is really where being pluggable would
help.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-10-24 Thread Tom Lane
Robert Haas  writes:
> On Mon, Oct 21, 2013 at 11:52 PM, Fujii Masao  wrote:
>> So, our consensus is to introduce the hooks for FPW compression so that
>> users can freely select their own best compression algorithm?
>> Also, probably we need to implement at least one compression contrib module
>> using that hook, maybe it's based on pglz or snappy.

> I don't favor making this pluggable. I think we should pick snappy or
> lz4 (or something else), put it in the tree, and use it.

I agree.  Hooks in this area are going to be a constant source of
headaches, vastly outweighing any possible benefit.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [BUGS] BUG #7873: pg_restore --clean tries to drop tables that don't exist

2013-10-24 Thread Pavel Stehule
2013/10/24 Robert Haas 

> On Tue, Oct 22, 2013 at 9:37 PM, Josh Kupershmidt 
> wrote:
> > On Mon, Oct 14, 2013 at 5:38 PM, Josh Kupershmidt 
> wrote:
> >> Also, Pavel, this patch is still listed as 'Needs Review' in the CF
> >> app, but I haven't seen a response to the concerns in my last message.
> >
> > It looks like this patch has been imported into the 2013-11 CF [1] and
> > marked "Needs Review". I looked at the version of the patch pointed to
> > in that CF entry back in July, and noted [2] several problems that
> > still seemed to be present in the patch, for which I never saw a
> > followup from Pavel.  IMO this patch should have gotten marked
> > "Returned with Feedback" pending a response from Pavel.
>
> That sounds reasonable to me.  Perhaps you should so mark it.
>

+1

Regards

Pavel


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] Re: [BUGS] BUG #7873: pg_restore --clean tries to drop tables that don't exist

2013-10-24 Thread Robert Haas
On Tue, Oct 22, 2013 at 9:37 PM, Josh Kupershmidt  wrote:
> On Mon, Oct 14, 2013 at 5:38 PM, Josh Kupershmidt  wrote:
>> Also, Pavel, this patch is still listed as 'Needs Review' in the CF
>> app, but I haven't seen a response to the concerns in my last message.
>
> It looks like this patch has been imported into the 2013-11 CF [1] and
> marked "Needs Review". I looked at the version of the patch pointed to
> in that CF entry back in July, and noted [2] several problems that
> still seemed to be present in the patch, for which I never saw a
> followup from Pavel.  IMO this patch should have gotten marked
> "Returned with Feedback" pending a response from Pavel.

That sounds reasonable to me.  Perhaps you should so mark it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2013-10-24 Thread Robert Haas
On Thu, Oct 24, 2013 at 9:06 AM, Heikki Linnakangas
 wrote:
> * the documentation should perhaps mention that the setting only has an
> effect if POSIX shared memory is used. That's the default on Linux, but we
> will try to fall back to SystemV shared memory if it fails.

This is true for dynamic shared memory, but not for the main shared
memory segment.   The main shared memory segment is always the
combination of a small, fixed-size System V shared memory chunk and a
anonymous shared memory region created by mmap(NULL, ..., MAP_SHARED).
 POSIX shared memory is not used.

(Exceptions: Anonymous shared memory isn't used on Windows, which has
its own mechanism, or when compiling with EXEC_BACKEND, when the whole
chunk is allocated as System V shared memory.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-10-24 Thread Robert Haas
On Mon, Oct 21, 2013 at 11:52 PM, Fujii Masao  wrote:
> So, our consensus is to introduce the hooks for FPW compression so that
> users can freely select their own best compression algorithm?
> Also, probably we need to implement at least one compression contrib module
> using that hook, maybe it's based on pglz or snappy.

I don't favor making this pluggable. I think we should pick snappy or
lz4 (or something else), put it in the tree, and use it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.2

2013-10-24 Thread Robert Haas
On Tue, Oct 22, 2013 at 2:13 PM, Andres Freund  wrote:
> On 2013-10-22 13:57:53 -0400, Robert Haas wrote:
>> On Tue, Oct 22, 2013 at 1:08 PM, Andres Freund  
>> wrote:
>> >> That strikes me as a flaw in the implementation rather than the idea.
>> >> You're presupposing a patch where the necessary information is
>> >> available in WAL yet you don't make use of it at the proper time.
>> >
>> > The problem is that the mapping would be somewhere *ahead* from the
>> > transaction/WAL we're currently decoding. We'd need to read ahead till
>> > we find the correct one.
>>
>> Yes, I think that's what you need to do.
>
> My problem with that is that rewrite can be gigabytes into the future.
>
> When reading forward we could either just continue reading data into the
> reorderbuffer, but delay replaying all future commits till we found the
> currently needed remap. That might have quite the additional
> storage/memory cost, but runtime complexity should be the same as normal
> decoding.
> Or we could individually read ahead for every transaction. But doing so
> for every transaction will get rather expensive (rougly O(amount_of_wal^2)).

[ Sorry it's taken me a bit of time to get back to this; other tasks
intervened, and I also just needed some time to let it settle in my
brain. ]

If you read ahead looking for a set of ctid translations from
relfilenode A to relfilenode B, and along the way you happen to
encounter a set of translations from relfilenode C to relfilenode D,
you could stash that set of translations away somewhere, so that if
the next transaction you process needs that set of mappings, it's
already computed.  With that approach, you'd never have to pre-read
the same set of WAL files more than once.

But, as I think about it more, that's not very different from your
idea of stashing the translations someplace other than WAL in the
first place.  I mean, if the read-ahead thread generates a series of
files in pg_somethingorother that contain those maps, you could have
just written the maps to that directory in the first place.  So on
further review I think we could adopt that approach.

However, I'm leery about the idea of using a relation fork for this.
I'm not sure whether that's what you had it mind, but it gives me the
willies.  First, it adds distributed overhead to the system, as
previously discussed; and second, I think the accounting may be kind
of tricky, especially in the face of multiple rewrites.  I'd be more
inclined to find a separate place to store the mappings.  Note that,
AFAICS, there's no real need for the mapping file to be
block-structured, and I believe they'll be written first (with no
readers) and subsequently only read (with no further writes) and
eventually deleted.

One possible objection to this is that it would preclude decoding on a
standby, which seems like a likely enough thing to want to do.  So
maybe it's best to WAL-log the changes to the mapping file so that the
standby can reconstruct it if needed.

> I think that'd be pretty similar to just disallowing VACUUM
> FREEZE/CLUSTER on catalog relations since effectively it'd be to
> expensive to use.

This seems unduly pessimistic to me; unless the catalogs are really
darn big, this is a mostly theoretical problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2013-10-24 Thread Peter Geoghegan
On Thu, Oct 24, 2013 at 6:54 AM, Andrew Dunstan  wrote:
> I'll be quite happy if we can get around the query text length limit. I have
> greatly increased the buffer size at quite a few clients, in one case where
> they run some pretty large auto-generated queries and have memory to burn,
> up to 40k.

I've already hacked up a prototype patch. I've heard enough complaints
about that to want to fix it once and for all.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Alvaro Herrera
Pavan Deolasee escribió:

> Yeah, I had brought up similar idea up thread. Right now wal_level is
> nicely ordered. But with this additional logic, I am not sure if we would
> need multiple new levels and also break that ordering (I don't know if its
> important). For example, one may want to set up streaming replication
> with/without this feature or hot standby with/without the feature. I don't
> have a good idea about how to capture them in wal_level. May be something
> like: minimal, archive, archive_with_this_new_feature, hot_standby and
> hot_standby_with_this_new_feature.

That's confusing.  A separate GUC sounds better.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: lob conversion functionality

2013-10-24 Thread Heikki Linnakangas

On 22.10.2013 13:55, Pavel Stehule wrote:

2013/10/21 Noah Misch

If you're prepared to change the function names and add the subset-oriented
functions, I would appreciate that.


here is patch


lobj.sgml still refer to the old names.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2013-10-24 Thread Andrew Dunstan


On 10/23/2013 07:51 PM, Peter Geoghegan wrote:

On Wed, Oct 23, 2013 at 4:46 PM, Josh Berkus  wrote:

So you're suggesting that instead of storing the aggregates as we
currently do, we store a buffer of the last N queries (in normal form)
and their stats?  And then aggregate when the user asks for it?

No, I'm not. I'm suggesting storing the query texts externally, in a
file. They usually use 1024 bytes of shared memory per entry,
regardless of how long the query text is. This would allow
pg_stat_statements to store arbitrarily large query texts, while also
giving us breathing room if we have ambitions around expanding what
pg_stat_statements can (optionally) track.

Having said that, I am still pretty sensitive to bloating pg_stat_statements.




Me too. I think min, max and stddev will have a fairly small impact, and 
give considerable bang for the buck. Not so sure about the other 
suggestions. And of course, memory impact is only half the story - CPU 
cycles spent is the other part.


I'll be quite happy if we can get around the query text length limit. I 
have greatly increased the buffer size at quite a few clients, in one 
case where they run some pretty large auto-generated queries and have 
memory to burn, up to 40k.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RULE regression test fragility?

2013-10-24 Thread Tom Lane
Andres Freund  writes:
> FWIW, I've repeatedly now thought that it'd make maintaining/updating
> patches easier if we switched that query into unaligned tuple only (\a
> \t) mode. That would remove the frequent conflicts on the row count and
> widespread changes due to changed alignment.
> Alternatively we could just wrap the query in \copy ... CSV.

Hm ... yeah, it would be a good thing if changes in one view didn't so
frequently have ripple effects to the whole output.  Not sure which
format is best for that though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER FREEZE

2013-10-24 Thread k...@rice.edu
On Thu, Oct 24, 2013 at 10:28:43AM +0530, Amit Kapila wrote:
> On Thu, Oct 24, 2013 at 4:58 AM, Thomas Munro  wrote:
> > Hi
> > I noticed that CLUSTER doesn't have a FREEZE option.  Here is a patch to add
> > that, for consistency with VACUUM.  Is it useful?
> 
> I wonder why anyone would like to freeze during CLUSTER command when
> they already have separate way (VACUUM FREEZE) to achieve it, do you
> know or can think of any case where user wants to do it along with
> Cluster command?
> 
> Anyway code side, I think you need to set both feeze_min_age as well
> as freeze_table_age, see VACUUM command in gram.y
> 
> CLUSTER opt_freeze opt_verbose qualified_name cluster_index_specification
> 
>   {
>   ClusterStmt *n = makeNode(ClusterStmt);
> - n->relation = $3;
> - n->indexname = $4;
> - n->verbose = $2;
> + n->relation = $4;
> + n->freeze_min_age = $2 ? 0 : -1;
> + n->indexname = $5;
> + n->verbose = $3;
> ..
> 
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
> 

Hi Amit,

If the FREEZE is part of the CLUSTER, you would only read/write the table
once. With a follow-up VACUUM FREEZE, you would re-read/write a second time.
I, for one, would appreciate being able to perform both in the same run. (+1)

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2013-10-24 Thread Heikki Linnakangas

On 24.10.2013 09:03, Abhijit Menon-Sen wrote:

This is a slightly reworked version of the patch submitted by Richard
Poole last month, which was based on Christian Kruse's earlier patch.


Thanks.


With huge_tlb_pages=off, this is the best result I got:

 tps = 8680.771068 (including connections establishing)
 tps = 8721.504838 (excluding connections establishing)

With huge_tlb_pages=on, this is the best result I got:

 tps = 9932.245203 (including connections establishing)
 tps = 9983.190304 (excluding connections establishing)

(Even the worst result I got in the latter case was a smidgen faster
than the best with huge_tlb_pages=off: 8796.344078 vs. 8721.504838.)


That's really impressive.


One open question is what to do about rounding up the size. It should
not be necessary, but for the fairly recent bug described at the link
in the comment (https://bugzilla.kernel.org/show_bug.cgi?id=56881). I
tried it without the rounding-up, and it fails on Ubuntu's 3.5.0-28
kernel (mmap returns EINVAL).


Let's get rid of the rounding. It's clearly a kernel bug, and it 
shouldn't be our business to add workarounds for any kernel bug out 
there. And the worst that will happen if you're running a buggy kernel 
version is that you fall back to not using huge pages (assuming 
huge_tlb_pages=try).


Other comments:

* guc.c doesn't actually need sys/mman.h for anything. Getting rid of 
the #include also lets you remove the configure test.


* the documentation should perhaps mention that the setting only has an 
effect if POSIX shared memory is used. That's the default on Linux, but 
we will try to fall back to SystemV shared memory if it fails.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Pavan Deolasee
On Thu, Oct 24, 2013 at 5:45 PM, Heikki Linnakangas  wrote:

>
>>
>  Will just have to figure out what we want the user interface to be like;
> should it be a separate guc, or somehow cram it into wal_level?
>
>
Yeah, I had brought up similar idea up thread. Right now wal_level is
nicely ordered. But with this additional logic, I am not sure if we would
need multiple new levels and also break that ordering (I don't know if its
important). For example, one may want to set up streaming replication
with/without this feature or hot standby with/without the feature. I don't
have a good idea about how to capture them in wal_level. May be something
like: minimal, archive, archive_with_this_new_feature, hot_standby and
hot_standby_with_this_new_feature.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Heikki Linnakangas

On 24.10.2013 14:15, Pavan Deolasee wrote:

On Thu, Oct 24, 2013 at 4:22 PM, Heikki Linnakangas
wrote:



To fix that, pg_rewind could always start the rewinding process from the
last checkpoint before the point that the histories diverge, instead of the
exact point of divergence.


Is that something required even if someone plans to use pg_rewind for a
cluster with checksums enabled ? I mean since only first update after
checkpoint is WAL logged, pg_rewind will break if another update happens
after standby forks.


Yes. It's broken as it is, even when checksums are enabled - good catch. 
I'll go change it to read all the WAL in the target starting from the 
last checkpoint before the point of divergence.



Or would the recovery logic apply first WAL without
looking at the page lsn ? (Sorry, may be I should read the code instead of
asking you)


WAL recovery does apply all the full-page images without looking at the 
page LSN, but that doesn't help in this case. pg_rewind copies over the 
blocks from the source server (= promoted standby) that were changed in 
the target server (= old master), after the standby's history diverged 
from it. In other words, it reverts the blocks that were changed in the 
old master, by copying them over from the promoted standby. After that, 
WAL recovery is performed, using the WAL from the promoted standby, to 
apply all the changes from the promoted standby that were not present in 
the old master. But it never replays any WAL from the old master. It 
reads it through, to construct the list of blocks that were modified, 
but it doesn't apply them.



If we do what you are suggesting, it seems like a single line patch to me.
In XLogSaveBufferForHint(), we probably need to look at this additional GUC
to decide whether or not to backup the block.


Yeah, it's trivial to add such a guc. Will just have to figure out what 
we want the user interface to be like; should it be a separate guc, or 
somehow cram it into wal_level?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Pavan Deolasee
On Thu, Oct 24, 2013 at 4:45 PM, Pavan Deolasee wrote:

> . Or would the recovery logic apply first WAL without looking at the page
> lsn ? (Sorry, may be I should read the code instead of asking you)
>
>
Never mind. I realized it has to. That's the whole purpose of backing it up
in the first place.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Pavan Deolasee
On Thu, Oct 24, 2013 at 4:22 PM, Heikki Linnakangas  wrote:

> .
>>
>
> To fix that, pg_rewind could always start the rewinding process from the
> last checkpoint before the point that the histories diverge, instead of the
> exact point of divergence.


Is that something required even if someone plans to use pg_rewind for a
cluster with checksums enabled ? I mean since only first update after
checkpoint is WAL logged, pg_rewind will break if another update happens
after standby forks. Or would the recovery logic apply first WAL without
looking at the page lsn ? (Sorry, may be I should read the code instead of
asking you)

If we do what you are suggesting, it seems like a single line patch to me.
In XLogSaveBufferForHint(), we probably need to look at this additional GUC
to decide whether or not to backup the block.

That would make the rewinding more expensive as it needs to read through a
> lot more WAL, but I think it would still be OK.


Yeah, probably you are right. Though the amount of additional work could be
significantly higher and some testing might be warranted.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Ident context leak during reloading of conf files when no ident information is present in the file

2013-10-24 Thread Heikki Linnakangas

On 24.10.2013 13:20, Haribabu kommi wrote:

There is an ident context leak which is occurs during reload of configuration 
files when
there is no ident configuration items are present in the configuration file.

In function load_ident(), New context is allocated to store the 
new_parsed_lines and deletes
the old context when parsed_ident_lines are present. If no parsed_ident_lines 
are present then
the context is not deleted.

Patch with fix is attached in the mail. please review and let me know your 
suggestions.


Thanks, committed.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Heikki Linnakangas

On 24.10.2013 13:02, Pavan Deolasee wrote:

Another difference AFAICS is that checksum feature needs the block to be
backed up only after the first time a hint bit is updated after checkpoint.
But for something like pg_rewind to work, we will need to WAL log every
hint bit update on a page. So we would want to keep it as short as possible.


To fix that, pg_rewind could always start the rewinding process from the 
last checkpoint before the point that the histories diverge, instead of 
the exact point of divergence. That would make the rewinding more 
expensive as it needs to read through a lot more WAL, but I think it 
would still be OK.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 regression

2013-10-24 Thread Thom Brown
On 5 September 2013 22:24, Bruce Momjian  wrote:
> On Mon, Aug 19, 2013 at 09:27:57PM -0400, Stephen Frost wrote:
>> * Andres Freund (and...@2ndquadrant.com) wrote:
>> > I vote for adapting the patch to additionally zero out the file via
>> > write(). In your tests that seemed to perform at least as good as the
>> > old method... It also has the advantage that we can use it a littlebit
>> > more as a testbed for possibly using it for heap extensions one day.
>> > We're pretty early in the cycle, so I am not worried about this too much...
>>
>> I dunno, I'm pretty disappointed that this doesn't actually improve
>> things.  Just following this casually, it looks like it might be some
>> kind of locking issue in the kernel that's causing it to be slower; or
>> at least some code path that isn't exercise terribly much and therefore
>> hasn't been given the love that it should.
>>
>> Definitely interested in what Ts'o says, but if we can't figure out why
>> it's slower *without* writing out the zeros, I'd say we punt on this
>> until Linux and the other OS folks improve the situation.
>
> FYI, the patch has been reverted.

Is there an updated patch available for this?  And did anyone hear from Ts'o?

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] UNION ALL on partitioned tables won't use indices.

2013-10-24 Thread Kyotaro HORIGUCHI
Hello, Sorry that it's been a while..

1. Observed symptom

As you know, UNION ALL accompanied with ORDER BY uses indexes if
available.

> uniontest=# EXPLAIN SELECT * FROM c11 UNION ALL SELECT * FROM c12 ORDER BY a;
>QUERY PLAN
> ---
>  Merge Append  (cost=0.59..10214.10 rows=20 width=16)
>Sort Key: c11.a
>->  Index Scan using ... on c11  (cost=0.29..3857.04 rows=10 width=16)
>->  Index Scan using ... on c12  (cost=0.29..3857.04 rows=10 width=16)

So do simple queries on partitioned (inheritance) tables.

> uniontest=# EXPLAIN SELECT * FROM p1 ORDER BY a;
> QUERY PLAN
> --
>  Merge Append  (cost=0.73..11392.19 rows=21 width=16)
>Sort Key: p1.a
>->  Index Scan using ... on p1  (cost=0.12..8.14 rows=1 width=44)
>->  Index Scan using ... on c11  (cost=0.29..3857.04 rows=10 width=16)
>->  Index Scan using ... on c12  (cost=0.29..3857.04 rows=10 width=16)

Nevertheless, UNION ALL on partitioned tables doesn't. This is
quite unfavourable behavior especially having LIMIT.

>uniontest=# EXPLAIN ANALYZE SELECT * FROM p1
>UNION ALL SELECT * FROM p2 ORDER BY a LIMIT 10;
>  QUERY PLAN   
>
> Limit   (actual time=182.732..182.735 rows=10 loops=1)
>   ->  Sort  (actual time=182.729..182.730 rows=10 loops=1)
> Sort Key: p1.a
> Sort Method: top-N heapsort  Memory: 25kB
> ->  Append  (actual time=0.029..108.109 rows=40 loops=1)
>   ->  Seq Scan on p1  (actual time=0.001..0.001 rows=0 loops=1)
>   ->  Seq Scan on c11  (actual time=0.027..19.074 rows=10 loops=1)
>   ->  Seq Scan on c12  (actual time=0.014..16.653 rows=10 loops=1)
>   ->  Seq Scan on p2  (actual time=0.000..0.000 rows=0 loops=1)
>   ->  Seq Scan on c21  (actual time=0.012..16.677 rows=10 loops=1)
>   ->  Seq Scan on c22  (actual time=0.012..16.794 rows=10 loops=1)
> Total runtime: 182.857 ms


2. The cause

In grouping_planner, flattern_simple_union_all creates
appendrelinfos for each subqueries then expand_inherited_tables
furthur expands the parent tables in each subqueries. Finally,
this sequence leaves following structure. Where rte[2] and [3]
are subqueries abandoned on the way pulling up rte[4] and [5].

rte[1] Subquery "SELECT*1", inh = 1
   +- appinfo[0] - rte[4] Relation "p1", inh = 1
   |   +- appinfo[2] - rte[6]  Relation "p1", inh = 0
   |   +- appinfo[3] - rte[7]  Relation "c11", inh = 0
   |   +- appinfo[4] - rte[8]  Relation "c12", inh = 0
   +- appinfo[1] - rte[5] Relation "p2", inh = 1
   +- appinfo[5] - rte[9]  Relation "p1", inh = 0
   +- appinfo[6] - rte[10] Relation "c11", inh = 0
   +- appinfo[7] - rte[11] Relation "c12", inh = 0

On the other hand, ec member finally has exprs only for varno =
1, 4 and 5, in other words, it lacks the members for the most
descendant RTEs.  This is because add_child_rel_equivalences()
always inhibits add_eq_member from registering the child_rel's
relid on EC. Conequently these grandchild relations does not find
index pathkeys for themselves.


3. Measures

I could thought up three approaches for the problem.

One is to simplly modifying here to give child flag in the
parameters of add_eq_member accordig to whether the child_rel is
inh or not. This seems to work fine although leaves two levels of
MergeAppends (PATCH-1). So the additional patch is attached to
collapse these MergeAppends (PATCH-2). This gives the same plan
as PATCH-3.

> uniontest=# explain analyze select * from p1 union all
> select * from p2 order by a limit 10;
>QUERY PLAN   
> 
>  Limit  (actual time=0.208..0.224 rows=10 loops=1)
>->  Merge Append  (actual time=0.205..0.218 rows=10 loops=1)
>  Sort Key: p1.a
>  ->  Merge Append  (actual time=0.110..0.120 rows=10 loops=1)
>Sort Key: p1.a
>->  Index Scan .. on p1  (actual time=0.006..0.006 rows=0 loops=1)
>->  Index Scan .. on c11  (actual time=0.054..0.060 rows=10 loops=1)
>->  Index Scan .. on c12  (actual time=0.046..0.046 rows=1 loops=1)
>  ->  Merge Append  (actual time=0.093..0.093 rows=1 loops=1)
>Sort Key: p2.a
>->  Index Scan .. on p2  (actual time=0.002..0.002 rows=0 loops=1)
>->  Index Scan .. on c21  (actual time=0.047..0.047 rows=1 loops=1)
>->  Index Scan .. on c22  (actual time=0.043..0.043 rows=1 loops=1)
>  Total runtime: 0.448 ms


The second is to collaps

[HACKERS] Ident context leak during reloading of conf files when no ident information is present in the file

2013-10-24 Thread Haribabu kommi
There is an ident context leak which is occurs during reload of configuration 
files when
there is no ident configuration items are present in the configuration file.

In function load_ident(), New context is allocated to store the 
new_parsed_lines and deletes
the old context when parsed_ident_lines are present. If no parsed_ident_lines 
are present then
the context is not deleted.

Patch with fix is attached in the mail. please review and let me know your 
suggestions.

Regards,
Hari babu.


ident_leak_v1.patch
Description: ident_leak_v1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Pavan Deolasee
On Mon, Oct 21, 2013 at 7:10 PM, Sawada Masahiko wrote:

>
>
> I agree with you.
> If writing FPW is not large performance degradation, it is just idea
> that we can use to write FPW in same timing as checksum enabled.
> i.g., if we support new wal_level, the system writes FPW when a simple
> SELECT updates hint bits. but checksum function is disabled.
> Thought?


I wonder if its too much for this purpose. In fact, we just need a way to
know that a block could have been written on the master which the standby
never saw. So even WAL logging just the block id should be good enough for
pg_rewind to be able to detect and later copy that block from the new
master. Having said that, I don't know if there is general advantage of WAL
logging the exact hint bit update operation for other reasons.

Another difference AFAICS is that checksum feature needs the block to be
backed up only after the first time a hint bit is updated after checkpoint.
But for something like pg_rewind to work, we will need to WAL log every
hint bit update on a page. So we would want to keep it as short as possible.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


[HACKERS] Regress tests to improve the function coverage of schemacmds and user and tablespace files

2013-10-24 Thread Haribabu kommi

Here I have added some more regression tests to improve the missing function 
coverage of
schemacmds.c, user.c and tablespace.c.

The added tests are mainly RENAME TO and OWNER TO support.

patches are attached in the mail.
please check and provide your suggestions.

Regards,
Hari babu.


schema_user_v1.patch
Description: schema_user_v1.patch


tablespace_v1.patch
Description: tablespace_v1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RULE regression test fragility?

2013-10-24 Thread Andres Freund
On 2013-10-23 20:50:30 -0400, Tom Lane wrote:
> Mike Blackwell  writes:
> > While reviewing the Network Stats Traffic patch I discovered the current
> > regression test for rules depends on the system view definitions not
> > changing:
> 
> Yes, this is standard.  We just update the expected output anytime
> somebody changes a system view.

FWIW, I've repeatedly now thought that it'd make maintaining/updating
patches easier if we switched that query into unaligned tuple only (\a
\t) mode. That would remove the frequent conflicts on the row count and
widespread changes due to changed alignment.
Alternatively we could just wrap the query in \copy ... CSV.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER FREEZE

2013-10-24 Thread Andres Freund
Hi,

On 2013-10-24 00:28:44 +0100, Thomas Munro wrote:
> I noticed that CLUSTER doesn't have a FREEZE option.  Here is a patch to
> add that, for consistency with VACUUM.  Is it useful?

I think you'd need to prevent that form from working on system catalogs
- xmin has a meaning there sometimes (e.g. indcheckxmin) at least.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER FREEZE

2013-10-24 Thread Thom Brown
On 24 October 2013 05:58, Amit Kapila  wrote:
> On Thu, Oct 24, 2013 at 4:58 AM, Thomas Munro  wrote:
>> Hi
>> I noticed that CLUSTER doesn't have a FREEZE option.  Here is a patch to add
>> that, for consistency with VACUUM.  Is it useful?
>
> I wonder why anyone would like to freeze during CLUSTER command when
> they already have separate way (VACUUM FREEZE) to achieve it, do you
> know or can think of any case where user wants to do it along with
> Cluster command?

Well I can't speak for Thomas, but if you're doing such a heavy-duty
operation on a table that involves an exclusive lock, you may as well
freeze it.  And in the case of CLUSTER, I imagine that in most
instances you'd be confident the table isn't going to undergo much
change (or at least not quickly).  CLUSTER FREEZE would mean not
having to run a separate VACUUM FREEZE after, and on a 10GB table,
that's a big deal as it means not having to rescan the whole table
again to freeze every row.

Note that REFRESH MATERIALIZED VIEW freezes rows too as it's already
writing all the data from scratch again, and has an exclusive lock.
(this isn't the case with the CONCURRENTLY option obviously)

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers