Re: [HACKERS] WITHIN GROUP patch

2013-10-10 Thread Pavel Stehule
2013/10/11 Andrew Gierth 

> > "Pavel" == Pavel Stehule  writes:
>
>  >> I found so following error message is not too friendly (mainly because
>  >> this functionality will be new)
>  >>
>  >> postgres=# select dense_rank(3,3,2) within group (order by num desc,
> odd)
>  >> from test4;
>  >> ERROR:  Incorrect number of arguments for hypothetical set function
>  >> LINE 1: select dense_rank(3,3,2) within group (order by num desc, od...
>  >> ^
>  >>
>  >> Probably some hint should be there?
>
> Hm, yeah.
>
> Anyone have ideas for better wording for the original error message,
> or a DETAIL or HINT addition? The key point here is that the number of
> _hypothetical_ arguments and the number of ordered columns must match,
> but we don't currently exclude the possibility that a user-defined
> hypothetical set function might have additional non-hypothetical args.
>
> The first alternative that springs to mind is:
>
> ERROR: Incorrect number of arguments for hypothetical set function
> DETAIL: Number of hypothetical arguments (3) must equal number of ordered
> columns (2)
>
> +1

Pavel

> --
> Andrew (irc:RhodiumToad)
>


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Josh Berkus
Robert,

>> The counter-proposal to "auto-tuning" is just to raise the default for
>> work_mem to 4MB or 8MB.  Given that Bruce's current formula sets it at
>> 6MB for a server with 8GB RAM, I don't really see the benefit of going
>> to a whole lot of code and formulas in order to end up at a figure only
>> incrementally different from a new static default.
> 
> Agreed.  But what do you think the value SHOULD be on such a system?

That's the problem: It Depends.

One thing in particular which is an issue with calculating against
max_connections is that users who don't need 100 connections seldom
*reduce* max_connections.  So that developer laptop which only needs 3
connections is still going to have a max_connections of 100, just like
the DW server where m_c should probably be 30.

> I guess the point I'm making here is that raising the default value is
> not mutually exclusive with auto-tuning.  We could quadruple the
> current defaults for work_mem and maintenance_work_mem and be better
> off right now, today.  Then, we could improve things further in the
> future if and when we agree on an approach to auto-tuning.  And people
> who don't use the auto-tuning will still have a better default.

Seems fine to me.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: space reserved for WAL record does not match what was written: panic on windows

2013-10-10 Thread Noah Misch
On Thu, Oct 10, 2013 at 03:23:30PM +0200, Andres Freund wrote:
> On 2013-10-10 08:59:47 -0400, Robert Haas wrote:
> > On Tue, Oct 8, 2013 at 6:24 PM, Andres Freund  
> > wrote:
> > > Do you have a better alternative? Making the computation unconditionally
> > > 64bit will have a runtime overhead and adding a StaticAssert in the
> > > existing macro doesn't work because we use it in array sizes where gcc
> > > balks.
> > > We could try using inline functions, but that's not going to be pretty
> > > either.
> > >
> > > I don't really see that many further usecases that will align 64bit
> > > values on 32bit platforms, so I think we're ok for now.
> > 
> > I'd be inclined to make the computation unconditionally 64-bit.  I
> > doubt the speed penalty is enough to worry about, and I think we're
> > going to have more and more cases where optimizing for 32-bit
> > platforms is just not the right decision.
> 
> MAXALIGN is used in several of PG's hottest functions in many
> scenarios. att_align_nominal is used in slot_deform_tuple,
> heap_deform_tuple, nocachegetattr, etc. So I don't think that's viable
> yet. At least not with much more benefit than this...

Agreed.  Besides performance, aligning a wider-than-pointer value is an
unusual need; authors should think thrice before doing that.  I might have
even defined the MAXALIGN64 macro in xlog.c rather than a core header.

Incidentally, why does MAXALIGN64 use unsigned math while MAXALIGN uses signed
math?

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PoC] pgstattuple2: block sampling to reduce physical read

2013-10-10 Thread Mark Kirkwood

On 11/10/13 17:33, Jaime Casanova wrote:

On Thu, Oct 10, 2013 at 5:32 PM, Mark Kirkwood
 wrote:

Quietly replying to myself - looking at the code the sampler does 3000
random page reads...

FWIW, something that bothers me is that there is 3000 random page
reads... i mean, why 3000? how do you get that number as absolute for
good accuracy in every relation? why not a percentage, maybe an
argument to the function?


Right,

Looking at http://en.wikipedia.org/wiki/Sample_size_determination maybe 
it is not such a bad setting - tho 400 or 1000 seem to be good magic 
numbers too (if we are gonna punt on single number that is).


Perhaps it should reuse (some of) the code from acquire_sample_rows in 
src/commands/analyze.c (we can't use exactly the same logic, as we need 
to keep block data together in this case).


Cheers

Mark




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PoC] pgstattuple2: block sampling to reduce physical read

2013-10-10 Thread Mark Kirkwood
On 11/10/13 17:08, Satoshi Nagayasu wrote:
> (2013/10/11 7:32), Mark Kirkwood wrote:
>> On 11/10/13 11:09, Mark Kirkwood wrote:
>>> On 16/09/13 16:20, Satoshi Nagayasu wrote:
 (2013/09/15 11:07), Peter Eisentraut wrote:
> On Sat, 2013-09-14 at 16:18 +0900, Satoshi Nagayasu wrote:
>> I'm looking forward to seeing more feedback on this approach,
>> in terms of design and performance improvement.
>> So, I have submitted this for the next CF.
> Your patch fails to build:
>
> pgstattuple.c: In function ‘pgstat_heap_sample’:
> pgstattuple.c:737:13: error: ‘SnapshotNow’ undeclared (first use in
> this function)
> pgstattuple.c:737:13: note: each undeclared identifier is reported
> only once for each function it appears in
 Thanks for checking. Fixed to eliminate SnapshotNow.

>>> This seems like a cool idea! I took a quick look, and initally
>>> replicated the sort of improvement you saw:
>>>
>>>
>>> bench=# explain analyze select * from pgstattuple('pgbench_accounts');
>>> QUERY PLAN
>>>
>>> 
>>> Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual
>>> time=786.368..786.369 rows=1 loops=1)
>>> Total runtime: 786.384 ms
>>> (2 rows)
>>>
>>> bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
>>> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
>>> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
>>> QUERY PLAN
>>>
>>> 
>>> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
>>> time=12.004..12.005 rows=1 loops=1)
>>> Total runtime: 12.019 ms
>>> (2 rows)
>>>
>>>
>>>
>>> I wondered what sort of difference eliminating caching would make:
>>>
>>> $ sudo sysctl -w vm.drop_caches=3
>>>
>>> Repeating the above queries:
>>>
>>>
>>> bench=# explain analyze select * from pgstattuple('pgbench_accounts');
>>> QUERY PLAN
>>>
>>> 
>>> Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual
>>> time=9503.774..9503.776 rows=1 loops=1)
>>> Total runtime: 9504.523 ms
>>> (2 rows)
>>>
>>> bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
>>> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
>>> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
>>> QUERY PLAN
>>>
>>> 
>>> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
>>> time=12330.630..12330.631 rows=1 loops=1)
>>> Total runtime: 12331.353 ms
>>> (2 rows)
>>>
>>>
>>> So the sampling code seems *slower* when the cache is completely cold -
>>> is that expected? (I have not looked at how the code works yet - I'll
>>> dive in later if I get a chance)!
> Thanks for testing that. It would be very helpful to improve the
> performance.
>
>> Quietly replying to myself - looking at the code the sampler does 3000
>> random page reads... I guess this is slower than 163935 (number of pages
>> in pgbench_accounts) sequential page reads thanks to os readahead on my
>> type of disk (WD Velociraptor). Tweaking the number of random reads (i.e
>> the sample size) down helps - but obviously that can impact estimation
>> accuracy.
>>
>> Thinking about this a bit more, I guess the elapsed runtime is not the
>> *only* theng to consider - the sampling code will cause way less
>> disruption to the os page cache (3000 pages vs possibly lots more than
>> 3000 for reading an entire ralation).
>>
>> Thoughts?
> I think it could be improved by sorting sample block numbers
> *before* physical block reads in order to eliminate random access
> on the disk.
>
> pseudo code:
> --
> for (i=0 ; i {
> sample_block[i] = random();
> }
>
> qsort(sample_block);
>
> for (i=0 ; i {
> buf = ReadBuffer(rel, sample_block[i]);
>
> do_some_stats_stuff(buf);
> }
> --
>
> I guess it would be helpful for reducing random access thing.
>
> Any comments?

Ah yes - that's a good idea (rough patch to your patch attached)!

bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
QUERY PLAN


Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
time=9968.318..9968.319 rows=1 loops=1)
Total runtime: 9968.443 ms
(2 rows)

It would appear that we are now not any worse than a complete sampling...

Cheers

Mark

*** pgstattuple.c.orig	2013-10-11 17:46:00.592666316 +1300
--- pgstattuple.c	2013-10-11 17:46:08.616450284 +1300
***
*** 100,105 
--- 100,123 
     uint64 *tuple_count, uin

Re: [HACKERS] Patch: FORCE_NULL option for copy COPY in CSV mode

2013-10-10 Thread Amit Kapila
On Thu, Oct 10, 2013 at 6:45 PM, Andrew Dunstan  wrote:
>
> On 10/09/2013 11:47 PM, Amit Kapila wrote:
>>
>>
>> One of the advantage, I could see using "NULL For .." syntax is
>> that already we have one syntax with which user can specify what
>> strings can be replaced with NULL, now just to handle quoted empty
>> string why to add different syntax.
>>
>> "FORCE_NULL" has advantage that it can be used for some columns rather
>> than all columns, but I think for that existing syntax can also be
>> modified to support it.
>>
>>
>>
>
>
> I think it's badly designed  on its face. We don't need and shouldn't
> provide a facility for different NULL markers. A general facility for that
> would be an ugly an quite pointless addition to code complexity. What we
> need is simply a way of altering one specific behaviour, namely the
> treatment of quoted NULL markers. We should not do that by allowing munging
> the NULL marker per column,

   I was thinking it to similar in some sense with "insert into tbl"
statement. For example
   Create table tbl (c1 int, c2 int, c3 int, c4 int)
   insert into tbl (col2) values(1);

   Here after table name, user can specify column names for which he
wants to provide specific values.

> but by a syntactical mechanism that directly
> addresses the change in behaviour. If you don't like "FORCE NULL" then let's
> pick something else, but not this ugly and unnecessary "NULL FOR" gadget.

   If you don't like idea of one generic syntax for NULLs in COPY
command, then "FORCE_QUOTED_NULL" or "QUOTED_NULL" will make more
sense as compare to FORCE_NULL.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PoC] pgstattuple2: block sampling to reduce physical read

2013-10-10 Thread Jaime Casanova
On Thu, Oct 10, 2013 at 5:32 PM, Mark Kirkwood
 wrote:
>
> Quietly replying to myself - looking at the code the sampler does 3000
> random page reads...

FWIW, something that bothers me is that there is 3000 random page
reads... i mean, why 3000? how do you get that number as absolute for
good accuracy in every relation? why not a percentage, maybe an
argument to the function?

also the name pgstattuple2, doesn't convince me... maybe you can use
pgstattuple() if you use a second argument (percentage of the sample)
to overload the function

-- 
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for reserved connections for replication users

2013-10-10 Thread Amit Kapila
On Thu, Oct 10, 2013 at 10:06 PM, Gibheer  wrote:
> On Thu, 10 Oct 2013 09:55:24 +0530
> Amit Kapila  wrote:
>
>> On Thu, Oct 10, 2013 at 3:17 AM, Gibheer 
>> wrote:
>> > On Mon, 7 Oct 2013 11:39:55 +0530
>> > Amit Kapila  wrote:
>> >> Robert Haas wrote:
>> >> On Mon, Aug 5, 2013 at 2:04 AM, Andres Freund
>> >>  wrote:
>> >> >>> Hmm.  It seems like this match is making MaxConnections no
>> >> >>> longer mean the maximum number of connections, but rather the
>> >> >>> maximum number of non-replication connections.  I don't think
>> >> >>> I support that definitional change, and I'm kinda surprised if
>> >> >>> this is sufficient to implement it anyway (e.g. see
>> >> >>> InitProcGlobal()).
>> >> >
>> >> >> I don't think the implementation is correct, but why don't you
>> >> >> like the definitional change? The set of things you can do from
>> >> >> replication connections are completely different from a normal
>> >> >> connection. So using separate "pools" for them seems to make
>> >> >> sense. That they end up allocating similar internal data seems
>> >> >> to be an implementation detail to me.
>> >>
>> >> > Because replication connections are still "connections".  If I
>> >> > tell the system I want to allow 100 connections to the server,
>> >> > it should allow 100 connections, not 110 or 95 or any other
>> >> > number.
>> >>
>> >> I think that to reserve connections for replication, mechanism
>> >> similar to superuser_reserved_connections be used rather than auto
>> >> vacuum workers or background workers.
>> >> This won't change the definition of MaxConnections. Another thing
>> >> is that rather than introducing new parameter for replication
>> >> reserved connections, it is better to use max_wal_senders as it
>> >> can serve the purpose.
>> >>
>> >> Review for replication_reserved_connections-v2.patch, considering
>> >> we are going to use mechanism similar to
>> >> superuser_reserved_connections and won't allow definition of
>> >> MaxConnections to change.
>> >>
>> >> 1. /* the extra unit accounts for the autovacuum launcher */
>> >>   MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
>> >> - + max_worker_processes;
>> >> + + max_worker_processes + max_wal_senders;
>> >>
>> >> Above changes are not required.
>> >>
>> >> 2.
>> >> + if ((!am_superuser && !am_walsender) &&
>> >>   ReservedBackends > 0 &&
>> >>   !HaveNFreeProcs(ReservedBackends))
>> >>
>> >> Change the check as you have in patch-1 for
>> >> ReserveReplicationConnections
>> >>
>> >> if (!am_superuser &&
>> >> (max_wal_senders > 0 || ReservedBackends > 0) &&
>> >> !HaveNFreeProcs(max_wal_senders + ReservedBackends))
>> >> ereport(FATAL,
>> >> (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
>> >> errmsg("remaining connection slots are reserved for replication and
>> >> superuser connections")));
>> >>
>> >> 3. In guc.c, change description of max_wal_senders similar to
>> >> superuser_reserved_connections at below place:
>> >>{"max_wal_senders", PGC_POSTMASTER, REPLICATION_SENDING,
>> >> gettext_noop("Sets the maximum number of simultaneously running WAL
>> >> sender processes."),
>> >>
>> >> 4. With this approach, there is no need to change
>> >> InitProcGlobal(), as it is used to keep track bgworkerFreeProcs
>> >> and autovacFreeProcs, for others it use freeProcs.
>> >>
>> >> 5. Below description in config.sgml needs to be changed for
>> >> superuser_reserved_connections to include the effect of
>> >> max_wal_enders in reserved connections.
>> >> Whenever the number of active concurrent connections is at least
>> >> max_connections minus superuser_reserved_connections, new
>> >> connections will be accepted only for superusers, and no new
>> >> replication connections will be accepted.
>> >>
>> >> 6. Also similar description should be added to max_wal_senders
>> >> configuration parameter.
>> >>
>> >> 7. Below comment needs to be updated, as it assumes only superuser
>> >> reserved connections as part of MaxConnections limit.
>> >>/*
>> >>  * ReservedBackends is the number of backends reserved for
>> >> superuser use.
>> >>  * This number is taken out of the pool size given by MaxBackends
>> >> so
>> >>  * number of backend slots available to non-superusers is
>> >>  * (MaxBackends - ReservedBackends).  Note what this really means
>> >> is
>> >>  * "if there are <= ReservedBackends connections available, only
>> >> superusers
>> >>  * can make new connections" --- pre-existing superuser connections
>> >> don't
>> >>  * count against the limit.
>> >>  */
>> >> int ReservedBackends;
>> >>
>> >> 8. Also we can add comment on top of definition for max_wal_senders
>> >> similar to ReservedBackends.
>> >>
>> >>
>> >> With Regards,
>> >> Amit Kapila.
>> >> EnterpriseDB: http://www.enterprisedb.com
>> >>
>> >
>> > Hi,
>> >
>> > I took the time and reworked the patch with the feedback till now.
>> > Thank you very much Amit!
>>
>>Is there any specific reason why you moved this patch to next
>> CommitFest?
>>
>> With Regards,
>> Ami

Re: [HACKERS] [PoC] pgstattuple2: block sampling to reduce physical read

2013-10-10 Thread Satoshi Nagayasu
(2013/10/11 7:32), Mark Kirkwood wrote:
> On 11/10/13 11:09, Mark Kirkwood wrote:
>> On 16/09/13 16:20, Satoshi Nagayasu wrote:
>>> (2013/09/15 11:07), Peter Eisentraut wrote:
 On Sat, 2013-09-14 at 16:18 +0900, Satoshi Nagayasu wrote:
> I'm looking forward to seeing more feedback on this approach,
> in terms of design and performance improvement.
> So, I have submitted this for the next CF.
 Your patch fails to build:

 pgstattuple.c: In function ‘pgstat_heap_sample’:
 pgstattuple.c:737:13: error: ‘SnapshotNow’ undeclared (first use in
 this function)
 pgstattuple.c:737:13: note: each undeclared identifier is reported
 only once for each function it appears in
>>> Thanks for checking. Fixed to eliminate SnapshotNow.
>>>
>> This seems like a cool idea! I took a quick look, and initally
>> replicated the sort of improvement you saw:
>>
>>
>> bench=# explain analyze select * from pgstattuple('pgbench_accounts');
>> QUERY PLAN
>>
>> 
>> Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual
>> time=786.368..786.369 rows=1 loops=1)
>> Total runtime: 786.384 ms
>> (2 rows)
>>
>> bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
>> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
>> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
>> QUERY PLAN
>>
>> 
>> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
>> time=12.004..12.005 rows=1 loops=1)
>> Total runtime: 12.019 ms
>> (2 rows)
>>
>>
>>
>> I wondered what sort of difference eliminating caching would make:
>>
>> $ sudo sysctl -w vm.drop_caches=3
>>
>> Repeating the above queries:
>>
>>
>> bench=# explain analyze select * from pgstattuple('pgbench_accounts');
>> QUERY PLAN
>>
>> 
>> Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual
>> time=9503.774..9503.776 rows=1 loops=1)
>> Total runtime: 9504.523 ms
>> (2 rows)
>>
>> bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
>> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
>> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
>> QUERY PLAN
>>
>> 
>> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
>> time=12330.630..12330.631 rows=1 loops=1)
>> Total runtime: 12331.353 ms
>> (2 rows)
>>
>>
>> So the sampling code seems *slower* when the cache is completely cold -
>> is that expected? (I have not looked at how the code works yet - I'll
>> dive in later if I get a chance)!

Thanks for testing that. It would be very helpful to improve the
performance.

> Quietly replying to myself - looking at the code the sampler does 3000
> random page reads... I guess this is slower than 163935 (number of pages
> in pgbench_accounts) sequential page reads thanks to os readahead on my
> type of disk (WD Velociraptor). Tweaking the number of random reads (i.e
> the sample size) down helps - but obviously that can impact estimation
> accuracy.
> 
> Thinking about this a bit more, I guess the elapsed runtime is not the
> *only* theng to consider - the sampling code will cause way less
> disruption to the os page cache (3000 pages vs possibly lots more than
> 3000 for reading an entire ralation).
> 
> Thoughts?

I think it could be improved by sorting sample block numbers
*before* physical block reads in order to eliminate random access
on the disk.

pseudo code:
--
for (i=0 ; i
Uptime Technologies, LLC. http://www.uptime.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-10-10 Thread Amit Kapila
On Fri, Oct 11, 2013 at 5:05 AM, Andres Freund  wrote:
> Hi,
> On 2013-10-11 03:44:01 +0900, Fujii Masao wrote:
>> I'm afraid that the patch has only limited effects in WAL reduction and
>> performance improvement unless the database contains highly-compressible
>> data like large blank characters column. It really depends on the contents
>> of the database. So, obviously FPW compression should not be the default.
>> Maybe we can treat it as just tuning knob.
>
>
> Have you tried using lz4 (or snappy) instead of pglz? There's a patch
> adding it to pg in
> http://archives.postgresql.org/message-id/20130621000900.GA12425%40alap2.anarazel.de
>
> If this really is only a benefit in scenarios with lots of such data, I
> have to say I have my doubts about the benefits of the patch.

I think it will be difficult to prove by using any compression
algorithm, that it compresses in most of the scenario's.
In many cases it can so happen that the WAL will also not be reduced
and tps can also come down if the data is non-compressible, because
any compression algorithm will have to try to compress the data and it
will burn some cpu for that, which inturn will reduce tps.

As this patch is giving a knob to user to turn compression on/off, so
users can decide if they want such benefit.
Now some users can say that they have no idea, how or what kind of
data will be there in their databases, so such kind of users should
not use this option, but on the other side some users know that they
have similar pattern of data, so they can get benefit out of such
optimisations. For example in telecom industry, i have seen that they
have lot of data as CDR's (call data records) in their HLR databases
for which the data records will be different but of same pattern.

Being said above, I think both this patch and my patch "WAL reduction
for Update" (https://commitfest.postgresql.org/action/patch_view?id=1209)
are using same technique for WAL compression and can lead to similar
consequences in different ways.
So I suggest to have unified method to enable WAL Compression for both
the patches.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-10-10 Thread Fujii Masao
On Fri, Oct 11, 2013 at 8:35 AM, Andres Freund  wrote:
> Hi,
> On 2013-10-11 03:44:01 +0900, Fujii Masao wrote:
>> I'm afraid that the patch has only limited effects in WAL reduction and
>> performance improvement unless the database contains highly-compressible
>> data like large blank characters column. It really depends on the contents
>> of the database. So, obviously FPW compression should not be the default.
>> Maybe we can treat it as just tuning knob.
>
>
> Have you tried using lz4 (or snappy) instead of pglz? There's a patch
> adding it to pg in
> http://archives.postgresql.org/message-id/20130621000900.GA12425%40alap2.anarazel.de

Yeah, it's worth checking them! Will do that.

> If this really is only a benefit in scenarios with lots of such data, I
> have to say I have my doubts about the benefits of the patch.

Yep, maybe the patch needs to be redesigned. Currently in the patch
compression is performed per FPW, i.e., the size of data to compress
is just 8KB. If we can increase the size of data to compress, we might
be able to improve the compression ratio. For example, by storing
all outstanding WAL data temporarily in local buffer, compressing them,
and then storing the compressed WAL data to WAL buffers.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-10-10 Thread Fujii Masao
On Fri, Oct 11, 2013 at 3:44 AM, Fujii Masao  wrote:
> On Fri, Oct 11, 2013 at 1:20 AM, Dimitri Fontaine
>  wrote:
>> Hi,
>>
>> I did a partial review of this patch, wherein I focused on the patch and
>> the code itself, as I saw other contributors already did some testing on
>> it, so that we know it applies cleanly and work to some good extend.
>
> Thanks a lot!
>
>> In full_page_writes_str() why are you returning "unrecognized" rather
>> than doing an ELOG(ERROR, …) for this unexpected situation?
>
> It's because the similar functions 'wal_level_str' and 'dbState' also return
> 'unrecognized' in the unexpected situation. I just implemented
> full_page_writes_str()
> in the same manner.
>
> If we do an elog(ERROR) in that case, pg_xlogdump would fail to dump
> the 'broken' (i.e., unrecognized fpw is set) WAL file. I think that some
> users want to use pg_xlogdump to investigate the broken WAL file, so
> doing an elog(ERROR) seems not good to me.
>
>> The code switches to compression (or trying to) when the following
>> condition is met:
>>
>> +   if (fpw <= FULL_PAGE_WRITES_COMPRESS)
>> +   {
>> +   rdt->data = CompressBackupBlock(page, BLCKSZ - 
>> bkpb->hole_length, &(rdt->len));
>>
>> We have
>>
>> + typedef enum FullPageWritesLevel
>> + {
>> +   FULL_PAGE_WRITES_OFF = 0,
>> +   FULL_PAGE_WRITES_COMPRESS,
>> +   FULL_PAGE_WRITES_ON
>> + } FullPageWritesLevel;
>>
>> + #define FullPageWritesIsNeeded(fpw)   (fpw >= FULL_PAGE_WRITES_COMPRESS)
>>
>> I don't much like using the <= test against and ENUM and I'm not sure I
>> understand the intention you have here. It somehow looks like a typo and
>> disagrees with the macro.
>
> I thought that FPW should be compressed only when full_page_writes is
> set to 'compress' or 'off'. That is, 'off' implies a compression. When it's 
> set
> to 'off', FPW is basically not generated, so there is no need to call
> CompressBackupBlock() in that case. But only during online base backup,
> FPW is forcibly generated even when it's set to 'off'. So I used the check
> "fpw <= FULL_PAGE_WRITES_COMPRESS" there.
>
>> What about using the FullPageWritesIsNeeded
>> macro, and maybe rewriting the macro as
>>
>> #define FullPageWritesIsNeeded(fpw) \
>>(fpw == FULL_PAGE_WRITES_COMPRESS || fpw == FULL_PAGE_WRITES_ON)
>
> I'm OK to change the macro so that the <= test is not used.
>
>> Also, having "on" imply "compress" is a little funny to me. Maybe we
>> should just finish our testing and be happy to always compress the full
>> page writes. What would the downside be exactly (on buzy IO system
>> writing less data even if needing more CPU will be the right trade-off).
>
> "on" doesn't imply "compress". When full_page_writes is set to "on",
> FPW is not compressed at all.
>
>> I like that you're checking the savings of the compressed data with
>> respect to the uncompressed data and cancel the compression if there's
>> no gain. I wonder if your test accounts for enough padding and headers
>> though given the results we saw in other tests made in this thread.
>
> I'm afraid that the patch has only limited effects in WAL reduction and
> performance improvement unless the database contains highly-compressible
> data like large blank characters column. It really depends on the contents
> of the database. So, obviously FPW compression should not be the default.
> Maybe we can treat it as just tuning knob.
>
>> Why do we have both the static function full_page_writes_str() and the
>> macro FullPageWritesStr, with two different implementations issuing
>> either "true" and "false" or "on" and "off"?
>
> First I was thinking to use "on" and "off" because they are often used
> as the setting value of boolean GUC. But unfortunately the existing
> pg_xlogdump uses "true" and "false" to show the value of full_page_writes
> in WAL. To avoid breaking the backward compatibility, I implmented
> the "true/false" version of function. I'm really not sure how many people
> want such a compatibility of pg_xlogdump, though.
>
>> !   unsignedhole_offset:15, /* number of bytes before "hole" */
>> !   flags:2,/* state of a backup 
>> block, see below */
>> !   hole_length:15; /* number of bytes in "hole" 
>> */
>>
>> I don't understand that. I wanted to use that patch as a leverage to
>> smoothly discover the internals of our WAL system but won't have the
>> time to do that here.
>
> We need the flag indicating whether each FPW is compressed or not.
> If no such a flag exists in WAL, the standby cannot determine whether
> it should decompress each FPW or not, and then cannot replay
> the WAL containing FPW properly. That is, I just used a 'space' in
> the header of FPW to have such a flag.
>
>> That said, I don't even know that C syntax.
>
> The struct 'ItemIdData' uses the same C syntax.
>
>> + #define BKPBLOCK_UNCOMPRESSED 0   /* uncompressed */
>> + #define BKP

Re: [HACKERS] Bugfix and new feature for PGXS

2013-10-10 Thread Andrew Dunstan


On 10/10/2013 09:35 PM, Peter Eisentraut wrote:

On Tue, 2013-10-08 at 10:04 -0400, Andrew Dunstan wrote:

On 10/07/2013 08:47 PM, Peter Eisentraut wrote:

I suspect this line

submake-libpq: $(libdir)/libpq.so ;

will cause problems on platforms with a different extension (e.g. OS X).


suggested fix is below.

Hmm, this would duplicate information about shared library naming in a
place outside of Makefile.shlib.  That doesn't look right.





Please suggest an alternative.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugfix and new feature for PGXS

2013-10-10 Thread Andrew Dunstan


On 10/10/2013 09:37 PM, Peter Eisentraut wrote:

On Mon, 2013-10-07 at 22:00 -0400, Andrew Dunstan wrote:

The code has been sitting in HEAD for several months, and I
committed on the back branches because it was wanted.

New features normally go through a full development cycle including
extensive beta testing, especially when they contain changes to external
interfaces.  The fact that the patch author wanted his feature released
immediately (of course) doesn't warrant a free pass in this case, IMO.






Perhaps you should have stated your objection when this was being 
discussed, and saved me some time.


I frankly think we can be a bit more tolerant about build system 
features than we would be about the actual software it builds.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-10-10 Thread Andrew Gierth
> "Pavel" == Pavel Stehule  writes:

 >> I found so following error message is not too friendly (mainly because
 >> this functionality will be new)
 >> 
 >> postgres=# select dense_rank(3,3,2) within group (order by num desc, odd)
 >> from test4;
 >> ERROR:  Incorrect number of arguments for hypothetical set function
 >> LINE 1: select dense_rank(3,3,2) within group (order by num desc, od...
 >> ^
 >> 
 >> Probably some hint should be there?

Hm, yeah.

Anyone have ideas for better wording for the original error message,
or a DETAIL or HINT addition? The key point here is that the number of
_hypothetical_ arguments and the number of ordered columns must match,
but we don't currently exclude the possibility that a user-defined
hypothetical set function might have additional non-hypothetical args.

The first alternative that springs to mind is:

ERROR: Incorrect number of arguments for hypothetical set function
DETAIL: Number of hypothetical arguments (3) must equal number of ordered 
columns (2)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Replace duplicate_oids with Perl implementation

2013-10-10 Thread Robert Haas
On Thu, Oct 10, 2013 at 8:23 PM, Peter Eisentraut  wrote:
> Replace duplicate_oids with Perl implementation
>
> It is more portable, more robust, and more readable.

smilodon seems unhappy with this:

cd ../../../src/include/catalog && ./duplicate_oids
sh: ./duplicate_oids: not found

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ENABLE/DISABLE CONSTRAINT NAME

2013-10-10 Thread wangshuo

On 2013-10-10 02:10, Robert Haas wrote:


I agree with these concerns, as well as those raised by Tom Lane and
Fabien COELHO, and I think they indicate that we shouldn't accept 
this

patch.  So I'm marking this as Rejected.



On 2013-10-11 06:48, Jim Nasby wrote:

I see a use case for disabling FKs and CHECKS but not PKs or UNIQUE 
constraints: FKs and CHECKS don't depend on additional state 
information (namely an index), so >it's easy to just disable them 
temporarily and then re-enable them. The same isn't true about a PK or 
UNIQUE constraint.


Of course we could decide to do something more complex to handle 
disabling PK/UNIQUE... though at that point it'd be better to just 
allow temporarily disabling >any index. But I think there's an argument 
to be made for that being beyond the scope of disabling "simple" 
constraints... it's a pretty high bar to set that we ?>won't accept a 
patch that disables simple constraints but not those involving indexes.


Thanks for your reply.
I found my patch's weakness.I think the DISABLE/ENABLE patch is 
necessary.

I will pack a new patch for all the constraints to commit.
Thanks again.

 Yours,
 Wang Shuo
 HighGo Software Co.,Ltd.
 October 11, 2013


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugfix and new feature for PGXS

2013-10-10 Thread Peter Eisentraut
On Mon, 2013-10-07 at 22:00 -0400, Andrew Dunstan wrote:
> The code has been sitting in HEAD for several months, and I 
> committed on the back branches because it was wanted. 

New features normally go through a full development cycle including
extensive beta testing, especially when they contain changes to external
interfaces.  The fact that the patch author wanted his feature released
immediately (of course) doesn't warrant a free pass in this case, IMO.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugfix and new feature for PGXS

2013-10-10 Thread Peter Eisentraut
On Tue, 2013-10-08 at 10:04 -0400, Andrew Dunstan wrote:
> On 10/07/2013 08:47 PM, Peter Eisentraut wrote:
> >
> > I suspect this line
> >
> > submake-libpq: $(libdir)/libpq.so ;
> >
> > will cause problems on platforms with a different extension (e.g. OS X).
> 
> 
> suggested fix is below.

Hmm, this would duplicate information about shared library naming in a
place outside of Makefile.shlib.  That doesn't look right.


> diff --git a/src/Makefile.global.in b/src/Makefile.global.in
> index bb732bb..b562378 100644
> --- a/src/Makefile.global.in
> +++ b/src/Makefile.global.in
> @@ -422,7 +422,11 @@ ifndef PGXS
>   submake-libpq:
>  $(MAKE) -C $(libpq_builddir) all
>   else
> -submake-libpq: $(libdir)/libpq.so ;
> +ifneq ($(PORTNAME),cygwin)
> +submake-libpq: $(libdir)/libpq$(DLSUFFIX) ;
> +else
> +submake-libpq: $(libdir)/cygpq$(DLSUFFIX) ;
> +endif
>   endif
> 
>   ifndef PGXS
> 
> 
> 





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-10 Thread Peter Eisentraut
You linked to this email from the commitfest entry, but there is no
patch here.  You probably meant a different email.  Check please.


On Tue, 2013-10-08 at 21:48 +0300, Heikki Linnakangas wrote:
> On 04.10.2013 14:13, Alexander Korotkov wrote:
> > On Fri, Oct 4, 2013 at 12:28 PM, Heikki Linnakangas >> wrote:
> >
> >> In the attached patch, I in fact already did that for data leaf pages, but
> >> didn't change the format of non-leaf pages yet. If we want to support
> >> pg_upgrade, we might want to refrain from changing the non-leaf format.
> >
> > In GinDataLeafPageGetPostingList* you use sizeof(ItemPointerData) without
> > MAXALIGN. Is it an error or you especially use 2 extra bytes on leaf page?
> 
> I didn't even think of it. Now that I do think of it, I don't see a 
> reason to use MAXALIGN there. PostingItems only require 2-byte 
> alignment. It's a bit fragile and underdocumented though. It probably 
> would be good to have a struct to represent that layout. Something like:
> 
> struct
> {
>ItemPointerData rightBound;
>PostingItem postingItems[1]; /* variable length array */
> } PostingItemPageContent;
> 
> And use that struct in the macros.
> 
> Then again, we do currently use MAXALIGN there, so if we want to avoid 
> changing the on-disk format, we have to keep it...
> 
> - Heikki
> 
> 





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Robert Haas
On Thu, Oct 10, 2013 at 6:27 PM, Josh Berkus  wrote:
>> More generally, Josh has made repeated comments that various proposed
>> value/formulas for work_mem are too low, but obviously the people who
>> suggested them didn't think so.  So I'm a bit concerned that we don't
>> all agree on what the end goal of this activity looks like.
>
> The counter-proposal to "auto-tuning" is just to raise the default for
> work_mem to 4MB or 8MB.  Given that Bruce's current formula sets it at
> 6MB for a server with 8GB RAM, I don't really see the benefit of going
> to a whole lot of code and formulas in order to end up at a figure only
> incrementally different from a new static default.

Agreed.  But what do you think the value SHOULD be on such a system?

I suggest that it's pretty reasonable to assume that even a
developer's personal machine will likely have 8GB or so by the time
PostgreSQL comes out, so tuning work_mem on that basis is not
unreasonable.  Plus, even if it has less, a developer probably won't
have 100 connections.

I guess the point I'm making here is that raising the default value is
not mutually exclusive with auto-tuning.  We could quadruple the
current defaults for work_mem and maintenance_work_mem and be better
off right now, today.  Then, we could improve things further in the
future if and when we agree on an approach to auto-tuning.  And people
who don't use the auto-tuning will still have a better default.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Robert Haas
On Thu, Oct 10, 2013 at 6:36 PM, Bruce Momjian  wrote:
> Patch attached.

ISTM that we have broad consensus that doing this at initdb time is
more desirable than doing it in the server on the fly.  Not everyone
agrees with that (you don't, for instance) but there were many, many
votes in favor of that option.

Judging by the commit I just pushed to do initdb-time selection of the
dynamic shared memory implementation to use, this is probably not hard
to code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest progress

2013-10-10 Thread Robert Haas
On Wed, Oct 9, 2013 at 2:03 PM, Robert Haas  wrote:
> I therefore propose that we start by marking all of the patches that
> are currently Waiting on Author as Returned with Feedback.  Most of
> them have been that way for a long time.

Hearing no objections, I went through and did this, but skipped some
that had recent activity, and instead marked one as ready for
committer on the basis that the reviewer thought it was in good shape
except for needing test case revision.  Together with today's rash of
commits, this means that we've now disposed of half the patches in the
CommitFest.  IOW, we have a lot of work left to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Revive line type

2013-10-10 Thread Peter Eisentraut
On Thu, 2013-10-10 at 10:51 -0400, Robert Haas wrote:
> On Wed, Oct 9, 2013 at 10:43 PM, Peter Eisentraut  wrote:
> > Revive line type
> 
> Kevin just pointed out to me that there are a bunch of buildfarm
> failures.  I'm looking at the ones that are attributable to shared
> memory, but there seem to be some problems with this patch as well.
> Check out brolga, for example.

OK, I fixed the obvious issue with the geometry test.  The line test is
now failing because of "negative zero", which could be addressed with an
alternative expected file like in the case of geometry, and also because
of a rounding issue.  I'm not sure yet whether the latter is just
another platform difference, an unfortunately example case, or an actual
bug.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Robert Haas
On Thu, Oct 10, 2013 at 7:59 PM, Josh Berkus  wrote:
>>> (5) Default to POSIX, and allow for SysV as a compile-time option for
>>> platforms with poor POSIX memory support.
>>
>> OK, I did #5.  Let's see how that works.
>
> Andrew pointed out upthread that, since platforms are unlikely to change
> what they support dynamically, we could do this at initdb time instead.

Oh, sorry, that's what I did.  I missed the fact that your #5 and his
#5 were different.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Josh Berkus

>> (5) Default to POSIX, and allow for SysV as a compile-time option for
>> platforms with poor POSIX memory support.
> 
> OK, I did #5.  Let's see how that works.

Andrew pointed out upthread that, since platforms are unlikely to change
what they support dynamically, we could do this at initdb time instead.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Robert Haas
On Thu, Oct 10, 2013 at 5:14 PM, Josh Berkus  wrote:
>>> Doesn't #2 negate all advantages of this effort?  Bringing sysv
>>> management back on the table seems like a giant step backwards -- or
>>> am I missing something?
>>
>> Not unless there's no difference between "the default" and "the only option".
>
> Well, per our earlier discussion about "the first 15 minutes", there
> actually isn't a difference.

Harsh.  :-)

> I can only see two reasonable alternatives:
>
> This one:
>> (3) Add a new setting that auto-probes for a type of shared memory
>> that works.  Try POSIX first, then System V.  Maybe even fall back to
>> mmap'd files if neither of those works.
>
> Or:
>
> (5) Default to POSIX, and allow for SysV as a compile-time option for
> platforms with poor POSIX memory support.

OK, I did #5.  Let's see how that works.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-10-10 Thread Andres Freund
Hi,
On 2013-10-11 03:44:01 +0900, Fujii Masao wrote:
> I'm afraid that the patch has only limited effects in WAL reduction and
> performance improvement unless the database contains highly-compressible
> data like large blank characters column. It really depends on the contents
> of the database. So, obviously FPW compression should not be the default.
> Maybe we can treat it as just tuning knob.


Have you tried using lz4 (or snappy) instead of pglz? There's a patch
adding it to pg in
http://archives.postgresql.org/message-id/20130621000900.GA12425%40alap2.anarazel.de
 

If this really is only a benefit in scenarios with lots of such data, I
have to say I have my doubts about the benefits of the patch.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Jim Nasby

On 10/10/13 9:44 AM, MauMau wrote:

From: "Robert Haas" 

On Thu, Oct 10, 2013 at 1:23 AM, Magnus Hagander  wrote:

I think it would be even simpler, and more reliable, to start with the
parameter to initdb - I like that. But instead of having it set a new
variable based on that and then autotune off that, just have *initdb*
do these calculations you're suggesting, and write new defaults to the
files (preferably with a comment).

That way if the user *later* comes in and say changes shared_buffers,
we don't dynamically resize the work_mem into a value that might cause
his machine to die from swapping which would definitely violate the
principle of least surprise..


+1 for all of that.  I completely agree.


I vote for this idea completely, too.  It's nice to be able to specify usable RAM with 
something like "initdb --system-memory 8GB", because it provides flexibility 
for memory allocation --- use the whole machine for one PostgreSQL instance, or run 
multiple instances on one machine with 50% of RAM for instance-A and 25% of RAM for 
instance B and C, etc.  But what is the default value of --system-memory?  I would like 
it to be the whole RAM.

I hope something like pgtune will be incorporated into the core, absorbing the 
ideas in:

- pgtune
- https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
- the book "PostgreSQL 9.0 High Performance" by Greg Smith

Then initdb calls the tool.  Of course, DBAs can use the tool later.  Like pgtune, the tool would 
be nice if it and initdb can accept "--system-type" or "--workload" with 
arguments {OLTP | DW | mixed}.


+1 on all of the above. If putting one-time magic in initdb works maybe then we 
can look at runtime or even completely dynamic magic.

FWIW, I would be careful about allowing the tool to go completely crazy if 
--system-memory is set really high, including for things like work_mem. 
Frequently if you've got a lot of memory you're going to want a serious chunk 
of it used by the filesystem/kernel cache, and not to just vanish into a random 
sort (esp since last I knew there were diminishing returns on sort work_mem...)

Of course, I'm in a world of 512G servers with 8GB shared buffers so...
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Andres Freund
On 2013-10-10 12:13:20 -0400, Robert Haas wrote:
> and on smew (Debian GNU/Linux 6.0), it
> fails with "Function not implemented", which according to a forum
> post[1] I found probably indicates that /dev/shm doesn't mount a tmpfs
> on that box.

It would be nice to get confirmed what the reason for that is - afaik
debian has mounted /dev/shm for ages. The most likely issue I can see is
an incorrectly setup chroot.
If it's just that I wouldn't be too concerned about it. That's a
scenario that won't happen to many users and relatively easily fixed

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.2

2013-10-10 Thread Andres Freund
Hi Robert,

On 2013-10-09 14:49:46 -0400, Robert Haas wrote:
> I spent some time looking at the sample plugin (patch 9/12).  Here are
> some review comments:
> 
> - I think that the decoding plugin interface should work more like the
> foreign data wrapper interface.  Instead of using pg_dlsym to look up
> fixed names, I think there should be a struct of function pointers
> that gets filled in and registered somehow.

You mean something like CREATE OUTPUT PLUGIN registering a function with
an INTERNAL return value returning a filled struct? I thought about
that, but it seemed more complex. Happy to change it though if it's
preferred.

> - pg_decode_init() only warns when it encounters an unknown option.
> An error seems more appropriate.

Fine with me. I think I just made it a warning because I wanted to
experiment with options.

> - Still wondering how we'll use this from a bgworker.

Simplified code to consume data:

LogicalDecodingReAcquireSlot(NameStr(*name));
ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, false /* not initial 
call */,
   MyLogicalDecodingSlot->confirmed_flush,
   options,
   logical_read_local_xlog_page,
   LogicalOutputPrepareWrite,
   LogicalOutputWrite);
...
while (true)
{
XLogRecord *record;
char   *errm = NULL;

record = XLogReadRecord(ctx->reader, startptr, &errm);
...
DecodeRecordIntoReorderBuffer(ctx, &buf);
}

/* at the end or better ever commit or such */
LogicalConfirmReceivedLocation(/* whatever you consumed */);

LogicalDecodingReleaseSlot();


> - The output format doesn't look very machine-parseable.   I really
> think we ought to provide something that is.  Maybe a CSV-like format,
> or maybe something else, but I don't see why someone who wants to do
> change logging should be forced to write and install C code.  If
> something like Bucardo can run on an unmodified system and extract
> change-sets this way without needing a .so file, that's going to be a
> huge win for usability.

We can change the current format but I really see little to no chance of
agreeing on a replication format that's serviceable to several solutions
short term. Once we've gained some experience - maybe even this cycle -
that might be different.

> More generally on this patch set, if I'm going to be committing any of
> this, I'd prefer to start with what is currently patches 3 and 4, once
> we reach agreement on those.

Sounds like a reasonable start.

> Are we hoping to get any of this committed for this CF?  If so, let's
> make a plan to get that done; time is short.  If not, let's update the
> CF app accordingly.

I'd really like to do so. I am travelling atm, but I will be back
tomorrow evening and will push an updated patch this weekend. The issue
I know of in the latest patches at
http://www.postgresql.org/message-id/20131007133232.ga15...@awork2.anarazel.de
is renaming from 
http://www.postgresql.org/message-id/20131008194758.gb3718...@alap2.anarazel.de

Do you know of anything else in the patches you're referring to?

Thanks,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Bruce Momjian
On Thu, Oct 10, 2013 at 03:27:17PM -0700, Josh Berkus wrote:
> 
> > More generally, Josh has made repeated comments that various proposed
> > value/formulas for work_mem are too low, but obviously the people who
> > suggested them didn't think so.  So I'm a bit concerned that we don't
> > all agree on what the end goal of this activity looks like.
> 
> The counter-proposal to "auto-tuning" is just to raise the default for
> work_mem to 4MB or 8MB.  Given that Bruce's current formula sets it at
> 6MB for a server with 8GB RAM, I don't really see the benefit of going
> to a whole lot of code and formulas in order to end up at a figure only
> incrementally different from a new static default.

Well, the plan was going to auto-tune shared_buffers and
effective_cache_size too.  We could fall back to our existing code where
effective_cache_size autotunes on shared_buffers, and we just up
work_mem's default, tell people to set shared_buffers properly, and call
it a day.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ENABLE/DISABLE CONSTRAINT NAME

2013-10-10 Thread Jim Nasby

On 10/9/13 1:10 PM, Robert Haas wrote:

On Tue, Sep 24, 2013 at 10:40 PM, Peter Eisentraut  wrote:

On Tue, 2013-09-24 at 11:58 +0200, Bernd Helmle wrote:

Hmm not sure i understand this argument either: this patch doesn't
allow disabling a primary key. It only supports FKs and CHECK
constraints explicitly.


Well, as soon as the patch for cataloging not-null constraints as check
constraints is available, it will be possible to create views that
depend functionally on check constraints.  Then you'll have the same
problem there.

It's also not clear why this patch only supports foreign keys and check
constraints.  Maybe that's what was convenient to implement, but it's
not a principled solution to the general issue that constraints can be
involved in dependencies.


I agree with these concerns, as well as those raised by Tom Lane and
Fabien COELHO, and I think they indicate that we shouldn't accept this
patch.  So I'm marking this as Rejected.


I see a use case for disabling FKs and CHECKS but not PKs or UNIQUE 
constraints: FKs and CHECKS don't depend on additional state information 
(namely an index), so it's easy to just disable them temporarily and then 
re-enable them. The same isn't true about a PK or UNIQUE constraint.

Of course we could decide to do something more complex to handle disabling PK/UNIQUE... 
though at that point it'd be better to just allow temporarily disabling any index. But I 
think there's an argument to be made for that being beyond the scope of disabling 
"simple" constraints... it's a pretty high bar to set that we won't accept a 
patch that disables simple constraints but not those involving indexes.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Bruce Momjian
On Thu, Oct 10, 2013 at 03:40:17PM -0700, Josh Berkus wrote:
> 
> >> I don't follow that.  Why would using a connection pooler change the 
> >> multiples
> >> of work_mem that a connection would use?
> > 
> > I assume that a connection pooler would keep processes running longer,
> > so even if they were not all using work_mem, they would have that memory
> > mapped into the process, and perhaps swapped out.
> 
> Yes, and then this is when it *really* matters what OS you're running,
> and what release.  FreeBSD and Solaris++ don't overallocate RAM, so
> those long-running connections pin a lot of RAM eventually.  And for
> Linux, it's a question of how aggressive the OOM killer is, which kinda
> depends on distro/version/sysadmin settings.
> 
> When I configure pgbouncer for Illumos users, I specifically have it
> rotate out old connections once an hour for this reason.

Just as a point of education, this is a good idea why you want to
allocate swap even if you expect your workload to fit in memory. 
Pushing unused memory to swap is a good use of swap.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Bruce Momjian
On Thu, Oct 10, 2013 at 02:44:12PM -0400, Peter Eisentraut wrote:
> On 10/10/13 11:31 AM, Bruce Momjian wrote:
> > Let me walk through the idea of adding an available_mem setting, that
> > Josh suggested, and which I think addresses Robert's concern about
> > larger shared_buffers and Windows servers.
> 
> I think this is a promising idea.  available_mem could even be set
> automatically by packages.  And power users could just set available_mem
> = -1 to turn off all the magic.

Yes, I was thinking about that.  Imagine we have an initdb parameter
for available memory --- packagers could do something like:

initdb -M $(awk '{print $2 * 1024; exit}' /proc/meminfo)

to pass in the available memory of the server, or to use 90% of RAM,
use:

initdb -M $(awk '{printf "%.0f\n", $2 * 1024 * 0.9; exit}' 
/proc/meminfo)

This allows us to externalize all the OS-specific information and allow
the packagers to supply it.  The packagers could even ask the user if
they wish to control the percentage.

FYI, I hope people are OK with me replying a lot in this thread --- I do
think this is going to take a lot of discussion, but I think the
end-result will be worth it.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Josh Berkus

>> I don't follow that.  Why would using a connection pooler change the 
>> multiples
>> of work_mem that a connection would use?
> 
> I assume that a connection pooler would keep processes running longer,
> so even if they were not all using work_mem, they would have that memory
> mapped into the process, and perhaps swapped out.

Yes, and then this is when it *really* matters what OS you're running,
and what release.  FreeBSD and Solaris++ don't overallocate RAM, so
those long-running connections pin a lot of RAM eventually.  And for
Linux, it's a question of how aggressive the OOM killer is, which kinda
depends on distro/version/sysadmin settings.

When I configure pgbouncer for Illumos users, I specifically have it
rotate out old connections once an hour for this reason.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Bruce Momjian
On Thu, Oct 10, 2013 at 11:18:28AM -0700, Josh Berkus wrote:
> Bruce,
> 
> >> That's way low, and frankly it's not worth bothering with this if all
> >> we're going to get is an incremental increase.  In that case, let's just
> >> set the default to 4MB like Robert suggested.
> > 
> > Uh, well, 100 backends at 6MB gives us 600MB, and if each backend uses
> > 3x work_mem, that gives us 1.8GB for total work_mem.  This was based on
> > Andrew's concerns about possible over-commit of work_mem.  I can of
> > course adjust that.
> 
> That's worst-case-scenario planning -- the 3X work-mem per backend was:
> a) Solaris and
> b) data warehousing
> 
> In a normal OLTP application each backend averages something like 0.25 *
> work_mem, since many queries use no work_mem at all.
> 
> It also doesn't address my point that, if we are worst-case-scenario
> default-setting, we're going to end up with defaults which aren't
> materially different from the current defaults.  In which case, why even
> bother with this whole exercise?

OK, here is an updated patch that is less conservative.  FYI, this
thread has gone on for 80 messages, and I assume it will take many more
until we are done:

test=> SHOW shared_buffers;
 shared_buffers

 128MB
(1 row)

test=> SHOW work_mem;
 work_mem
--
 2621kB
(1 row)

test=> SHOW maintenance_work_mem;
 maintenance_work_mem
--
 10922kB
(1 row)


---

test=> SHOW shared_buffers;
 shared_buffers

 2GB
(1 row)

test=> SHOW work_mem;
 work_mem
--
 41943kB
(1 row)

test=> SHOW maintenance_work_mem;
 maintenance_work_mem
--
 174762kB
(1 row)


---

test=> SHOW shared_buffers;
 shared_buffers

 8GB
(1 row)

test=> SHOW work_mem;
 work_mem
--
 167772kB
(1 row)

test=> SHOW maintenance_work_mem;
 maintenance_work_mem
--
 699050kB
(1 row)

Patch attached.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
new file mode 100644
index e8e8e6f..2f00c74
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*** include 'filename'
*** 1121,1127 
 
  Specifies the amount of memory to be used by internal sort operations
  and hash tables before writing to temporary disk files. The value
! defaults to one megabyte (1MB).
  Note that for a complex query, several sort or hash operations might be
  running in parallel; each operation will be allowed to use as much memory
  as this value specifies before it starts to write data into temporary
--- 1121,1128 
 
  Specifies the amount of memory to be used by internal sort operations
  and hash tables before writing to temporary disk files. The value
! defaults to 2 * shared_buffers /
! max_connections.
  Note that for a complex query, several sort or hash operations might be
  running in parallel; each operation will be allowed to use as much memory
  as this value specifies before it starts to write data into temporary
*** include 'filename'
*** 1147,1153 
  Specifies the maximum amount of memory to be used by maintenance
  operations, such as VACUUM, CREATE
  INDEX, and ALTER TABLE ADD FOREIGN KEY.  It defaults
! to 16 megabytes (16MB).  Since only one of these
  operations can be executed at a time by a database session, and
  an installation normally doesn't have many of them running
  concurrently, it's safe to set this value significantly larger
--- 1148,1155 
  Specifies the maximum amount of memory to be used by maintenance
  operations, such as VACUUM, CREATE
  INDEX, and ALTER TABLE ADD FOREIGN KEY.  It defaults
! to shared_buffers / 4 /
! autovacuum_max_workers.  Since only one of these
  operations can be executed at a time by a database session, and
  an installation normally doesn't have many of them running
  concurrently, it's safe to set this value significantly larger
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
new file mode 100644
index 33efb3c..68af1dc
*** a/sr

Re: [HACKERS] [PoC] pgstattuple2: block sampling to reduce physical read

2013-10-10 Thread Mark Kirkwood
On 11/10/13 11:09, Mark Kirkwood wrote:
> On 16/09/13 16:20, Satoshi Nagayasu wrote:
>> (2013/09/15 11:07), Peter Eisentraut wrote:
>>> On Sat, 2013-09-14 at 16:18 +0900, Satoshi Nagayasu wrote:
 I'm looking forward to seeing more feedback on this approach,
 in terms of design and performance improvement.
 So, I have submitted this for the next CF.
>>> Your patch fails to build:
>>>
>>> pgstattuple.c: In function ‘pgstat_heap_sample’:
>>> pgstattuple.c:737:13: error: ‘SnapshotNow’ undeclared (first use in
>>> this function)
>>> pgstattuple.c:737:13: note: each undeclared identifier is reported
>>> only once for each function it appears in
>> Thanks for checking. Fixed to eliminate SnapshotNow.
>>
> This seems like a cool idea! I took a quick look, and initally
> replicated the sort of improvement you saw:
>
>
> bench=# explain analyze select * from pgstattuple('pgbench_accounts');
> QUERY PLAN
>
> 
> Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual
> time=786.368..786.369 rows=1 loops=1)
> Total runtime: 786.384 ms
> (2 rows)
>
> bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
> QUERY PLAN
>
> 
> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
> time=12.004..12.005 rows=1 loops=1)
> Total runtime: 12.019 ms
> (2 rows)
>
>
>
> I wondered what sort of difference eliminating caching would make:
>
> $ sudo sysctl -w vm.drop_caches=3
>
> Repeating the above queries:
>
>
> bench=# explain analyze select * from pgstattuple('pgbench_accounts');
> QUERY PLAN
>
> 
> Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual
> time=9503.774..9503.776 rows=1 loops=1)
> Total runtime: 9504.523 ms
> (2 rows)
>
> bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
> QUERY PLAN
>
> 
> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
> time=12330.630..12330.631 rows=1 loops=1)
> Total runtime: 12331.353 ms
> (2 rows)
>
>
> So the sampling code seems *slower* when the cache is completely cold -
> is that expected? (I have not looked at how the code works yet - I'll
> dive in later if I get a chance)!
>

Quietly replying to myself - looking at the code the sampler does 3000
random page reads... I guess this is slower than 163935 (number of pages
in pgbench_accounts) sequential page reads thanks to os readahead on my
type of disk (WD Velociraptor). Tweaking the number of random reads (i.e
the sample size) down helps - but obviously that can impact estimation
accuracy.

Thinking about this a bit more, I guess the elapsed runtime is not the
*only* theng to consider - the sampling code will cause way less
disruption to the os page cache (3000 pages vs possibly lots more than
3000 for reading an entire ralation).

Thoughts?

Cheers

Mark



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Bruce Momjian
On Thu, Oct 10, 2013 at 11:14:27AM -0700, Jeff Janes wrote:
> The assumption that each connection won't use lots of work_mem is also
> false, I think, especially in these days of connection poolers.
> 
> 
> I don't follow that.  Why would using a connection pooler change the multiples
> of work_mem that a connection would use?

I assume that a connection pooler would keep processes running longer,
so even if they were not all using work_mem, they would have that memory
mapped into the process, and perhaps swapped out.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Josh Berkus

> More generally, Josh has made repeated comments that various proposed
> value/formulas for work_mem are too low, but obviously the people who
> suggested them didn't think so.  So I'm a bit concerned that we don't
> all agree on what the end goal of this activity looks like.

The counter-proposal to "auto-tuning" is just to raise the default for
work_mem to 4MB or 8MB.  Given that Bruce's current formula sets it at
6MB for a server with 8GB RAM, I don't really see the benefit of going
to a whole lot of code and formulas in order to end up at a figure only
incrementally different from a new static default.

The core issue here is that there aren't good "generic" values for these
settings for all users -- that's why we have the settings in the first
place.   Following a formula isn't going to change that.

If we're serious about autotuning, then we should look at:

a) admissions control for non-shared resources (e.g. work_mem)

b) auto-feedback tuning loops (ala Heikki's checkpoint_segments and the
bgwriter).

We could certainly create an autofeedback tuning loop for work_mem.
Just watch the database, record the amount of data spilled to disk for
work (pg_stat_sorts), and record the total RAM pinned by backends.
*Then* apply a formula and maybe bump up work_mem a bit depending on
what comes out of it.  And keep monitoring and keep readjusting.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ECPG FETCH readahead

2013-10-10 Thread Alvaro Herrera
Boszormenyi Zoltan escribió:
> 2013-09-10 03:04 keltezéssel, Peter Eisentraut írta:
> >You need to update the dblink regression tests.
> 
> Done.

Dude, this is an humongous patch.  I was shocked by it initially, but on
further reading, I observed that it's only a huge patch which also does
some mechanical changes to test output.  I think it'd be better to split
the part that's responsible for the changed lines in test output
mentioning "ecpg_process_output".  That should be a reasonably small
patch which changes ecpg_execute slightly and adds the new function, is
followed by the enormous resulting mechanical changes in test output.
It should be possible to commit that relatively quickly.  Then there's
the rest of the patch, which would adds a huge pile of new code.

I think there are some very minor changes to backend code as well --
would it make sense to post that as a separate piece?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PoC] pgstattuple2: block sampling to reduce physical read

2013-10-10 Thread Mark Kirkwood
On 16/09/13 16:20, Satoshi Nagayasu wrote:
> (2013/09/15 11:07), Peter Eisentraut wrote:
>> On Sat, 2013-09-14 at 16:18 +0900, Satoshi Nagayasu wrote:
>>> I'm looking forward to seeing more feedback on this approach,
>>> in terms of design and performance improvement.
>>> So, I have submitted this for the next CF.
>>
>> Your patch fails to build:
>>
>> pgstattuple.c: In function ‘pgstat_heap_sample’:
>> pgstattuple.c:737:13: error: ‘SnapshotNow’ undeclared (first use in
>> this function)
>> pgstattuple.c:737:13: note: each undeclared identifier is reported
>> only once for each function it appears in
>
> Thanks for checking. Fixed to eliminate SnapshotNow.
>

This seems like a cool idea! I took a quick look, and initally
replicated the sort of improvement you saw:


bench=# explain analyze select * from pgstattuple('pgbench_accounts');
QUERY PLAN


Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual
time=786.368..786.369 rows=1 loops=1)
Total runtime: 786.384 ms
(2 rows)

bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
QUERY PLAN


Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
time=12.004..12.005 rows=1 loops=1)
Total runtime: 12.019 ms
(2 rows)



I wondered what sort of difference eliminating caching would make:

$ sudo sysctl -w vm.drop_caches=3

Repeating the above queries:


bench=# explain analyze select * from pgstattuple('pgbench_accounts');
QUERY PLAN


Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual
time=9503.774..9503.776 rows=1 loops=1)
Total runtime: 9504.523 ms
(2 rows)

bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
QUERY PLAN


Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
time=12330.630..12330.631 rows=1 loops=1)
Total runtime: 12331.353 ms
(2 rows)


So the sampling code seems *slower* when the cache is completely cold -
is that expected? (I have not looked at how the code works yet - I'll
dive in later if I get a chance)!

Regards

Mark



Re: [HACKERS] Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls

2013-10-10 Thread Alvaro Herrera
We have this block:

+   {
+   /*
+* This is the window we want - but we have to tweak the
+* definition slightly (e.g. to support the IGNORE NULLS frame
+* option) as we're not using the default (i.e. parent) frame
+* options.
+*
+* We'll create a 'child' (using refname to inherit everything
+* from the parent) that just overrides the frame options
+* (assuming it doesn't already exist):
+*/
+   WindowDef  *clone = makeNode(WindowDef);

... then it goes to populate the clone.  When this is done, we use the
clone to walk the list of existing WindowDefs, and if there's a match we
free this one and use that one.  Wouldn't it be better to walk the
existing array first looking for a match, and only create a clone if
none is found?  This would avoid the memory leak problems; I originally
pointed out that you're leaking the bits created by this makeNode() call
above, but now that I look at it again, I think you're also leaking the
bits created by the two copyObject() calls to create the clone.  It
appears to me that it's simpler to not allocate any memory in the first
place, unless necessary.


Also, in parsenodes.h, you had the [MANDATORY] and such tags.  Three
things about that: 1) it looks a lot uglier than the original, so how
about the modified version below?  and 2) what does "MANDATORY value of
NULL" means?  Maybe you mean "MANDATORY value or NULL" instead? 3)
Exactly what case does the "in this case" phrase refer to?  I think the
comment should be more explicit.  Also, I think this should be its own
paragraph instead of being mixed with the "For entries in a" paragraph.

/*
 * WindowDef - raw representation of WINDOW and OVER clauses
 *
 * For entries in a WINDOW list, "name" is the window name being defined.
 * For OVER clauses, we use "name" for the "OVER window" syntax, or "refname"
 * for the "OVER (window)" syntax, which is subtly different --- the latter
 * implies overriding the window frame clause.
 * 
 * In this case, the per-field indicators determine what the semantics
 * are:
 *  [V]irtual
 *  If NULL, then the parent's (refname) value is used.
 *  [M]andatory
 *  Never inherited from the parent, so must be specified; 
may be NULL.
 *  [S]uper
 *  Always inherited from parent, any local version ignored.
 */
typedef struct WindowDef
{
NodeTag type;
char   *name;   /* [M] window's own name */
char   *refname;/* [M] referenced window name, if any */
List   *partitionClause;/* [V] PARTITION BY expression list */
List   *orderClause;/* [M] ORDER BY (list of SortBy) */
int frameOptions;   /* [M] frame_clause options, see below */
Node   *startOffset;/* [M] expression for starting bound, if 
any */
Node   *endOffset;  /* [M] expression for ending bound, if any 
*/
int location;   /* parse location, or -1 if none/unknown */
} WindowDef;

In gram.y there are some spurious whitespaces at end-of-line.  You
should be able to see them with git diff --check.  (I don't think we
support running pgindent on .y files, which would have otherwise cleaned
this up.)

A style issue.  You have this:

+   /*
+* We can process a constant offset much more efficiently; initially
+* we'll scan through the first  non-null rows, and store that
+* index. On subsequent rows we'll decide whether to push that index
+* forwards to the next non-null value, or just return it again.
+*/
+   leadlag_const_context *context = WinGetPartitionLocalMemory(
+   winobj,
+ sizeof(leadlag_const_context));
+   int count_forward = 0;

I think it'd be better to put the declarations above the comment, and
assignment to "context" below the comment.  This way, the indentation of
the assignment is not so odd.  So it'd look like

+   leadlag_const_context *context;
+   int count_forward = 0;
+
+   /*
+* We can process a constant offset much more efficiently; initially
+* we'll scan through the first  non-null rows, and store that
+* index. On subsequent rows we'll decide whether to push that index
+* forwards to the next non-null value, or just return it again.
+*/
+   context = WinGetPartitionLocalMemory(winobj,
+sizeof(leadlag_const_context));


And a final style comment.  You have a block like this:

if (ignore_nulls && !const_offset)
{
long block;
}
else if (ignore_nulls /* && const_offset */)
{
another long block;
}
else
{
more stuff;
}

I t

Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Josh Berkus
Robert,

>> Doesn't #2 negate all advantages of this effort?  Bringing sysv
>> management back on the table seems like a giant step backwards -- or
>> am I missing something?
> 
> Not unless there's no difference between "the default" and "the only option".

Well, per our earlier discussion about "the first 15 minutes", there
actually isn't a difference.

We got rid of SHMMAX for the majority of our users for 9.3.  We should
NOT revert that just so we can support older platforms -- especially
since, if anything, Kernel.org is going to cripple SysV support even
further in the future.  The platforms where it does work represent the
vast majority of our users, and are only on the increase.

I can only see two reasonable alternatives:

This one:
> (3) Add a new setting that auto-probes for a type of shared memory
> that works.  Try POSIX first, then System V.  Maybe even fall back to
> mmap'd files if neither of those works.

Or:

(5) Default to POSIX, and allow for SysV as a compile-time option for
platforms with poor POSIX memory support.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Daniel Farina
On Thu, Oct 10, 2013 at 1:30 PM, Alvaro Herrera
 wrote:
>> > Just noticed that you changed the timer to struct Instrumentation.  Not
>> > really sure about that change.  Since you seem to be using only the
>> > start time and counter, wouldn't it be better to store only those?
>> > Particularly unsure about passing INSTRUMENT_ALL to InstrAlloc().
>>
>> Yeah, I was unsure about that too.
>>
>> The motivation was that I need one more piece of information in
>> pgss_store (the absolute start time).  I was going to widen the
>> argument list, but it was looking pretty long, so instead I was
>> thinking it'd be more concise to push the entire, typically extant
>> Instrumentation struct pointer down.
>
> Would it work to define your own struct to pass around?

Absolutely, I was just hoping to spare the code another abstraction if
another was a precise superset.

Looks like that isn't going to happen, though, so a pgss-oriented
struct is likely what will have to be.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Robert Haas
On Thu, Oct 10, 2013 at 4:00 PM, Merlin Moncure  wrote:
>> (2) Default to using System V shared memory.  If people want POSIX
>> shared memory, let them change the default.
>
> Doesn't #2 negate all advantages of this effort?  Bringing sysv
> management back on the table seems like a giant step backwards -- or
> am I missing something?

Not unless there's no difference between "the default" and "the only option".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Pattern matching operators a index

2013-10-10 Thread soroosh sardari
Hi

I'm developing a new type for character string, like varchar. I wrote
operators for btree and so forth.
I wonder how pattern matching operators using btree index, because btree
operator class ony knows about >, >=, <=, and = operators, but operators
for pattern matching, such as LIKE, are not known for btree access method.

Now my question is:
Is Postgre using btree for pattern matching query for varchar or other
character string types?

If it does, how i implement it for my new type?

Regards,
Soroosh


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread pilum . 70

The V7-Patch applied cleanly and I got no issues in my first tests.

The change from column session_start to a function seems very reasonable for me.

Concernig the usability, I would like to suggest a minor change, 
that massively increases the usefulness of the patch for beginners, 
who often use this view as a first approach to optimize index structure.


The history of this tool contains a first version without normalization.
This wasn't useful enough except for prepared queries.
The actual version has normalized queries, so calls get
summarized to get a glimpse of bad queries.
But the drawback of this approach is impossibility to use
explain analyze without further substitutions.

The identification patch provides the possibility to summarize calls
by query_id, so that the normalized query string itself is no longer 
needed to be exposed in the view for everyone.


I suggest to add a parameter to recover the possibility to
display real queries. The following very minor change
(based on V7) exposes the first real query getting this
query_id if normalization of the exposed string ist deactivated 
(The actual behaviour is the default). 
This new option is not always the best starting point to discover index shortfalls, 
but a huge gain for beginners because it serves the needs

in more than 90% of the normal use cases.

What do you think?

Arne

Date:   Mon Oct 7 17:54:08 2013 +

Switch to disable normalized query strings

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c
index e50dfba..6cc9244 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -234,7 +234,7 @@ static int  pgss_max;   /* max # 
statements to track */
 static int pgss_track; /* tracking level */
 static bool pgss_track_utility; /* whether to track utility commands */
 static bool pgss_save; /* whether to save stats across 
shutdown */
-
+static bool pgss_normalize; /* whether to normalize the query 
representation shown in the view (otherwise show the first query executed with 
this query_id) */

 #define pgss_enabled() \
(pgss_track == PGSS_TRACK_ALL || \
@@ -356,6 +356,17 @@ _PG_init(void)
 NULL,
 NULL);

+   DefineCustomBoolVariable("pg_stat_statements.normalize",
+"Selects whether the view column contains the query 
strings in a normalized form.",
+  NULL,
+  &pgss_normalize,
+  true,
+  PGC_SUSET,
+  0,
+  NULL,
+  NULL,
+  NULL);
+
EmitWarningsOnPlaceholders("pg_stat_statements");

/*
@@ -1084,9 +1095,9 @@ pgss_store(const char *query, uint32 queryId,

query_len = strlen(query);

-   if (jstate)
+   if (jstate && pgss_normalize)
{
-   /* Normalize the string if enabled */
+   /* Normalize the string is not NULL and normalized 
query strings are enabled */
norm_query = generate_normalized_query(jstate, query,

   &query_len,

   key.encoding);






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread David Fetter
On Thu, Oct 10, 2013 at 12:13:20PM -0400, Robert Haas wrote:
> Since, as has been previously discussed in this forum on multiple
> occasions [citation needed], the default System V shared memory limits
> are absurdly low on many systems, the dynamic shared memory patch
> defaults to POSIX shared memory, which has often been touted as a
> superior alternative [citation needed].  Unfortunately, the buildfarm
> isn't entirely happy with this decision.  On buildfarm member anole
> (HP-UX B.11.31), allocation of dynamic shared memory fails with a
> "Permission denied" error, and on smew (Debian GNU/Linux 6.0), it
> fails with "Function not implemented", which according to a forum
> post[1] I found probably indicates that /dev/shm doesn't mount a tmpfs
> on that box.
> 
> What shall we do about this?  I see a few options.
> 
> (1) Define the issue as "not our problem".  IOW, as of now, if you
> want to use PostgreSQL, you've got to either make POSIX shared memory
> work on your machine, or change the GUC that selects the type of
> dynamic shared memory used.

+1 for this.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Andrew Dunstan


On 10/10/2013 02:45 PM, Robert Haas wrote:

On Thu, Oct 10, 2013 at 2:36 PM, Peter Geoghegan  wrote:

On Thu, Oct 10, 2013 at 9:13 AM, Robert Haas  wrote:

(2) Default to using System V shared memory.  If people want POSIX
shared memory, let them change the default.
After some consideration, I think my vote is for option #2.

Wouldn't that become the call of packagers?

Packagers can certainly override whatever we do, but we still need to
make the buildfarm green again.





I really dislike throwing things over the wall to packagers like this, 
anyway. Quite apart from anything else, not everyone uses pre-built 
packages, and we should make things as as easy as possible for those who 
don't, too.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Alvaro Herrera
Daniel Farina escribió:

> Given that, perhaps a way to fix this is something like this patch-fragment:
> 
> """
>  {
>   PGSS_TUP_V1_0 = 1,
>   PGSS_TUP_V1_1,
> - PGSS_TUP_LATEST
> + PGSS_TUP_V1_2
>  } pgssTupVersion;
> 
> +#define PGSS_TUP_LATEST PGSS_TUP_V1_2
> """

This sounds good.  I have seen other places that have the LATEST
definition as part of the enum, assigning the previous value to it.  I'm
not really sure which of these is harder to miss when updating the code.
I'm happy with either.

Make sure to use the PGSS_TUP_V1_2 in the two places mentioned, in any
case.

> > Just noticed that you changed the timer to struct Instrumentation.  Not
> > really sure about that change.  Since you seem to be using only the
> > start time and counter, wouldn't it be better to store only those?
> > Particularly unsure about passing INSTRUMENT_ALL to InstrAlloc().
> 
> Yeah, I was unsure about that too.
> 
> The motivation was that I need one more piece of information in
> pgss_store (the absolute start time).  I was going to widen the
> argument list, but it was looking pretty long, so instead I was
> thinking it'd be more concise to push the entire, typically extant
> Instrumentation struct pointer down.

Would it work to define your own struct to pass around?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Daniel Farina
On Thu, Oct 10, 2013 at 10:49 AM, Alvaro Herrera
 wrote:
> Daniel Farina escribió:
>> On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao  wrote:
>
>> > In my test, I found that pg_stat_statements.total_time always indicates a 
>> > zero.
>> > I guess that the patch might handle pg_stat_statements.total_time wrongly.
>> >
>> > +values[i++] = DatumGetTimestamp(
>> > +instr_get_timestamptz(pgss->session_start));
>> > +values[i++] = DatumGetTimestamp(
>> > +instr_get_timestamptz(entry->introduced));
>> >
>> > These should be executed only when detected_version >= PGSS_TUP_LATEST?
>>
>> Yes. Oversight.
>
> Hmm, shouldn't this be conditional on a new PGSS_TUP_V1_2?  I mean, if
> later somebody patches pgss to have a PGSS_TUP_V2 or V1_3 and that
> becomes latest, somebody running the current definition with the updated
> .so will no longer get these values.  Or is the intention that
> PGSS_TUP_LATEST will never be updated again, and future versions will
> get PGSS_TUP_REALLY_LATEST and PGSS_TUP_REALLY_LATEST_HONEST and so on?
> I mean, surely we can always come up with new symbols to use, but it
> seems hard to follow.
>
> There's one other use of PGSS_TUP_LATEST here which I think should also
> be changed to >= SOME_SPECIFIC_VERSION:
>
> +   if (detected_version >= PGSS_TUP_LATEST)
> +   {
> +   uint64 qid = pgss->private_stat_session_key;
> +
> +   qid ^= (uint64) entry->query_id;
> +   qid ^= ((uint64) entry->query_id) << 32;
> +
> +   values[i++] = Int64GetDatumFast(qid);
> +   }

I made some confusing mistakes here in using the newer tuple
versioning.  Let me try to explain what the motivation was:

I was adding new fields to pg_stat_statements and was afraid that it'd
be hard to get a very clear view that all the set fields are in
alignment and there were no accidental overruns, with the problem
getting more convoluted as more versions are added.

The idea of PGSS_TUP_LATEST is useful to catch common programmer error
by testing some invariants, and it'd be nice not to have to thrash the
invariant checking code every release, which would probably defeat the
point of such "oops" prevention code.  But, the fact that I went on to
rampantly do questionable things PGSS_TUP_LATEST is a bad sign.

By example, here are the two uses that have served me very well:

/* Perform version detection */
if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_0)
detected_version = PGSS_TUP_V1_0;
else if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_1)
detected_version = PGSS_TUP_V1_1;
else if (tupdesc->natts == PG_STAT_STATEMENTS_COLS)
detected_version = PGSS_TUP_LATEST;
else
{
/*
 * Couldn't identify the tuple format.  Raise error.
 *
 * This is an exceptional case that may only happen in bizarre
 * situations, since it is thought that every released version
 * of pg_stat_statements has a matching schema.
 */
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg("pg_stat_statements schema is not supported "
"by its installed binary")));
}

And

#ifdef USE_ASSERT_CHECKING
/* Check that every column appears to be filled */
switch (detected_version)
{
case PGSS_TUP_V1_0:
Assert(i == PG_STAT_STATEMENTS_COLS_V1_0);
break;
case PGSS_TUP_V1_1:
Assert(i == PG_STAT_STATEMENTS_COLS_V1_1);
break;
case PGSS_TUP_LATEST:
Assert(i == PG_STAT_STATEMENTS_COLS);
break;
default:
Assert(false);
}
#endif

Given that, perhaps a way to fix this is something like this patch-fragment:

"""
 {
  PGSS_TUP_V1_0 = 1,
  PGSS_TUP_V1_1,
- PGSS_TUP_LATEST
+ PGSS_TUP_V1_2
 } pgssTupVersion;

+#define PGSS_TUP_LATEST PGSS_TUP_V1_2
"""

This way when a programmer is making new tuple versions, they are much
more likely to see the immediate need to teach those two sites about
the new tuple size.  But, the fact that one does not get the
invariants updated in a completely compulsory way may push the value
of this checking under water.

I'd be sad to see it go, it has saved me a lot of effort when
returning to the code after a while.

What do you think?

> The instr_time thingy being used for these times maps to
> QueryPerformanceCounter() on Windows, and I'm not sure how useful this
> is for long-term time tracking; see
> http://stackoverflow.com/questions/5296990/queryperformancecounter-and-overflows#5297163
> for instance.  I think it'd be better to use TimestampTz and
> GetCurrentTimestamp() for this.

Ah. I was going to do that, but thought it'd be nice to merely push
down the already-extant Instr struct in most cases, as to get the
'start' t

Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Merlin Moncure
On Thu, Oct 10, 2013 at 11:13 AM, Robert Haas  wrote:
> Since, as has been previously discussed in this forum on multiple
> occasions [citation needed], the default System V shared memory limits
> are absurdly low on many systems, the dynamic shared memory patch
> defaults to POSIX shared memory, which has often been touted as a
> superior alternative [citation needed].  Unfortunately, the buildfarm
> isn't entirely happy with this decision.  On buildfarm member anole
> (HP-UX B.11.31), allocation of dynamic shared memory fails with a
> "Permission denied" error, and on smew (Debian GNU/Linux 6.0), it
> fails with "Function not implemented", which according to a forum
> post[1] I found probably indicates that /dev/shm doesn't mount a tmpfs
> on that box.
>
> What shall we do about this?  I see a few options.
>
> (1) Define the issue as "not our problem".  IOW, as of now, if you
> want to use PostgreSQL, you've got to either make POSIX shared memory
> work on your machine, or change the GUC that selects the type of
> dynamic shared memory used.
>
> (2) Default to using System V shared memory.  If people want POSIX
> shared memory, let them change the default.

Doesn't #2 negate all advantages of this effort?  Bringing sysv
management back on the table seems like a giant step backwards -- or
am I missing something?

http://www.postgresql.org/docs/9.3/interactive/kernel-resources.html#SYSVIPC

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Robert Haas
On Thu, Oct 10, 2013 at 3:41 PM, Christopher Browne  wrote:
> On Thu, Oct 10, 2013 at 12:28 PM, Bruce Momjian  wrote:
>> How do we handle the Python dependency, or is this all to be done in
>> some other language?  I certainly am not ready to take on that job.
>
> I should think it possible to reimplement it in C.  It was considerably
> useful to start by implementing in Python, as that evades various sorts
> of efforts needed in C (e.g. - memory allocation, picking a hash table
> implementation), and allows someone to hack on it without needing to
> run through a recompile every time something is touched.

Also, the last time I saw that tool, it output recommendations for
work_mem that I would never, ever recommend to anyone on a production
server - they were VERY high.

More generally, Josh has made repeated comments that various proposed
value/formulas for work_mem are too low, but obviously the people who
suggested them didn't think so.  So I'm a bit concerned that we don't
all agree on what the end goal of this activity looks like.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Daniel Farina
On Thu, Oct 10, 2013 at 10:12 AM, Fujii Masao  wrote:
> On Fri, Oct 11, 2013 at 12:19 AM, Daniel Farina  wrote:
>> Probably.
>>
>> The idea is that without those fields it's, to wit, impossible to
>> explain non-monotonic movement in metrics of those queries for precise
>> use in tools that insist on monotonicity of the fields, which are all
>> cumulative to date I think.
>>
>> pg_stat_statements_reset() or crashing loses the session, so one
>> expects "ncalls" may decrease.  In addition, a query that is bouncing
>> in and out of the hash table will have its statistics be lost, so its
>> statistics will also decrease.  This can cause un-resolvable artifact
>> in analysis tools.
>>
>> The two fields allow for precise understanding of when the statistics
>> for a given query_id are continuous since the last time a program
>> inspected it.
>
> Thanks for elaborating them! Since 'introduced' is reset even when
> the statistics is reset, maybe we can do without 'session_start' for
> that purpose?

There is a small loss of precision.

The original reason was that if one wanted to know, given two samples
of pg_stat_statements, when the query_id is going to remain stable for
a given query.

For example:

If the session changes on account of a reset, then it is known that
all query_ids one's external program is tracking are no longer going
to be updated, and the program can take account for the fact that the
same query text is going to have a new query_id.

Given the new idea of mixing in the point release number:

If the code is changed to instead mixing in the full version of
Postgres, as you suggested recently, this can probably be removed.
The caveat there is then the client is going to have to do something a
bit weird like ask for the point release and perhaps even compile
options of Postgres to know when the query_id is going to have a
different value for a given query text.  But, maybe this is an
acceptable compromise to remove one field.

>>> +/*
>>> + * The role calling this function is unable to see
>>> + * sensitive aspects of this tuple.
>>> + *
>>> + * Nullify everything except the "insufficient privilege"
>>> + * message for this entry
>>> + */
>>> +memset(nulls, 1, sizeof nulls);
>>> +
>>> +nulls[i]  = 0;
>>> +values[i] = CStringGetTextDatum("");
>>>
>>> Why do we need to hide *all* the fields in pg_stat_statements, when
>>> it's accessed by improper user? This is a big change of pg_stat_statements
>>> behavior, and it might break the compatibility.
>>
>> It seems like an information leak that grows more major if query_id is
>> exposed and, at any point, one can determine the query_id for a query
>> text.
>
> So hiding only query and query_id is enough?

Yeah, I think so.  The other fields feel a bit weird to leave hanging
around as well, so I thought I'd just "fix" it in one shot, but doing
the minimum or only applying this idea to new fields is safer.  It's
shorter to hit all the fields, though, which is why I was tempted to
do that.

Perhaps not a good economy for potential subtle breaks in the next version.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Christopher Browne
On Thu, Oct 10, 2013 at 12:28 PM, Bruce Momjian  wrote:
> How do we handle the Python dependency, or is this all to be done in
> some other language?  I certainly am not ready to take on that job.

I should think it possible to reimplement it in C.  It was considerably
useful to start by implementing in Python, as that evades various sorts
of efforts needed in C (e.g. - memory allocation, picking a hash table
implementation), and allows someone to hack on it without needing to
run through a recompile every time something is touched.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Oct 10, 2013 at 2:36 PM, Peter Geoghegan  wrote:
> > On Thu, Oct 10, 2013 at 9:13 AM, Robert Haas  wrote:
> >> (2) Default to using System V shared memory.  If people want POSIX
> >> shared memory, let them change the default.
> >
> >> After some consideration, I think my vote is for option #2.
> >
> > Wouldn't that become the call of packagers?
> 
> Packagers can certainly override whatever we do, but we still need to
> make the buildfarm green again.

While I agree that making the buildfarm green is valuable, I really
wonder about a system where /dev/shm is busted.

Personally, I like Andrew's suggestion to test and set accordingly, with
the default being POSIX if it's available and a fall-back to SysV (maybe
with a warning..).  Going back to the situation where our default set-up
limits us to the ridiculously small SysV value would *really* suck; even
if we don't have any users today, we're certainly going to have some
soon and I don't think they'll be happy with a 24MB (or whatever) limit.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Andrew Dunstan


On 10/10/2013 02:35 PM, Robert Haas wrote:

On Thu, Oct 10, 2013 at 2:21 PM, Andrew Dunstan  wrote:

Other votes?  Other ideas?

5) test and set it in initdb.

Are you advocating for that option, or just calling out that it's
possible?  I'd say that's closely related to option #3, except at
initdb time rather than run-time - and it might be preferable to #3
for some of the same reasons discussed on the thread about tuning
work_mem, namely, that having it change from one postmaster lifetime
to the next might lead to user astonishment.




Mainly just to throw it into the mix, But like you I think it's probably 
a better option than #3 for the reason you give. It also has the 
advantage of keeping any probing code out of the backend.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Magnus Hagander
On Thu, Oct 10, 2013 at 8:46 PM, Robert Haas  wrote:
> On Thu, Oct 10, 2013 at 2:45 PM, Josh Berkus  wrote:
>> On 10/10/2013 11:41 AM, Robert Haas wrote:
>>> tunedb --available-memory=32GB
>>>
>>> ...and it will print out a set of proposed configuration settings.  If
>>> we want a mode that rewrites the configuration file, we could have:
>>>
>>> tunedb --available-memory=32GB --rewrite-config-file=$PATH
>>>
>>> ...but that might be overkill, at least for version 1.
>>
>> Given that we are talking currently about ALTER SYSTEM SET *and*
>> configuration directories, we should not be rewriting any existing
>> config file.  We should be adding an auto-generated one, or using ALTER
>> SYSTEM SET.
>>
>> In fact, why don't we just do this though ALTER SYSTEM SET?  add a
>> plpgsql function called pg_tune().
>
> That's another way to do it, for sure.  It does require the ability to
> log in to the database.  I imagine that could be less convenient in
> some scripting environments.

I think that would also make it much harder to automate for packagers.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.4] row level security

2013-10-10 Thread Marc Munro
On Wed, 2013-09-04 at 14:35 +, Robert Haas wrote:
> 
> On Fri, Aug 30, 2013 at 3:43 PM, Tom Lane  wrote:
> > I think it's entirely sensible to question whether we should reject
> (not
> > "hold up") RLS if it has major covert-channel problems.
> 
> We've already had this argument before, about the security_barrier
[ . . . ]

Sorry for following up on this so late, I have just been trying to catch
up with the mailing lists.

I am the developer of Veil, which this thread mentioned a number of
times.  I wanted to state/confirm a number of things:

Veil is not up to date wrt Postgres versions.  I didn't release a new
version for 9.2, and when no-one complained I figured no-one other than
me was using it.  I'll happily update it if anyone wants it.

Veil makes no attempt to avoid covert channels.  It can't.

Veil is a low-level toolset designed for optimising queries about
privileges.  It allows you to build RLS with reasonable performance, but
it is not in itself a solution for RLS.

I wish the Postgres RLS project well and look forward to its release in
Postgres 9.4.  

__
Marc




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] strange behavior of pg_trgm's similarity function

2013-10-10 Thread Fujii Masao
On Thu, Oct 10, 2013 at 11:00 PM, Merlin Moncure  wrote:
> On Thu, Oct 10, 2013 at 7:12 AM, Heikki Linnakangas
>  wrote:
>> On 10.10.2013 15:03, Fujii Masao wrote:
>>>
>>> Hi,
>>>
>>> The behavior of pg_trgm's similarity function seems strange. Is this
>>> intentional?
>>>
>>> I was thinking that the following three calls of the similarity function
>>> return
>>> the same number because the second argument is just the three characters
>>> contained in the first argument in every calls.
>>>
>>> =# SELECT similarity('12345', '123');
>>> =# SELECT similarity('12345', '234');
>>> =# SELECT similarity('12345', '345');
>>>
>>> But that's not true. Each returns the different number.
>>>
>>> =# SELECT similarity('12345', '123');
>>>   similarity
>>> 
>>> 0.428571
>>> (1 row)
>>>
>>> =# SELECT similarity('12345', '234');
>>>   similarity
>>> 
>>> 0.11
>>> (1 row)
>>>
>>> =# SELECT similarity('12345', '345');
>>>   similarity
>>> 
>>> 0.25
>>> (1 row)
>>>
>>> This happens because, for example, similarity('12345', '123') returns
>>> the similarity number of '**12345*' and '**123*' (* means the blank
>>> character),
>>> NOT '12345' and '123'. IOW, two and one blank characters are added into
>>> the heading and tailing of each argument, respectively. I wonder why
>>> pg_trgm's similarity function works in this way. We should change this
>>> so that no blank characters are added into the arguments?
>>
>>
>> Well, you could also argue that "11" and "22" are quite similar,
>> even though pg_trgm's similarity will not think so. It comes down to the
>> definition of similarity, and how well that definition matches your
>> intuition.
>>
>> FWIW, it feels right to me that a match in the beginning of a word is worth
>> more than one in the middle of a string. -1 on changing that.

Okay, understood.

> I'm not so sure that the assumption that leading trigrams should
> effectively weight > 3x is a good one to build into the library.
> However, the behavior is clearly documented and can't be changed.   I
> think you'd need to improvise an alternate set of "trigram ops" if you
> wanted to rig an alternate matching behavior.

Yeah, this makes sense.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Peter Eisentraut
On 10/10/13 11:45 AM, Bruce Momjian wrote:
> I think the big win for a tool would be to query the user about how they
> are going to be using Postgres, and that can then spit out values the
> user can add to postgresql.conf, or to a config file that is included at
> the end of postgresql.conf.

I think such a tool would actually make the initial experience worse for
many people.  Quick, how are you going to use your PostgreSQL server:

- OLTP
- web
- mixed

Uh, all of the above?  This sort of thing can quickly turn the "first 15
minutes" into the first 2 hours, as users are forced to analyze the
different settings, have second thoughts, wonder about how to change
them back, etc.  The fewer decisions people have to make initially, the
better.  The initdb phase already has too many required decisions that
can cause regret later (e.g., locale, encoding, checksums).



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Peter Geoghegan
On Wed, Oct 9, 2013 at 10:21 PM, Magnus Hagander  wrote:
>> Well, the Postgres defaults won't really change, because the default
>> vacuum_work_mem will be -1, which will have vacuum defer to
>> maintenance_work_mem. Under this scheme, vacuum only *prefers* to get
>> bound working memory size from vacuum_work_mem. If you don't like
>> vacuum_work_mem, you can just ignore it.

> While unrelated to the main topic of this thread, I think this is very
> important as well. I often have to advice people to remember to cap
> their maintenance_work_mem because of autovacuum, and to remember to
> re-tune maintenance_wokr_mem when they change the number of autovacuum
> workers.

I'll code that up at some point, then.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Robert Haas
On Thu, Oct 10, 2013 at 2:36 PM, Peter Geoghegan  wrote:
> On Thu, Oct 10, 2013 at 9:13 AM, Robert Haas  wrote:
>> (2) Default to using System V shared memory.  If people want POSIX
>> shared memory, let them change the default.
>
>> After some consideration, I think my vote is for option #2.
>
> Wouldn't that become the call of packagers?

Packagers can certainly override whatever we do, but we still need to
make the buildfarm green again.

> Wasn't there already some
> reason why it was advantageous for FreeBSD to continue to use System V
> shared memory?

Yes, but this code doesn't affect the main shared memory segment, so I
think that's sort of a separate point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Josh Berkus
On 10/10/2013 11:41 AM, Robert Haas wrote:
> tunedb --available-memory=32GB
> 
> ...and it will print out a set of proposed configuration settings.  If
> we want a mode that rewrites the configuration file, we could have:
> 
> tunedb --available-memory=32GB --rewrite-config-file=$PATH
> 
> ...but that might be overkill, at least for version 1.

Given that we are talking currently about ALTER SYSTEM SET *and*
configuration directories, we should not be rewriting any existing
config file.  We should be adding an auto-generated one, or using ALTER
SYSTEM SET.

In fact, why don't we just do this though ALTER SYSTEM SET?  add a
plpgsql function called pg_tune().

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Magnus Hagander
On Thu, Oct 10, 2013 at 8:41 PM, Robert Haas  wrote:
> On Thu, Oct 10, 2013 at 12:28 PM, Bruce Momjian  wrote:
>> On Thu, Oct 10, 2013 at 12:00:54PM -0400, Stephen Frost wrote:
>>> * Bruce Momjian (br...@momjian.us) wrote:
>>> > Well, I like the idea of initdb calling the tool, though the tool then
>>> > would need to be in C probably as we can't require python for initdb.
>>> > The tool would not address Robert's issue of someone increasing
>>> > shared_buffers on their own.
>>>
>>> I'm really not impressed with this argument.  Either the user is going
>>> to go and modify the config file, in which case I would hope that they'd
>>> at least glance around at what they should change, or they're going to
>>> move off PG because it's not performing well enough for them- which is
>>> really what I'm trying to avoid happening during the first 15m.
>>
>> Well, they aren't going around and looking at other parameters now or we
>> would not feel a need to auto-tune many of our defaults.
>>
>> How do we handle the Python dependency, or is this all to be done in
>> some other language?  I certainly am not ready to take on that job.
>
> I don't see why it can't be done in C.  The server is written in C,
> and so is initdb.  So no matter where we do this, it's gonna be in C.
> Where does Python enter into it?
>
> What I might propose is that we have add a new binary tunedb, maybe
> compiled out of the src/bin/initdb.c directory.  So you can say:
>
> initdb --available-memory=32GB
>
> ...and it will initialize the cluster with appropriate settings.  Or
> you can say:
>
> tunedb --available-memory=32GB
>
> ...and it will print out a set of proposed configuration settings.  If
> we want a mode that rewrites the configuration file, we could have:
>
> tunedb --available-memory=32GB --rewrite-config-file=$PATH
>
> ...but that might be overkill, at least for version 1.

I like this. And I agree that the edit-in-place might be overkill. But
then, if/when we get the ability to programatically modify the config
files, that's probably not a very complicated thing to add once the
rest is done.


>> One nice thing about a tool is that you can see your auto-tuned defaults
>> right away, while doing this in the backend, you have to start the
>> server to see the defaults.  I am not even sure how I could allow users
>> to preview their defaults for different available_mem settings.
>
> Yep, agreed.  And agreed that not being able to preview settings is a problem.

I'd even say it would be a *big* problem.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Fujii Masao
On Fri, Oct 11, 2013 at 2:49 AM, Alvaro Herrera
 wrote:
> Daniel Farina escribió:
>> On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao  wrote:
>
>> > In my test, I found that pg_stat_statements.total_time always indicates a 
>> > zero.
>> > I guess that the patch might handle pg_stat_statements.total_time wrongly.
>> >
>> > +values[i++] = DatumGetTimestamp(
>> > +instr_get_timestamptz(pgss->session_start));
>> > +values[i++] = DatumGetTimestamp(
>> > +instr_get_timestamptz(entry->introduced));
>> >
>> > These should be executed only when detected_version >= PGSS_TUP_LATEST?
>>
>> Yes. Oversight.
>
> Hmm, shouldn't this be conditional on a new PGSS_TUP_V1_2?

I was just thinking the same thing. Agreed.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Robert Haas
On Thu, Oct 10, 2013 at 2:45 PM, Josh Berkus  wrote:
> On 10/10/2013 11:41 AM, Robert Haas wrote:
>> tunedb --available-memory=32GB
>>
>> ...and it will print out a set of proposed configuration settings.  If
>> we want a mode that rewrites the configuration file, we could have:
>>
>> tunedb --available-memory=32GB --rewrite-config-file=$PATH
>>
>> ...but that might be overkill, at least for version 1.
>
> Given that we are talking currently about ALTER SYSTEM SET *and*
> configuration directories, we should not be rewriting any existing
> config file.  We should be adding an auto-generated one, or using ALTER
> SYSTEM SET.
>
> In fact, why don't we just do this though ALTER SYSTEM SET?  add a
> plpgsql function called pg_tune().

That's another way to do it, for sure.  It does require the ability to
log in to the database.  I imagine that could be less convenient in
some scripting environments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Peter Eisentraut
On 10/10/13 11:31 AM, Bruce Momjian wrote:
> Let me walk through the idea of adding an available_mem setting, that
> Josh suggested, and which I think addresses Robert's concern about
> larger shared_buffers and Windows servers.

I think this is a promising idea.  available_mem could even be set
automatically by packages.  And power users could just set available_mem
= -1 to turn off all the magic.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Peter Geoghegan
On Thu, Oct 10, 2013 at 11:43 AM, Robert Haas  wrote:
> On Thu, Oct 10, 2013 at 1:37 PM, Josh Berkus  wrote:
>> So, the question is: can we reasonably determine, at initdb time, how
>> much RAM the system has?
>
> As long as you are willing to write platform-dependent code, yes.

That's why trying to give the responsibility to a packager is compelling.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Peter Geoghegan
On Thu, Oct 10, 2013 at 11:41 AM, Robert Haas  wrote:
> I don't see why it can't be done in C.  The server is written in C,
> and so is initdb.  So no matter where we do this, it's gonna be in C.
> Where does Python enter into it?

I mentioned that pgtune was written in Python, but as you say that's
wholly incidental. An equivalent C program would only be slightly more
verbose.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-10-10 Thread Fujii Masao
On Fri, Oct 11, 2013 at 1:20 AM, Dimitri Fontaine
 wrote:
> Hi,
>
> I did a partial review of this patch, wherein I focused on the patch and
> the code itself, as I saw other contributors already did some testing on
> it, so that we know it applies cleanly and work to some good extend.

Thanks a lot!

> In full_page_writes_str() why are you returning "unrecognized" rather
> than doing an ELOG(ERROR, …) for this unexpected situation?

It's because the similar functions 'wal_level_str' and 'dbState' also return
'unrecognized' in the unexpected situation. I just implemented
full_page_writes_str()
in the same manner.

If we do an elog(ERROR) in that case, pg_xlogdump would fail to dump
the 'broken' (i.e., unrecognized fpw is set) WAL file. I think that some
users want to use pg_xlogdump to investigate the broken WAL file, so
doing an elog(ERROR) seems not good to me.

> The code switches to compression (or trying to) when the following
> condition is met:
>
> +   if (fpw <= FULL_PAGE_WRITES_COMPRESS)
> +   {
> +   rdt->data = CompressBackupBlock(page, BLCKSZ - 
> bkpb->hole_length, &(rdt->len));
>
> We have
>
> + typedef enum FullPageWritesLevel
> + {
> +   FULL_PAGE_WRITES_OFF = 0,
> +   FULL_PAGE_WRITES_COMPRESS,
> +   FULL_PAGE_WRITES_ON
> + } FullPageWritesLevel;
>
> + #define FullPageWritesIsNeeded(fpw)   (fpw >= FULL_PAGE_WRITES_COMPRESS)
>
> I don't much like using the <= test against and ENUM and I'm not sure I
> understand the intention you have here. It somehow looks like a typo and
> disagrees with the macro.

I thought that FPW should be compressed only when full_page_writes is
set to 'compress' or 'off'. That is, 'off' implies a compression. When it's set
to 'off', FPW is basically not generated, so there is no need to call
CompressBackupBlock() in that case. But only during online base backup,
FPW is forcibly generated even when it's set to 'off'. So I used the check
"fpw <= FULL_PAGE_WRITES_COMPRESS" there.

> What about using the FullPageWritesIsNeeded
> macro, and maybe rewriting the macro as
>
> #define FullPageWritesIsNeeded(fpw) \
>(fpw == FULL_PAGE_WRITES_COMPRESS || fpw == FULL_PAGE_WRITES_ON)

I'm OK to change the macro so that the <= test is not used.

> Also, having "on" imply "compress" is a little funny to me. Maybe we
> should just finish our testing and be happy to always compress the full
> page writes. What would the downside be exactly (on buzy IO system
> writing less data even if needing more CPU will be the right trade-off).

"on" doesn't imply "compress". When full_page_writes is set to "on",
FPW is not compressed at all.

> I like that you're checking the savings of the compressed data with
> respect to the uncompressed data and cancel the compression if there's
> no gain. I wonder if your test accounts for enough padding and headers
> though given the results we saw in other tests made in this thread.

I'm afraid that the patch has only limited effects in WAL reduction and
performance improvement unless the database contains highly-compressible
data like large blank characters column. It really depends on the contents
of the database. So, obviously FPW compression should not be the default.
Maybe we can treat it as just tuning knob.

> Why do we have both the static function full_page_writes_str() and the
> macro FullPageWritesStr, with two different implementations issuing
> either "true" and "false" or "on" and "off"?

First I was thinking to use "on" and "off" because they are often used
as the setting value of boolean GUC. But unfortunately the existing
pg_xlogdump uses "true" and "false" to show the value of full_page_writes
in WAL. To avoid breaking the backward compatibility, I implmented
the "true/false" version of function. I'm really not sure how many people
want such a compatibility of pg_xlogdump, though.

> !   unsignedhole_offset:15, /* number of bytes before "hole" */
> !   flags:2,/* state of a backup 
> block, see below */
> !   hole_length:15; /* number of bytes in "hole" 
> */
>
> I don't understand that. I wanted to use that patch as a leverage to
> smoothly discover the internals of our WAL system but won't have the
> time to do that here.

We need the flag indicating whether each FPW is compressed or not.
If no such a flag exists in WAL, the standby cannot determine whether
it should decompress each FPW or not, and then cannot replay
the WAL containing FPW properly. That is, I just used a 'space' in
the header of FPW to have such a flag.

> That said, I don't even know that C syntax.

The struct 'ItemIdData' uses the same C syntax.

> + #define BKPBLOCK_UNCOMPRESSED 0   /* uncompressed */
> + #define BKPBLOCK_COMPRESSED   1   /* comperssed */
>
> There's a typo in the comment above.

Yep.

>>   [time required to replay WAL generated during running pgbench]
>>   61s (on)  120

Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Robert Haas
On Thu, Oct 10, 2013 at 1:37 PM, Josh Berkus  wrote:
> So, the question is: can we reasonably determine, at initdb time, how
> much RAM the system has?

As long as you are willing to write platform-dependent code, yes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Robert Haas
On Thu, Oct 10, 2013 at 12:28 PM, Bruce Momjian  wrote:
> On Thu, Oct 10, 2013 at 12:00:54PM -0400, Stephen Frost wrote:
>> * Bruce Momjian (br...@momjian.us) wrote:
>> > Well, I like the idea of initdb calling the tool, though the tool then
>> > would need to be in C probably as we can't require python for initdb.
>> > The tool would not address Robert's issue of someone increasing
>> > shared_buffers on their own.
>>
>> I'm really not impressed with this argument.  Either the user is going
>> to go and modify the config file, in which case I would hope that they'd
>> at least glance around at what they should change, or they're going to
>> move off PG because it's not performing well enough for them- which is
>> really what I'm trying to avoid happening during the first 15m.
>
> Well, they aren't going around and looking at other parameters now or we
> would not feel a need to auto-tune many of our defaults.
>
> How do we handle the Python dependency, or is this all to be done in
> some other language?  I certainly am not ready to take on that job.

I don't see why it can't be done in C.  The server is written in C,
and so is initdb.  So no matter where we do this, it's gonna be in C.
Where does Python enter into it?

What I might propose is that we have add a new binary tunedb, maybe
compiled out of the src/bin/initdb.c directory.  So you can say:

initdb --available-memory=32GB

...and it will initialize the cluster with appropriate settings.  Or
you can say:

tunedb --available-memory=32GB

...and it will print out a set of proposed configuration settings.  If
we want a mode that rewrites the configuration file, we could have:

tunedb --available-memory=32GB --rewrite-config-file=$PATH

...but that might be overkill, at least for version 1.

> One nice thing about a tool is that you can see your auto-tuned defaults
> right away, while doing this in the backend, you have to start the
> server to see the defaults.  I am not even sure how I could allow users
> to preview their defaults for different available_mem settings.

Yep, agreed.  And agreed that not being able to preview settings is a problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Peter Geoghegan
On Thu, Oct 10, 2013 at 9:13 AM, Robert Haas  wrote:
> (2) Default to using System V shared memory.  If people want POSIX
> shared memory, let them change the default.

> After some consideration, I think my vote is for option #2.

Wouldn't that become the call of packagers? Wasn't there already some
reason why it was advantageous for FreeBSD to continue to use System V
shared memory?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Robert Haas
On Thu, Oct 10, 2013 at 2:21 PM, Andrew Dunstan  wrote:
>> Other votes?  Other ideas?
>
> 5) test and set it in initdb.

Are you advocating for that option, or just calling out that it's
possible?  I'd say that's closely related to option #3, except at
initdb time rather than run-time - and it might be preferable to #3
for some of the same reasons discussed on the thread about tuning
work_mem, namely, that having it change from one postmaster lifetime
to the next might lead to user astonishment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Josh Berkus

> It also doesn't address my point that, if we are worst-case-scenario
> default-setting, we're going to end up with defaults which aren't
> materially different from the current defaults.  In which case, why even
> bother with this whole exercise?

Oh, and let me reiterate: the way to optimize work_mem is through an
admission control mechanism.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Andrew Dunstan


On 10/10/2013 12:13 PM, Robert Haas wrote:

Since, as has been previously discussed in this forum on multiple
occasions [citation needed], the default System V shared memory limits
are absurdly low on many systems, the dynamic shared memory patch
defaults to POSIX shared memory, which has often been touted as a
superior alternative [citation needed].  Unfortunately, the buildfarm
isn't entirely happy with this decision.  On buildfarm member anole
(HP-UX B.11.31), allocation of dynamic shared memory fails with a
"Permission denied" error, and on smew (Debian GNU/Linux 6.0), it
fails with "Function not implemented", which according to a forum
post[1] I found probably indicates that /dev/shm doesn't mount a tmpfs
on that box.

What shall we do about this?  I see a few options.

(1) Define the issue as "not our problem".  IOW, as of now, if you
want to use PostgreSQL, you've got to either make POSIX shared memory
work on your machine, or change the GUC that selects the type of
dynamic shared memory used.

(2) Default to using System V shared memory.  If people want POSIX
shared memory, let them change the default.

(3) Add a new setting that auto-probes for a type of shared memory
that works.  Try POSIX first, then System V.  Maybe even fall back to
mmap'd files if neither of those works.

(4) Remove the option to use POSIX shared memory.  System V FTW!

After some consideration, I think my vote is for option #2.  Option #1
seems too user-hostile, especially for a facility that has no in-core
users yet, though I can imagine we might want to go that way
eventually, especially if we at some point try to dump System V shared
memory altogether, as has been proposed.  Option #4 seems sad; we know
that System V shared memory limits are a problem for plenty of people,
so it'd be a shame not to have a way to use the POSIX facilities if
they're available.  Option #3 is fine as far as it goes, but it adds a
significant amount of complexity I'd rather not deal with.

Other votes?  Other ideas?



5) test and set it in initdb.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Josh Berkus
Bruce,

>> That's way low, and frankly it's not worth bothering with this if all
>> we're going to get is an incremental increase.  In that case, let's just
>> set the default to 4MB like Robert suggested.
> 
> Uh, well, 100 backends at 6MB gives us 600MB, and if each backend uses
> 3x work_mem, that gives us 1.8GB for total work_mem.  This was based on
> Andrew's concerns about possible over-commit of work_mem.  I can of
> course adjust that.

That's worst-case-scenario planning -- the 3X work-mem per backend was:
a) Solaris and
b) data warehousing

In a normal OLTP application each backend averages something like 0.25 *
work_mem, since many queries use no work_mem at all.

It also doesn't address my point that, if we are worst-case-scenario
default-setting, we're going to end up with defaults which aren't
materially different from the current defaults.  In which case, why even
bother with this whole exercise?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Claudio Freire
On Thu, Oct 10, 2013 at 1:13 PM, Robert Haas  wrote:
> (1) Define the issue as "not our problem".  IOW, as of now, if you
> want to use PostgreSQL, you've got to either make POSIX shared memory
> work on your machine, or change the GUC that selects the type of
> dynamic shared memory used.
>
> (2) Default to using System V shared memory.  If people want POSIX
> shared memory, let them change the default.
>
> (3) Add a new setting that auto-probes for a type of shared memory
> that works.  Try POSIX first, then System V.  Maybe even fall back to
> mmap'd files if neither of those works.
>
> (4) Remove the option to use POSIX shared memory.  System V FTW!
>
> After some consideration, I think my vote is for option #2.  Option #1
> seems too user-hostile, especially for a facility that has no in-core
> users yet, though I can imagine we might want to go that way
> eventually, especially if we at some point try to dump System V shared
> memory altogether, as has been proposed.  Option #4 seems sad; we know
> that System V shared memory limits are a problem for plenty of people,
> so it'd be a shame not to have a way to use the POSIX facilities if
> they're available.  Option #3 is fine as far as it goes, but it adds a
> significant amount of complexity I'd rather not deal with.
>
> Other votes?  Other ideas?


I believe option 2 is not only good for now, but also a necessary
previous step in the way to option 3, which I believe should be the
goal.

So, ship a version with option 2, then implement 3, and make it the
default when enough people (using option 2) have successfully tested
pg's implementation of POSIX shared memory.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Jeff Janes
On Wed, Oct 9, 2013 at 8:06 AM, Andrew Dunstan  wrote:

>
> On 10/09/2013 10:45 AM, Bruce Momjian wrote:
>
>> On Wed, Oct  9, 2013 at 04:40:38PM +0200, Pavel Stehule wrote:
>>
>>>  Effectively, if every session uses one full work_mem, you end up
>>> with
>>>  total work_mem usage equal to shared_buffers.
>>>
>>>  We can try a different algorithm to scale up work_mem, but it seems
>>> wise
>>>  to auto-scale it up to some extent based on shared_buffers.
>>>
>>>
>>> In my experience a optimal value of work_mem depends on data and load,
>>> so I
>>> prefer a work_mem as independent parameter.
>>>
>> But it still is an independent parameter.  I am just changing the default.
>>
>>
> The danger with work_mem especially is that setting it too high can lead
> to crashing postgres or your system at some stage down the track, so
> autotuning it is kinda dangerous, much more dangerous than autotuning
> shared buffers.
>


Is this common to see?  I ask because in my experience, having 100
connections all decide to do large sorts simultaneously is going to make
the server fall over, regardless of whether it tries to do them in memory
(OOM) or whether it does them with tape sorts (stuck spin locks, usually).


>
> The assumption that each connection won't use lots of work_mem is also
> false, I think, especially in these days of connection poolers.
>

I don't follow that.  Why would using a connection pooler change the
multiples of work_mem that a connection would use?

Cheers,

Jeff


[HACKERS] Re: dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread David Johnston
Robert Haas wrote
> Unfortunately, the buildfarm
> isn't entirely happy with this decision.  On buildfarm member anole
> (HP-UX B.11.31), allocation of dynamic shared memory fails with a
> "Permission denied" error, and on smew (Debian GNU/Linux 6.0), it
> fails with "Function not implemented", which according to a forum
> post[1] I found probably indicates that /dev/shm doesn't mount a tmpfs
> on that box.
> 
> What shall we do about this?  I see a few options.

Is this something that rightly falls into being a distro/package specific
setting?  If so then the first goal should be to ensure the maximum number
of successful basic installation scenarios - namely someone installing
PostgreSQL and connect to the running postgres database without encountering
an error.  

As a default I would presume the current System V behavior is sufficient to
accomplish this goal.  If package maintainers can then guarantee that
changing the default will improve the user experience they should be
supported and encouraged to do so but if they are at all unsure they should
leave the default in place.  

As long as a new user is able to get a running database on their machine
if/when they run up against the low defaults of System V memory they will at
least be able to focus on that single problem as opposed to having a failed
initial install and being unsure exactly what they may have done wrong.

Thus option # 2 seems sufficient.  I do think that having some kind of
shared-memory-manager utility could have value but I'd rather see that be a
standalone utility as opposed to something magical done inside the bowels of
the database.  While probably harder to code and learn such a utility would
provide for a much greater UX if implemented well.

David J.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/dynamic-shared-memory-wherein-I-am-punished-for-good-intentions-tp5774055p5774080.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Stephen Frost
Bruce,

* Bruce Momjian (br...@momjian.us) wrote:
> On Thu, Oct 10, 2013 at 12:00:54PM -0400, Stephen Frost wrote:
> > I'm really not impressed with this argument.  Either the user is going
> > to go and modify the config file, in which case I would hope that they'd
> > at least glance around at what they should change, or they're going to
> > move off PG because it's not performing well enough for them- which is
> > really what I'm trying to avoid happening during the first 15m.
> 
> Well, they aren't going around and looking at other parameters now or we
> would not feel a need to auto-tune many of our defaults.

I think you're confusing things here.  There's a huge difference between
"didn't configure anything and got our defaults" and "went and changed
only one thing in postgresql.conf".  For one thing, we have a ton of the
former.  Perhaps there are some of the latter as well, but I would argue
it's a pretty small group.

> How do we handle the Python dependency, or is this all to be done in
> some other language?  I certainly am not ready to take on that job.

I agree that we can't add a Python (or really, perl) dependency, but I
don't think there's anything terribly complicated in what pgtune is
doing that couldn't be pretty easily done in C..

> One nice thing about a tool is that you can see your auto-tuned defaults
> right away, while doing this in the backend, you have to start the
> server to see the defaults.  I am not even sure how I could allow users
> to preview their defaults for different available_mem settings.

Agreed.

> > Actually, it *is* good, as Magnus pointed out.  Changing a completely
> > unrelated parameter shouldn't make all of your plans suddenly change.
> > This is mollified, but only a bit, if you have a GUC that's explicitly
> > "this changes other GUCs", but I'd much rather have a tool that can do a
> > better job to begin with and which helps the user understand what
> > parameters are available to change and why there's more than one.
> 
> Well, the big question is how many users are going to use the tool, as
> we are not setting this up for experts, but for novices.

The goal would be to have the distros and/or initdb use it for the
initial configuration..  Perhaps by using debconf or similar to ask the
user, perhaps by just running it and letting it do whatever it wants.

> I think one big risk is someone changing shared_buffers and not having
> an accurate available_memory.  That might lead to some very inaccurate
> defaults.  Also, what happens if available_memory is not supplied at
> all?  Do we auto-tune just from shared_buffers, or not autotune at all,
> and then what are the defaults?  We could certainly throw an error if
> shared_buffers > available_memory.  We might just ship with
> available_memory defaulting to 256MB and auto-tune everything from
> there.

These questions are less of an issue if we simply don't have this
"available_memory" GUC (which strikes me as just adding more confusion
for users anyway, not less).  If no '--available-memory' (or whatever)
option is passed to initdb then we should have it assume some default,
yes, but my view on what that is depends on what the specific results
are.  It sounds like --avail-memory=256MB would end up setting things to
about what we have now for defaults, which is alright for
shared_buffers, imv, but not for a default work_mem (1MB is *really*
small...).

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Alvaro Herrera
Daniel Farina escribió:
> On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao  wrote:

> > In my test, I found that pg_stat_statements.total_time always indicates a 
> > zero.
> > I guess that the patch might handle pg_stat_statements.total_time wrongly.
> >
> > +values[i++] = DatumGetTimestamp(
> > +instr_get_timestamptz(pgss->session_start));
> > +values[i++] = DatumGetTimestamp(
> > +instr_get_timestamptz(entry->introduced));
> >
> > These should be executed only when detected_version >= PGSS_TUP_LATEST?
> 
> Yes. Oversight.

Hmm, shouldn't this be conditional on a new PGSS_TUP_V1_2?  I mean, if
later somebody patches pgss to have a PGSS_TUP_V2 or V1_3 and that
becomes latest, somebody running the current definition with the updated
.so will no longer get these values.  Or is the intention that
PGSS_TUP_LATEST will never be updated again, and future versions will
get PGSS_TUP_REALLY_LATEST and PGSS_TUP_REALLY_LATEST_HONEST and so on?
I mean, surely we can always come up with new symbols to use, but it
seems hard to follow.

There's one other use of PGSS_TUP_LATEST here which I think should also
be changed to >= SOME_SPECIFIC_VERSION:

+   if (detected_version >= PGSS_TUP_LATEST)
+   {
+   uint64 qid = pgss->private_stat_session_key;
+ 
+   qid ^= (uint64) entry->query_id;
+   qid ^= ((uint64) entry->query_id) << 32;
+ 
+   values[i++] = Int64GetDatumFast(qid);
+   }


This paragraph reads a bit strange to me:

+  A statistics session is the time period when statistics are gathered by 
statistics collector 
+  without being reset. So a statistics session continues across normal 
shutdowns, 
+  but whenever statistics are reset, like during a crash or upgrade, a new 
time period 
+  of statistics collection commences i.e. a new statistics session. 
+  The query_id value generation is linked to statistics session to emphasize 
the fact 
+  that whenever statistics are reset,the query_id for the same queries will 
also change.

"time period when"?  Shouldn't that be "time period during which".
Also, doesn't a new "statistics session" start when a stats reset is
invoked by the user?  The bit after "commences" appears correct (to me,
not a native by any means) but seems also a bit strange.

I just noticed that the table describing the output indicates that
session_start and introduced are of type text, but the SQL defines
timestamptz.


The instr_time thingy being used for these times maps to
QueryPerformanceCounter() on Windows, and I'm not sure how useful this
is for long-term time tracking; see
http://stackoverflow.com/questions/5296990/queryperformancecounter-and-overflows#5297163
for instance.  I think it'd be better to use TimestampTz and
GetCurrentTimestamp() for this.

Just noticed that you changed the timer to struct Instrumentation.  Not
really sure about that change.  Since you seem to be using only the
start time and counter, wouldn't it be better to store only those?
Particularly unsure about passing INSTRUMENT_ALL to InstrAlloc().

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Josh Berkus
All,

We can't reasonably require user input at initdb time, because most
users don't run initdb by hand -- their installer does it for them.  So
any "tuning" which initdb does needs to be fully automated.

So, the question is: can we reasonably determine, at initdb time, how
much RAM the system has?

I also think this is where the much-debated ALTER SYSTEM SET suddenly
becomes valuable.  With it, it's reasonable to run a "tune-up" tool on
the client side.  I do think it's reasonable to tell a user:

"Just installed PostgreSQL?  Run this command to tune your system:"

Mind you, the tuneup tool I'm working on makes use of Python,
configuration directory, and Jinga2, so it's not even relevant to the
preceeding.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-10-10 Thread Fujii Masao
On Wed, Oct 9, 2013 at 1:35 PM, Haribabu kommi
 wrote:
> On 08 October 2013 18:42 KONDO Mitsumasa wrote:
>>(2013/10/08 20:13), Haribabu kommi wrote:
>>> I will test with sync_commit=on mode and provide the test results.
>>OK. Thanks!
>
> Pgbench test results with synchronous_commit mode as on.

Thanks!

> Thread-1  
>   Threads-2
> Head code   FPW compressHead 
> code   FPW compress
> Pgbench-org 5min138(0.24GB) 131(0.04GB)   
>   160(0.28GB) 163(0.05GB)
> Pgbench-1000 5min   140(0.29GB) 128(0.03GB)   
>   160(0.33GB) 162(0.02GB)
> Pgbench-org 15min   141(0.59GB) 136(0.12GB)   
>   160(0.65GB) 162(0.14GB)
> Pgbench-1000 15min  138(0.81GB) 134(0.11GB) 
> 159(0.92GB) 162(0.18GB)
>
> Pgbench-org - original pgbench
> Pgbench-1000 - changed pgbench with a record size of 1000.

This means that you changed the data type of pgbench_accounts.filler
to char(1000)?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-10-10 Thread Fujii Masao
On Tue, Oct 8, 2013 at 10:07 PM, KONDO Mitsumasa
 wrote:
> Hi,
>
> I tested dbt-2 benchmark in single instance and synchronous replication.

Thanks!

> Unfortunately, my benchmark results were not seen many differences...
>
>
> * Test server
>Server: HP Proliant DL360 G7
>CPU:Xeon E5640 2.66GHz (1P/4C)
>Memory: 18GB(PC3-10600R-9)
>Disk:   146GB(15k)*4 RAID1+0
>RAID controller: P410i/256MB
>
> * Result
> ** Single instance**
> | NOTPM | 90%tile | Average | S.Deviation
> +---+-+-+-
> no-patched  | 3322.93   | 20.469071   | 5.882   | 10.478
> patched | 3315.42   | 19.086105   | 5.669   | 9.108
>
>
> ** Synchronous Replication **
> | NOTPM | 90%tile | Average | S.Deviation
> +---+-+-+-
> no-patched  | 3275.55   | 21.332866   | 6.072   | 9.882
> patched | 3318.82   | 18.141807   | 5.757   | 9.829
>
> ** Detail of result
> http://pgstatsinfo.projects.pgfoundry.org/DBT-2_Fujii_patch/
>
>
> I set full_page_write = compress with Fujii's patch in DBT-2. But it does
> not
> seems to effect for eleminating WAL files.

Could you let me know how much WAL records were generated
during each benchmark?

I think that this benchmark result clearly means that the patch
has only limited effects in the reduction of WAL volume and
the performance improvement unless the database contains
highly-compressible data like pgbench_accounts.filler. But if
we can use other compression algorithm, maybe we can reduce
WAL volume very much. I'm not sure what algorithm is good
for WAL compression, though.

It might be better to introduce the hook for compression of FPW
so that users can freely use their compression module, rather
than just using pglz_compress(). Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Bruce Momjian
On Thu, Oct 10, 2013 at 10:20:02AM -0700, Josh Berkus wrote:
> On 10/09/2013 02:15 PM, Bruce Momjian wrote:
> > and for shared_buffers of 2GB:
> > 
> > test=> show shared_buffers;
> >  shared_buffers
> > 
> >  2GB
> > (1 row)
> > 
> > test=> SHOW work_mem;
> >  work_mem
> > --
> >  6010kB
> > (1 row)
> 
> Huh?  Only 6MB work_mem for 8GB RAM?  How'd you get that?

> That's way low, and frankly it's not worth bothering with this if all
> we're going to get is an incremental increase.  In that case, let's just
> set the default to 4MB like Robert suggested.

Uh, well, 100 backends at 6MB gives us 600MB, and if each backend uses
3x work_mem, that gives us 1.8GB for total work_mem.  This was based on
Andrew's concerns about possible over-commit of work_mem.  I can of
course adjust that.

Consider 8GB of shared memory is 21MB.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Josh Berkus
On 10/09/2013 02:15 PM, Bruce Momjian wrote:
> and for shared_buffers of 2GB:
> 
>   test=> show shared_buffers;
>shared_buffers
>   
>2GB
>   (1 row)
>   
>   test=> SHOW work_mem;
>work_mem
>   --
>6010kB
>   (1 row)

Huh?  Only 6MB work_mem for 8GB RAM?  How'd you get that?

That's way low, and frankly it's not worth bothering with this if all
we're going to get is an incremental increase.  In that case, let's just
set the default to 4MB like Robert suggested.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Josh Berkus

> Because 'maintenance' operations were rarer, so we figured we could use
> more memory in those cases.

Once we brought Autovacuum into core, though, we should have changed that.

However, I agree with Magnus that the simple course is to have an
autovacuum_worker_memory setting which overrides maint_work_mem if set.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Bruce Momjian
On Thu, Oct 10, 2013 at 12:59:39PM -0400, Andrew Dunstan wrote:
> 
> On 10/10/2013 12:45 PM, Bruce Momjian wrote:
> >On Thu, Oct 10, 2013 at 12:39:04PM -0400, Andrew Dunstan wrote:
> >>On 10/10/2013 12:28 PM, Bruce Momjian wrote:
> >>>How do we handle the Python dependency, or is this all to be done in
> >>>some other language?  I certainly am not ready to take on that job.
> >>
> >>Without considering any wider question here, let me just note this:
> >>
> >>Anything that can be done in this area in Python should be doable in
> >>Perl fairly simply. I don't think we should be adding any Python
> >>dependencies. For good or ill Perl has been used for pretty much all
> >>our complex scripting (pgindent, MSVC build system etc.)
> >Yes, but this is a run-time requirement, not build-time, and we have not
> >used Perl in that regard.
> >
> 
> 
> Nor Python. If we want to avoid added dependencies, we would need to use C.

Yeah.  :-(  My crazy idea would be, because setting setting
available_mem without a restart would not be supported, to allow the
backend to output suggestions, e.g.:

test=> SHOW available_mem = '24GB';
Auto-tuning values:
shared_buffers = 6GB
work_mem = 10MB
...
ERROR:  parameter "available_mem" cannot be changed without restarting 
the server

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Fujii Masao
On Fri, Oct 11, 2013 at 12:19 AM, Daniel Farina  wrote:
> Probably.
>
> The idea is that without those fields it's, to wit, impossible to
> explain non-monotonic movement in metrics of those queries for precise
> use in tools that insist on monotonicity of the fields, which are all
> cumulative to date I think.
>
> pg_stat_statements_reset() or crashing loses the session, so one
> expects "ncalls" may decrease.  In addition, a query that is bouncing
> in and out of the hash table will have its statistics be lost, so its
> statistics will also decrease.  This can cause un-resolvable artifact
> in analysis tools.
>
> The two fields allow for precise understanding of when the statistics
> for a given query_id are continuous since the last time a program
> inspected it.

Thanks for elaborating them! Since 'introduced' is reset even when
the statistics is reset, maybe we can do without 'session_start' for
that purpose?

>> +/*
>> + * The role calling this function is unable to see
>> + * sensitive aspects of this tuple.
>> + *
>> + * Nullify everything except the "insufficient privilege"
>> + * message for this entry
>> + */
>> +memset(nulls, 1, sizeof nulls);
>> +
>> +nulls[i]  = 0;
>> +values[i] = CStringGetTextDatum("");
>>
>> Why do we need to hide *all* the fields in pg_stat_statements, when
>> it's accessed by improper user? This is a big change of pg_stat_statements
>> behavior, and it might break the compatibility.
>
> It seems like an information leak that grows more major if query_id is
> exposed and, at any point, one can determine the query_id for a query
> text.

So hiding only query and query_id is enough?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Andrew Dunstan


On 10/10/2013 12:45 PM, Bruce Momjian wrote:

On Thu, Oct 10, 2013 at 12:39:04PM -0400, Andrew Dunstan wrote:

On 10/10/2013 12:28 PM, Bruce Momjian wrote:

How do we handle the Python dependency, or is this all to be done in
some other language?  I certainly am not ready to take on that job.


Without considering any wider question here, let me just note this:

Anything that can be done in this area in Python should be doable in
Perl fairly simply. I don't think we should be adding any Python
dependencies. For good or ill Perl has been used for pretty much all
our complex scripting (pgindent, MSVC build system etc.)

Yes, but this is a run-time requirement, not build-time, and we have not
used Perl in that regard.




Nor Python. If we want to avoid added dependencies, we would need to use C.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [BUGS] BUG #7873: pg_restore --clean tries to drop tables that don't exist

2013-10-10 Thread Andrew Dunstan


On 09/19/2013 06:12 PM, Pavel Stehule wrote:



2013/9/16 Satoshi Nagayasu mailto:sn...@uptime.jp>>







I'm looking at this patch, and I have a question here.

Should "DROP TRIGGER IF EXISTS" ignore error for non-existing trigger
and non-existing table? Or just only for non-existing trigger?


My opinion is so, both variants should be ignored - it should be fully 
fault tolerant in this use case.






This thread seems to have gone cold, but I'm inclined to agree with 
Pavel. If the table doesn't exist, neither does the trigger, and the 
whole point of the 'IF EXISTS' variants is to provide the ability to 
issue DROP commands that don't fail if their target is missing.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Bruce Momjian
On Thu, Oct 10, 2013 at 12:39:04PM -0400, Andrew Dunstan wrote:
> 
> On 10/10/2013 12:28 PM, Bruce Momjian wrote:
> >
> >How do we handle the Python dependency, or is this all to be done in
> >some other language?  I certainly am not ready to take on that job.
> 
> 
> Without considering any wider question here, let me just note this:
> 
> Anything that can be done in this area in Python should be doable in
> Perl fairly simply. I don't think we should be adding any Python
> dependencies. For good or ill Perl has been used for pretty much all
> our complex scripting (pgindent, MSVC build system etc.)

Yes, but this is a run-time requirement, not build-time, and we have not
used Perl in that regard.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Andrew Dunstan


On 10/10/2013 12:28 PM, Bruce Momjian wrote:


How do we handle the Python dependency, or is this all to be done in
some other language?  I certainly am not ready to take on that job.



Without considering any wider question here, let me just note this:

Anything that can be done in this area in Python should be doable in 
Perl fairly simply. I don't think we should be adding any Python 
dependencies. For good or ill Perl has been used for pretty much all our 
complex scripting (pgindent, MSVC build system etc.)



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add record_image_ops opclass for matview concurrent refresh.

2013-10-10 Thread Kevin Grittner
Kevin Grittner  wrote:

> Add record_image_ops opclass for matview concurrent refresh.

The buildfarm pointed out that I had not handled pass-by-value data
types correctly.  Fixed based on advice from Robert.  We'll see
whether that clears up the part of the buildfarm breakage
attributed to this patch.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index 0bcbb5d..7925ce2 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -1464,9 +1464,8 @@ record_image_cmp(PG_FUNCTION_ARGS)
 			}
 			else if (tupdesc1->attrs[i1]->attbyval)
 			{
-cmpresult = memcmp(&(values1[i1]),
-   &(values2[i2]),
-   tupdesc1->attrs[i1]->attlen);
+if (values1[i1] != values2[i2])
+	cmpresult = (values1[i1] < values2[i2]) ? -1 : 1;
 			}
 			else
 			{
@@ -1695,9 +1694,7 @@ record_image_eq(PG_FUNCTION_ARGS)
 			}
 			else if (tupdesc1->attrs[i1]->attbyval)
 			{
-result = (memcmp(&(values1[i1]),
- &(values2[i2]),
- tupdesc1->attrs[i1]->attlen) == 0);
+result = (values1[i1] == values2[i2]);
 			}
 			else
 			{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for reserved connections for replication users

2013-10-10 Thread Gibheer
On Thu, 10 Oct 2013 09:55:24 +0530
Amit Kapila  wrote:

> On Thu, Oct 10, 2013 at 3:17 AM, Gibheer 
> wrote:
> > On Mon, 7 Oct 2013 11:39:55 +0530
> > Amit Kapila  wrote:
> >> Robert Haas wrote:
> >> On Mon, Aug 5, 2013 at 2:04 AM, Andres Freund
> >>  wrote:
> >> >>> Hmm.  It seems like this match is making MaxConnections no
> >> >>> longer mean the maximum number of connections, but rather the
> >> >>> maximum number of non-replication connections.  I don't think
> >> >>> I support that definitional change, and I'm kinda surprised if
> >> >>> this is sufficient to implement it anyway (e.g. see
> >> >>> InitProcGlobal()).
> >> >
> >> >> I don't think the implementation is correct, but why don't you
> >> >> like the definitional change? The set of things you can do from
> >> >> replication connections are completely different from a normal
> >> >> connection. So using separate "pools" for them seems to make
> >> >> sense. That they end up allocating similar internal data seems
> >> >> to be an implementation detail to me.
> >>
> >> > Because replication connections are still "connections".  If I
> >> > tell the system I want to allow 100 connections to the server,
> >> > it should allow 100 connections, not 110 or 95 or any other
> >> > number.
> >>
> >> I think that to reserve connections for replication, mechanism
> >> similar to superuser_reserved_connections be used rather than auto
> >> vacuum workers or background workers.
> >> This won't change the definition of MaxConnections. Another thing
> >> is that rather than introducing new parameter for replication
> >> reserved connections, it is better to use max_wal_senders as it
> >> can serve the purpose.
> >>
> >> Review for replication_reserved_connections-v2.patch, considering
> >> we are going to use mechanism similar to
> >> superuser_reserved_connections and won't allow definition of
> >> MaxConnections to change.
> >>
> >> 1. /* the extra unit accounts for the autovacuum launcher */
> >>   MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
> >> - + max_worker_processes;
> >> + + max_worker_processes + max_wal_senders;
> >>
> >> Above changes are not required.
> >>
> >> 2.
> >> + if ((!am_superuser && !am_walsender) &&
> >>   ReservedBackends > 0 &&
> >>   !HaveNFreeProcs(ReservedBackends))
> >>
> >> Change the check as you have in patch-1 for
> >> ReserveReplicationConnections
> >>
> >> if (!am_superuser &&
> >> (max_wal_senders > 0 || ReservedBackends > 0) &&
> >> !HaveNFreeProcs(max_wal_senders + ReservedBackends))
> >> ereport(FATAL,
> >> (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
> >> errmsg("remaining connection slots are reserved for replication and
> >> superuser connections")));
> >>
> >> 3. In guc.c, change description of max_wal_senders similar to
> >> superuser_reserved_connections at below place:
> >>{"max_wal_senders", PGC_POSTMASTER, REPLICATION_SENDING,
> >> gettext_noop("Sets the maximum number of simultaneously running WAL
> >> sender processes."),
> >>
> >> 4. With this approach, there is no need to change
> >> InitProcGlobal(), as it is used to keep track bgworkerFreeProcs
> >> and autovacFreeProcs, for others it use freeProcs.
> >>
> >> 5. Below description in config.sgml needs to be changed for
> >> superuser_reserved_connections to include the effect of
> >> max_wal_enders in reserved connections.
> >> Whenever the number of active concurrent connections is at least
> >> max_connections minus superuser_reserved_connections, new
> >> connections will be accepted only for superusers, and no new
> >> replication connections will be accepted.
> >>
> >> 6. Also similar description should be added to max_wal_senders
> >> configuration parameter.
> >>
> >> 7. Below comment needs to be updated, as it assumes only superuser
> >> reserved connections as part of MaxConnections limit.
> >>/*
> >>  * ReservedBackends is the number of backends reserved for
> >> superuser use.
> >>  * This number is taken out of the pool size given by MaxBackends
> >> so
> >>  * number of backend slots available to non-superusers is
> >>  * (MaxBackends - ReservedBackends).  Note what this really means
> >> is
> >>  * "if there are <= ReservedBackends connections available, only
> >> superusers
> >>  * can make new connections" --- pre-existing superuser connections
> >> don't
> >>  * count against the limit.
> >>  */
> >> int ReservedBackends;
> >>
> >> 8. Also we can add comment on top of definition for max_wal_senders
> >> similar to ReservedBackends.
> >>
> >>
> >> With Regards,
> >> Amit Kapila.
> >> EnterpriseDB: http://www.enterprisedb.com
> >>
> >
> > Hi,
> >
> > I took the time and reworked the patch with the feedback till now.
> > Thank you very much Amit!
> 
>Is there any specific reason why you moved this patch to next
> CommitFest?
> 
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
> 

Mike asked me about the status of the patch a couple days back and I
didn't think I would be able to work on the pa

Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Bruce Momjian
On Thu, Oct 10, 2013 at 12:00:54PM -0400, Stephen Frost wrote:
> * Bruce Momjian (br...@momjian.us) wrote:
> > Well, I like the idea of initdb calling the tool, though the tool then
> > would need to be in C probably as we can't require python for initdb. 
> > The tool would not address Robert's issue of someone increasing
> > shared_buffers on their own.
> 
> I'm really not impressed with this argument.  Either the user is going
> to go and modify the config file, in which case I would hope that they'd
> at least glance around at what they should change, or they're going to
> move off PG because it's not performing well enough for them- which is
> really what I'm trying to avoid happening during the first 15m.

Well, they aren't going around and looking at other parameters now or we
would not feel a need to auto-tune many of our defaults.

How do we handle the Python dependency, or is this all to be done in
some other language?  I certainly am not ready to take on that job.

One nice thing about a tool is that you can see your auto-tuned defaults
right away, while doing this in the backend, you have to start the
server to see the defaults.  I am not even sure how I could allow users
to preview their defaults for different available_mem settings.

> > In fact, the other constants would still
> > be hard-coded in from initdb, which isn't good.
> 
> Actually, it *is* good, as Magnus pointed out.  Changing a completely
> unrelated parameter shouldn't make all of your plans suddenly change.
> This is mollified, but only a bit, if you have a GUC that's explicitly
> "this changes other GUCs", but I'd much rather have a tool that can do a
> better job to begin with and which helps the user understand what
> parameters are available to change and why there's more than one.

Well, the big question is how many users are going to use the tool, as
we are not setting this up for experts, but for novices.

I think one big risk is someone changing shared_buffers and not having
an accurate available_memory.  That might lead to some very inaccurate
defaults.  Also, what happens if available_memory is not supplied at
all?  Do we auto-tune just from shared_buffers, or not autotune at all,
and then what are the defaults?  We could certainly throw an error if
shared_buffers > available_memory.  We might just ship with
available_memory defaulting to 256MB and auto-tune everything from
there.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add json_typeof() and json_is_*() functions.

2013-10-10 Thread Andrew Dunstan


On 08/06/2013 08:42 AM, Robert Haas wrote:

On Fri, Aug 2, 2013 at 8:22 AM, Andrew Tipton  wrote:

But without json_is_scalar(), the choice is one of these two forms:
   json_typeof() NOT IN ('object', 'array')
   json_typeof() IN ('string', 'number', 'boolean', 'null')

The first of those is what seemed to make sense to me.  The user can
always define their own convenience function if they so desire.  I
don't think we need to bloat the default contents of pg_proc for that.




I agree. I have committed a version with just the one function.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >