date:20120104

Re: [HACKERS] pgstat wait timeout

2012-01-04 Thread pratikchirania

Thanks, i missed that.

After doing these changes, following is the observation:

1. The size of the pgstat file is 86KB. Last edited was when i moved the
file location to RAMdisk.
2. The issue persists. I am seeing continuous logs:

2012-01-05 00:00:06 JST WARNING:  pgstat wait timeout
2012-01-05 00:00:14 JST WARNING:  pgstat wait timeout
2012-01-05 00:00:26 JST WARNING:  pgstat wait timeout
.
.
.
2012-01-05 15:36:25 JST WARNING:  pgstat wait timeout
2012-01-05 15:36:37 JST WARNING:  pgstat wait timeout
2012-01-05 15:36:45 JST WARNING:  pgstat wait timeout


--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pgstat-wait-timeout-tp5078125p5121894.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Tom Lane

Andrew Dunstan  writes:
> Tom said:
>> 2. A slightly cleaner fix for this should be to duplicate the SV and
>> then release the copy around the SvPVutf8 call, only if it's readonly.
>> "Fixing" it in do_util_elog is entirely the wrong thing.

> How do we tell if it's readonly?

SvREADONLY(sv) macro.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Andrew Dunstan




On 01/04/2012 06:15 PM, David E. Wheeler wrote:
[11:12pm]TonyC:theory: using sv_mortalcopy() instead of newSVsv() 
should prevent the leak in that workaround, assuming there's no 
FREETMPS between the call and use of the return value


That's the solution to leakiness I had in mind.

Tom said:


2. A slightly cleaner fix for this should be to duplicate the SV and
then release the copy around the SvPVutf8 call, only if it's readonly.
"Fixing" it in do_util_elog is entirely the wrong thing.


How do we tell if it's readonly?

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_restore direct to database is broken for --insert dumps

2012-01-04 Thread Tom Lane

Andrew Dunstan  writes:
> On 01/04/2012 06:20 PM, Tom Lane wrote:
>> But we'd have to deal with
>> standard-conforming strings some way.  The idea I had about that, since
>> we have an open database connection at hand, is to check the connected
>> backend's standard_conforming_strings state via PQparameterStatus.  If
>> it doesn't have the right setting then things are going to fail anyway.

> Do we care what it is?

Well, yeah, we care.

> Or can we just issue a SET to make it what it needs to be?

We already did, but I don't think the code in pg_backup_db has
convenient access to the value.  I might be wrong about that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Autonomous subtransactions

2012-01-04 Thread Gianni Ciolli

On Wed, Jan 04, 2012 at 04:58:08PM -0600, Jim Nasby wrote:
> Except AFAIR Oracle uses the term to indicate something that is
> happening *outside* of your current transaction, which is definitely
> not what the proposal is talking about.

That feature is commonly translated in PostgreSQL to a dblink-based
solution, which itself is not distant from the current proposal, at
least in terms of inside/outside (the biggest difference I can see is
on sharing temporary tables).

But I am not sure I understand your remark; it would be clearer to me
if you could provide an example explaining the difference.

Dr. Gianni Ciolli - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
gianni.cio...@2ndquadrant.it | www.2ndquadrant.it

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_restore direct to database is broken for --insert dumps

2012-01-04 Thread Andrew Dunstan




On 01/04/2012 06:20 PM, Tom Lane wrote:

But we'd have to deal with
standard-conforming strings some way.  The idea I had about that, since
we have an open database connection at hand, is to check the connected
backend's standard_conforming_strings state via PQparameterStatus.  If
it doesn't have the right setting then things are going to fail anyway.



Do we care what it is? Or can we just issue a SET to make it what it 
needs to be?


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add SPI results constants available for PL/*

2012-01-04 Thread Samuel PHAN

I agree with Pavel also. Putting these constants in the pg_catalog isn't
the cleanest solution.

Though one can make its own little lib in python, perl, whatever, to store
these constants, it would be better if through the compilation, these C
constants were copied in a way for PL/*.

I can't really tell for the procedure languages other than PL/python but
for this one, typically, it would be nice to have them in "*plpy*" module.

result = plpy.execute(sql_query)
if result.status() == plpy.SPI_OK_SELECT:
# something...

Since the PG developers are the one who decide these constant values, it's
quite logic that the equivalent constants for each PL are made available by
the PG developers and not each geek to redo the wheel on it's own.

Well, it's not a crucial feature of course, but a nice to have, that was my
point ;-)

Have a nice day, guys,

Samuel

On Wed, Jan 4, 2012 at 3:11 AM, Andrew Dunstan  wrote:

>
>
> On 01/03/2012 08:40 PM, Robert Haas wrote:
>
>> On Tue, Jan 3, 2012 at 9:33 AM, Pavel 
>> Stehule>
>>  wrote:
>>
>>> I'd suppose it interesting to add a table to pg_catalog containing this
 data.

>>> - it is useless overhead
>>>
>> I tend to agree.
>>
>>  I am expecting so definition some constants in Perl, Python is simple
>>>
>> Presumably one could instead write a script to transform the list of
>> constants into a .pm file that could be loaded into the background, or
>> whatever PL/python's equivalent of that concept is.  Not sure if
>> there's a better way to do it.
>>
>
> Yeah, I'm with you and Pavel. Here's my quick perl one-liner to produce a
> set of SPI_* constants for pl/perl. I'm looking at the best way to include
> this in the bootstrap code.
>
>   perl -ne 'BEGIN { print "use constant\n{\n"; } END { print "};\n"; }
>   print "\t$1 => $2,\n" if /#define (SPI_\S+)\s+\(?(-?\d+)\)?/;'
>   src/include/executor/spi.h
>
>
> cheers
>
> andrew
>
>

Re: [HACKERS] [BUGS] BUG #6379: SQL Function Causes Back-end Crash

2012-01-04 Thread Tom Lane

I wrote:
> Perhaps the intoRel stuff should be
> saving/restoring the original destreceiver instead of just blindly
> overwriting it.

I concluded that was the best fix, and have committed it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_restore direct to database is broken for --insert dumps

2012-01-04 Thread Tom Lane

Andrew Dunstan  writes:
> On 01/04/2012 01:13 PM, Tom Lane wrote:
>> Not entirely sure what to do about this.  We could consider reverting
>> the aforesaid patch and trying to find another way of fixing that code's
>> failure to cope with standard-conforming strings, but I'm not sure that
>> there's a good way to know whether standard_conforming_strings is active
>> here, and anyway that code was ugly as sin and I'd rather not resurrect
>> it.  But on the other hand, there are no clear line boundaries in the
>> compressed data, and we can't introduce any without (a) worsening
>> compression and (b) breaking compatibility with existing dump files.
>> 
>> Anybody have any bright ideas?  I'm fresh out at the moment.

> I pondered this while out on my daily constitutional. The first thing 
> that occurred to me is that it would possibly have been better if we'd 
> made pg_dump not use a quoting mechanism whose behaviour is dependent on 
> a setting (e.g. E'' or dollar quoting). But I guess that's water under 
> the bridge.

> Could we detect an appropriate line ending in ahwrite() after it's been 
> decompressed and buffer partial lines accordingly?

Not easily: there could be newlines embedded in data strings or SQL
identifiers.  I'm not seeing any way around this except to restore the
minimal lexing capability.  One thing we could probably do is to
restrict it to be used only when reading table data, and continue to
assume that object creation commands can be emitted as-is.  That would
at least get us out of needing to parse dollar-quoted literals, which
aren't used in the INSERT commands.  But we'd have to deal with
standard-conforming strings some way.  The idea I had about that, since
we have an open database connection at hand, is to check the connected
backend's standard_conforming_strings state via PQparameterStatus.  If
it doesn't have the right setting then things are going to fail anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread David E. Wheeler

On Jan 4, 2012, at 2:48 PM, Andrew Dunstan wrote:

>> That's kinda grotty ... and leaky ...
>> 
> 
> Of course it is. It wasn't meant as a solution but as validation of your 
> suspicions about the nature of the problem. (The leakiness could be solved, 
> though.)

From #p5p on irc.perl.org:

[10:58pm]dg:interesting, so SvPV() handles string overloading, but SvPVutf8() 
doesn't, yet: "Like C, but converts sv to utf8 first if necessary."
[10:58pm]dg:oh, only for readonly objects
[10:58pm]dg:probably why no-one has noticed, as version is probably the only 
readonly thing with string overloading
[11:08pm]TonyC:it doesn't need string overloading
[11:09pm]TonyC:https://gist.github.com/1562734
[11:12pm]TonyC:theory: using sv_mortalcopy() instead of newSVsv() should 
prevent the leak in that workaround, assuming there's no FREETMPS between the 
call and use of the return value

Useful?

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Autonomous subtransactions

2012-01-04 Thread Jim Nasby

On Dec 19, 2011, at 12:31 PM, Simon Riggs wrote:
> On Sun, Dec 18, 2011 at 4:22 PM, Jim Nasby  wrote:
>> On Dec 18, 2011, at 2:28 AM, Gianni Ciolli wrote:
>>> I have written some notes about autonomous subtransactions, which have
>>> already been touched (at least) in two separate threads; please find
>>> them at
>>> 
>>>  http://wiki.postgresql.org/wiki/Autonomous_subtransactions
>> 
>> The document seems to mix the terms subtransaction and autonomous 
>> transaction. That's going to generate a ton of confusion, because both terms 
>> already have meaning associated with them:
>> 
>> - Autonomous transaction means you can execute something outside of your 
>> current transaction and it is in no way effected by the current transaction 
>> (doesn't matter if T0 commits or not).
>> - Subtransactions are an alternative to savepoints. They allow you to break 
>> a large transaction into smaller chunks, but if T0 doesn't commit then none 
>> of the subtransactions do either.
> 
> OK, perhaps we should just stick to the term Autonomous Transaction.
> That term is in common use, even if the usage is otherwise exactly the
> same as a subtransaction i.e. main transaction stops until the
> subtransaction is complete.

Except AFAIR Oracle uses the term to indicate something that is happening 
*outside* of your current transaction, which is definitely not what the 
proposal is talking about. I'm not wed to "subtransaction" (though I think it's 
a perfectly good name for this), but I definitely think calling this an 
"autonomous transaction" would be bad.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLOG contention

2012-01-04 Thread Jim Nasby

On Dec 20, 2011, at 11:29 PM, Tom Lane wrote:
> Robert Haas  writes:
>> So, what do we do about this?  The obvious answer is "increase
>> NUM_CLOG_BUFFERS", and I'm not sure that's a bad idea.
> 
> As you say, that's likely to hurt people running in small shared
> memory.  I too have thought about merging the SLRU areas into the main
> shared buffer arena, and likewise have concluded that it is likely to
> be way more painful than it's worth.  What I think might be an
> appropriate compromise is something similar to what we did for
> autotuning wal_buffers: use a fixed percentage of shared_buffers, with
> some minimum and maximum limits to ensure sanity.  But picking the
> appropriate percentage would take a bit of research.

ISTM that this is based more on number of CPUs rather than total memory, no? 
Likewise, things like the number of shared buffer partitions would be highly 
dependent on the number of CPUs.

So perhaps we should either probe the number of CPUs on a box, or have a GUC to 
tell us how many there are...
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Andrew Dunstan




On 01/04/2012 05:05 PM, Tom Lane wrote:



Well, the crash is apparently solved by the following, which your
investigation suggested to me:
-   cmsg = sv2cstr(msg);
+   cmsg = sv2cstr(newSVsv(msg));

That's kinda grotty ... and leaky ...



Of course it is. It wasn't meant as a solution but as validation of your 
suspicions about the nature of the problem. (The leakiness could be 
solved, though.)


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_restore direct to database is broken for --insert dumps

2012-01-04 Thread Andrew Dunstan




On 01/04/2012 01:13 PM, Tom Lane wrote:

In http://archives.postgresql.org/pgsql-admin/2012-01/msg8.php
it's pointed out that recent versions of pg_restore fall over on
archives made with -Fc --inserts (or --column-inserts), but only when
restoring direct to database; if you ask for text output it's perfectly
fine.  Investigation shows that the problem is that individual INSERT
commands are being broken apart at arbitrary buffer boundaries --- you
don't see any problem in text output, but when the bufferloads are
submitted as separate PQexec calls, of course bad things happen.

I believe this worked okay up until my patch here:
http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=6545a901a
which removed the mini SQL lexer in pg_backup_db.c.  I had supposed that
that had no useful function except to separate COPY data from
not-COPY-data, but in reality it had another function of ensuring that
INSERT commands split across zlib bufferload boundaries would get
reassembled before they are submitted to PQexec.

Not entirely sure what to do about this.  We could consider reverting
the aforesaid patch and trying to find another way of fixing that code's
failure to cope with standard-conforming strings, but I'm not sure that
there's a good way to know whether standard_conforming_strings is active
here, and anyway that code was ugly as sin and I'd rather not resurrect
it.  But on the other hand, there are no clear line boundaries in the
compressed data, and we can't introduce any without (a) worsening
compression and (b) breaking compatibility with existing dump files.

Anybody have any bright ideas?  I'm fresh out at the moment.





I pondered this while out on my daily constitutional. The first thing 
that occurred to me is that it would possibly have been better if we'd 
made pg_dump not use a quoting mechanism whose behaviour is dependent on 
a setting (e.g. E'' or dollar quoting). But I guess that's water under 
the bridge.


Could we detect an appropriate line ending in ahwrite() after it's been 
decompressed and buffer partial lines accordingly?


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Tom Lane

Andrew Dunstan  writes:
> On 01/04/2012 03:56 PM, Tom Lane wrote:
>> I think what's being passed *is* an SV --- at least, the contents look
>> reasonable in gdb --- but for some reason SvPVutf8 isn't coping with
>> this particular kind of SV.  Googling suggests that SvPVutf8 used to
>> fail on READONLY SVs, of which this is one if I'm reading the flag bits
>> correctly; but that was supposedly fixed years ago.  I believe we've hit
>> some other undocumented limitation of that function, which the Perl guys
>> may or may not acknowledge as a bug once we've tracked it down better.

> Well, the crash is apparently solved by the following, which your 
> investigation suggested to me:
> -   cmsg = sv2cstr(msg);
> +   cmsg = sv2cstr(newSVsv(msg));

That's kinda grotty ... and leaky ...

I installed perl-debuginfo and soon found that SvPVutf8 leads to here:

(gdb) s
9066Perl_croak(aTHX_ "Can't coerce readonly %s to string in 
%s",
(gdb) bt
#0  Perl_sv_pvn_force_flags (my_perl=0x17f3170, sv=0x18b6c50, 
lp=0x7fffb0c8e2f8, flags=) at sv.c:9066
#1  0x0038c30c7003 in Perl_sv_utf8_upgrade_flags_grow (my_perl=0x17f3170, 
sv=0x18b6c50, flags=2, extra=0) at sv.c:3228
#2  0x0038c30c7778 in Perl_sv_2pvutf8 (my_perl=0x17f3170, sv=0x18b6c50, 
lp=0x7fffb0c8e370) at sv.c:3079
#3  0x7f4308447614 in sv2cstr (sv=0x18b6c50) at plperl_helpers.h:54
#4  0x7f430844771f in do_util_elog (level=18, msg=0x18b6c50) at Util.xs:44
#5  0x7f4308447bdc in XS__elog (my_perl=0x17f3170, cv=0x181e008)
at Util.xs:105
#6  0x0038c30b548f in Perl_pp_entersub (my_perl=0x17f3170) at pp_hot.c:3046
#7  0x0038c30ac796 in Perl_runops_standard (my_perl=0x17f3170) at run.c:41
#8  0x0038c30480ae in Perl_call_sv (my_perl=0x17f3170, sv=0x19843e0, 
flags=10) at perl.c:2647
#9  0x7f4308440f3e in plperl_call_perl_func (desc=0x7fffb0c8e8b0, 
fcinfo=0x7fffb0c8fe10) at plperl.c:2018
#10 0x7f430843fa99 in plperl_inline_handler (fcinfo=0x7fffb0c902a0)
at plperl.c:1751

which leads to a few conclusions:

1. SvPVutf8 fails on readonly SVs, despite the fact that no such
limitation is documented and that this was supposedly fixed in 2004, cf
http://www.nntp.perl.org/group/perl.perl5.porters/2004/03/msg89505.html
We ought to hold somebody's feet to the fire about that.  I don't really
expect any response beyond documenting the limitation in perlapi, but
at least they ought to do that.

2. A slightly cleaner fix for this should be to duplicate the SV and
then release the copy around the SvPVutf8 call, only if it's readonly.
"Fixing" it in do_util_elog is entirely the wrong thing.

3. Perl_croak inside a PG_TRY block fails quite nastily.  I think we
might be well advised to move the sv2cstr(msg) call outside the PG_TRY,
but I'm wondering whether there is not a more general structural problem
in plperl concerning nesting of PG and Perl error recovery.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Andrew Dunstan




On 01/04/2012 03:56 PM, Tom Lane wrote:

Andrew Dunstan  writes:

On 01/04/2012 12:56 PM, Tom Lane wrote:

I looked at that last night but it appeared that SvOK would be perfectly
happy.  (Didn't actually try it, though, I was just eyeballing the flags
in gdb.)

I tested it and you're right, it doesn't help. I don't see what else we
can do about it. There doesn't appear to be any test for an SV in the API.

I think what's being passed *is* an SV --- at least, the contents look
reasonable in gdb --- but for some reason SvPVutf8 isn't coping with
this particular kind of SV.  Googling suggests that SvPVutf8 used to
fail on READONLY SVs, of which this is one if I'm reading the flag bits
correctly; but that was supposedly fixed years ago.  I believe we've hit
some other undocumented limitation of that function, which the Perl guys
may or may not acknowledge as a bug once we've tracked it down better.





Well, the crash is apparently solved by the following, which your 
investigation suggested to me:



diff --git a/src/pl/plperl/Util.xs b/src/pl/plperl/Util.xs
index 7d0102b..0785e2e 100644
--- a/src/pl/plperl/Util.xs
+++ b/src/pl/plperl/Util.xs
@@ -41,7 +41,7 @@ do_util_elog(int level, SV *msg)

PG_TRY();
{
-   cmsg = sv2cstr(msg);
+   cmsg = sv2cstr(newSVsv(msg));
elog(level, "%s", cmsg);
pfree(cmsg);
}


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Robert Haas

On Wed, Jan 4, 2012 at 4:02 PM, Kevin Grittner
 wrote:
> Robert Haas  wrote:
>
>> 2. The CLOG code isn't designed to manage a large number of
>> buffers, so adding more might cause a performance regression on
>> small systems.
>>
>> On Nate Boley's 32-core system, running pgbench at scale factor
>> 100, the optimal number of buffers seems to be around 32.  I'd
>> like to get some test results from smaller systems - any chance
>> you (or anyone) have, say, an 8-core box you could test on?
>
> Hmm.  I can think of a lot of 4-core servers I could test on.  (We
> have a few poised to go into production where it would be relatively
> easy to do benchmarking without distorting factors right now.)
> After that we jump to 16 cores, unless I'm forgetting something.
> These are currently all in production, but some of them are
> redundant machines which could be pulled for a few hours here and
> there for benchmarks.  If either of those seem worthwhile, please
> spec the useful tests so I can capture the right information.

Yes, both of those seem useful.  To compile, I do this:

./configure --prefix=$HOME/install/$BRANCHNAME --enable-depend
--enable-debug ${EXTRA_OPTIONS}
make
make -C contrib/pgbench
make check
make install
make -C contrib/pgbench install

In this case, the relevant builds would probably be (1) master, (2)
master with NUM_CLOG_BUFFERS = 16, (3) master with NUM_CLOG_BUFFERS =
32, and (4) master with NUM_CLOG_BUFFERS = 48.  (You could also try
intermediate numbers if it seems warranted.)

Basic test setup:

rm -rf $PGDATA
~/install/master/bin/initdb
cat >> $PGDATA/postgresql.conf

Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Jim Nasby

On Jan 4, 2012, at 2:02 PM, Kevin Grittner wrote:
> Jim Nasby  wrote:
>> Here's output from our largest OLTP system... not sure exactly how
>> to interpret it, so I'm just providing the raw data. This spans
>> almost exactly 1 month.
> 
> Those number wind up meaning that 18% of the 256-byte blocks (1024
> transactions each) were all commits.  Yikes.  That pretty much
> shoots down Robert's idea of summarized CLOG data, I think.

Here's another data point. This is for a londiste slave of what I posted 
earlier. Note that this slave has no users on it.
054A 654
054B 835
054C 973
054D 1020
054E 1012
054F 1022
0550 284


And these clog files are from Sep 15-30... I believe that's the period when we 
were building this slave, but I'm not 100% certain.

04F0 194
04F1 253
04F2 585
04F3 243
04F4 176
04F5 164
04F6 358
04F7 505
04F8 168
04F9 180
04FA 369
04FB 318
04FC 236
04FD 437
04FE 242
04FF 625
0500 222
0501 139
0502 174
0503 91
0504 546
0505 220
0506 187
0507 151
0508 199
0509 491
050A 232
050B 170
050C 191
050D 414
050E 557
050F 231
0510 173
0511 159
0512 436
0513 789
0514 354
0515 157
0516 187
0517 333
0518 599
0519 483
051A 300
051B 512
051C 713
051D 422
051E 291
051F 596
0520 785
0521 825
0522 484
0523 238
0524 151
0525 190
0526 256
0527 403
0528 551
0529 757
052A 837
052B 418
052C 256
052D 161
052E 254
052F 423
0530 469
0531 757
0532 627
0533 325
0534 224
0535 295
0536 290
0537 352
0538 561
0539 565
053A 833
053B 756
053C 485
053D 276
053E 241
053F 270
0540 334
0541 306
0542 700
0543 821
0544 402
0545 199
0546 226
0547 250
0548 354
0549 587


This is for a slave of that database that does have user activity:

054A 654
054B 835
054C 420
054D 432
054E 852
054F 666
0550 302
0551 243
0552 600
0553 295
0554 617
0555 504
0556 232
0557 304
0558 580
0559 156

--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding Checkpoint Redo Record

2012-01-04 Thread Robert Haas

On Wed, Jan 4, 2012 at 4:06 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> But, the OP makes me wonder: why can a standby only perform a
>> restartpoint where the master performed a checkpoint?  It seems like a
>> standby ought to be able to create a restartpoint anywhere, just by
>> writing everything, flushing it to disk, and update pg_control.
>
> Perhaps, but then crash restarts would have to accept start pointers
> that point at any random place in the WAL.  I like the additional error
> checking of verifying that there's a checkpoint recod there.

I could go either way on that one, but...

> Also
> I think the full-page-write mechanism would no longer protect against
> torn pages during replay if you did that.

...this is a very good point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Robert Haas

On Wed, Jan 4, 2012 at 3:51 PM, Kevin Grittner
 wrote:
>> If double writes aren't going to give us anything "for free",
>> maybe that's not the right place to be focusing our
>> efforts...
>
> I'm not sure why it's not enough that they improve performance over
> the alternative.  Making some other feature with obvious overhead
> "free" seems an odd requirement to hang on this.  (Maybe I'm
> misunderstanding you on that point?)

Well, this thread is nominally about checksums, but here we are
talking about double writes, so I thought we were connecting those
features in some way?

Certainly, life is easier if we can develop them completely separately
- but checksums really ought to come with some sort of solution to the
problem of a torn-page with hint bit changes, IMO, and I thought
that's why were thinking so hard about DW just now.

Maybe I'm confused.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #6379: SQL Function Causes Back-end Crash

2012-01-04 Thread Tom Lane

Paul Ramsey  writes:
> Further notes, from Andrew (RhodiumToad) on IRC about the cause of this 
> crasher:
> [12:31pm] RhodiumToad: there's no trivial fix

IMO the main bug here is that functions.c isn't expecting qd->dest to be
overwritten,  so we could work around it by keeping a separate private
copy of the dest pointer.  However, it would also be fair to ask whether
there's not a cleaner solution.  Perhaps the intoRel stuff should be
saving/restoring the original destreceiver instead of just blindly
overwriting it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding Checkpoint Redo Record

2012-01-04 Thread Tom Lane

Robert Haas  writes:
> But, the OP makes me wonder: why can a standby only perform a
> restartpoint where the master performed a checkpoint?  It seems like a
> standby ought to be able to create a restartpoint anywhere, just by
> writing everything, flushing it to disk, and update pg_control.

Perhaps, but then crash restarts would have to accept start pointers
that point at any random place in the WAL.  I like the additional error
checking of verifying that there's a checkpoint recod there.  Also
I think the full-page-write mechanism would no longer protect against
torn pages during replay if you did that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Kevin Grittner

Robert Haas  wrote:

> 2. The CLOG code isn't designed to manage a large number of
> buffers, so adding more might cause a performance regression on
> small systems.
> 
> On Nate Boley's 32-core system, running pgbench at scale factor
> 100, the optimal number of buffers seems to be around 32.  I'd
> like to get some test results from smaller systems - any chance
> you (or anyone) have, say, an 8-core box you could test on?

Hmm.  I can think of a lot of 4-core servers I could test on.  (We
have a few poised to go into production where it would be relatively
easy to do benchmarking without distorting factors right now.) 
After that we jump to 16 cores, unless I'm forgetting something. 
These are currently all in production, but some of them are
redundant machines which could be pulled for a few hours here and
there for benchmarks.  If either of those seem worthwhile, please
spec the useful tests so I can capture the right information.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Tom Lane

Andrew Dunstan  writes:
> On 01/04/2012 12:56 PM, Tom Lane wrote:
>> I looked at that last night but it appeared that SvOK would be perfectly
>> happy.  (Didn't actually try it, though, I was just eyeballing the flags
>> in gdb.)

> I tested it and you're right, it doesn't help. I don't see what else we 
> can do about it. There doesn't appear to be any test for an SV in the API.

I think what's being passed *is* an SV --- at least, the contents look
reasonable in gdb --- but for some reason SvPVutf8 isn't coping with
this particular kind of SV.  Googling suggests that SvPVutf8 used to
fail on READONLY SVs, of which this is one if I'm reading the flag bits
correctly; but that was supposedly fixed years ago.  I believe we've hit
some other undocumented limitation of that function, which the Perl guys
may or may not acknowledge as a bug once we've tracked it down better.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Kevin Grittner

Robert Haas  wrote:

> I think it may also be tricky to make sure that a backend that
> needs to write a dirty buffer doesn't end up having to wait for a
> double-write to be fsync'd.

This and other parts of your post seem to ignore the BBU write-back
cache.  Multiple fsyncs of a single page can still be collapsed at
that level to a single actual disk write.  In fact, I rather doubt
this technology will look very good on machines without write-back
caching.  I'm not as sure as you are that this is a net loss in
throughput, either.  When the fsync storm clogs the RAID controller,
even reads stall, so something which more evenly pushes writes to
disk might avoid these non-productive pauses.  I think that could
improve throughput enough to balance or exceed the other effects. 
Maybe.  I agree we need to be careful to craft a good set of
benchmarks here.

> Checksums aren't free even when using double-writes: if you don't
> have checksums, pages that have only hint bit-changes don't need
> to be double-written.

Agreed.  Checksums aren't expected to be free under any
circumstances.  I'm expecting DW to be slightly faster than FPW in
general, with or without in-page checksums.

> If double writes aren't going to give us anything "for free",
> maybe that's not the right place to be focusing our
> efforts...

I'm not sure why it's not enough that they improve performance over
the alternative.  Making some other feature with obvious overhead
"free" seems an odd requirement to hang on this.  (Maybe I'm
misunderstanding you on that point?)

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Avoid FK validations for no-rewrite ALTER TABLE ALTER TYPE

2012-01-04 Thread Noah Misch

I neglected to commit after revising the text of a few comments; use this
version instead.  Thanks.
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3b52415..9eba8e8 100644
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***
*** 276,281  static Oid transformFkeyCheckAttrs(Relation pkrel,
--- 276,282 
int numattrs, int16 *attnums,
Oid *opclasses);
  static void checkFkeyPermissions(Relation rel, int16 *attnums, int natts);
+ static CoercionPathType findFkeyCast(Oid targetTypeId, Oid sourceTypeId, Oid 
*funcid);
  static void validateCheckConstraint(Relation rel, HeapTuple constrtup);
  static void validateForeignKeyConstraint(char *conname,
 Relation rel, Relation 
pkrel,
***
*** 357,362  static void ATPostAlterTypeCleanup(List **wqueue, 
AlteredTableInfo *tab, LOCKMOD
--- 358,364 
  static void ATPostAlterTypeParse(Oid oldId, char *cmd,
 List **wqueue, LOCKMODE lockmode, bool 
rewrite);
  static void TryReuseIndex(Oid oldId, IndexStmt *stmt);
+ static void TryReuseForeignKey(Oid oldId, Constraint *con);
  static void change_owner_fix_column_acls(Oid relationOid,
 Oid oldOwnerId, Oid 
newOwnerId);
  static void change_owner_recurse_to_sequences(Oid relationOid,
***
*** 5696,5701  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
--- 5698,5705 
numpks;
Oid indexOid;
Oid constrOid;
+   boolold_check_ok;
+   ListCell   *old_pfeqop_item = list_head(fkconstraint->old_conpfeqop);
  
/*
 * Grab an exclusive lock on the pk table, so that someone doesn't 
delete
***
*** 5812,5817  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
--- 5816,5828 
(errcode(ERRCODE_INVALID_FOREIGN_KEY),
 errmsg("number of referencing and referenced 
columns for foreign key disagree")));
  
+   /*
+* On the strength of a previous constraint, we might avoid scanning
+* tables to validate this one.  See below.
+*/
+   old_check_ok = (fkconstraint->old_conpfeqop != NIL);
+   Assert(!old_check_ok || numfks == 
list_length(fkconstraint->old_conpfeqop));
+ 
for (i = 0; i < numpks; i++)
{
Oid pktype = pktypoid[i];
***
*** 5826,5831  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
--- 5837,5843 
Oid ppeqop;
Oid ffeqop;
int16   eqstrategy;
+   Oid pfeqop_right;
  
/* We need several fields out of the pg_opclass entry */
cla_ht = SearchSysCache1(CLAOID, 
ObjectIdGetDatum(opclasses[i]));
***
*** 5868,5877  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
pfeqop = get_opfamily_member(opfamily, opcintype, fktyped,
 
eqstrategy);
if (OidIsValid(pfeqop))
ffeqop = get_opfamily_member(opfamily, fktyped, fktyped,

 eqstrategy);
else
!   ffeqop = InvalidOid;/* keep compiler quiet */
  
if (!(OidIsValid(pfeqop) && OidIsValid(ffeqop)))
{
--- 5880,5896 
pfeqop = get_opfamily_member(opfamily, opcintype, fktyped,
 
eqstrategy);
if (OidIsValid(pfeqop))
+   {
+   pfeqop_right = fktyped;
ffeqop = get_opfamily_member(opfamily, fktyped, fktyped,

 eqstrategy);
+   }
else
!   {
!   /* keep compiler quiet */
!   pfeqop_right = InvalidOid;
!   ffeqop = InvalidOid;
!   }
  
if (!(OidIsValid(pfeqop) && OidIsValid(ffeqop)))
{
***
*** 5893,5899  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
--- 5912,5921 
target_typeids[1] = opcintype;
if (can_coerce_type(2, input_typeids, target_typeids,

COERCION_IMPLICIT))
+   {

Re: [HACKERS] [BUGS] BUG #6379: SQL Function Causes Back-end Crash

2012-01-04 Thread Paul Ramsey

Further notes, from Andrew (RhodiumToad) on IRC about the cause of this crasher:

[12:03pm] RhodiumToad: what happens is this
[12:04pm] RhodiumToad: postquel_start know this statement doesn't
return the result, so it supplies None_Receiver as the dest-receiver
for the query
[12:04pm] RhodiumToad: however, it knows it's a plannedStmt, so it
fires up the full executor to run it
[12:05pm] RhodiumToad: and the executor allocates a new destreceiver
in its own memory context, replaces es->qd->dest with it,
[12:05pm] RhodiumToad: (the new destreceiver is the one that writes
tuples to the created table)
[12:06pm] RhodiumToad: then at executorEnd (called from postquel_end),
executor shutdown closes the new rel, _and then frees the executor's
memory context, including the destreceiver it created
[12:07pm] RhodiumToad: postquel_end doesn't know that its setting of
->dest was clobbered, so it goes to try and destroy it again, and gets
garbage (if assertions are on)
[12:07pm] RhodiumToad: if assertions weren't on, then the rDestroy
call is harmless
[12:07pm] RhodiumToad: well, mostly harmless
[12:07pm] RhodiumToad: sneaky one, that
[12:09pm] RhodiumToad: you can confirm it by tracing through that
second call to postquel_end and confirming that it's the call to
ExecutorEnd that stomps the content of qd->dest
[12:12pm] pramsey: confirmed, the pass through ExecutorEnd has
clobbered the value so there's garbage when it arrives at line 638
[12:14pm] RhodiumToad: if you trace through ExecutorEnd itself, it
should be the FreeExecutorState that does it
[12:15pm] RhodiumToad: wonder how far back this bug goes
[12:16pm] RhodiumToad: actually not very far
[12:17pm] RhodiumToad: older versions just figured that qd->dest was
always None_Receiver and therefore did not need an rDestroy call
[12:17pm] RhodiumToad: (which is a no-op for None_Receiver)
[12:17pm] pramsey: kills my 8.4
[12:17pm] RhodiumToad: so this is broken in 8.4+
[12:17pm] pramsey: ah
[12:18pm] RhodiumToad: 8.4 introduced the lazy-eval of selects in sql functions
[12:19pm] RhodiumToad: prior to that they were always run immediately
to completion
[12:19pm] RhodiumToad: that requires juggling the destreceiver a bit,
hence the bug
[12:20pm] RhodiumToad: btw, the first statement of the function
shouldn't be needed
[12:21pm] RhodiumToad: just  ... as $f$ create table foo as select 1
as x; $f$;  should be enough to break it
[12:31pm] RhodiumToad: there's no trivial fix


On Wed, Jan 4, 2012 at 11:32 AM, Paul Ramsey  wrote:
> One extra detail, my PostgreSQL is compiled with --enable-cassert.
> This is required to set off the killer function.
>
>> On Wed, Jan 04, 2012 at 07:17:17PM +, pram...@cleverelephant.ca wrote:
>>> The following bug has been logged on the website:
>>>
>>> Bug reference:      6379
>>> Logged by:          Paul Ramsey
>>> Email address:      pram...@cleverelephant.ca
>>> PostgreSQL version: 9.1.2
>>> Operating system:   OSX 10.6.8
>>> Description:
>>>
>>> CREATE OR REPLACE FUNCTION kill_backend()
>>> RETURNS VOID
>>> AS $$
>>>   DROP TABLE if EXISTS foo;
>>>   CREATE TABLE foo AS SELECT * FROM pg_class LIMIT 1;
>>> $$ LANGUAGE 'sql';
>>>
>>> SELECT kill_backend();
>>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] controlling the location of server-side SSL files

2012-01-04 Thread Robert Haas

On Tue, Jan 3, 2012 at 9:38 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Jan 3, 2012 at 6:25 PM, Peter Eisentraut  wrote:
>>> [ reasons ]
>
>> I agree with these reasons.  We don't get charged $0.50 per GUC, so
>> there's no particular reason to contort things to have fewer of them.
>
> Well, there definitely is a distributed cost to each additional GUC.
> Peter's given what are probably adequate reasons to add several of them
> here, but that doesn't mean we should not ask the question whether each
> new GUC is really necessary.

No argument.  I'm merely saying that I think the rationale for these
GUCs is solid enough to justify their existence.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Robert Haas

On Wed, Jan 4, 2012 at 3:02 PM, Kevin Grittner
 wrote:
> Jim Nasby  wrote:
>> Here's output from our largest OLTP system... not sure exactly how
>> to interpret it, so I'm just providing the raw data. This spans
>> almost exactly 1 month.
>
> Those number wind up meaning that 18% of the 256-byte blocks (1024
> transactions each) were all commits.  Yikes.  That pretty much
> shoots down Robert's idea of summarized CLOG data, I think.

I'm not *totally* certain of that... another way to look at it is that
I have to be able to show a win even if only 18% of the probes into
the summarized data are successful, which doesn't seem totally out of
the question given how cheap I think lookups could be.  But I'll admit
it's not real encouraging.

I think the first thing we need to look at is increasing the number of
CLOG buffers.  Even if hypothetical summarized CLOG data had a 60% hit
rate rather than 18%, 8 CLOG buffers is probably still not going to be
enough for a 32-core system, let alone anything larger.  I am aware of
two concerns here:

1. Unconditionally adding more CLOG buffers will increase PostgreSQL's
minimum memory footprint, which is bad for people suffering under
default shared memory limits or running a database on a device with
less memory than a low-end cell phone.

2. The CLOG code isn't designed to manage a large number of buffers,
so adding more might cause a performance regression on small systems.

On Nate Boley's 32-core system, running pgbench at scale factor 100,
the optimal number of buffers seems to be around 32.  I'd like to get
some test results from smaller systems - any chance you (or anyone)
have, say, an 8-core box you could test on?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Alex Hunsaker

On Wed, Jan 4, 2012 at 13:13, Andrew Dunstan  wrote:
>
>
> On 01/04/2012 12:56 PM, Tom Lane wrote:

>> I looked at that last night but it appeared that SvOK would be perfectly
>> happy.  (Didn't actually try it, though, I was just eyeballing the flags
>> in gdb.)

>
> I tested it and you're right, it doesn't help. I don't see what else we can
> do about it. There doesn't appear to be any test for an SV in the API.

I think about the best we can do is something along the lines of:

sv2cstr()
{
...
  if (Perl_vverify(sv))
   return utf_u2e(SvPV(sv));
...
}

I dont the the utf_u2e is strictly needed (other than that it strdups)
as I don't think versions can have utf8 chars, or at least that $^V
will not have utf8 chars (and even if it did it would only cause
problems if they had codepoints in >128  <255).

We would still have issues with typeglobs...

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Avoid FK validations for no-rewrite ALTER TABLE ALTER TYPE

2012-01-04 Thread Noah Misch

Git master can already avoid rewriting the table for column type changes like
varchar(10) -> varchar(20).  However, if the column in question is on either
side of a FK relationship, we always revalidate the foreign key.  Concretely,
I wanted these no-rewrite type changes to also assume FK validity:
- Any typmod-only change
- text <-> varchar
- domain <-> base type

To achieve that, this patch skips the revalidation when two conditions hold:

(a) Old and new pg_constraint.conpfeqop match exactly.  This is actually
stronger than needed; we could loosen things by way of operator families.
However, no core type would benefit, and my head exploded when I tried to
define the more-generous test correctly.

(b) The functions, if any, implementing a cast from the foreign type to the
primary opcintype are the same.  For this purpose, we can consider a binary
coercion equivalent to an exact type match.  When the opcintype is
polymorphic, require that the old and new foreign types match exactly.  (Since
ri_triggers.c does use the executor, the stronger check for polymorphic types
is no mere future-proofing.  However, no core type exercises its necessity.)

These follow from the rules used to decide when to rebuild an index.  I
further justify them in source comments.

To implement this, I have ATPostAlterTypeParse() stash the content of the old
pg_constraint.conpfeqop in the Constraint node.  ATAddForeignKeyConstraint()
notices that and evaluates the above rules.  If they both pass, it omits the
validation step just as though skip_validation had been given.

Thanks,
nm
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3b52415..0cde503 100644
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***
*** 276,281  static Oid transformFkeyCheckAttrs(Relation pkrel,
--- 276,282 
int numattrs, int16 *attnums,
Oid *opclasses);
  static void checkFkeyPermissions(Relation rel, int16 *attnums, int natts);
+ static CoercionPathType findFkeyCast(Oid targetTypeId, Oid sourceTypeId, Oid 
*funcid);
  static void validateCheckConstraint(Relation rel, HeapTuple constrtup);
  static void validateForeignKeyConstraint(char *conname,
 Relation rel, Relation 
pkrel,
***
*** 357,362  static void ATPostAlterTypeCleanup(List **wqueue, 
AlteredTableInfo *tab, LOCKMOD
--- 358,364 
  static void ATPostAlterTypeParse(Oid oldId, char *cmd,
 List **wqueue, LOCKMODE lockmode, bool 
rewrite);
  static void TryReuseIndex(Oid oldId, IndexStmt *stmt);
+ static void TryReuseForeignKey(Oid oldId, Constraint *con);
  static void change_owner_fix_column_acls(Oid relationOid,
 Oid oldOwnerId, Oid 
newOwnerId);
  static void change_owner_recurse_to_sequences(Oid relationOid,
***
*** 5696,5701  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
--- 5698,5705 
numpks;
Oid indexOid;
Oid constrOid;
+   boolold_check_ok;
+   ListCell   *old_pfeqop_item = list_head(fkconstraint->old_conpfeqop);
  
/*
 * Grab an exclusive lock on the pk table, so that someone doesn't 
delete
***
*** 5812,5817  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
--- 5816,5828 
(errcode(ERRCODE_INVALID_FOREIGN_KEY),
 errmsg("number of referencing and referenced 
columns for foreign key disagree")));
  
+   /*
+* On the strength of a previous constraint, we might avoid scanning
+* tables to validate this one.  See below.
+*/
+   old_check_ok = (fkconstraint->old_conpfeqop != NIL);
+   Assert(!old_check_ok || numfks == 
list_length(fkconstraint->old_conpfeqop));
+ 
for (i = 0; i < numpks; i++)
{
Oid pktype = pktypoid[i];
***
*** 5826,5831  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
--- 5837,5843 
Oid ppeqop;
Oid ffeqop;
int16   eqstrategy;
+   Oid pfeqop_right;
  
/* We need several fields out of the pg_opclass entry */
cla_ht = SearchSysCache1(CLAOID, 
ObjectIdGetDatum(opclasses[i]));
***
*** 5868,5877  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
pfeqop = get_opfamily_member(opfamily, opcintype, fktyped,
 
eqstrategy);
if (OidIsValid(pfeqop))
ffeqop =

Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Andrew Dunstan




On 01/04/2012 12:56 PM, Tom Lane wrote:

Andrew Dunstan  writes:

The docs (perldoc perlvar) seem to suggest $^V isn't an SV (i.e. a
scalar) but some other sort of animal:

Yeah, it's a version object, but I'd have thought that SvPV and friends
would automatically stringify such an object.  Otherwise, practically
any kind of perl extension could be crashed by passing it one, no?


But Util.xs::util_elog() expects an SV and doesn't check whether or not
it actually has one. I've found a few other ways of crashing this call
(e.g. by passing a typeglob), so maybe we need to test that we actually
have an SV. I think SvOK() is what we'd use for that - perl gurus please
confirm.

I looked at that last night but it appeared that SvOK would be perfectly
happy.  (Didn't actually try it, though, I was just eyeballing the flags
in gdb.)





I tested it and you're right, it doesn't help. I don't see what else we 
can do about it. There doesn't appear to be any test for an SV in the API.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Robert Haas

On Wed, Jan 4, 2012 at 1:32 PM, Kevin Grittner
 wrote:
> Robert Haas  wrote:
>> we only fsync() at end-of-checkpoint.  So we'd have to think about
>> what to fsync, and how often, to keep the double-write buffer to a
>> manageable size.
>
> I think this is the big tuning challenge with this technology.

One of them, anyway.  I think it may also be tricky to make sure that
a backend that needs to write a dirty buffer doesn't end up having to
wait for a double-write to be fsync'd.

>> I can't help thinking that any extra fsyncs are pretty expensive,
>> though, especially if you have to fsync() every file that's been
>> double-written before clearing the buffer. Possibly we could have
>> 2^N separate buffers based on an N-bit hash of the relfilenode and
>> segment number, so that we could just fsync 1/(2^N)-th of the open
>> files at a time.
>
> I'm not sure I'm following -- we would just be fsyncing those files
> we actually wrote pages into, right?  Not all segments for the table
> involved?

Yes.

>> But even that sounds expensive: writing back lots of dirty data
>> isn't cheap.  One of the systems I've been doing performance
>> testing on can sometimes take >15 seconds to write a shutdown
>> checkpoint,
>
> Consider the relation-file fsyncs for double-write as a form of
> checkpoint spreading, and maybe it won't seem so bad.  It should
> make that shutdown checkpoint less painful.  Now, I have been
> thinking that on a write-heavy system you had better have a BBU
> write-back cache, but that's my recommendation, anyway.

I think this point has possibly been beaten to death, but at the risk
of belaboring the point I'll bring it up again: the frequency with
which we fsync() is basically a trade-off between latency and
throughput.  If you fsync a lot, then each one will be small, so you
shouldn't experience much latency, but throughput might suck.  If you
don't fsync very much, then you maximize the chances for
write-combining (because inserting an fsync between two writes to the
same block forces that block to be physically written twice rather
than just once) thus improving throughput, but when you do get around
to calling fsync() there may be a lot of data to write all at once,
and you may get a gigantic latency spike.

As far as I can tell, one fsync per checkpoint is the theoretical
minimum, and that's what we do now.  So our current system is
optimized for throughput.  The decision to put full-page images into
WAL rather than a separate buffer is essentially turning the dial in
the same direction, because, in effect, the double-write fsync
piggybacks on the WAL fsync which we must do anyway.  So both the
decision to use a double-write buffer AT ALL and the decision to fsync
more frequently to keep that buffer to a manageable size are going to
result in turning that dial in the opposite direction.  It seems to me
inevitable that, even with the best possible implementation,
throughput will get worse.  With a good implementation but not a bad
one, latency should improve.

Now, this is not necessarily a reason to reject the idea.  I believe
that several people have proposed that our current implementation is
*overly* optimized for throughput *at the expense of* latency, and
that we might want to provide some options that, in one way or
another, fsync more frequently, so that checkpoint spikes aren't as
bad.  But when it comes time to benchmark, we might need to think
somewhat carefully about what we're testing...

Another thought here is that double-writes may not be the best
solution, and are almost certainly not the easiest-to-implement
solution.  We could instead do something like this: when an unlogged
change is made to a buffer (e.g. a hint bit is set), we set a flag on
the buffer header.  When we evict such a buffer, we emit a WAL record
that just overwrites the whole buffer with a new FPI.  There are some
pretty obvious usage patterns where this is likely to be painful (e.g.
load a big table without setting hint bits, and then seq-scan it).
But there are also many use cases where the working set fits inside
shared buffers and data pages don't get written very often, apart from
checkpoint time, and those cases might work just fine.  Also, the
cases that are problems for this implementation are likely to also be
problems for a double-write based implementation, for exactly the same
reasons: if you discover at buffer eviction time that you need to
fsync something (whether it's WAL or DW), it's going to hurt.
Checksums aren't free even when using double-writes: if you don't have
checksums, pages that have only hint bit-changes don't need to be
double-written.  If double writes aren't going to give us anything
"for free", maybe that's not the right place to be focusing our
efforts...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql

Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Kevin Grittner

Jim Nasby  wrote:

> Here's output from our largest OLTP system... not sure exactly how
> to interpret it, so I'm just providing the raw data. This spans
> almost exactly 1 month.

Those number wind up meaning that 18% of the 256-byte blocks (1024
transactions each) were all commits.  Yikes.  That pretty much
shoots down Robert's idea of summarized CLOG data, I think.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting -Werror in CFLAGS

2012-01-04 Thread Andrew Dunstan




On 01/04/2012 02:35 PM, Heikki Linnakangas wrote:

On 04.01.2012 20:44, Robert Haas wrote:

On Tue, Jan 3, 2012 at 9:23 PM, Tom Lane  wrote:

Robert Haas  writes:
On Tue, Jan 3, 2012 at 7:39 PM, Peter 
Geoghegan  wrote:

Yes, I know that these only appeared in GCC 4.6+ and as such are a
relatively recent phenomenon, but there has been some effort to
eliminate them, and if I could get a non-hacked -Werror build I'd 
feel

happy enough about excluding them as already outlined.



I just do this:
echo COPT=-Werror>  src/Makefile.custom
...which seems to work reasonably well.


I see no point in -Werror whatsoever.  If you aren't examining the make
output for warnings, you're not following proper development practice
IMO.


I find -Werror to be a convenient way to examine the output for
warnings.  Otherwise they scroll off the screen.  Yeah, I could save
the output to a file and grep it afterwards, but that seems less
convenient.  I'm clearly not the only one doing it this way, since
src/backend/parser/gram.o manually sticks in -Wno-error...


I use "make -s".


Yeah, that's a good thing to do.

We are by far the most vigilant project I am aware of about fixing 
warnings. That's a Good Thing (tm,). Build most FOSS software and you 
see huge numbers of warnings fly by. It can get quite distressing.


We turn the errors off for gram.o precisely because we can't control it, 
since the included source file scan.c is generated by flex.


cheers

andrew






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting -Werror in CFLAGS

2012-01-04 Thread Heikki Linnakangas


On 04.01.2012 20:44, Robert Haas wrote:

On Tue, Jan 3, 2012 at 9:23 PM, Tom Lane  wrote:

Robert Haas  writes:

On Tue, Jan 3, 2012 at 7:39 PM, Peter Geoghegan  wrote:

Yes, I know that these only appeared in GCC 4.6+ and as such are a
relatively recent phenomenon, but there has been some effort to
eliminate them, and if I could get a non-hacked -Werror build I'd feel
happy enough about excluding them as already outlined.



I just do this:
echo COPT=-Werror>  src/Makefile.custom
...which seems to work reasonably well.


I see no point in -Werror whatsoever.  If you aren't examining the make
output for warnings, you're not following proper development practice
IMO.


I find -Werror to be a convenient way to examine the output for
warnings.  Otherwise they scroll off the screen.  Yeah, I could save
the output to a file and grep it afterwards, but that seems less
convenient.  I'm clearly not the only one doing it this way, since
src/backend/parser/gram.o manually sticks in -Wno-error...


I use "make -s".

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Jim Nasby


On Dec 23, 2011, at 2:23 PM, Kevin Grittner wrote:

> Jeff Janes  wrote:
> 
>> Could we get some major OLTP users to post their CLOG for
>> analysis?  I wouldn't think there would be much
>> security/propietary issues with CLOG data.
> 
> FWIW, I got the raw numbers to do my quick check using this Ruby
> script (put together for me by Peter Brant).  If it is of any use to
> anyone else, feel free to use it and/or post any enhanced versions
> of it.

Here's output from our largest OLTP system... not sure exactly how to interpret 
it, so I'm just providing the raw data. This spans almost exactly 1 month.

I have a number of other systems I can profile if anyone's interested.

063A 379
063B 143
063C 94
063D 94
063E 326
063F 113
0640 122
0641 270
0642 81
0643 390
0644 183
0645 76
0646 61
0647 50
0648 275
0649 288
064A 126
064B 53
064C 59
064D 125
064E 357
064F 92
0650 54
0651 83
0652 267
0653 328
0654 118
0655 75
0656 104
0657 280
0658 414
0659 105
065A 74
065B 153
065C 303
065D 63
065E 216
065F 169
0660 113
0661 405
0662 85
0663 52
0664 44
0665 78
0666 412
0667 116
0668 48
0669 61
066A 66
066B 364
066C 104
066D 48
066E 68
066F 104
0670 465
0671 158
0672 64
0673 62
0674 115
0675 452
0676 296
0677 65
0678 80
0679 177
067A 316
067B 86
067C 87
067D 270
067E 84
067F 295
0680 299
0681 88
0682 35
0683 67
0684 66
0685 456
0686 146
0687 52
0688 33
0689 73
068A 147
068B 345
068C 107
068D 67
068E 50
068F 97
0690 473
0691 156
0692 47
0693 57
0694 97
0695 550
0696 224
0697 51
0698 80
0699 280
069A 115
069B 426
069C 241
069D 395
069E 98
069F 130
06A0 523
06A1 296
06A2 92
06A3 97
06A4 122
06A5 524
06A6 256
06A7 118
06A8 111
06A9 157
06AA 553
06AB 166
06AC 106
06AD 103
06AE 200
06AF 621
06B0 288
06B1 95
06B2 107
06B3 227
06B4 92
06B5 447
06B6 210
06B7 364
06B8 119
06B9 113
06BA 384
06BB 319
06BC 45
06BD 68
06BE 2
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting -Werror in CFLAGS

2012-01-04 Thread Peter Geoghegan

On 4 January 2012 18:44, Robert Haas  wrote:
> On Tue, Jan 3, 2012 at 9:23 PM, Tom Lane  wrote:
>> I'm also less than thrilled with the idea that whatever the gcc boys
>> decide to make a warning tomorrow will automatically become a MUST FIX
>> NOW for us.
>
> I'm not thrilled about that either.  Especially since they seem to be
> adding more and more warnings that are harder and harder to work
> around for issues that are less and less important.  Unimportant
> warnings that are easily avoidable are not so bad, but...

I'd have a certain amount of sympathy for that view. It took building
with Clang to notice that we incorrectly used one enum rvalue to
assign to a variable of another enum type, which I thought was a
little bit surprising; I'd have expected GCC to catch that one, since
it is pretty likely to be valid.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting -Werror in CFLAGS

2012-01-04 Thread Robert Haas

On Tue, Jan 3, 2012 at 9:23 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Jan 3, 2012 at 7:39 PM, Peter Geoghegan  
>> wrote:
>>> Yes, I know that these only appeared in GCC 4.6+ and as such are a
>>> relatively recent phenomenon, but there has been some effort to
>>> eliminate them, and if I could get a non-hacked -Werror build I'd feel
>>> happy enough about excluding them as already outlined.
>
>> I just do this:
>> echo COPT=-Werror > src/Makefile.custom
>> ...which seems to work reasonably well.
>
> I see no point in -Werror whatsoever.  If you aren't examining the make
> output for warnings, you're not following proper development practice
> IMO.

I find -Werror to be a convenient way to examine the output for
warnings.  Otherwise they scroll off the screen.  Yeah, I could save
the output to a file and grep it afterwards, but that seems less
convenient.  I'm clearly not the only one doing it this way, since
src/backend/parser/gram.o manually sticks in -Wno-error...

> gcc is not the only tool we use in the build process, so if you
> are relying on -Werror to call attention to everything you should be
> worrying about, you lost already.

Hmm, I guess.  I've never had a problem with anything else that I can
remember, though.

> I'm also less than thrilled with the idea that whatever the gcc boys
> decide to make a warning tomorrow will automatically become a MUST FIX
> NOW for us.

I'm not thrilled about that either.  Especially since they seem to be
adding more and more warnings that are harder and harder to work
around for issues that are less and less important.  Unimportant
warnings that are easily avoidable are not so bad, but...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Kevin Grittner

Robert Haas  wrote:

> we only fsync() at end-of-checkpoint.  So we'd have to think about
> what to fsync, and how often, to keep the double-write buffer to a
> manageable size.

I think this is the big tuning challenge with this technology.

> I can't help thinking that any extra fsyncs are pretty expensive,
> though, especially if you have to fsync() every file that's been
> double-written before clearing the buffer. Possibly we could have
> 2^N separate buffers based on an N-bit hash of the relfilenode and
> segment number, so that we could just fsync 1/(2^N)-th of the open
> files at a time.

I'm not sure I'm following -- we would just be fsyncing those files
we actually wrote pages into, right?  Not all segments for the table
involved?

> But even that sounds expensive: writing back lots of dirty data
> isn't cheap.  One of the systems I've been doing performance
> testing on can sometimes take >15 seconds to write a shutdown
> checkpoint,

Consider the relation-file fsyncs for double-write as a form of
checkpoint spreading, and maybe it won't seem so bad.  It should
make that shutdown checkpoint less painful.  Now, I have been
thinking that on a write-heavy system you had better have a BBU
write-back cache, but that's my recommendation, anyway.

> and I'm sure that other people have similar (and worse) problems.

Well, I have no doubt that this feature should be optional.  Those
who prefer can continue to do full-page writes to the WAL, instead. 
Or take the "running with scissors" approach.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] grants vs. inherited tables

2012-01-04 Thread Robert Haas

On Fri, Dec 30, 2011 at 4:25 AM, Marko Kreen  wrote:
>> I have the (hopefully wrong) impression that you're missing the fact
>> that it already exists, at least in 9.0.
>
> You are right, I missed it.  For quite obvious reason:
>
>  $ grep -ri aclexplode doc/
>  $
>
> Is there a good reason why it's undocumented?  Internal/unstable API?
> I better avoid it then.  But I would like to have this or similar
> function as part of public API.

I don't see any real reason why we couldn't document this one.  It
returns OIDs, but that's the name of the game if you're doing anything
non-trivial with PostgreSQL system catalogs.  Off-hand I'm not quite
sure which section of the documentation would be appropriate, though.
It looks like the functions we provide are mostly documented in
chapter 9, Functions and Operators.  Section 9.23 on "System
Information Functions" seems like it's probably the closest fit...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Robert Haas

On Wed, Jan 4, 2012 at 8:31 AM, Kevin Grittner
 wrote:
>> When we reach a restartpoint, we fsync everything down to disk and
>> then nuke the double-write buffer.
>
> I think we add to the double-write buffer as we write pages from the
> buffer to disk.  I don't think it makes sense to do potentially
> repeated writes of the same page with different contents to the
> double-write buffer as we go; nor is it a good idea to leave the page
> unsynced and let the double-write buffer grow for a long time.

You may be right.  Currently, though, we only fsync() at
end-of-checkpoint.  So we'd have to think about what to fsync, and how
often, to keep the double-write buffer to a manageable size.  I can't
help thinking that any extra fsyncs are pretty expensive, though,
especially if you have to fsync() every file that's been
double-written before clearing the buffer.  Possibly we could have 2^N
separate buffers based on an N-bit hash of the relfilenode and segment
number, so that we could just fsync 1/(2^N)-th of the open files at a
time.  But even that sounds expensive: writing back lots of dirty data
isn't cheap.  One of the systems I've been doing performance testing
on can sometimes take >15 seconds to write a shutdown checkpoint, and
I'm sure that other people have similar (and worse) problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_restore direct to database is broken for --insert dumps

2012-01-04 Thread Tom Lane

In http://archives.postgresql.org/pgsql-admin/2012-01/msg8.php
it's pointed out that recent versions of pg_restore fall over on
archives made with -Fc --inserts (or --column-inserts), but only when
restoring direct to database; if you ask for text output it's perfectly
fine.  Investigation shows that the problem is that individual INSERT
commands are being broken apart at arbitrary buffer boundaries --- you
don't see any problem in text output, but when the bufferloads are
submitted as separate PQexec calls, of course bad things happen.

I believe this worked okay up until my patch here:
http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=6545a901a
which removed the mini SQL lexer in pg_backup_db.c.  I had supposed that
that had no useful function except to separate COPY data from
not-COPY-data, but in reality it had another function of ensuring that
INSERT commands split across zlib bufferload boundaries would get
reassembled before they are submitted to PQexec.

Not entirely sure what to do about this.  We could consider reverting
the aforesaid patch and trying to find another way of fixing that code's
failure to cope with standard-conforming strings, but I'm not sure that
there's a good way to know whether standard_conforming_strings is active
here, and anyway that code was ugly as sin and I'd rather not resurrect
it.  But on the other hand, there are no clear line boundaries in the
compressed data, and we can't introduce any without (a) worsening
compression and (b) breaking compatibility with existing dump files.

Anybody have any bright ideas?  I'm fresh out at the moment.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2012-01-04 Thread Robert Haas

On Wed, Jan 4, 2012 at 9:28 AM, Aidan Van Dyk  wrote:
> I'ld love a "hook" script that was run if sync-rep state ever changed
> (heck, I'ld even like it if it just choose a new sync standby).

That seems useful.  I don't think the current code quite knows its own
state; we seem to have each walsender recompute who the boss is, and
if you query pg_stat_replication that redoes the same calculation.  I
can't shake the feeling that there's a better way... which would also
facilitate this.

> Even better, is there a way we could start injecting "notify" events
> into the cluster on these types of changes?  Especially now that
> notify events can take payloads, it means I don't have to keep
> constantly polling the database to see if it things its connected,
> etc.

I like this idea, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding Checkpoint Redo Record

2012-01-04 Thread Robert Haas

On Wed, Jan 4, 2012 at 11:02 AM, Simon Riggs  wrote:
> On Wed, Jan 4, 2012 at 3:56 PM, Heikki Linnakangas
>  wrote:
>> On 04.01.2012 08:42, Amit Kapila wrote:
>>>
>>> Why PostgreSQL needs to write WAL record for Checkpoint when it maintains
>>> same information in pg_control file?
>>
>>
>> I guess it wouldn't be strictly necessary...
>
> Apart from replicated standbys, which need that info for running 
> restartpoints.

Yeah.

But, the OP makes me wonder: why can a standby only perform a
restartpoint where the master performed a checkpoint?  It seems like a
standby ought to be able to create a restartpoint anywhere, just by
writing everything, flushing it to disk, and update pg_control.  I
assume there's some reason that doesn't work, I just don't know what
it is...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Tom Lane

Andrew Dunstan  writes:
> The docs (perldoc perlvar) seem to suggest $^V isn't an SV (i.e. a 
> scalar) but some other sort of animal:

Yeah, it's a version object, but I'd have thought that SvPV and friends
would automatically stringify such an object.  Otherwise, practically
any kind of perl extension could be crashed by passing it one, no?

> But Util.xs::util_elog() expects an SV and doesn't check whether or not 
> it actually has one. I've found a few other ways of crashing this call 
> (e.g. by passing a typeglob), so maybe we need to test that we actually 
> have an SV. I think SvOK() is what we'd use for that - perl gurus please 
> confirm.

I looked at that last night but it appeared that SvOK would be perfectly
happy.  (Didn't actually try it, though, I was just eyeballing the flags
in gdb.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Andrew Dunstan




On 01/04/2012 11:15 AM, Tom Lane wrote:

Andrew Dunstan  writes:

On 01/04/2012 12:47 AM, David E. Wheeler wrote:

Oy, this doesn’t look good:
$ do LANGUAGE plperl $$ elog(NOTICE, $^V) $$;
The connection to the server was lost. Attempting reset: Succeeded.

Try
 elog(NOTICE, "$^V")

Isn't this a Perl bug?  It seems to be crashing in SvPVutf8, which
means that either Perl passed something that's not an SV to a function
declared to accept SVs, or that SvPVutf8 fails on some SVs.  Either
way, Perl is failing to satisfy the POLA if you ask me.




U, not sure.

The docs (perldoc perlvar) seem to suggest $^V isn't an SV (i.e. a 
scalar) but some other sort of animal:



$^V The revision, version, and subversion of the Perl interpreter,
represented as a "version" object.

This variable first appeared in perl 5.6.0; earlier versions of
perl will see an undefined value. Before perl 5.10.0 $^V was
represented as a v-string.

$^V can be used to determine whether the Perl interpreter
executing a script is in the right range of versions. For
example:

warn "Hashes not randomized!\n" if !$^V or $^V lt v5.8.1

To convert $^V into its string representation use "sprintf()"'s
"%vd" conversion:

printf "version is v%vd\n", $^V; # Perl's version




But Util.xs::util_elog() expects an SV and doesn't check whether or not 
it actually has one. I've found a few other ways of crashing this call 
(e.g. by passing a typeglob), so maybe we need to test that we actually 
have an SV. I think SvOK() is what we'd use for that - perl gurus please 
confirm.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread David E. Wheeler

On Jan 4, 2012, at 8:15 AM, Tom Lane wrote:

> Isn't this a Perl bug?  It seems to be crashing in SvPVutf8, which
> means that either Perl passed something that's not an SV to a function
> declared to accept SVs, or that SvPVutf8 fails on some SVs.  Either
> way, Perl is failing to satisfy the POLA if you ask me.

Possibly, though I know nothing of Perl’s internals. Someone able to come up 
with a simple test case?

Thanks,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread David E. Wheeler

On Jan 4, 2012, at 12:44 AM, Andrew Dunstan wrote:

> Try
> 
>   elog(NOTICE, "$^V")

Yeah, I used

elog(NOTICE, version->new($^V))

Which was fine. But still, it should’t segfault.

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Tom Lane

Andrew Dunstan  writes:
> On 01/04/2012 12:47 AM, David E. Wheeler wrote:
>> Oy, this doesnt look good:
>> $ do LANGUAGE plperl $$ elog(NOTICE, $^V) $$;
>> The connection to the server was lost. Attempting reset: Succeeded.

> Try
> elog(NOTICE, "$^V")

Isn't this a Perl bug?  It seems to be crashing in SvPVutf8, which
means that either Perl passed something that's not an SV to a function
declared to accept SVs, or that SvPVutf8 fails on some SVs.  Either
way, Perl is failing to satisfy the POLA if you ask me.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding Checkpoint Redo Record

2012-01-04 Thread Simon Riggs

On Wed, Jan 4, 2012 at 3:56 PM, Heikki Linnakangas
 wrote:
> On 04.01.2012 08:42, Amit Kapila wrote:
>>
>> Why PostgreSQL needs to write WAL record for Checkpoint when it maintains
>> same information in pg_control file?
>
>
> I guess it wouldn't be strictly necessary...

Apart from replicated standbys, which need that info for running restartpoints.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding Checkpoint Redo Record

2012-01-04 Thread Heikki Linnakangas


On 04.01.2012 08:42, Amit Kapila wrote:

Why PostgreSQL needs to write WAL record for Checkpoint when it maintains
same information in pg_control file?


I guess it wouldn't be strictly necessary...


This may be required in case we need information about more than one
checkpoint as pg_control can hold information of only recent checkpoint. But
I could not think of a case where more than one checkpoint information will
be required.


There is a pointer in the control file to the previous checkpoint, too. 
It's not normally needed, but we fall back to that if the latest 
checkpoint cannot be read for some reason, like disk failure. If you 
have a disk failure and cannot read the latest checkpoint, chances are 
that you have a corrupt database anyway, but at least we try to recover 
as much as we can using the previous checkpoint.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Regarding Checkpoint Redo Record

2012-01-04 Thread Amit Kapila

Why PostgreSQL needs to write WAL record for Checkpoint when it maintains
same information in pg_control file?

This may be required in case we need information about more than one
checkpoint as pg_control can hold information of only recent checkpoint. But
I could not think of a case where more than one checkpoint information will
be required.

Could anybody let me know the cases where it is required.

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Simon Riggs

On Wed, Jan 4, 2012 at 1:31 PM, Stephen Frost  wrote:
> Simon, all,
>
> * Simon Riggs (si...@2ndquadrant.com) wrote:
>> (1) report all errors on a page, including errors that don't change
>> PostgreSQL data. This involves checksumming long strings of zeroes,
>> which the checksum algorithm can't tell apart from long strings of
>> ones.
>
> Do we actually know when/where it's supposed to be all zeros, and hence
> could we check for that explicitly?  If we know what it's supposed to
> be, in order to be consistent with other information, I could certainly
> see value in actually checking that.

Yes, we can. Excellent suggestion, will implement.

That means we can keep the CRC calc fast as well as check the whole of
the page inbound.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BGWriter latch, power saving

2012-01-04 Thread Peter Geoghegan

On 4 January 2012 07:24, Heikki Linnakangas
 wrote:
> I think SetBufferCommitInfoNeedsSave() needs the same treatment as
> MarkBufferDirty(). And it would probably be good to only set the latch if
> the buffer wasn't dirty already. Setting a latch that's already set is fast,
> but surely it's even faster to not even try.

That seems reasonable. Revised patch is attached.

> Yeah, I'd like to see a micro-benchmark of a worst-case scenario. I'm a bit
> worried about the impact on systems with a lot of CPUs. If you have a lot of
> CPUs writing to the same cache line that contains the latch's flag, that
> might get expensive.

Also reasonable, but I don't think that I'll get around to it until
after the final commitfest deadline.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
new file mode 100644
index e3ae92d..0fcaf01
*** a/src/backend/bootstrap/bootstrap.c
--- b/src/backend/bootstrap/bootstrap.c
*** AuxiliaryProcessMain(int argc, char *arg
*** 416,421 
--- 416,422 
  
  		case BgWriterProcess:
  			/* don't set signals, bgwriter has its own agenda */
+ 			ProcGlobal->bgwriterLatch = &MyProc->procLatch; /* Expose */
  			BackgroundWriterMain();
  			proc_exit(1);		/* should never return */
  
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
new file mode 100644
index fc1a579..a5248ba
*** a/src/backend/port/unix_latch.c
--- b/src/backend/port/unix_latch.c
***
*** 50,55 
--- 50,56 
  #include "miscadmin.h"
  #include "postmaster/postmaster.h"
  #include "storage/latch.h"
+ #include "storage/proc.h"
  #include "storage/shmem.h"
  
  /* Are we currently in WaitLatch? The signal handler would like to know. */
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
new file mode 100644
index 1f8d2d6..59e5180
*** a/src/backend/postmaster/bgwriter.c
--- b/src/backend/postmaster/bgwriter.c
***
*** 51,56 
--- 51,58 
  #include "storage/ipc.h"
  #include "storage/lwlock.h"
  #include "storage/pmsignal.h"
+ #include "storage/proc.h"
+ #include "storage/procsignal.h"
  #include "storage/shmem.h"
  #include "storage/smgr.h"
  #include "storage/spin.h"
*** static volatile sig_atomic_t shutdown_re
*** 73,78 
--- 75,82 
  /*
   * Private state
   */
+ #define BGWRITER_HIBERNATE_MS			1
+ 
  static bool am_bg_writer = false;
  
  /* Prototypes for private functions */
*** BackgroundWriterMain(void)
*** 123,129 
  	pqsignal(SIGQUIT, bg_quickdie);		/* hard crash time */
  	pqsignal(SIGALRM, SIG_IGN);
  	pqsignal(SIGPIPE, SIG_IGN);
! 	pqsignal(SIGUSR1, SIG_IGN);			/* reserve for ProcSignal */
  	pqsignal(SIGUSR2, SIG_IGN);
  
  	/*
--- 127,133 
  	pqsignal(SIGQUIT, bg_quickdie);		/* hard crash time */
  	pqsignal(SIGALRM, SIG_IGN);
  	pqsignal(SIGPIPE, SIG_IGN);
! 	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
  	pqsignal(SIGUSR2, SIG_IGN);
  
  	/*
*** BackgroundWriterMain(void)
*** 238,278 
  	for (;;)
  	{
  		/*
! 		 * Emergency bailout if postmaster has died.  This is to avoid the
! 		 * necessity for manual cleanup of all postmaster children.
  		 */
! 		if (!PostmasterIsAlive())
! 			exit(1);
! 
! 		if (got_SIGHUP)
  		{
! 			got_SIGHUP = false;
! 			ProcessConfigFile(PGC_SIGHUP);
! 			/* update global shmem state for sync rep */
  		}
! 		if (shutdown_requested)
  		{
  			/*
! 			 * From here on, elog(ERROR) should end with exit(1), not send
! 			 * control back to the sigsetjmp block above
  			 */
! 			ExitOnAnyError = true;
! 			/* Normal exit from the bgwriter is here */
! 			proc_exit(0);		/* done */
! 		}
  
! 		/*
! 		 * Do one cycle of dirty-buffer writing.
! 		 */
! 		BgBufferSync();
  
! 		/* Nap for the configured time. */
! 		BgWriterNap();
  	}
  }
  
  /*
   * BgWriterNap -- Nap for the configured time or until a signal is received.
   */
  static void
  BgWriterNap(void)
--- 242,337 
  	for (;;)
  	{
  		/*
! 		 * Do one cycle of dirty-buffer writing, potentially hibernating if
! 		 * there have been no buffers to write.
  		 */
! 		if (!BgBufferSync())
  		{
! 			/* Clock sweep was not "lapped" - nap for the configured time. */
! 			BgWriterNap();
  		}
! 		else
  		{
+ 			int res = 0;
  			/*
! 			 * Initiate hibernation by "arming" the process latch. This usage
! 			 * differs from the standard pattern for latches outlined in
! 			 * latch.h, where the latch is generally reset immediately after
! 			 * WaitLatch returns, in the next iteration of an infinite loop.
! 			 *
! 			 * It should only be possible to *really* set the latch from client
! 			 * backends while the bgwriter is idle, and not during productive
! 			 * cycles where buffers are written, or shortly thereafter. It's
! 			 * important that the SetLatch() call within the buffer manager
! 			 * usually inexpensively ret

Re: [HACKERS] Standalone synchronous master

2012-01-04 Thread Aidan Van Dyk

On Tue, Jan 3, 2012 at 9:22 PM, Robert Haas  wrote:

> It seems to me that if you are happy with #2, you don't really need to
> enable sync rep in the first place.
>
> At any rate, even without multiple component failures, this
> configuration makes it pretty easy to lose durability (which is the
> only point of having sync rep in the first place).  Suppose the NIC
> card on the master is the failing component.  If it happens to drop
> the TCP connection to the clients just before it drops the connection
> to the standby, the standby will have all the transactions, and you
> can fail over just fine.  If it happens to drop the TCP connection to
> the just before it drops the connection to the clients, the standby
> will not have all the transactions, and failover will lose some
> transactions - and presumably you enabled this feature in the first
> place precisely to prevent that sort of occurrence.
>
> I do think that it might be useful to have this if there's a
> configurable timeout involved - that way, people could say, well, I'm
> OK with maybe losing transactions if the standby has been gone for X
> seconds.  But if the only possible behavior is equivalent to a
> zero-second timeout I don't think it's too useful.  It's basically
> just going to lead people to believe that their data is more secure
> than it really is, which IMHO is not helpful.

So, I'm a big fan of syncrep guaranteeing it's guarantees.  To me,
that's the whole point.  Having it "fall out of sync rep" at any point
*automatically* seems to be exactly counter to the point of sync rep.

That said, I'm also a big fan of monitoring everything as well as I could...

I'ld love a "hook" script that was run if sync-rep state ever changed
(heck, I'ld even like it if it just choose a new sync standby).

Even better, is there a way we could start injecting "notify" events
into the cluster on these types of changes?  Especially now that
notify events can take payloads, it means I don't have to keep
constantly polling the database to see if it things its connected,
etc.

a.


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] review: CHECK FUNCTION statement

2012-01-04 Thread Albe Laurenz

Pavel Stehule wrote:
> here is new version of CHECK FUNCTION patch
> 
> I changed implementation of interface:
> 
> * checked functions returns table instead raising exceptions - it
> necessary for describing more issues inside one function - and it
> allow to use better structured data then ExceptionData

[...]

> * result of CHECK FUNCTION is simple table (like EXPLAIN - based on
> Tom proposition)

I don't have the time for a complete review, but I tried the patch
and found:

It is in context diff and applies to current master (there is fuzz 1
in one hunk). It contains documentation and regression tests.
Compiles without warnings and passes regression tests.

The two or three CHECK FUNCTIONs I ran worked ok.

The documentation (that I wrote) will need to get updated: currently
it states in two places that the checker function should throw a
warning if it encounters a problem.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Kevin Grittner

Alvaro Herrera  wrote:
 
> We don't support BLCKSZ higher than 32k anyway
 
Thanks for pointing that out.  Then I think we should declare sum1 to
be uint and sum2 to be uint64.  We can take out the "% 255" out from
where it sits in the v2 patch, and just add something like this after
the sums are generated:
 
sum1 %= 255;
sum2 %= 255;
 
Or, of course, we can just do it on the line where we combine the two
sums for the final result, if that's not too hard to read.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Kevin Grittner

Simon Riggs  wrote:

> My focus was on getting something working first, then tuning. If
> we're agreed that we have everything apart from the tuning then we
> can proceed with tests to see which works better.

Sure.  I just think you are there already except for what I got into.

FWIW, moving the modulus application out of the loop is a very
trivial change and has no affect on the results; it's strictly a
performance issue.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Stephen Frost

Simon, all,

* Simon Riggs (si...@2ndquadrant.com) wrote:
> (1) report all errors on a page, including errors that don't change
> PostgreSQL data. This involves checksumming long strings of zeroes,
> which the checksum algorithm can't tell apart from long strings of
> ones.

Do we actually know when/where it's supposed to be all zeros, and hence
could we check for that explicitly?  If we know what it's supposed to
be, in order to be consistent with other information, I could certainly
see value in actually checking that.

I don't think that's valuable enough to go breaking abstraction layers
or bending over backwards to do it though.  If we don't have the
knowledge, at the right level, that the data should all be zeros then
including those pieces in the CRC certainly makes sense to me.

Just my 2c.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Kevin Grittner

Robert Haas  wrote:
> Jeff Janes  wrote:

>> But it doesn't seem safe to me replace a page from the DW buffer
>> and then apply WAL to that replaced page which preceded the age of
>> the page in the buffer.
>
> That's what LSNs are for.

Agreed.

> If we write the page to the checkpoint buffer just once per
> checkpoint, recovery can restore the double-written versions of the
> pages and then begin WAL replay, which will restore all the
> subsequent changes made to the page. Recovery may also need to do
> additional double-writes if it encounters pages that for which we
> wrote WAL but never flushed the buffer, because a crash during
> recovery can also create torn pages.

That's a good point.  I think WAL application does need to use
double-write.  As usual, it doesn't affect *when* a page must be
written, but *how*.

> When we reach a restartpoint, we fsync everything down to disk and
> then nuke the double-write buffer.

I think we add to the double-write buffer as we write pages from the
buffer to disk.  I don't think it makes sense to do potentially
repeated writes of the same page with different contents to the
double-write buffer as we go; nor is it a good idea to leave the page
unsynced and let the double-write buffer grow for a long time.

> Similarly, in normal running, we can nuke the double-write buffer
> at checkpoint time, once the fsyncs are complete.

Well, we should nuke it for re-use as soon as all pages in the buffer
are written and fsynced.  I'm not at all sure that the best
performance is hit by waiting for checkpoint for that versus doing it
at page eviction time.

The whole reason that double-write techniques don't double the write
time is that it is relatively small and the multiple writes to the
same disk sectors get absorbed by the BBU write-back without actually
hitting the disk all the time.  Letting the double-write buffer grow
to a large size seems likely to me to be a performance killer.  The
whole double-write, including fsyncs to buffer and the actual page
location should just be considered part of the page write process, I
think.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Alvaro Herrera

Excerpts from Kevin Grittner's message of mié ene 04 04:12:43 -0300 2012:
> "Kevin Grittner"  wrote:
>  
> > if we define sum1 and sum2 as uint I don't see how we can get an
> > overflow with 8k byes
>  
> I feel the need to amend that opinion.
>  
> While sum1 only needs to hold a maximum of (BLCKSZ * 255), which
> would be adequate for a BLCKSZ up to 16 MB, sum2 needs to hold up to
> a maximum of ((BLCKSZ * (BLCKSZ + 1) / 2) * 255) so a 32-bit unsigned
> isn't even good for an 8 KB block size.  A uint64 for sum2 can handle
> BLCKSZ up to 256 MB.  So we could define sum1 as uint (which is
> presumably either uint32 or uint64) and sum2 as uint64 and for a
> BLCKSZ up toe 16 MB we still don't need to find the modulo 255
> numbers until we're done adding things up.
>  
> Does anyone think we need to support anything bigger than that? 
> Making them both uint64 would support a BLCKSZ up to 256 MB without
> needing to do the division inside the loop. but it might be slightly
> slower on 32-bit builds.

We don't support BLCKSZ higher than 32k anyway ... saith pg_config.h:

/* Size of a disk block --- this also limits the size of a tuple. You can set
   it bigger if you need bigger tuples (although TOAST should reduce the need
   to have large tuples, since fields can be spread across multiple tuples).
   BLCKSZ must be a power of 2. The maximum possible value of BLCKSZ is
   currently 2^15 (32768). This is determined by the 15-bit widths of the
   lp_off and lp_len fields in ItemIdData (see include/storage/itemid.h).
   Changing BLCKSZ requires an initdb. */
#define BLCKSZ 8192

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Kevin Grittner

Simon Riggs  wrote:

> We can either
> 
> (1) report all errors on a page, including errors that don't change
> PostgreSQL data. This involves checksumming long strings of zeroes,
> which the checksum algorithm can't tell apart from long strings of
> ones.

What do you mean?  Each byte which goes into the checksum, and the
position of that byte influences the outcome once you've got a
non-zero value in sum1.  The number of leading NIL bytes would not
affect the outcome unless you seed the calculation with something
non-zero, but including the page header in the calculation seems to
cover that OK.

> (2) report only errors that changed PostgreSQL data.
> 
> We already do (1) for WAL CRCs so doing the same thing for page
> checksums makes sense and is much faster.
> 
> If enough people think we should do (2) then its a simple change to
> the patch.

To me, (1) makes more sense, but it seems to me you're currently
doing (2) because you check in three parts, skipping the free space
in the middle of the page.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgstat wait timeout

2012-01-04 Thread Tomas Vondra

On 4 Leden 2012, 13:17, pratikchirania wrote:
> I have installed RAMdisk and pointed the parameter:
>
> #stats_temp_directory = 'B:\pg_stat_tmp'
> I also tried #stats_temp_directory = 'B:/pg_stat_tmp'
>
> But, still there is no file created in the RAM disk.
> The previous stat file is touched even after the change is made. (I have
> restarted the service after effecting the change)

You have to remove the '#' at the beginning, this way it's commented out.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgstat wait timeout

2012-01-04 Thread pratikchirania

I have installed RAMdisk and pointed the parameter:

#stats_temp_directory = 'B:\pg_stat_tmp'
I also tried #stats_temp_directory = 'B:/pg_stat_tmp'

But, still there is no file created in the RAM disk.
The previous stat file is touched even after the change is made. (I have
restarted the service after effecting the change)

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pgstat-wait-timeout-tp5078125p5119436.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Nicolas Barbier

2012/1/4 Simon Riggs :

> On Wed, Jan 4, 2012 at 9:20 AM, Andres Freund  wrote:
>
>> I wonder if CRC32c wouldn't be a good alternative given more and more cpus
>> (its in SSE 4.2) support calculating it in silicon.
>
> We're trying to get something that fits in 16bits for this release.
> I'm guessing CRC32c doesn't?

What happens to the problem-detecting performance of a 16 bit part of
a CRC32c vs. a real 16 bit checksum? If it is still as good, it might
make sense to use the former, assuming that there is a way to easily
trigger the silicon support and enough CPUs support it.

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Ants Aasma

On Wed, Jan 4, 2012 at 3:49 AM, Robert Haas  wrote:
> On Fri, Dec 30, 2011 at 11:58 AM, Jeff Janes  wrote:
>> On 12/29/11, Ants Aasma  wrote:
>>> Unless I'm missing something, double-writes are needed for all writes,
>>> not only the first page after a checkpoint. Consider this sequence of
>>> events:
>>>
>>> 1. Checkpoint
>>> 2. Double-write of page A (DW buffer write, sync, heap write)
>>> 3. Sync of heap, releasing DW buffer for new writes.
>>>  ... some time goes by
>>> 4. Regular write of page A
>>> 5. OS writes one part of page A
>>> 6. Crash!
>>>
>>> Now recovery comes along, page A is broken in the heap with no
>>> double-write buffer backup nor anything to recover it by in the WAL.
>>
>> Isn't 3 the very definition of a checkpoint, meaning that 4 is not
>> really a regular write as it is the first one after a checkpoint?
>
> I think you nailed it.

No, I should have explicitly stated that no checkpoint happens in
between. I think the confusion here is because I assumed Kevin
described a fixed size d-w buffer in this message:

On Thu, Dec 29, 2011 at 6:44 PM, Kevin Grittner
 wrote:
> ...  The file is fsync'd (like I said,
> hopefully to BBU cache), then each page in the double-write buffer is
> written to the normal page location, and that is fsync'd.  Once that
> is done, the database writes have no risk of being torn, and the
> double-write buffer is marked as empty.  ...

If the double-write buffer survives until the next checkpoint,
double-writing only the first write should work just fine. The
advantage over current full-page writes is that the write is not into
the WAL stream and is done (hopefully) by the bgwriter/checkpointer in
the background.

--
Ants Aasma

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-04 Thread Simon Riggs

On Tue, Jan 3, 2012 at 11:28 PM, Tom Lane  wrote:
> Jim Nasby  writes:
>> On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
>>> This could well be related to the fact that DropRelFileNodeBuffers()
>>> does a scan of shared_buffers, which is an O(N) approach no matter the
>>> size of the index.
>
>> Couldn't we just leave the buffers alone? Once an index is dropped and 
>> that's pushed out through the catalog then nothing should be trying to 
>> access them and they'll eventually just get aged out.
>
> No, we can't, because if they're still dirty then the bgwriter would
> first try to write them to the no-longer-existing storage file.  It's
> important that we kill the buffers immediately during relation drop.
>
> I'm still thinking that it might be sufficient to mark the buffers
> invalid and let the clock sweep find them, thereby eliminating the need
> for a freelist.

My patch puts things on the freelist only when it is free to do so.
Not having a freelist at all is probably a simpler way of avoiding the
lock contention, so I'll happily back that suggestion instead. Patch
attached, previous patch revoked.

> Simon is after a different solution involving getting
> rid of the clock sweep...

err, No, he isn't. Not sure where that came from since I'm advocating
only minor changes there to curb worst case behaviour.

But lets discuss that on the main freelist thread.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index 94cefba..9332a74 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -115,7 +115,7 @@ InitBufferPool(void)
 			 * Initially link all the buffers together as unused. Subsequent
 			 * management of this list is done by freelist.c.
 			 */
-			buf->freeNext = i + 1;
+			StrategyInitFreelistBuffer(buf);
 
 			buf->io_in_progress_lock = LWLockAssign();
 			buf->content_lock = LWLockAssign();
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 3e62448..6b49cae 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -27,6 +27,7 @@ typedef struct
 	/* Clock sweep hand: index of next buffer to consider grabbing */
 	int			nextVictimBuffer;
 
+#ifdef USE_BUFMGR_FREELIST
 	int			firstFreeBuffer;	/* Head of list of unused buffers */
 	int			lastFreeBuffer; /* Tail of list of unused buffers */
 
@@ -34,7 +35,7 @@ typedef struct
 	 * NOTE: lastFreeBuffer is undefined when firstFreeBuffer is -1 (that is,
 	 * when the list is empty)
 	 */
-
+#endif
 	/*
 	 * Statistics.	These counters should be wide enough that they can't
 	 * overflow during a single bgwriter cycle.
@@ -134,6 +135,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 	 */
 	StrategyControl->numBufferAllocs++;
 
+#ifdef USE_BUFMGR_FREELIST
 	/*
 	 * Try to get a buffer from the freelist.  Note that the freeNext fields
 	 * are considered to be protected by the BufFreelistLock not the
@@ -165,8 +167,9 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 		}
 		UnlockBufHdr(buf);
 	}
+#endif
 
-	/* Nothing on the freelist, so run the "clock sweep" algorithm */
+	/* Run the "clock sweep" algorithm */
 	trycounter = NBuffers;
 	for (;;)
 	{
@@ -182,20 +185,25 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 		 * If the buffer is pinned or has a nonzero usage_count, we cannot use
 		 * it; decrement the usage_count (unless pinned) and keep scanning.
 		 */
-		LockBufHdr(buf);
 		if (buf->refcount == 0)
 		{
-			if (buf->usage_count > 0)
+			if (buf->usage_count > StrategyControl->completePasses)
 			{
 buf->usage_count--;
 trycounter = NBuffers;
 			}
 			else
 			{
-/* Found a usable buffer */
-if (strategy != NULL)
-	AddBufferToRing(strategy, buf);
-return buf;
+LockBufHdr(buf);
+if (buf->refcount == 0)
+{
+	UnlockBufHdr(buf);
+	/* Found a usable buffer */
+	if (strategy != NULL)
+		AddBufferToRing(strategy, buf);
+	return buf;
+}
+UnlockBufHdr(buf);
 			}
 		}
 		else if (--trycounter == 0)
@@ -207,10 +215,8 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 			 * probably better to fail than to risk getting stuck in an
 			 * infinite loop.
 			 */
-			UnlockBufHdr(buf);
 			elog(ERROR, "no unpinned buffers available");
 		}
-		UnlockBufHdr(buf);
 	}
 
 	/* not reached */
@@ -223,6 +229,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 void
 StrategyFreeBuffer(volatile BufferDesc *buf)
 {
+#ifdef USE_BUFMGR_FREELIST
 	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
 
 	/*
@@ -238,6 +245,24 @@ StrategyFreeBuffer(volatile BufferDesc *buf)
 	}
 
 	LWLockRelease(BufFreelistLock);
+#endif
+}
+
+/*
+ * StrategyInitFreelist: put a buffer on the freelist during InitBufferPool
+ */
+void
+StrategyInit

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Simon Riggs

On Wed, Jan 4, 2012 at 9:20 AM, Andres Freund  wrote:
> On Tuesday, January 03, 2012 11:21:42 PM Kevin Grittner wrote:
>> (1)  I like the choice of Fletcher-16.  It should be very good at
>> detecting problems while being a lot less expensive that an official
>> CRC calculation.
> I wonder if CRC32c wouldn't be a good alternative given more and more cpus
> (its in SSE 4.2) support calculating it in silicon.

We're trying to get something that fits in 16bits for this release.
I'm guessing CRC32c doesn't?


-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Andres Freund

On Tuesday, January 03, 2012 11:21:42 PM Kevin Grittner wrote:
> (1)  I like the choice of Fletcher-16.  It should be very good at
> detecting problems while being a lot less expensive that an official
> CRC calculation. 
I wonder if CRC32c wouldn't be a good alternative given more and more cpus 
(its in SSE 4.2) support calculating it in silicon.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-04 Thread Andrew Dunstan




On 01/04/2012 12:47 AM, David E. Wheeler wrote:

Is this perhaps by design?

Oy, this doesn’t look good:

$ do LANGUAGE plperl $$ elog(NOTICE, $^V) $$;
ERROR:  server conn crashed?
ERROR:  server conn crashed?
The connection to the server was lost. Attempting reset: Succeeded.
(pgxn@localhost:5900) 06:44:42 [pgxn]
$



Try

   elog(NOTICE, "$^V")

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Simon Riggs

On Tue, Jan 3, 2012 at 10:21 PM, Kevin Grittner
 wrote:

> I'm happy with how this looks, except (as noted in a code comment)
> that there seems to be room for optimization of the calculation
> itself.  Details below:

...

> (3)  Rather than having PageSetVerificationInfo() use memcpy,
> followed by pass through the copied data to calculate the checksum,
> might it be better to have a "copy and calculate" version of the
> function (as in VMware's original patch) to save an extra pass over
> the page image?

> Other than these performance tweaks around the calculation phase, I
> didn't spot any problems.  I beat up on it a bit on a couple
> machines without hitting any bugs or seeing any regression test
> failures.

My focus was on getting something working first, then tuning. If we're
agreed that we have everything apart from the tuning then we can
proceed with tests to see which works better.

The copy and calculate approach might get in the way of hardware
prefetch since in my understanding the memory fetch time exceeds the
calculation time. As discussed elsewhere using that code or not would
not stop that work being credited.

David, please can you rework the VMware calc patch to produce an
additional 16-bit checksum mechanism in a way compatible with the
16bit patch, so we can test the two versions of the calculation? We
can make the GUC an enum so that the page checksum is selectable (for
testing).

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-04 Thread Simon Riggs

On Tue, Jan 3, 2012 at 11:00 PM, Jim Nasby  wrote:
> On Jan 3, 2012, at 4:21 PM, Kevin Grittner wrote:
>> (2)  I'm not sure about doing this in three parts, to skip the
>> checksum itself and the hole in the middle of the page.  Is this
>> because the hole might not have predictable data?  Why would that
>> matter, as long as it is read back the same?
>
> IMO not checksumming the free space would be a really bad idea. It's entirely 
> possible to have your hardware crapping on your free space, and I'd still 
> want to know that that was happening. Now, it would be interesting if the 
> free space could be checksummed separately, since there's no reason to refuse 
> to read the page if only the free space is screwed up... But given the 
> choice, I'd rather get an error when the free space is "corrupted" and be 
> forced to look into things rather than blissfully ignore corrupted free space 
> only to be hit later with real data loss.

I see that argument. We don't have space for 2 checksums.

We can either

(1) report all errors on a page, including errors that don't change
PostgreSQL data. This involves checksumming long strings of zeroes,
which the checksum algorithm can't tell apart from long strings of
ones.

(2) report only errors that changed PostgreSQL data.

We already do (1) for WAL CRCs so doing the same thing for page
checksums makes sense and is much faster.

If enough people think we should do (2) then its a simple change to the patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BGWriter latch, power saving

2012-01-04 Thread Simon Riggs

On Wed, Jan 4, 2012 at 7:24 AM, Heikki Linnakangas
 wrote:

> Setting a latch that's already set is fast,
> but surely it's even faster to not even try.

Agreed. I think we should SetLatch() at the first point a backend
writes a dirty buffer because the bgwriter has been inactive.

Continually waking the bgwriter makes it harder for it to return to sleep.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

74 matches

Mail list logo