Re: [HACKERS] hstore ==> and deprecate =>

2010-06-10 Thread Peter Eisentraut
On tis, 2010-06-08 at 16:17 -0400, Robert Haas wrote:
> > Perhaps
> > ->
> 
> That's already in use to mean something else.

Btw., the SQL standard also defines -> for something else, so if you
wanted to be really visionary, you could deprecate that one as an
operator at the same time.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LLVM / clang

2010-06-10 Thread Peter Eisentraut
On tor, 2010-06-10 at 09:52 -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > Quick testing shows that clang doesn't get through the configure stage
> > on this Debian system -- it looks like some amount of better integration
> > with glibc might be needed.  Building with llvm-gcc works fine, but I
> > understand that using llvm-gcc with native code generation isn't all
> > that different from using gcc itself, so that's not a surprising result.
> > The only issue is that the float8 regression test fails, so it is
> > apparently not *exactly* the same.
> 
> There's a buildfarm animal using llvm-gcc, and it passes just fine ...
> so the float8 failure sounds to me like another integration problem.

The diff in this case is

*** src/test/regress/expected/float8.out
--- src/test/regress/results/float8.out
***
*** 384,390 
  SELECT '' AS bad, f.f1 * '1e200' from FLOAT8_TBL f;
  ERROR:  value out of range: overflow
  SELECT '' AS bad, f.f1 ^ '1e200' from FLOAT8_TBL f;
! ERROR:  value out of range: overflow
  SELECT 0 ^ 0 + 0 ^ 1 + 0 ^ 0.0 + 0 ^ 0.5;
   ?column? 
  --
--- 384,398 
  SELECT '' AS bad, f.f1 * '1e200' from FLOAT8_TBL f;
  ERROR:  value out of range: overflow
  SELECT '' AS bad, f.f1 ^ '1e200' from FLOAT8_TBL f;
!  bad | ?column? 
! -+--
!  |0
!  |  NaN
!  |  NaN
!  |  NaN
!  |  NaN
! (5 rows)
! 
  SELECT 0 ^ 0 + 0 ^ 1 + 0 ^ 0.0 + 0 ^ 0.5;
   ?column? 
  --

which means that this combo signals an overflow in pow() by returning
NaN and not setting errno.

Curiously enough, the problem goes away when you insert elog()
statements after the pow() call.  Could be a code generation/pipelining
issue.

Btw., this is

llvm-gcc (GCC) 4.2.1 (Based on Apple Inc. build 5649) (LLVM build)

which sounds somewhat old.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LLVM / clang

2010-06-10 Thread Takahiro Itagaki

Peter Eisentraut  wrote:

> Some new warnings, however:
> 
> xlog.c:7759:22: warning: self-comparison always results in a constant
> value
> max_locks_per_xact != max_locks_per_xact)
>^
> 
> Looks like a bug.

Ah, it should be compared with the same name field in ControlFile.

Index: src/backend/access/transam/xlog.c
===
--- src/backend/access/transam/xlog.c   (HEAD)
+++ src/backend/access/transam/xlog.c   (fixed)
@@ -7756,7 +7756,7 @@
if (wal_level != ControlFile->wal_level ||
MaxConnections != ControlFile->MaxConnections ||
max_prepared_xacts != ControlFile->max_prepared_xacts ||
-   max_locks_per_xact != max_locks_per_xact)
+   max_locks_per_xact != ControlFile->max_locks_per_xact)
{
/*
 * The change in number of backend slots doesn't need to be



Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LLVM / clang

2010-06-10 Thread Tom Lane
Peter Eisentraut  writes:
> [ assorted LLVM warnings ]

> dt_common.c:818:75: warning: more data arguments than '%' conversions
> [-Wformat-extra-args]
> sprintf(str + strlen(str), (min != 0) ?
> "%+03d:%02d" : "%+03d", hour, min);
> ~~~^

> [and a few more like that]

> These are instances where a format string is an expression that results
> in a variable number of format arguments.  Not sure if that is actually
> legal in C.

I believe it's legal, but I'd be in favor of making a project policy
against it, simply because you aren't going to get any static checking
from gcc about whether the arguments match the format string.  There
isn't any good excuse not to code the above like

if (min != 0)
sprintf(str + strlen(str), "%+03d:%02d", hour, min);
else
sprintf(str + strlen(str), "%+03d", hour);

which would produce warnings if you managed to mess up the format match.

> print.c:778:22: warning: field width should have type 'int', but
> argument has type 'unsigned int' [-Wformat]
> fprintf(fout, "%-*s%s\n", (width_total -
> width) / 2, "",

> Not sure about that.

That one, on the other hand, is pretty silly ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] warning message in standby

2010-06-10 Thread Fujii Masao
On Fri, Jun 11, 2010 at 1:01 AM, Heikki Linnakangas
 wrote:
> We're talking about a corrupt record (incorrect CRC, incorrect backlink
> etc.), not errors within redo functions. During crash recovery, a corrupt
> record means you've reached end of WAL. In standby mode, when streaming WAL
> from master, that shouldn't happen, and it's not clear what to do if it
> does. PANIC is not a good idea, at least if the server uses hot standby,
> because that only makes the situation worse from availability point of view.
> So we log the error as a WARNING, and keep retrying. It's unlikely that the
> problem will just go away, but we keep retrying anyway in the hope that it
> does. However, it seems that we're too aggressive with the retries.

Right. The attached patch calms down the retries: if we found an invalid
record while streaming WAL from master, we sleep for 5 seconds (needs to
be reduced?) before retrying to replay the record which is in the same
location where the invalid one was found. Comments?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


calm_down_retries_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LLVM / clang

2010-06-10 Thread Peter Eisentraut
On tor, 2010-06-10 at 11:55 +0300, Peter Eisentraut wrote:
> Quick testing shows that clang doesn't get through the configure stage
> on this Debian system -- it looks like some amount of better
> integration with glibc might be needed.

Some details on this ...

configure has two problems.  The first is a "present but cannot be
compiled" warning about wctype.h.  This is described here:
.  It looks like glibc 2.11
or some later version will fix this.  (eglibc 2.11 doesn't have the fix
yet.)  But this doesn't cause a problem during the compile.

The second problem is that the prototype check for accept() fails.  This
is because glibc defines the second argument to be a "transparent
union", apparently to make it look like a lot of things at once.  clang
apparently doesn't understand that.  One could address this by checking
for the typedef that glibc uses explicitly in the configure check, but
that would appear to defeat the point of the *transparent* union.  A
workaround is to remove -D_GNU_SOURCE from src/template/linux.

Predictably, this will make PL/Perl fail to build.

Also, it will make src/backend/libpq/auth.c fail to build, because
struct ucred is only defined when _GNU_SOURCE is used.  This would
actually fail to work on GCC as well, so I think we should add an
explicit configure check for struct ucred.

The rest of the build goes through and the regression tests pass.

Some new warnings, however:

xlog.c:7759:22: warning: self-comparison always results in a constant
value
max_locks_per_xact != max_locks_per_xact)
   ^

Looks like a bug.

postmaster.c:3386:18: warning: more data arguments than '%' conversions
[-Wformat-extra-args]
 remote_host, remote_port);
  ^

dt_common.c:818:75: warning: more data arguments than '%' conversions
[-Wformat-extra-args]
sprintf(str + strlen(str), (min != 0) ?
"%+03d:%02d" : "%+03d", hour, min);

~~~^

[and a few more like that]

These are instances where a format string is an expression that results
in a variable number of format arguments.  Not sure if that is actually
legal in C.

print.c:778:22: warning: field width should have type 'int', but
argument has type 'unsigned int' [-Wformat]
fprintf(fout, "%-*s%s\n", (width_total -
width) / 2, "",
 ^
~

[and a few more like that]

Not sure about that.

Also there are boatloads of warnings in the regex stuff about unused
things, that we probably don't have to worry about.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug / shortcoming in has_*_privilege

2010-06-10 Thread Tom Lane
Robert Haas  writes:
> On Thu, Jun 10, 2010 at 5:54 PM, Jim Nasby  wrote:
>> So there's no way to see if a particular privilege has been granted to 
>> public. ISTM 'public' should be accepted, since you can't use it as a role 
>> name anyway...

> It's a bit sticky - you could make that work for
> has_table_privilege(name, oid, text) or has_table_privilege(name,
> text, text), but what would you do about the versions whose first
> argument is an oid?

Nothing.  The only reason to use those forms is in a join against
pg_authid, and the "public" group doesn't have an entry there.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ps display "waiting for max_standby_delay"

2010-06-10 Thread Fujii Masao
On Fri, Jun 11, 2010 at 11:20 AM, Takahiro Itagaki
 wrote:
>
> Bruce Momjian  wrote:
>
>> > how about showing actual waiting time instead?
>> >              " waiting for max_standby_delay (%d ms)",
>> >              MaxStandbyDelay)
>>
>> Sounds interesting, but how often would the ps statust display be
>> updated?  I hope not too often.
>
> We can change the interval of updates to 500ms or so if do it,
> but I rethink ps display is not the best place for the information.
>
> I'd like to modify the additonal message "waiting for max_standby_delay"
> just to "waiting", because we don't use "waiting for statement_timeout"
> for normal queries.

+1

I don't think that it's useful to display the value of max_standby_delay.

> If we need additional information about conflictions in recovery,
> we would supply them with SQL views instead of ps display in 9.1.

+1

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] vacuum_defer_cleanup_age

2010-06-10 Thread Fujii Masao
Hi,

vacuum_defer_cleanup_age is categorized as "Statement Behavior"
parameter in the document. On the other hand, it's categorized
as "Hot Standby" one in postgresql.conf. Why do we need to do so?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] failover vs. read only queries

2010-06-10 Thread Fujii Masao
On Fri, Jun 11, 2010 at 1:48 AM, Josh Berkus  wrote:
> On 06/09/2010 07:36 PM, Mark Kirkwood wrote:
>>
>> On 10/06/10 14:07, Tatsuo Ishii wrote:
>>>
>>> The one of top 3 questions I got
>>> when we propose them our HA solution is, "how long will it take to
>>> do failover when the master DB crashes?"
>>>
>>
>> Same here +1
>
> In that case, wouldn't they set max_standby_delay to 0?  In which case the
> failover problem goes away, no?

Yes, but I guess they'd also like to run read only queries on the standby.
Setting max_standby_delay to 0 would prevent them from doing that because
the conflict with the replay of the VACUUM or HOT record would often happen.
vacuum_defer_cleanup_age would be helpful for that case, but it seems to be
hard to tune that.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ps display "waiting for max_standby_delay"

2010-06-10 Thread Takahiro Itagaki

Bruce Momjian  wrote:

> > how about showing actual waiting time instead? 
> >  " waiting for max_standby_delay (%d ms)",
> >  MaxStandbyDelay)
> 
> Sounds interesting, but how often would the ps statust display be
> updated?  I hope not too often.

We can change the interval of updates to 500ms or so if do it,
but I rethink ps display is not the best place for the information.

I'd like to modify the additonal message "waiting for max_standby_delay"
just to "waiting", because we don't use "waiting for statement_timeout"
for normal queries.

If we need additional information about conflictions in recovery,
we would supply them with SQL views instead of ps display in 9.1.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-06-10 Thread Tom Lane
Jan Wieck  writes:
> Depends. Specifically on transaction profiles and how long the blocks 
> linger around before being written. If you can set the all visible bit 
> by the time, the page is written the first time, what bit including the 
> is-frozen one cannot be set at that time too?

All-visible and is-frozen would be the same bit ...

> And even if some cases still required another page write because those 
> frozen bits cannot be set on first write, this seems to be a win-win. We 
> would get rid of the FrozenXid completely and shift to a bit, so we can 
> effectively have a min_ freeze_age of zero while keeping the xid's forever.

Right.  I don't see any downside, other than eating another status bit
per tuple, which we can afford.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ps display "waiting for max_standby_delay"

2010-06-10 Thread Bruce Momjian
Takahiro Itagaki wrote:
> Hi,
> 
> We have codes to change ps display for recovery process during hot standby.
> The current code always shows max_standby_delay for the message, but how
> about showing actual waiting time instead? Since DBAs can always get the
> parameter from postgresql.conf they wrote, so the parameter value itself
> is not so useful. Actual waiting time might be more useful to determine
> which values to be set to max_standby_delay, no?
> 
> [backend/storage/ipc/standby.c]
> snprintf(new_status + len, 50,
>  " waiting for max_standby_delay (%d ms)",
>  MaxStandbyDelay);  ==> GetCurrentTimestamp() - waitStart
> set_ps_display(new_status, false);
> 
> I think SQL-based activity view will be more useful than ps display,
> but it's an item for 9.1.

Sounds interesting, but how often would the ps statust display be
updated?  I hope not too often.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-10 Thread Marko Kreen
On 6/4/10, Robert Haas  wrote:
> On Fri, Jun 4, 2010 at 10:44 AM, Greg Stark  wrote:
>  > A function which takes a starting xid and a number of transactions to
>  > return seems very tied to one particular application. I could easily
>  > see other systems such as a multi-master system instead only wanting
>  > to compare two transactions to find out which committed first. Or
>  > non-replication applications where you have an LSN and want to know
>  > whether a given transaction had committed by that time.
>  >
>  > So one possible interface would be to do something like
>  > xids_committed_between(lsn_start, lsn_end) -- and yes, possibly with
>  > an optional argument to limit the number or records returned.
>
>
> I'm imagining that the backend data storage for this would be a file
>  containing, essentially, a struct for each commit repeated over and
>  over again, packed tightly.  It's easy to index into such a file using
>  a sequence number (give me the 1000'th commit) but searching by LSN
>  would require (a) storing the LSNs and (b) binary search.  Maybe it's
>  worth adding that complexity, but I'm not sure that it is.  Keeping
>  the size of this file small is important for ensuring that it has
>  minimal performance impact (which is also why I'm not sold on trying
>  to include the tuple counters that Jan proposed - I think we can solve
>  the problem he's worried about there more cleanly in other ways).

AIUI, you index the file by offset.

>  >>  I think
>  >> we should be very careful about assuming that we understand
>  >> replication and its needs better than someone who has spent many years
>  >> developing one of the major PostgreSQL replication solutions.
>  >
>  > Well the flip side of that is that we want an interface that's useful
>  > for more than just one replication system. This is something basic
>  > enough that I think it will be useful for more than just replication
>  > if we design it generally enough. It should be useful for
>  > backup/restore processes and monitoring as well as various forms of
>  > replication including master-slave trigger based systems but also
>  > including PITR-based replication, log-parsing systems, multi-master
>  > trigger based systems, 2PC-based systems, etc.
>
>
> Making it general enough to serve multiple needs is good, but we've
>  got to make sure that the extra complexity is buying us something.
>  Jan seems pretty confident that this could be used by Londiste also,
>  though it would be nice to have some confirmation from the Londiste
>  developer(s) on that.  I think it may also have applications for
>  distributed transactions and multi-master replication, but I am not
>  too sure it helps much for PITR-based replication or log-parsing
>  systems.  We want to design something that is good, but trying to
>  solve too many problems may end up solving none of them well.

The potential for single shared queue implementation, with
the additional potential for merging async replication
implementations sounds attractive.  (Merging ~ having
single one that satisfies broad range of needs.)

Unless the functionality accepted into core will be limited
to replication only and/or performs worse than current
snapshot-based grouping.  Then it is uninteresting, of course.

Jan's proposal of storing small struct into segmented files
sounds like it could work.  Can't say anything more because
I can't imagine it as well as Jan.  Would need to play with
working implementation to say more...

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Bug / shortcoming in has_*_privilege

2010-06-10 Thread Jim Nasby
test...@workbook=# select has_table_privilege( 'public', 'test', 'SELECT' );
ERROR:  role "public" does not exist
test...@workbook=# 

So there's no way to see if a particular privilege has been granted to public. 
ISTM 'public' should be accepted, since you can't use it as a role name 
anyway...

test...@workbook=# create role public;
ERROR:  role name "public" is reserved
test...@workbook=# create role "public";
ERROR:  role name "public" is reserved
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug / shortcoming in has_*_privilege

2010-06-10 Thread Robert Haas
On Thu, Jun 10, 2010 at 5:54 PM, Jim Nasby  wrote:
> test...@workbook=# select has_table_privilege( 'public', 'test', 'SELECT' );
> ERROR:  role "public" does not exist
> test...@workbook=#
>
> So there's no way to see if a particular privilege has been granted to 
> public. ISTM 'public' should be accepted, since you can't use it as a role 
> name anyway...
>
> test...@workbook=# create role public;
> ERROR:  role name "public" is reserved
> test...@workbook=# create role "public";
> ERROR:  role name "public" is reserved

It's a bit sticky - you could make that work for
has_table_privilege(name, oid, text) or has_table_privilege(name,
text, text), but what would you do about the versions whose first
argument is an oid?  It would seem a bit awkward to have the behavior
by asymmetrical, although I guess we could...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fix use of posix_fadvise in xlog.c

2010-06-10 Thread Greg Smith

Tom Lane wrote:

Heikki Linnakangas  writes:
  
In a steady-state situation new WAL files are not created very often 
because we recycle old ones, so it probably doesn't make much difference.



Yeah.  We really don't worry too much about the performance of the
new-WAL-file-creation code path because of this.
  


The only situation where the WAL zeroing path turns ugly is if you 
launch a bunch of activity against a fresh server that doesn't have any 
segments to recycle yet.  The last time we talked about improving that, 
the best idea I thought came out was to be better about preallocating 
segments than the code already is, rather than trying to speed up how 
the kernel deals with the situation.  See the links for "Be more 
aggressive about creating WAL files" at http://wiki.postgresql.org/wiki/Todo


I'm also not very optimistic about adding more posix_fadvise calls 
really helping just because the implementations of those are so 
unpredictable across operating systems.  I'm sure that Mark could figure 
out the right magic to speed up this specific case on Linux, but have my 
doubts that work would translate very well to many other operating 
systems.  Whereas a more generic preallocation improvement would help 
everywhere.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error with GIT Repository

2010-06-10 Thread Alvaro Herrera
Excerpts from Andrew Dunstan's message of jue jun 10 11:26:59 -0400 2010:

> Why are you cloning over http? Here is the best way to clone, which 
> seems to be working:
> 
> [and...@sophia ]$ git clone --mirror
> git://git.postgresql.org/git/postgresql.git
> Initialized empty Git repository in /home/andrew/postgresql.git/

In case you're a git-ignorant like me and are wondering why the above
does not produce a usable checkout, the complete recipe is here:

http://archives.postgresql.org/message-id/20090602162347.gf23...@yugib.highrise.ca
(in short, you need a git clone --reference)

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-10 Thread Simon Riggs
On Thu, 2010-06-10 at 22:49 +0300, Heikki Linnakangas wrote:
> On 10/06/10 22:24, Dimitri Fontaine wrote:
> > Heikki Linnakangas  writes:
> >> Maybe we could add a new pg_cleanuparchive binary, but we'll need some
> >> discussion...
> >
> > Would this binary ever be used manually, not invoked by PostgreSQL? As
> > it depends on the %r option to be given and to be right, I don't think
> > so.
> 
> Hmm, actually it would be pretty handy. To make use of a base backup, 
> you need all the WAL files following the one where pg_start_backup() was 
> called. We create a .backup file in the archive to indicate that 
> location, like:
> 
> 0001002F.0020.backup
> 
> So to clean up all WAL files older than those needed by that base 
> backup, you would simply copy-paste that location and call 
> pg_cleanuparchive:
> 
> pg_cleanuparchive /walarchive/ 0001002F

OK, sounds like we're on the same thought train.

Here's the code.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


pg_archivecleanup.tar
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-10 Thread Dimitri Fontaine
Heikki Linnakangas  writes:
> Maybe we could add a new pg_cleanuparchive binary, but we'll need some
> discussion...

Would this binary ever be used manually, not invoked by PostgreSQL? As
it depends on the %r option to be given and to be right, I don't think
so.

Therefore my take on this problem is to provide internal commands here,
that maybe wouldn't need to be explicitly passed any argument. If
they're internal they certainly can access to the information they need?

As a user, I'd find it so much better to trust PostgreSQL for proposing
sane defaults. As a developer, you will certainly find it easier to
maintain, document and distribute.

While at it, the other internal command we need is pg_archive_bypass for
the archive_command so that windows users have the /usr/bin/true option
too.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-06-10 Thread Greg Stark
On Thu, Jun 3, 2010 at 11:41 AM, Greg Stark  wrote:
> I think to make it work you need to store a whole 64-bit reference
> transaction id consisting of both a cycle counter and a transaction
> id. The invariant for the page is that every xid on the page can be
> compared to that reference transaction id using normal transactionid
> semantics. Actually I think the easiest way to do that is to set it to
> the oldest xid on the page. The first thing to do before comparing any
> transaction id on the page with a real transaction id would be to
> figure out whether the reference xid is comparable to the live xid,
> which if it's the oldest xid on the page implies they'll all be
> comparable.
>
> The way to maintain that invariant would be that any xid insertion on
> the page must advance the reference xid if it's not comparable to the
> newly inserted xid. It has to be advanced to the oldest xid that's
> still comparable with the newly inserted xid. Any xids on the page
> that are older than the new refernce xid have to be frozen or removed.
> I'm not sure how to do that without keeping clog forever though.

So the more I think about this the more I think it's unavoidable that
we would need to retain clog forever.

I think the goal here is to be able to load data into the database and
then never write the data ever again. Even if you visit the page years
later after the transaction ids have wrapped around several times. In
that case there's no avoiding that you'll need to know whether that
transaction committed or aborted.

Now we could make a bet that most transactions commit and therefore we
could keep a list of aborted transactions only which we might be able
to keep "forever" in very little space if very few transactions abort.
Presumably we would only use this form once the transaction was about
to be truncated out of clog. I'm not too happy with the assumption
that there aren't many aborts though. Someone could come along with a
use case where they have lots of aborts and run into strange
limitations and performance characteristics.

Alternatively we could do something like keeping a list of tables
touched by any transaction. Then vacuum could look for any
non-committed transactions old enough to be in danger of aging out of
clog and ensure those tables are frozen. But any tables which have
never been touched by any such old transaction could be left alone.
when we read in the page we'll be able to recognize the old
transactions as committed if they're beyond the end of the clog
horizon.

I don't really like that idea either because it leaves performance
really quite unpredictable. I could have a large table that goes
unvacuumed for a long time -- then when I come along with some tiny
query where I hit C-c and cause an abort I suddenly set a trap which
causes a huge vacuum freeze to fire off.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-10 Thread Heikki Linnakangas

On 10/06/10 22:24, Dimitri Fontaine wrote:

Heikki Linnakangas  writes:

Maybe we could add a new pg_cleanuparchive binary, but we'll need some
discussion...


Would this binary ever be used manually, not invoked by PostgreSQL? As
it depends on the %r option to be given and to be right, I don't think
so.


Hmm, actually it would be pretty handy. To make use of a base backup, 
you need all the WAL files following the one where pg_start_backup() was 
called. We create a .backup file in the archive to indicate that 
location, like:


0001002F.0020.backup

So to clean up all WAL files older than those needed by that base 
backup, you would simply copy-paste that location and call 
pg_cleanuparchive:


pg_cleanuparchive /walarchive/ 0001002F

Of course, if there's a perl one-liner to do that, we can just put that 
in the docs and don't really need pg_cleanuparchive at all.



Therefore my take on this problem is to provide internal commands here,
that maybe wouldn't need to be explicitly passed any argument. If
they're internal they certainly can access to the information they need?


You want more flexibility in more advanced cases. Like if you have 
multiple standbys sharing the archive, you only want to remove old WAL 
files after they're not needed by *any* of the standbys anymore. Doing 
the cleanup directly in the archive_cleanup_command would cause the old 
WAL files to be removed prematurely, but you could put a shell script 
there to store the location to a file, and call pg_cleanuparchive with 
the max() of the locations reported by all standby servers.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-06-10 Thread Jan Wieck

Seems I underestimated the importance of forensic breadcrumbs.


On 6/9/2010 12:09 PM, Tom Lane wrote:

I do like the idea of using a status bit rather than FrozenXid to mark a
frozen tuple, because that eliminates the conflict between wanting to
freeze aggressively for performance reasons and wanting to preserve Xids
for forensic reasons.  But it doesn't seem to do much for Josh's
original problem.


Depends. Specifically on transaction profiles and how long the blocks 
linger around before being written. If you can set the all visible bit 
by the time, the page is written the first time, what bit including the 
is-frozen one cannot be set at that time too?


Maybe some analysis on the typical behavior of such system is in order. 
Especially the case Josh was mentioning seems to be a typical single 
insert logging style application, with little else going on on that 
particular database. I can't reveal specifics about that particular 
case, but think of something like taking frequent sensor readings, that 
need to be kept for years for forensics in case there is a product 
recall some day.


And even if some cases still required another page write because those 
frozen bits cannot be set on first write, this seems to be a win-win. We 
would get rid of the FrozenXid completely and shift to a bit, so we can 
effectively have a min_ freeze_age of zero while keeping the xid's forever.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error with GIT Repository

2010-06-10 Thread Magnus Hagander
On Thu, Jun 10, 2010 at 18:20, Stephen Frost  wrote:
> * Andrew Dunstan (and...@dunslane.net) wrote:
>> I don't see why not. Buildfarm members are going to have to reset their
>> repos when we finally cut over in a few months. Luckily, this is a
>> fairly painless operation - blow away the repo and change the config
>> file and the script will resync as if nothing had happened.
>
> Should we stop bothering to offer http://git.postgresql.org then..?  Or

No, we should not.

Especially if someone has a clue how to do it. The last time I fixed
it by runnin repack, but that didn't work this time. I have no clue
why it's asking for a file that doesn't exist.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] warning message in standby

2010-06-10 Thread Robert Haas
On Thu, Jun 10, 2010 at 12:49 PM, Greg Stark  wrote:
> On Thu, Jun 10, 2010 at 5:13 PM, Robert Haas  wrote:
>> At this point you should have a working HS/SR setup.  Now:
>>
>> 8. shut the slave down
>> 9. move recovery.conf out of the way
>> 10. restart the slave - it will do recovery and enter normal running
>> 11. make some database changes
>> 12. stop the slave
>> 13. put recovery.conf back
>> 14. restart the slave
>> 15. make a bunch of changes on the master
>>
>> When the slave then tries to replay, you then get something like:
>>
>> WARNING:  invalid record length at 0/4005330
>> WARNING:  invalid record length at 0/4005330
>> WARNING:  invalid record length at 0/4005330
>>
>
> Woah, why does this procedure lead to this situation? I would hope
> there's nothing a user could do which would cause it short of invoking
> dd to corrupt the WAL files.
>
> At precisely which step of the procedure did the user do something
> wrong?

13.

> Is there any reason we can't detect that they've done it and
> throw a specific error message saying the configuration is invalid?

I'm not sure how we'd go about doing that, but I agree it would be nice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] warning message in standby

2010-06-10 Thread Greg Stark
On Thu, Jun 10, 2010 at 5:13 PM, Robert Haas  wrote:
> At this point you should have a working HS/SR setup.  Now:
>
> 8. shut the slave down
> 9. move recovery.conf out of the way
> 10. restart the slave - it will do recovery and enter normal running
> 11. make some database changes
> 12. stop the slave
> 13. put recovery.conf back
> 14. restart the slave
> 15. make a bunch of changes on the master
>
> When the slave then tries to replay, you then get something like:
>
> WARNING:  invalid record length at 0/4005330
> WARNING:  invalid record length at 0/4005330
> WARNING:  invalid record length at 0/4005330
>

Woah, why does this procedure lead to this situation? I would hope
there's nothing a user could do which would cause it short of invoking
dd to corrupt the WAL files.

At precisely which step of the procedure did the user do something
wrong? Is there any reason we can't detect that they've done it and
throw a specific error message saying the configuration is invalid?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] failover vs. read only queries

2010-06-10 Thread Josh Berkus

On 06/09/2010 07:36 PM, Mark Kirkwood wrote:

On 10/06/10 14:07, Tatsuo Ishii wrote:


The one of top 3 questions I got
when we propose them our HA solution is, "how long will it take to
do failover when the master DB crashes?"



Same here +1


In that case, wouldn't they set max_standby_delay to 0?  In which case 
the failover problem goes away, no?


--
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 'create or replace function' no longer allows parameters

2010-06-10 Thread Josh Berkus



It's this patch:
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=e7eb1113f8a95e9927fdbe9cc6fb0ac101612be2#patch7

It should probably be mentioned in the incompatibilities section of the
9.0 release notes.


Addition will be included in my release notes patch, coming today.

--
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error with GIT Repository

2010-06-10 Thread Stephen Frost
* Andrew Dunstan (and...@dunslane.net) wrote:
> I don't see why not. Buildfarm members are going to have to reset their  
> repos when we finally cut over in a few months. Luckily, this is a  
> fairly painless operation - blow away the repo and change the config  
> file and the script will resync as if nothing had happened.

Should we stop bothering to offer http://git.postgresql.org then..?  Or
do we expect it to get fixed and work correctly once we cut over and
rebuild?  Also, perhaps we could list the git-hub option on the wiki
(http://wiki.postgresql.org/wiki/Other_Git_Repositories)?

(and, yea, it's the same me)

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Error with GIT Repository

2010-06-10 Thread Andrew Dunstan



Stephen Frost wrote:

* Andrew Dunstan (and...@dunslane.net) wrote:
  

Luxenberg, Scott I. wrote:


I have been trying to create/run a build farm as part of a project I am
working on.
  

That seems an odd thing to do since we have one ...



To clarify, he's setting up a build farm *member*. :)
  


Aha. Amazing the difference one little word can make ...



As a side-note, it works just fine from git-hub's http mirror and that's
what we've been playing with, but I don't know if we want to recommend
that for build-farm members..


  


I don't see why not. Buildfarm members are going to have to reset their 
repos when we finally cut over in a few months. Luckily, this is a 
fairly painless operation - blow away the repo and change the config 
file and the script will resync as if nothing had happened.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] warning message in standby

2010-06-10 Thread Robert Haas
On Thu, Jun 10, 2010 at 12:01 PM, Heikki Linnakangas
 wrote:
> We're talking about a corrupt record (incorrect CRC, incorrect backlink
> etc.), not errors within redo functions. During crash recovery, a corrupt
> record means you've reached end of WAL. In standby mode, when streaming WAL
> from master, that shouldn't happen, and it's not clear what to do if it
> does. PANIC is not a good idea, at least if the server uses hot standby,
> because that only makes the situation worse from availability point of view.
> So we log the error as a WARNING, and keep retrying. It's unlikely that the
> problem will just go away, but we keep retrying anyway in the hope that it
> does. However, it seems that we're too aggressive with the retries.

You can reproduce this problem by doing the following.

1. initdb
2. edit postgresql.conf, set wal_level=hot_standby, max_wal_senders=1;
edit pg_hba.conf, trust local replication connections
3. pg_ctl start
4. make some changes to the database
5. take a hot backup to another directory (call it pgslave)
6. create pgslave/recovery.conf with standby_mode='on',
primary_conninfo=whatever, edit pgslave/postgresql.conf change the
port number, set hot_standby=on
7. pg_ctl start -D pgslave

At this point you should have a working HS/SR setup.  Now:

8. shut the slave down
9. move recovery.conf out of the way
10. restart the slave - it will do recovery and enter normal running
11. make some database changes
12. stop the slave
13. put recovery.conf back
14. restart the slave
15. make a bunch of changes on the master

When the slave then tries to replay, you then get something like:

WARNING:  invalid record length at 0/4005330
WARNING:  invalid record length at 0/4005330
WARNING:  invalid record length at 0/4005330

...ad infinitum.

Obviously there are other ways this could occur - the WAL could really
be corrupted, for example - but the current handling is not too
graceful.  I'm actually thinking it might be better to trigger a
shutdown if this happens.  Probably something has gone haywire and
manual intervention is required.  Retrying when there's no hope of
success isn't really that helpful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] warning message in standby

2010-06-10 Thread Heikki Linnakangas

On 10/06/10 17:38, Tom Lane wrote:

Robert Haas  writes:

On Mon, Jun 7, 2010 at 9:21 AM, Fujii Masao  wrote:

When an error is found in the WAL streamed from the master, a warning
message is repeated without interval forever in the standby. This
consumes CPU load very much, and would interfere with read-only queries.
To fix this problem, we should add a sleep into emode_for_corrupt_record()
or somewhere? Or we should stop walreceiver and retry to read WAL from
pg_xlog or the archive?



I ran into this problem at one point, too, but was in the middle of
trying to investigate a different bug and didn't have time to track
down what was causing it.



I think the basic question here is - if there's an error in the WAL,
how do we expect to EVER recover?  Even if we can read from the
archive or pg_xlog, presumably it's the same WAL - why should we be
any more successful the second time?


What "warning message" are we talking about?  All the error cases I can
think of in WAL-application are ERROR, or likely even PANIC.


We're talking about a corrupt record (incorrect CRC, incorrect backlink 
etc.), not errors within redo functions. During crash recovery, a 
corrupt record means you've reached end of WAL. In standby mode, when 
streaming WAL from master, that shouldn't happen, and it's not clear 
what to do if it does. PANIC is not a good idea, at least if the server 
uses hot standby, because that only makes the situation worse from 
availability point of view. So we log the error as a WARNING, and keep 
retrying. It's unlikely that the problem will just go away, but we keep 
retrying anyway in the hope that it does. However, it seems that we're 
too aggressive with the retries.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How about closing some Open Items?

2010-06-10 Thread Greg Stark
On Wed, Jun 9, 2010 at 8:01 PM, Alvaro Herrera
 wrote:
> Thanks for clearing the list.  There are only 5 remaining items, which
> is kinda exciting, though Tom's assertion that HS is still bug-ridden is
> a bit off-putting.

It's a big piece of subtle code and it's quite possible it contains
bugs. But people know that and as a result it's received a lot of
testing and careful thought already. The chances it has bugs of its
own are probably lower than for other major projects in the past. On
the other hand it's recovery-related and it shakes me that we have no
regression tests for recovery let alone standby databases.

What's more scary are either of two cases:

1) There are use cases of varying degrees of obscurity which haven't
been explicitly covered where the behaviour is not what people would
expect. We've already fixed a few such cases such as shutdown
semantics and setting up a standby based on an backup of initdb
results before starting up the database. This is the kind of thing we
need lots of users testing their real workloads with and doing test
failovers and so on.

2) There are unrelated areas of the database which have collateral
damage that nobody expected and thought to test for. Hopefully we have
enough regression tests to detect this kind of thing but again as
there are no regression tests for recovery we could have bugs in other
systems that don't turn up until you use those systems on a standby
database or after running the system as a standby database and then
bringing it up.




-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fix use of posix_fadvise in xlog.c

2010-06-10 Thread Tom Lane
Heikki Linnakangas  writes:
> In a steady-state situation new WAL files are not created very often 
> because we recycle old ones, so it probably doesn't make much difference.

Yeah.  We really don't worry too much about the performance of the
new-WAL-file-creation code path because of this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fix use of posix_fadvise in xlog.c

2010-06-10 Thread Heikki Linnakangas

On 10/06/10 18:17, Mark Wong wrote:

On Jun 9, 2010, at 11:25 PM, Heikki Linnakangas
 wrote:

I don't think POSIX_FADV_DONTNEED does what you think it does. It
tells the kernel that "you don't need to keep these pages in the cache
anymore, I won't be accessing them anymore". If you call it when you
open the file, before reading/writing, there is nothing in the cache
and the call will do nothing.


Oops, my bad. I think I was confused by the short description in the man
page. I didn't read the longer descriptoon. :( Then would it be worth
making the this call after the file is zeroed out?


Not sure. If you're churning through WAL files at a reasonable speed, 
the zeroed-out file will soon be written to again. OTOH, we always write 
whole pages, so maybe the OS is smart enough to not read the page back 
to memory just to overwrite it.


In a steady-state situation new WAL files are not created very often 
because we recycle old ones, so it probably doesn't make much difference.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-10 Thread Andrew Dunstan



Heikki Linnakangas wrote:

On 10/06/10 17:38, Andrew Dunstan wrote:

I think my logic needs a tiny piece of adjustment, to ignore the
timeline segment of the file name.


I'm not sure you should ignore it. Presumably anything in an older 
timeline is indeed not required anymore and can be removed, and 
anything in a newer timeline... how did it get there? Seems safer not 
remove it.




Well, I was just following the logic in pg-standby.c:

   /*
* We ignore the timeline part of the XLOG segment 
identifiers

* in deciding whether a segment is still needed.  This
* ensures that we won't prematurely remove a segment from a
* parent timeline. We could probably be a little more
* proactive about removing segments of non-parent 
timelines,

* but that would be a whole lot more complicated.
*
* We use the alphanumeric sorting property of the filenames
* to decide which ones are earlier than the
* exclusiveCleanupFileName file. Note that this means files
* are not removed in the order they were originally 
written,

* in case this worries you.
*/
   if (strlen(xlde->d_name) == XLOG_DATA_FNAME_LEN &&
   strspn(xlde->d_name, "0123456789ABCDEF")
   == XLOG_DATA_FNAME_LEN &&
 strcmp(xlde->d_name + 8, exclusiveCleanupFileName + 8) 
< 0)



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error with GIT Repository

2010-06-10 Thread Andrew Dunstan



Luxenberg, Scott I. wrote:

Greetings all,

I have been trying to create/run a build farm as part of a project I am
working on. 


That seems an odd thing to do since we have one ...


However, I have noticed the primary git repostitory,
git.postgresql.org/git, does not seem to be working. Namely, whenever I
try to clone the directory, I receive this error:

Error: Unable to find 5e4933c31d3cd2750ee1793efe6eca43055fb273e under
http://git.postgresql.org/git/postgresql.git
Cannot obtain needed blob 5e4933c31d3cd2750ee1793efe6eca4305fb273e while
processing commit c5609c66ce2ee4fdb180be95721252b47f90499
Error: fetch failed.

I thought it would be prudent to notify the list so someone could
possibly check into this.


  



Why are you cloning over http? Here is the best way to clone, which 
seems to be working:


   [and...@sophia ]$ git clone --mirror
   git://git.postgresql.org/git/postgresql.git
   Initialized empty Git repository in /home/andrew/postgresql.git/
   remote: Counting objects: 376865, done.
   remote: Compressing objects: 100% (87569/87569), done.
   remote: Total 376865 (delta 310187), reused 352950 (delta 287485)
   Receiving objects: 100% (376865/376865), 178.73 MiB | 251 KiB/s, done.
   Resolving deltas: 100% (310187/310187), done.
   [and...@sophia ]$

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-10 Thread Heikki Linnakangas

On 10/06/10 17:38, Andrew Dunstan wrote:

I think my logic needs a tiny piece of adjustment, to ignore the
timeline segment of the file name.


I'm not sure you should ignore it. Presumably anything in an older 
timeline is indeed not required anymore and can be removed, and anything 
in a newer timeline... how did it get there? Seems safer not remove it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fix use of posix_fadvise in xlog.c

2010-06-10 Thread Mark Wong



On Jun 9, 2010, at 11:25 PM, Heikki Linnakangas > wrote:



On 10/06/10 06:47, Mark Wong wrote:

I wanted to propose a fix for to xlog.c regarding the use of
posix_fadvise() for 9.1 (unless someone feels it's ok for 9.0).
Currently posix_fadvise() is used right before a log file is closed  
so

it's effectively not doing anything, when posix_fadvise is to be
called.  This patch moves the posix_fadvise() call into 3 other
locations within XLogFileInit() where a file handle is returned.  The
first case is where an existing open file handle is returned.  The
next case is when a file is to be zeroed out.  The third case is
returning a file handle, which may be the file that was just zeroed
out.


I don't think POSIX_FADV_DONTNEED does what you think it does. It  
tells the kernel that "you don't need to keep these pages in the  
cache anymore, I won't be accessing them anymore". If you call it  
when you open the file, before reading/writing, there is nothing in  
the cache and the call will do nothing.


Oops, my bad.  I think I was confused by the short description in the  
man page.  I didn't read the longer descriptoon. :( Then would it be  
worth making the this call after the file is zeroed out?


Regards,
Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Error with GIT Repository

2010-06-10 Thread Luxenberg, Scott I.
Greetings all,

I have been trying to create/run a build farm as part of a project I am
working on. However, I have noticed the primary git repostitory,
git.postgresql.org/git, does not seem to be working. Namely, whenever I
try to clone the directory, I receive this error:

Error: Unable to find 5e4933c31d3cd2750ee1793efe6eca43055fb273e under
http://git.postgresql.org/git/postgresql.git
Cannot obtain needed blob 5e4933c31d3cd2750ee1793efe6eca4305fb273e while
processing commit c5609c66ce2ee4fdb180be95721252b47f90499
Error: fetch failed.

I thought it would be prudent to notify the list so someone could
possibly check into this.

Thanks!

Scott Luxenberg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-10 Thread Andrew Dunstan



Robert Haas wrote:

It won't kill us to change that sentence. "pg_standby is only used now
within the cleanup command" etc

pg_standby already contains the exact logic we need here. Having two
sets of code for the same thing isn't how we do things.



Well, we could factor out that part of the code so it could be used in 
two binaries. But ...



Maybe we could add a new pg_cleanuparchive binary, but we'll need some
discussion...
  

Which will go nowhere, as we both already know.



I have a feeling that I may be poking my nose into an incipient
shouting match, but FWIW I agree with Heikki that it would be
preferable to keep this separate from pg_standby.  Considering that
Andrew wrote this in 24 lines of Perl code (one-third of which are
basically just there for logging purposes), I'm not that worried about
code duplication, unless what we actually need is significantly more
complicated.

  


I think my logic needs a tiny piece of adjustment, to ignore the 
timeline segment of the file name. But that will hardly involve a great 
deal of extra code - just chop off the first 8 chars. It's not like the 
code for this in pg_standby.c is terribly complex.


The virtue of a perl script is that it's very easily customizable, e.g. 
you might only delete files if they are older than a certain age.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] warning message in standby

2010-06-10 Thread Tom Lane
Robert Haas  writes:
> On Mon, Jun 7, 2010 at 9:21 AM, Fujii Masao  wrote:
>> When an error is found in the WAL streamed from the master, a warning
>> message is repeated without interval forever in the standby. This
>> consumes CPU load very much, and would interfere with read-only queries.
>> To fix this problem, we should add a sleep into emode_for_corrupt_record()
>> or somewhere? Or we should stop walreceiver and retry to read WAL from
>> pg_xlog or the archive?

> I ran into this problem at one point, too, but was in the middle of
> trying to investigate a different bug and didn't have time to track
> down what was causing it.

> I think the basic question here is - if there's an error in the WAL,
> how do we expect to EVER recover?  Even if we can read from the
> archive or pg_xlog, presumably it's the same WAL - why should we be
> any more successful the second time?

What "warning message" are we talking about?  All the error cases I can
think of in WAL-application are ERROR, or likely even PANIC.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-10 Thread Robert Haas
On Thu, Jun 10, 2010 at 3:28 AM, Simon Riggs  wrote:
> On Thu, 2010-06-10 at 10:18 +0300, Heikki Linnakangas wrote:
>> On 09/06/10 10:21, Simon Riggs wrote:
>> > On Tue, 2010-06-08 at 18:30 -0400, Andrew Dunstan wrote:
>> >
>> >> I prefer archive_cleanup_command. We should name things after their
>> >> principal function, not an implementation detail, IMNSHO.
>> >>
>> >> More importantly, we should include an example in the docs. I created
>> >> one the other day  when this was actually bothering me a bit (see
>> >> ).
>> >> That seemed to work ok, but maybe it's too long, and maybe people would
>> >> prefer a shell script to perl.
>> >
>> > I submitted a patch to make the command "pg_standby -a %r"
>> >
>> > That's a more portable solution, ISTM.
>> >
>> > I'll commit that and fix the docs.
>>
>> Huh, wait. There's no -a option in pg_standby, so I presume you're
>> planning to add that too. I don't like confusing pg_standby into this,
>> the docs are currently quite clear that if you want to use the built-in
>> standby mode, you can't use pg_standby, and this would muddy the waters.
>
> It won't kill us to change that sentence. "pg_standby is only used now
> within the cleanup command" etc
>
> pg_standby already contains the exact logic we need here. Having two
> sets of code for the same thing isn't how we do things.
>
>> Maybe we could add a new pg_cleanuparchive binary, but we'll need some
>> discussion...
>
> Which will go nowhere, as we both already know.

I have a feeling that I may be poking my nose into an incipient
shouting match, but FWIW I agree with Heikki that it would be
preferable to keep this separate from pg_standby.  Considering that
Andrew wrote this in 24 lines of Perl code (one-third of which are
basically just there for logging purposes), I'm not that worried about
code duplication, unless what we actually need is significantly more
complicated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] InvalidXLogRecPtr in docs

2010-06-10 Thread Tom Lane
Heikki Linnakangas  writes:
> Even then, we wouldn't need to start from the beginning of the WAL 
> segment AFAICS. The point is to start from the Redo pointer, not from 
> the checkpoint record, because as soon as we read the checkpoint record 
> we'll need to start applying WAL from the Redo pointer, which is 
> earlier. The WAL file boundaries don't come into play there.

I don't believe it's a good idea to have SR not write full xlog segment
files.  Consider for example the following scenario:

1. SR writes some xlog file from the middle.
2. Filesystem says "ah-hah, I know about sparse storage" and doesn't
   allocate the first half of the file.
3. Failover: slave goes live.
4. xlog file gets recycled for re-use.
5. While reusing the file, we write into the first half ... or try to,
   but there's no disk space.
6. PANIC.

There are probably some other good reasons not to allow incomplete
copies of WAL files to exist on the slave system, anyway.

I'm not sure if it's worth the trouble, or even a particularly smart
idea, to force the output of the status function to be monotonic
regardless of what happens underneath.  I think removing that claim
from the docs altogether is the easiest answer.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] warning message in standby

2010-06-10 Thread Robert Haas
On Mon, Jun 7, 2010 at 9:21 AM, Fujii Masao  wrote:
> When an error is found in the WAL streamed from the master, a warning
> message is repeated without interval forever in the standby. This
> consumes CPU load very much, and would interfere with read-only queries.
> To fix this problem, we should add a sleep into emode_for_corrupt_record()
> or somewhere? Or we should stop walreceiver and retry to read WAL from
> pg_xlog or the archive?

I ran into this problem at one point, too, but was in the middle of
trying to investigate a different bug and didn't have time to track
down what was causing it.

I think the basic question here is - if there's an error in the WAL,
how do we expect to EVER recover?  Even if we can read from the
archive or pg_xlog, presumably it's the same WAL - why should we be
any more successful the second time?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LLVM / clang

2010-06-10 Thread Tom Lane
Peter Eisentraut  writes:
> Quick testing shows that clang doesn't get through the configure stage
> on this Debian system -- it looks like some amount of better integration
> with glibc might be needed.  Building with llvm-gcc works fine, but I
> understand that using llvm-gcc with native code generation isn't all
> that different from using gcc itself, so that's not a surprising result.
> The only issue is that the float8 regression test fails, so it is
> apparently not *exactly* the same.

There's a buildfarm animal using llvm-gcc, and it passes just fine ...
so the float8 failure sounds to me like another integration problem.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] No hash join across partitioned tables?

2010-06-10 Thread Robert Haas
(moving to -hackers)

On Wed, Jun 9, 2010 at 4:11 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> In going back through emails I had marked as possibly needing another
>> look before 9.0 is released, I came across this issue again.  As I
>> understand it, analyze (or analyse) now collects statistics for both
>> the parent individually, and for the parent and its children together.
>>  However, as I further understand it, autovacuum won't actually fire
>> off an analyze unless there's enough activity on the parent table
>> considered individually to warrant it.  So if you have an empty parent
>> and a bunch of children with data in it, your stats will still stink,
>> unless you analyze by hand.
>
> Check.
>
>> Assuming my understanding of the problem is correct, we could:
>
>> (a) fix it,
>> (b) document that you should consider periodic manual analyze commands
>> in this situation, or
>> (c) do nothing.
>
>> Thoughts?
>
> The objections to (a) are that it might result in excessive ANALYZE work
> if not done intelligently, and that we haven't got a patch ready anyway.
> I would have liked to get to this for 9.0 but I feel it's a bit late
> now.

I guess I can't really disagree with that.  Should we try to document
this in some way?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parser handling of large object OIDs

2010-06-10 Thread Robert Haas
On Wed, Jun 9, 2010 at 10:26 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, Jun 9, 2010 at 5:02 PM, Robert Haas  wrote:
>>> I believe that the comment code is probably right, because I think
>>> IConst can only handle values < 2^31, whereas OIDs can be as large as
>>> 2^32-1.
>
>> I investigated this a little more and the above analysis turns out to
>> be correct.  ALTER LARGE OBJECT OWNER and GRANT ... ON LARGE OBJECT
>> don't work for large objects outside the range of a signed integer.
>
> Yup.
>
>> Session demonstrating the problem and proposed patch attached.
>
> This patch seems extremely grotty, though.  Surely that's not the way we
> were doing it in the comment code?

I pretty much just moved the existing code from CommentLargeObject()
into a new function oidparse().  I couldn't really figure out where to
put the oidparse() function so I eventually decided on oid.c, and
therefore also ripped out the trip through the fmgr layer in favor of
calling the appropriate code directly.  Other than that it's the same
code.  I'm open to suggestions, but this is basically just a small bit
of code rearrangement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-06-10 Thread Fujii Masao
On Thu, Jun 10, 2010 at 7:19 PM, Heikki Linnakangas
 wrote:
>> --- 1902,1908 
>>          for standby purposes, and the number of old WAL segments
>> available
>>          for standbys is determined based only on the location of the
>> previous
>>          checkpoint and status of WAL archiving.
>> +         This parameter has no effect on a restartpoint.
>>          This parameter can only be set in the
>> postgresql.conf
>>          file or on the server command line.
>>         
>
> Hmm, I wonder if wal_keep_segments should take effect during recovery too?
> We don't support cascading slaves, but if you have two slaves connected to
> one master (without an archive), and you perform failover to one of them,
> without wal_keep_segments the 2nd slave might not find all the files it
> needs in the new master. Then again, that won't work without an archive
> anyway, because we error out at a TLI mismatch in replication. Seems like
> this is 9.1 material..

Yep, since currently SR cannot get over the gap of TLI, wal_keep_segments
is not worth taking effect during recovery.

>> *** a/doc/src/sgml/wal.sgml
>> --- b/doc/src/sgml/wal.sgml
>> ***
>> *** 424,429 
>> --- 424,430 
>>    
>>     There will always be at least one WAL segment file, and will normally
>>     not be more than (2 + checkpoint_completion_target)
>> * checkpoint_segments + 1
>> +    or checkpoint_segments + > linkend="guc-wal-keep-segments"> + 1
>>     files.  Each segment file is normally 16 MB (though this size can be
>>     altered when building the server).  You can use this to estimate space
>>     requirements for WAL.
>
> That's not true, wal_keep_segments is the minimum number of files retained,
> independently of checkpoint_segments. The corret formula is (2 +
> checkpoint_completion_target * checkpoint_segments, wal_keep_segments)

You mean that the maximum number of WAL files is: ?

max {
  (2 + checkpoint_completion_target) * checkpoint_segments,
  wal_keep_segments
}

Just after a checkpoint removes old WAL files, there might be wal_keep_segments
WAL files. Additionally, checkpoint_segments WAL files might be generated before
the subsequent checkpoint removes old WAL files. So I think that the maximum
number is

max {
  (2 + checkpoint_completion_target) * checkpoint_segments,
  wal_keep_segments + checkpoint_segments
}

Am I missing something?

>>    
>> +    In archive recovery or standby mode, the server periodically performs
>> +    restartpointsrestartpoint
>> +    which are similar to checkpoints in normal operation: the server
>> forces
>> +    all its state to disk, updates the pg_control file to
>> +    indicate that the already-processed WAL data need not be scanned
>> again,
>> +    and then recycles old log segment files if they are in the
>> +    pg_xlog directory. Note that this recycling is not
>> affected
>> +    by wal_keep_segments at all. A restartpoint is triggered,
>> +    if at least one checkpoint record has been replayed since the last
>> +    restartpoint, every checkpoint_timeout seconds, or every
>> +    checkoint_segments log segments only in standby mode,
>> +    whichever comes first
>
> That last sentence is a bit unclear. How about:
>
> A restartpoint is triggered if at least one checkpoint record has been
> replayed and checkpoint_timeout seconds have passed since last
> restartpoint. In standby mode, a restartpoint is also triggered if
> checkoint_segments log segments have been replayed since last
> restartpoint and at least one checkpoint record has been replayed since.

Thanks! Seems good.

>> ... In log shipping case, the checkpoint interval
>> +    on the standby is normally smaller than that on the master.
>> +   
>
> What does that mean? Restartpoints can't be performed more frequently than
> checkpoints in the master because restartpoints can only be performed at
> checkpoint records.

Yes, that's what I meant.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] failover vs. read only queries

2010-06-10 Thread Fujii Masao
On Thu, Jun 10, 2010 at 9:58 AM, Takahiro Itagaki
 wrote:
>
> Fujii Masao  wrote:
>
>> > 1. Reset max_standby_delay = 0 in postgresql.conf
>> > 2. pg_ctl reload
>> > 3. Create a trigger file
>>
>> As far as I read the HS code, SIGHUP is not checked while a recovery
>> is waiting for queries :(  So pg_ctl reload would have no effect on
>> the conflicting queries.
>>
>> Independently from the problem I raised, I think that we should call
>> HandleStartupProcInterrupts() in that sleep loop.
>
> Hmmm, if reload doesn't work, can we write a query like below?
>
>  SELECT pg_terminate_backend(pid)
>    FROM pg_locks
>   WHERE conflicted-with-recovery-process;

I'm not sure that, but as you suggested, we can minimize the failover
time by using the following operation even in 9.0.

1. Reset max_standby_delay = 0 in postgresql.conf
2. pg_ctl reload
3. Cancel all the queries or all the conflicting ones
4. Create a trigger file

For now, I'll use the above when building the HA system using 9.0
and a clusterware.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] failover vs. read only queries

2010-06-10 Thread Fujii Masao
On Thu, Jun 10, 2010 at 5:06 AM, Tom Lane  wrote:
> Josh Berkus  writes:
>> The fact that failover current does *not* terminate existing queries and
>> transactions was regarded as a feature by the audience, rather than a
>> bug, when I did demos of HS/SR.  Of course, they might not have been
>> thinking of the delay for writes.
>
>> If there were an easy way to make the trigger file cancel all running
>> queries, apply remaining logs and come up, then I'd vote for that for
>> 9.0.  I think it's the more desired behavior by most users.  However,
>> I'm opposed to any complex solutions which might delay 9.0 release.
>
> My feeling about it is that if you want fast failover you should not
> have your failover target server configured as hot standby at all, let
> alone hot standby with a long max_standby_delay.  Such a slave could be
> very far behind on applying WAL when the crunch comes, and no amount of
> query killing will save you from that.  Put your long-running standby
> queries on a different slave instead.
>
> We should consider whether we can improve the situation in 9.1, but it
> is not a must-fix for 9.0; especially when the correct behavior isn't
> immediately obvious.

OK. Let's revisit in 9.1.

I attached the proposal patch for 9.1. The patch treats max_standby_delay
as zero (i.e., cancels all the conflicting queries immediately), ever since
the trigger file is created. So we can cause a recovery to end without
waiting for any lock held by queries, and minimize the failover time.
OTOH, queries which don't conflict with a recovery survive the failover.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


prevent_lock_conflict_from_slowing_failover_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-06-10 Thread Heikki Linnakangas

On 10/06/10 09:14, Fujii Masao wrote:

On Thu, Jun 10, 2010 at 12:09 AM, Heikki Linnakangas
  wrote:

BTW, should there be doc changes for this? I didn't find anything explaining
how restartpoints are triggered, we should add a paragraph somewhere.


+1

What about the attached patch?


> (description of wal_keep_segments)

*** 1902,1907  SET ENABLE_SEQSCAN TO OFF;
--- 1902,1908 
  for standby purposes, and the number of old WAL segments available
  for standbys is determined based only on the location of the previous
  checkpoint and status of WAL archiving.
+ This parameter has no effect on a restartpoint.
  This parameter can only be set in the postgresql.conf
  file or on the server command line.
 


Hmm, I wonder if wal_keep_segments should take effect during recovery 
too? We don't support cascading slaves, but if you have two slaves 
connected to one master (without an archive), and you perform failover 
to one of them, without wal_keep_segments the 2nd slave might not find 
all the files it needs in the new master. Then again, that won't work 
without an archive anyway, because we error out at a TLI mismatch in 
replication. Seems like this is 9.1 material..



*** a/doc/src/sgml/wal.sgml
--- b/doc/src/sgml/wal.sgml
***
*** 424,429 
--- 424,430 

 There will always be at least one WAL segment file, and will normally
 not be more than (2 + checkpoint_completion_target) * 
checkpoint_segments + 1
+or checkpoint_segments +  + 1
 files.  Each segment file is normally 16 MB (though this size can be
 altered when building the server).  You can use this to estimate space
 requirements for WAL.


That's not true, wal_keep_segments is the minimum number of files 
retained, independently of checkpoint_segments. The corret formula is (2 
+ checkpoint_completion_target * checkpoint_segments, wal_keep_segments)




+In archive recovery or standby mode, the server periodically performs
+restartpointsrestartpoint
+which are similar to checkpoints in normal operation: the server forces
+all its state to disk, updates the pg_control file to
+indicate that the already-processed WAL data need not be scanned again,
+and then recycles old log segment files if they are in the
+pg_xlog directory. Note that this recycling is not affected
+by wal_keep_segments at all. A restartpoint is triggered,
+if at least one checkpoint record has been replayed since the last
+restartpoint, every checkpoint_timeout seconds, or every
+checkoint_segments log segments only in standby mode,
+whichever comes first


That last sentence is a bit unclear. How about:

A restartpoint is triggered if at least one checkpoint record has been 
replayed and checkpoint_timeout seconds have passed since 
last restartpoint. In standby mode, a restartpoint is also triggered if 
checkoint_segments log segments have been replayed since 
last restartpoint and at least one checkpoint record has been replayed 
since.



... In log shipping case, the checkpoint interval
+on the standby is normally smaller than that on the master.
+   


What does that mean? Restartpoints can't be performed more frequently 
than checkpoints in the master because restartpoints can only be 
performed at checkpoint records.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] InvalidXLogRecPtr in docs

2010-06-10 Thread Heikki Linnakangas

On 10/06/10 11:37, Fujii Masao wrote:

On Thu, Jun 10, 2010 at 5:04 PM, Heikki Linnakangas
  wrote:

I believe that starting from the beginning of the WAL segment is just
paranoia, to avoid creating a WAL file that's missing some data from the
beginning. Right?


Only when the recovery starting record (i.e., the record at the checkpoint
redo location) is not found, we need to start replication from the
beginning
of the segment, I think. That is, fetching_ckpt = true case in the
following
code.


if (PrimaryConnInfo)
{
RequestXLogStreaming(
fetching_ckpt ? RedoStartLSN : *RecPtr,
PrimaryConnInfo);
continue;
}


Even then, we wouldn't need to start from the beginning of the WAL segment
AFAICS. The point is to start from the Redo pointer, not from the checkpoint
record, because as soon as we read the checkpoint record we'll need to start
applying WAL from the Redo pointer, which is earlier. The WAL file
boundaries don't come into play there.


You mean that the WAL file containing the Redo pointer is guaranteed to exist
if we could read the checkpoint record, so we don't need to start from the
beginning of the segment? This is probably true. But what if we could not read
the checkpoint record? In this case, the WAL file containing the Redo pointer
also might not exist.


Oh, I think I understand the issue now: we need the header in the 
beginning of the WAL segment to be valid, even if the first record we're 
interested in is in the middle of the file. I missed that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LLVM / clang

2010-06-10 Thread Peter Eisentraut
On tis, 2010-06-08 at 12:12 +0200, P. Caillaud wrote:
> I'd like to experiment on compiling postgres with LLVM (either llvm-gcc or  
> clang) on Linux, is it supported ? Where should I start ?

The way to choose a compiler is

./configure CC=your-cc ...other...options...

We support a fair amount of non-GCC compilers, so supporting one or two
more should be possible.

Quick testing shows that clang doesn't get through the configure stage
on this Debian system -- it looks like some amount of better integration
with glibc might be needed.  Building with llvm-gcc works fine, but I
understand that using llvm-gcc with native code generation isn't all
that different from using gcc itself, so that's not a surprising result.
The only issue is that the float8 regression test fails, so it is
apparently not *exactly* the same.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LLVM / clang

2010-06-10 Thread Peter Eisentraut
On ons, 2010-06-09 at 09:59 +0200, Florian Pflug wrote:
> The most heavily platform dependent part of the code is the spinlock
> implementation. You might want to check that it actually uses the
> version optimized for your platform, not the (much slower) generic
> implementation based on semaphores.

You only get the slow implementation if you configure explicitly with
--disable-spinlocks.  A toolchain that didn't support spinlocks would
fail the build and then the user could use that option to get past that
problem.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] InvalidXLogRecPtr in docs

2010-06-10 Thread Fujii Masao
On Thu, Jun 10, 2010 at 5:04 PM, Heikki Linnakangas
 wrote:
>>> Should we:
>>> 1. Just document that,
>>> 2. Change pg_last_xlog_location() to not move backwards in that case, or
>>> 3. Change the behavior so that we start streaming at the exact byte
>>> location
>>> where we left off?
>>
>> I'm for 2 as follows.
>>
>> diff --git a/src/backend/replication/walreceiver.c
>> b/src/backend/replication/walreceiver.c
>> index 26aeca6..f0fd813 100644
>> --- a/src/backend/replication/walreceiver.c
>> +++ b/src/backend/replication/walreceiver.c
>> @@ -524,7 +524,8 @@ XLogWalRcvFlush(void)
>>
>>                 /* Update shared-memory status */
>>                 SpinLockAcquire(&walrcv->mutex);
>> -               walrcv->receivedUpto = LogstreamResult.Flush;
>> +               if (XLByteLT(walrcv->receivedUpto, LogstreamResult.Flush))
>> +                       walrcv->receivedUpto = LogstreamResult.Flush;
>>                 SpinLockRelease(&walrcv->mutex);
>
> That's not enough, because we set receivedUpto in RequestXlogStreaming()
> already.

Ah, you are right.

>>> I believe that starting from the beginning of the WAL segment is just
>>> paranoia, to avoid creating a WAL file that's missing some data from the
>>> beginning. Right?
>>
>> Only when the recovery starting record (i.e., the record at the checkpoint
>> redo location) is not found, we need to start replication from the
>> beginning
>> of the segment, I think. That is, fetching_ckpt = true case in the
>> following
>> code.
>>
>>> if (PrimaryConnInfo)
>>> {
>>>        RequestXLogStreaming(
>>>                fetching_ckpt ? RedoStartLSN : *RecPtr,
>>>                PrimaryConnInfo);
>>>        continue;
>>> }
>
> Even then, we wouldn't need to start from the beginning of the WAL segment
> AFAICS. The point is to start from the Redo pointer, not from the checkpoint
> record, because as soon as we read the checkpoint record we'll need to start
> applying WAL from the Redo pointer, which is earlier. The WAL file
> boundaries don't come into play there.

You mean that the WAL file containing the Redo pointer is guaranteed to exist
if we could read the checkpoint record, so we don't need to start from the
beginning of the segment? This is probably true. But what if we could not read
the checkpoint record? In this case, the WAL file containing the Redo pointer
also might not exist.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] InvalidXLogRecPtr in docs

2010-06-10 Thread Heikki Linnakangas

On 10/06/10 10:43, Fujii Masao wrote:

On Thu, Jun 10, 2010 at 4:07 PM, Heikki Linnakangas
  wrote:

BTW, the docs claim about pg_last_xlog_location() that "While streaming
replication is in progress this will increase monotonically." That's a bit
misleading: when the replication connection is broken for some reason and we
restart it, we begin streaming from the beginning of the last WAL segment.
So at that moment, pg_last_xlog_location() moves backwards to the beginning
of the WAL segment.

Should we:
1. Just document that,
2. Change pg_last_xlog_location() to not move backwards in that case, or
3. Change the behavior so that we start streaming at the exact byte location
where we left off?


I'm for 2 as follows.

diff --git a/src/backend/replication/walreceiver.c
b/src/backend/replication/walreceiver.c
index 26aeca6..f0fd813 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -524,7 +524,8 @@ XLogWalRcvFlush(void)

 /* Update shared-memory status */
 SpinLockAcquire(&walrcv->mutex);
-   walrcv->receivedUpto = LogstreamResult.Flush;
+   if (XLByteLT(walrcv->receivedUpto, LogstreamResult.Flush))
+   walrcv->receivedUpto = LogstreamResult.Flush;
 SpinLockRelease(&walrcv->mutex);


That's not enough, because we set receivedUpto in RequestXlogStreaming() 
already.



I believe that starting from the beginning of the WAL segment is just
paranoia, to avoid creating a WAL file that's missing some data from the
beginning. Right?


Only when the recovery starting record (i.e., the record at the checkpoint
redo location) is not found, we need to start replication from the beginning
of the segment, I think. That is, fetching_ckpt = true case in the following
code.


if (PrimaryConnInfo)
{
RequestXLogStreaming(
fetching_ckpt ? RedoStartLSN : *RecPtr,
PrimaryConnInfo);
continue;
}


Even then, we wouldn't need to start from the beginning of the WAL 
segment AFAICS. The point is to start from the Redo pointer, not from 
the checkpoint record, because as soon as we read the checkpoint record 
we'll need to start applying WAL from the Redo pointer, which is 
earlier. The WAL file boundaries don't come into play there.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] variable TriggerFile can be declared as static

2010-06-10 Thread Heikki Linnakangas

On 10/06/10 10:25, Fujii Masao wrote:

Currently the variable TriggerFile is declared as extern, but
it's not used in other source file than xlog.c. How about
declaring it as static? Here is the patch.


Thanks, applied.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] InvalidXLogRecPtr in docs

2010-06-10 Thread Fujii Masao
On Thu, Jun 10, 2010 at 4:07 PM, Heikki Linnakangas
 wrote:
> Ah, I just committed a patch to do the same, before seeing your email.
> Thanks anyway.

Yeah, thanks a lot!

> BTW, the docs claim about pg_last_xlog_location() that "While streaming
> replication is in progress this will increase monotonically." That's a bit
> misleading: when the replication connection is broken for some reason and we
> restart it, we begin streaming from the beginning of the last WAL segment.
> So at that moment, pg_last_xlog_location() moves backwards to the beginning
> of the WAL segment.
>
> Should we:
> 1. Just document that,
> 2. Change pg_last_xlog_location() to not move backwards in that case, or
> 3. Change the behavior so that we start streaming at the exact byte location
> where we left off?

I'm for 2 as follows.

diff --git a/src/backend/replication/walreceiver.c
b/src/backend/replication/walreceiver.c
index 26aeca6..f0fd813 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -524,7 +524,8 @@ XLogWalRcvFlush(void)

/* Update shared-memory status */
SpinLockAcquire(&walrcv->mutex);
-   walrcv->receivedUpto = LogstreamResult.Flush;
+   if (XLByteLT(walrcv->receivedUpto, LogstreamResult.Flush))
+   walrcv->receivedUpto = LogstreamResult.Flush;
SpinLockRelease(&walrcv->mutex);


> I believe that starting from the beginning of the WAL segment is just
> paranoia, to avoid creating a WAL file that's missing some data from the
> beginning. Right?

Only when the recovery starting record (i.e., the record at the checkpoint
redo location) is not found, we need to start replication from the beginning
of the segment, I think. That is, fetching_ckpt = true case in the following
code.

> if (PrimaryConnInfo)
> {
>   RequestXLogStreaming(
>   fetching_ckpt ? RedoStartLSN : *RecPtr,
>   PrimaryConnInfo);
>   continue;
> }

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-10 Thread Simon Riggs
On Thu, 2010-06-10 at 10:18 +0300, Heikki Linnakangas wrote:
> On 09/06/10 10:21, Simon Riggs wrote:
> > On Tue, 2010-06-08 at 18:30 -0400, Andrew Dunstan wrote:
> >
> >> I prefer archive_cleanup_command. We should name things after their
> >> principal function, not an implementation detail, IMNSHO.
> >>
> >> More importantly, we should include an example in the docs. I created
> >> one the other day  when this was actually bothering me a bit (see
> >> ).
> >> That seemed to work ok, but maybe it's too long, and maybe people would
> >> prefer a shell script to perl.
> >
> > I submitted a patch to make the command "pg_standby -a %r"
> >
> > That's a more portable solution, ISTM.
> >
> > I'll commit that and fix the docs.
> 
> Huh, wait. There's no -a option in pg_standby, so I presume you're 
> planning to add that too. I don't like confusing pg_standby into this, 
> the docs are currently quite clear that if you want to use the built-in 
> standby mode, you can't use pg_standby, and this would muddy the waters.

It won't kill us to change that sentence. "pg_standby is only used now
within the cleanup command" etc

pg_standby already contains the exact logic we need here. Having two
sets of code for the same thing isn't how we do things.

> Maybe we could add a new pg_cleanuparchive binary, but we'll need some 
> discussion...

Which will go nowhere, as we both already know.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] variable TriggerFile can be declared as static

2010-06-10 Thread Fujii Masao
Hi,

Currently the variable TriggerFile is declared as extern, but
it's not used in other source file than xlog.c. How about
declaring it as static? Here is the patch.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


definition_triggerfile_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-10 Thread Heikki Linnakangas

On 09/06/10 10:21, Simon Riggs wrote:

On Tue, 2010-06-08 at 18:30 -0400, Andrew Dunstan wrote:


I prefer archive_cleanup_command. We should name things after their
principal function, not an implementation detail, IMNSHO.

More importantly, we should include an example in the docs. I created
one the other day  when this was actually bothering me a bit (see
).
That seemed to work ok, but maybe it's too long, and maybe people would
prefer a shell script to perl.


I submitted a patch to make the command "pg_standby -a %r"

That's a more portable solution, ISTM.

I'll commit that and fix the docs.


Huh, wait. There's no -a option in pg_standby, so I presume you're 
planning to add that too. I don't like confusing pg_standby into this, 
the docs are currently quite clear that if you want to use the built-in 
standby mode, you can't use pg_standby, and this would muddy the waters.


Maybe we could add a new pg_cleanuparchive binary, but we'll need some 
discussion...


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] InvalidXLogRecPtr in docs

2010-06-10 Thread Heikki Linnakangas

On 10/06/10 09:42, Fujii Masao wrote:

On Thu, Jun 10, 2010 at 11:56 AM, Tom Lane  wrote:

Robert Haas  writes:

On Wed, Jun 9, 2010 at 9:46 PM, Takahiro Itagaki
  wrote:

I found a term "InvalidXLogRecPtr" in 9.0 docs.
http://developer.postgresql.org/pgdocs/postgres/functions-admin.html#FUNCTIONS-RECOVERY-INFO-TABLE
| ... then the return value will be InvalidXLogRecPtr (0/0).



Maybe we should be returning NULL instead of 0/0.


+1 for using NULL instead of an artificially chosen value, for both of
those functions.


Okay, the attached patch makes those functions return NULL in that case.


Ah, I just committed a patch to do the same, before seeing your email. 
Thanks anyway.


BTW, the docs claim about pg_last_xlog_location() that "While streaming 
replication is in progress this will increase monotonically." That's a 
bit misleading: when the replication connection is broken for some 
reason and we restart it, we begin streaming from the beginning of the 
last WAL segment. So at that moment, pg_last_xlog_location() moves 
backwards to the beginning of the WAL segment.


Should we:
1. Just document that,
2. Change pg_last_xlog_location() to not move backwards in that case, or
3. Change the behavior so that we start streaming at the exact byte 
location where we left off?


I believe that starting from the beginning of the WAL segment is just 
paranoia, to avoid creating a WAL file that's missing some data from the 
beginning. Right?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] InvalidXLogRecPtr in docs

2010-06-10 Thread Heikki Linnakangas

On 10/06/10 05:56, Tom Lane wrote:

Robert Haas  writes:

On Wed, Jun 9, 2010 at 9:46 PM, Takahiro Itagaki
  wrote:

I found a term "InvalidXLogRecPtr" in 9.0 docs.
http://developer.postgresql.org/pgdocs/postgres/functions-admin.html#FUNCTIONS-RECOVERY-INFO-TABLE
| ... then the return value will be InvalidXLogRecPtr (0/0).



Maybe we should be returning NULL instead of 0/0.


+1 for using NULL instead of an artificially chosen value, for both of
those functions.


Agreed, committed a patch to do that.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers