Re: [HACKERS] pg_restore --no-post-data and --post-data-only

2011-12-08 Thread Josh Berkus
On 12/8/11 9:18 PM, Joachim Wieland wrote:
> If you ask pg_restore to restore a section out of an archive which
> doesn't have this section, there is no error and the command just
> succeeds. This is what I expected and I think it's the right thing to
> do but maybe others think that
> there should be a warning.

Andrew and I discussed this previously.  It's consistent with how we
treat other options in pg_restore.  It may be that we should be
consistently treating all options differently, but I don't think that's
specific to this patch.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore --no-post-data and --post-data-only

2011-12-08 Thread Joachim Wieland
On Tue, Nov 15, 2011 at 6:14 PM, Andrew Dunstan  wrote:
> Updated version with pg_restore included is attached.

The patch applies with some fuzz by now but compiles without errors or warnings.

The feature just works, it is not adding a lot of new code, basically
it parses the given options and then skips over steps depending on the
selected section.

I verified the equivalence of -a and -s to the respective sections in
the different archive formats and no surprise here either, they were
equivalent except for the header (which has a timestamp).

If you ask pg_restore to restore a section out of an archive which
doesn't have this section, there is no error and the command just
succeeds. This is what I expected and I think it's the right thing to
do but maybe others think that
there should be a warning.

In pg_restore, pre-data cannot be run in parallel, it would only run
serially, data and post-data can run in parallel, though. This is also
what I had expected but it might be worth to add a note about this to
the documentation.

What I didn't like about the implementation was the two set_section()
functions, I'd prefer them to move to a file that is shared between
pg_dump and pg_restore and become one function...


Minor issues:

{"section", required_argument, NULL, 5} in pg_dump.c is not in the alphabetical
order of the options.

 ./pg_restore --section=foobar
pg_restore: unknown section name "foobar")

Note the trailing ')', it's coming from a _(...) confusion

Some of the lines in the patch have trailing spaces and in the
documentation part tabs and spaces are mixed.

int skip used as bool skip in dumpDumpableObject()


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] PostgreSQL fails to build with 32bit MinGW-w64

2011-12-08 Thread NISHIYAMA Tomoaki
Hi,

> FYI I've been testing with the attached patch.
> We'll need to construct a configure test for HAVE_CRTDEFS_H.

Isn't it enough to add the name in configure.in and run autoconf to 
update configure and autoheaders to update pg_config.h.in?
The check of win32 before large file perhaps should also go to configure.in, 
otherwise they would be wiped with next autoconf.

The patch recreated with removing the #undef
but adding the conditional to skip AC_SYS_LARGEFILE in configure.in and 
update configure by autoconf.

diff --git a/config/ac_func_accept_argtypes.m4 
b/config/ac_func_accept_argtypes.m4
index 1e77179..a82788d 100644
--- a/config/ac_func_accept_argtypes.m4
+++ b/config/ac_func_accept_argtypes.m4
@@ -46,7 +46,7 @@ AC_DEFUN([AC_FUNC_ACCEPT_ARGTYPES],
  [AC_CACHE_VAL(ac_cv_func_accept_arg1,dnl
   [AC_CACHE_VAL(ac_cv_func_accept_arg2,dnl
[AC_CACHE_VAL(ac_cv_func_accept_arg3,dnl
-[for ac_cv_func_accept_return in 'int' 'unsigned int PASCAL' 'SOCKET'; do
+[for ac_cv_func_accept_return in 'int' 'unsigned int PASCAL' 'SOCKET 
WSAAPI'; do
   for ac_cv_func_accept_arg1 in 'int' 'unsigned int' 'SOCKET'; do
for ac_cv_func_accept_arg2 in 'struct sockaddr *' 'const struct 
sockaddr *' 'void *'; do
 for ac_cv_func_accept_arg3 in 'int' 'size_t' 'socklen_t' 'unsigned 
int' 'void'; do
diff --git a/configure b/configure
index de9ba5a..dfdd034 100755
--- a/configure
+++ b/configure
@@ -10056,7 +10056,8 @@ done
 
 
 
-for ac_header in crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h 
langinfo.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h 
sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h 
sys/tas.h sys/time.h sys/ucred.h sys/un.h termios.h ucred.h utime.h wchar.h 
wctype.h kernel/OS.h kernel/image.h SupportDefs.h
+
+for ac_header in crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h 
langinfo.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h 
sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h 
sys/tas.h sys/time.h sys/ucred.h sys/un.h termios.h ucred.h utime.h wchar.h 
wctype.h kernel/OS.h kernel/image.h SupportDefs.h crtdefs.h
 do
 as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
 if { as_var=$as_ac_Header; eval "test \"\${$as_var+set}\" = set"; }; then
@@ -17997,6 +17998,7 @@ fi
 # compiler characteristic, but you'd be wrong.  We must check this before
 # probing existence of related functions such as fseeko, since the largefile
 # defines can affect what is generated for that.
+if test "$PORTNAME" != "win32" ; then
 # Check whether --enable-largefile was given.
 if test "${enable_largefile+set}" = set; then
   enableval=$enable_largefile;
@@ -18353,7 +18355,7 @@ rm -rf conftest*
   fi
 fi
 
-
+fi
 # Check for largefile support (must be after AC_SYS_LARGEFILE)
 # The cast to long int works around a bug in the HP C Compiler
 # version HP92453-01 B.11.11.23709.GP, which incorrectly rejects
@@ -18808,7 +18810,7 @@ else
  if test "${ac_cv_func_accept_arg3+set}" = set; then
   $as_echo_n "(cached) " >&6
 else
-  for ac_cv_func_accept_return in 'int' 'unsigned int PASCAL' 'SOCKET'; do
+  for ac_cv_func_accept_return in 'int' 'unsigned int PASCAL' 'SOCKET 
WSAAPI'; do
   for ac_cv_func_accept_arg1 in 'int' 'unsigned int' 'SOCKET'; do
for ac_cv_func_accept_arg2 in 'struct sockaddr *' 'const struct 
sockaddr *' 'void *'; do
 for ac_cv_func_accept_arg3 in 'int' 'size_t' 'socklen_t' 'unsigned 
int' 'void'; do
diff --git a/configure.in b/configure.in
index 5591b93..79180df 100644
--- a/configure.in
+++ b/configure.in
@@ -985,7 +985,7 @@ AC_SUBST(OSSP_UUID_LIBS)
 ##
 
 dnl sys/socket.h is required by AC_FUNC_ACCEPT_ARGTYPES
-AC_CHECK_HEADERS([crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h 
langinfo.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h 
sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h 
sys/tas.h sys/time.h sys/ucred.h sys/un.h termios.h ucred.h utime.h wchar.h 
wctype.h kernel/OS.h kernel/image.h SupportDefs.h])
+AC_CHECK_HEADERS([crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h 
langinfo.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h 
sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h 
sys/tas.h sys/time.h sys/ucred.h sys/un.h termios.h ucred.h utime.h wchar.h 
wctype.h kernel/OS.h kernel/image.h SupportDefs.h crtdefs.h])
 
 # On BSD, cpp test for net/if.h will fail unless sys/socket.h
 # is included first.
@@ -1174,8 +1174,9 @@ fi
 # compiler characteristic, but you'd be wrong.  We must check this before
 # probing existence of related functions such as fseeko, since the largefile
 # defines can affect what is generated for that.
+if test "$PORTNAME" != "win32" ; then
 AC_SYS_LARGEFILE
-
+fi
 # Check for largefile support (must be after AC_SYS_LARGEFILE)
 AC_CHECK_SIZEOF([off_t])
 
diff --git a/src/include/c.h b/src/include/c.h
index 0391860..14f6443 100644
--- a/src/include/c.h
++

Re: [HACKERS] [v9.2] Fix Leaky View Problem

2011-12-08 Thread Kohei KaiGai
2011/12/8 Robert Haas :
> On Sat, Dec 3, 2011 at 3:19 AM, Kohei KaiGai  wrote:
>> I rebased my patch set. New functions in pg_proc.h prevented to apply
>> previous revision cleanly. Here is no functional changes.
>
> I was thinking that my version of this (attached to an email from
> earlier today) might be about ready to commit.  But while I was
> trolling through the archives on this problem trying to figure out who
> to credit, I found an old complaint of Tom's that we never fixed, and
> which represents a security exposure for this patch:
>
> rhaas=# create table foo (a integer);
> CREATE TABLE
> rhaas=# insert into foo select generate_series(1,10);
> INSERT 0 10
> rhaas=# insert into foo values (1);
> INSERT 0 1
> rhaas=# analyze foo;
> ANALYZE
> rhaas=# create view safe_foo with (security_barrier) as select * from
> foo where a > 5;
> CREATE VIEW
> rhaas=# grant select on safe_foo to bob;
> GRANT
>
> Secure in the knowledge that Bob will only be able to see rows where a
> is 6 or higher, we go to bed.  But Bob finds a way to outsmart us:
>
> rhaas=> create or replace function leak(integer,integer) returns
> boolean as $$begin raise notice 'leak % %', $1, $2; return false;
> end$$ language plpgsql;
> CREATE FUNCTION
> rhaas=> create operator !! (procedure = leak, leftarg = integer,
> rightarg = integer, restrict = eqsel);
> CREATE OPERATOR
> rhaas=> explain select * from safe_foo where a !! 0;
> NOTICE:  leak 1 0
>                         QUERY PLAN
> -
>  Subquery Scan on safe_foo  (cost=0.00..2.70 rows=1 width=4)
>   Filter: (safe_foo.a !! 0)
>   ->  Seq Scan on foo  (cost=0.00..1.14 rows=6 width=4)
>         Filter: (a > 5)
> (4 rows)
>
> OOPS.  The *executor* has been persuaded not to apply the
> possibly-nefarious operator !! to the data until after applying the
> security-critical qual "a > 5".  But the *planner* has no such
> compunctions, and has cheerfully leaked the most common value in the
> table, which the user wasn't supposed to see.  I guess it's hopeless
> to suppose that we're going to completely conceal the list of MCVs
> from the user, since it might change the plan - and even if
> ProcessUtility_hook or somesuch is used to disable EXPLAIN, the user
> can still try to ferret out the MCVs via a timing attack.  That having
> been said, the above behavior doesn't sit well with me: letting the
> user probe for MCVs via a timing attack or a plan change is one thing;
> printing them out on request is a little bit too convenient for my
> taste.  :-(
>
Sorry, I missed this scenario, and have not investigated this code path
in detail yet.

My first impression remind me an idea that I proposed before, even
though it got negative response due to user visible changes.
It requires superuser privilege to create new operators, since we
assume superuser does not set up harmful configuration.

I still think it is an idea. Or, maybe, we can adopt a bit weaker restriction;
functions being used to operators must have "leakproof" property.

Is it worthful to have a discussion again?

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Note about bogus pg_amop entries for commutator operators

2011-12-08 Thread Tom Lane
While reviewing the SP-Gist patch I noticed an error that I've seen
before, notably in the rangetypes patch: cross-type operators should
not be included in an opclass merely because their commutators are.
To be useful in an index opclass, an operator has to take the indexed
column's datatype as its *left* input, because index scan conditions
are always "indexed_column OP comparison_value".  So, for example,
if you've got a GiST opclass on points, it might be useful to include
"point <@ box" in that opclass, but there is no reason to include
"box @> point" in it.  (If the operators are marked as commutators,
the planner can figure out what to do with "box @> indexed_point_col",
but this is not dependent on any opclass entries.)

Both of the aforementioned patches contained not only useless pg_amop
entries, but utterly broken "support" code that would have crashed
if it ever could have been reached, because it made the wrong
assumptions about which input would be on which side of the index
clause.

So it finally occurred to me to add an opr_sanity test case that
accounts for this, and lo and behold, it found three similarly bogus
entries that were already in the code:

***
*** 1077,1082 
--- 1092,1115 
  -+---
  (0 rows)
  
+ -- Check that each operator listed in pg_amop has an associated opclass,
+ -- that is one whose opcintype matches oprleft (possibly by coercion).
+ -- Otherwise the operator is useless because it cannot be matched to an index.
+ -- (In principle it could be useful to list such operators in 
multiple-datatype
+ -- btree opfamilies, but in practice you'd expect there to be an opclass for
+ -- every datatype the family knows about.)
+ SELECT p1.amopfamily, p1.amopstrategy, p1.amopopr
+ FROM pg_amop AS p1
+ WHERE NOT EXISTS(SELECT 1 FROM pg_opclass AS p2
+  WHERE p2.opcfamily = p1.amopfamily
+AND binary_coercible(p2.opcintype, p1.amoplefttype));
+  amopfamily | amopstrategy | amopopr 
+ +--+-
+1029 |   27 | 433
+1029 |   47 | 757
+1029 |   67 | 759
+ (3 rows)
+ 
  -- Operators that are primary members of opclasses must be immutable (else
  -- it suggests that the index ordering isn't fixed).  Operators that are
  -- cross-type members need only be stable, since they are just shorthands

(Those are in the gist point_ops opclass, if you were wondering.)

I'm planning to leave it like this for the moment in the spgist patch,
and then go back and clean up the useless point_ops entries and their
underlying dead code in a separate commit.  Just FYI.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fix for pg_upgrade

2011-12-08 Thread panam
OK, works now with the recent update.

Thanks

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/fix-for-pg-upgrade-tp3411128p5059777.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] PostgreSQL fails to build with 32bit MinGW-w64

2011-12-08 Thread Andrew Dunstan



On 12/05/2011 06:27 PM, Andrew Dunstan wrote:



   $ cat regression.diffs
   ***
   
C:/MinGW/msys/1.0/home/pgrunner/bf/root32/HEAD/pgsql/src/test/regress/expected/float8-exp-three-digits-win32.out   
   Fri Nov 25 14:24:49 2011

   ---
   
C:/MinGW/msys/1.0/home/pgrunner/bf/root32/HEAD/pgsql/src/test/regress/results/float8.out   
   Mon Dec  5 18:17:36 2011

   ***
   *** 382,388 
 SET f1 = FLOAT8_TBL.f1 * '-1'
 WHERE FLOAT8_TBL.f1 > '0.0';
  SELECT '' AS bad, f.f1 * '1e200' from FLOAT8_TBL f;
   ! ERROR:  value out of range: overflow
  SELECT '' AS bad, f.f1 ^ '1e200' from FLOAT8_TBL f;
  ERROR:  value out of range: overflow
  SELECT 0 ^ 0 + 0 ^ 1 + 0 ^ 0.0 + 0 ^ 0.5;
   --- 382,396 
 SET f1 = FLOAT8_TBL.f1 * '-1'
 WHERE FLOAT8_TBL.f1 > '0.0';
  SELECT '' AS bad, f.f1 * '1e200' from FLOAT8_TBL f;
   !  bad | ?column?
   ! -+--
   !  |0
   !  |  -3.484e+201
   !  | -1.0043e+203
   !  |-Infinity
   !  | -1.2345678901234
   ! (5 rows)
   !
  SELECT '' AS bad, f.f1 ^ '1e200' from FLOAT8_TBL f;
  ERROR:  value out of range: overflow
  SELECT 0 ^ 0 + 0 ^ 1 + 0 ^ 0.0 + 0 ^ 0.5;

   ==






This is apparently an optimization bug in the compiler. If I turn 
optimization off (CFLAGS=-O0) it goes away. Ick.


So at the moment I'm a bit blocked. I can't really file a bug because 
the compiler can't currently be used to build postgres, I don't have 
time to construct a self-contained test case, and I don't want to commit 
changes to enable the compiler until the issue is solved.


FYI I've been testing with the attached patch. We'll need to construct a 
configure test for HAVE_CRTDEFS_H.


cheers

andrew


diff --git a/config/ac_func_accept_argtypes.m4 b/config/ac_func_accept_argtypes.m4
index 1e77179..a82788d 100644
--- a/config/ac_func_accept_argtypes.m4
+++ b/config/ac_func_accept_argtypes.m4
@@ -46,7 +46,7 @@ AC_DEFUN([AC_FUNC_ACCEPT_ARGTYPES],
  [AC_CACHE_VAL(ac_cv_func_accept_arg1,dnl
   [AC_CACHE_VAL(ac_cv_func_accept_arg2,dnl
[AC_CACHE_VAL(ac_cv_func_accept_arg3,dnl
-[for ac_cv_func_accept_return in 'int' 'unsigned int PASCAL' 'SOCKET'; do
+[for ac_cv_func_accept_return in 'int' 'unsigned int PASCAL' 'SOCKET WSAAPI'; do
   for ac_cv_func_accept_arg1 in 'int' 'unsigned int' 'SOCKET'; do
for ac_cv_func_accept_arg2 in 'struct sockaddr *' 'const struct sockaddr *' 'void *'; do
 for ac_cv_func_accept_arg3 in 'int' 'size_t' 'socklen_t' 'unsigned int' 'void'; do
diff --git a/configure b/configure
index de9ba5a..fbafa4b 100755
--- a/configure
+++ b/configure
@@ -17998,6 +17998,7 @@ fi
 # probing existence of related functions such as fseeko, since the largefile
 # defines can affect what is generated for that.
 # Check whether --enable-largefile was given.
+if test "$PORTNAME" != "win32" ; then
 if test "${enable_largefile+set}" = set; then
   enableval=$enable_largefile;
 fi
@@ -18352,6 +18353,7 @@ esac
 rm -rf conftest*
   fi
 fi
+fi
 
 
 # Check for largefile support (must be after AC_SYS_LARGEFILE)
@@ -18808,7 +18810,7 @@ else
  if test "${ac_cv_func_accept_arg3+set}" = set; then
   $as_echo_n "(cached) " >&6
 else
-  for ac_cv_func_accept_return in 'int' 'unsigned int PASCAL' 'SOCKET'; do
+  for ac_cv_func_accept_return in 'int' 'unsigned int PASCAL' 'SOCKET WSAAPI'; do
   for ac_cv_func_accept_arg1 in 'int' 'unsigned int' 'SOCKET'; do
for ac_cv_func_accept_arg2 in 'struct sockaddr *' 'const struct sockaddr *' 'void *'; do
 for ac_cv_func_accept_arg3 in 'int' 'size_t' 'socklen_t' 'unsigned int' 'void'; do
diff --git a/src/include/c.h b/src/include/c.h
index 0391860..cb9b150 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -58,7 +58,8 @@
 #endif
 #include "postgres_ext.h"
 
-#if _MSC_VER >= 1400 || defined(WIN64)
+#define HAVE_CRTDEFS_H 1
+#if _MSC_VER >= 1400 || defined(HAVE_CRTDEFS_H)
 #define errcode __msvc_errcode
 #include 
 #undef errcode
diff --git a/src/include/port/win32.h b/src/include/port/win32.h
index 34f4004..49da684 100644
--- a/src/include/port/win32.h
+++ b/src/include/port/win32.h
@@ -31,7 +31,7 @@
  * The Mingw64 headers choke if this is already defined - they
  * define it themselves.
  */
-#if !defined(WIN64) || defined(WIN32_ONLY_COMPILER)
+#if !defined(__MINGW64_VERSION_MAJOR) || defined(WIN32_ONLY_COMPILER)
 #define _WINSOCKAPI_
 #endif
 #include 
@@ -225,9 +225,13 @@ int			setitimer(int which, const struct itimerval * value, struct itimerval * ov
 #define fseeko(stream, offset, origin) _fseeki64(stream, offset, origin)
 #define ftello(stream) _ftelli64(stream)
 #else
+#ifndef fseeko
 #define fseeko(stream, offset, origin) fseeko64(stream, offset, origin)
+#endif
+#ifndef ftello
 #define ftello(stream) ftello64(stream)
 #endif
+#endif
 
 /*
  * Supplement to .
@@ -264,6 +268,7 @@ typedef int p

Re: [HACKERS] pg_dump --exclude-table-data

2011-12-08 Thread Andrew Dunstan



On 12/08/2011 11:13 AM, Robert Haas wrote:

On Wed, Dec 7, 2011 at 10:19 PM, Andrew Dunstan  wrote:

I'm also a bit concerned about the relationship between this and the
existing -s option.  It seems odd that you use --schema-only to get
the behavior database-wide, and --exclude-table-data to get it for
just one table.  Is there some way we can make that a bit more
consistent?

It's consistent, and was designed to be, with the --exclude-table option.
I'm not sure what you want it to look like instead. But TBH I'm more
interested in getting the functionality than in how it's spelled.

Ah, hmm.  Well, maybe it's fine the way that you have it.  But I'd be
tempted to at least add a sentence to the sgml documentation for each
option referring to the other, e.g. "To dump only schema for all
tables in the database, see --schema-only." and "To exclude table data
for only a subset of tables in the database, see
--exclude-table-data.", or something along those lines.



Sure, no problem with that.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump --exclude-table-data

2011-12-08 Thread Robert Haas
On Wed, Dec 7, 2011 at 10:19 PM, Andrew Dunstan  wrote:
>> I'm also a bit concerned about the relationship between this and the
>> existing -s option.  It seems odd that you use --schema-only to get
>> the behavior database-wide, and --exclude-table-data to get it for
>> just one table.  Is there some way we can make that a bit more
>> consistent?
>
> It's consistent, and was designed to be, with the --exclude-table option.
> I'm not sure what you want it to look like instead. But TBH I'm more
> interested in getting the functionality than in how it's spelled.

Ah, hmm.  Well, maybe it's fine the way that you have it.  But I'd be
tempted to at least add a sentence to the sgml documentation for each
option referring to the other, e.g. "To dump only schema for all
tables in the database, see --schema-only." and "To exclude table data
for only a subset of tables in the database, see
--exclude-table-data.", or something along those lines.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lots of FSM-related fragility in transaction commit

2011-12-08 Thread Tom Lane
Heikki Linnakangas  writes:
> On 08.12.2011 08:20, Tom Lane wrote:
>> So this is really a whole lot worse than our behavior was in pre-FSM
>> days, and it needs to get fixed.

> This bug was actually introduced only recently. Notice how the log says 
> "consistent recovery state reached at 0/5D71BA8". This interacts badly 
> with Fujii's patch I committed last week:

You're right, I was testing on HEAD not 9.1.x.

> That was harmless until last week, because reachedMinRecoveryPoint was 
> not used for anything unless you're doing archive recovery and hot 
> standby was enabled, but IMO the "consistent recovery state reached" log 
> message was misleading even then. I propose that we apply the attached 
> patch to master and backbranches.

Looks sane to me, though I've not tested to see what effect it has on
the case I was testing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow substitute allocators for PGresult.

2011-12-08 Thread Kyotaro HORIGUCHI
Hello,

 The documentation had slipped my mind.

 This is the patch to add the documentation of PGresult custom
 storage. It shows in section '31.19. Alternative result
 storage'.

 regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 252ff8c..dc2acb6 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -7229,6 +7229,325 @@ int PQisthreadsafe();
  
 
 
+ 
+  Alternative result storage
+
+  
+   PGresult
+   PGconn
+  
+
+  
+   As the standard usage, users can get the result of command
+   execution from PGresult aquired
+   with PGgetResult
+   from PGConn. While the memory areas for
+   the PGresult are allocated with malloc() internally within calls of
+   command execution functions such as PQexec
+   and PQgetResult. If you have difficulties to
+   handle the result records in the form of PGresult, you can instruct
+   PGconn to store them into your own storage instead of PGresult.
+  
+
+  
+   
+
+ PQregisterTupleAdder
+ 
+  PQregisterTupleAdder
+ 
+
+
+
+ 
+   Sets a function to allocate memory for each tuple and column
+   values, and add the completed tuple into your storage.
+
+void PQregisterTupleAdder(PGconn *conn,
+  addTupleFunction func,
+  void *param);
+
+ 
+ 
+ 
+   
+	 
+	   conn
+	   
+	 
+	   The connection object to set the tuple adder
+	   function. PGresult created from this connection calles
+	   this function to store the result tuples instead of
+	   storing into its internal storage.
+	 
+	   
+	 
+	 
+	   func
+	   
+	 
+	   Tuple adder function to set. NULL means to use the
+	   default storage.
+	 
+	   
+	 
+	 
+	   param
+	   
+	 
+	   A pointer to contextual parameter passed
+	   to func.
+	 
+	   
+	 
+   
+ 
+
+   
+  
+
+  
+   
+
+ addTupleFunction
+ 
+  addTupleFunction
+ 
+
+
+
+ 
+   The type for the callback function to serve memory blocks for
+   each tuple and its column values, and to add the constructed
+   tuple into your own storage.
+
+typedef enum 
+{
+  ADDTUP_ALLOC_TEXT,
+  ADDTUP_ALLOC_BINARY,
+  ADDTUP_ADD_TUPLE
+} AddTupFunc;
+
+void *(*addTupleFunction)(PGresult *res,
+  AddTupFunc func,
+  int id,
+  size_t size);
+
+ 
+
+ 
+   Generally this function must return NULL for failure and should
+   set the error message
+   with PGsetAddTupleErrMes if the cause is
+   other than out of memory. This funcion must not throw any
+   exception. This function is called in the sequence following.
+
+   
+	 
+	   Call with func
+	   = ADDTUP_ALLOC_BINARY
+	   and id = -1 to request the memory
+	   for tuple used as an array
+	   of PGresAttValue 
+	 
+	 
+	   Call with func
+	   = ADDTUP_ALLOC_TEXT
+	   or ADDTUP_ALLOC_TEXT
+	   and id is zero or positive number
+	   to request the memory for each column value in current
+	   tuple.
+	 
+	 
+	   Call with func
+	   = ADDTUP_ADD_TUPLE to request the
+	   constructed tuple to store.
+	 
+   
+ 
+ 
+   Calling addTupleFunction
+   with func =
+   ADDTUP_ALLOC_TEXT is telling to return a
+memory block with at least size bytes
+which may not be aligned to the word boundary.
+   id is a zero or positive number
+   distinguishes the usage of requested memory block, that is the
+   position of the column for which the memory block is used.
+ 
+ 
+   When func
+   = ADDTUP_ALLOC_BINARY, this function is
+   telled to return a memory block with at
+   least size bytes which is aligned to the
+   word boundary.
+   id is the identifier distinguishes the
+   usage of requested memory block. -1 means that it is used as an
+   array of PGresAttValue to store the tuple. Zero or
+   positive numbers have the same meanings as for
+   ADDTUP_ALLOC_BINARY.
+ 
+ When func
+   = ADDTUP_ADD_TUPLE, this function is
+   telled to store the PGresAttValue structure
+   constructed by the caller into your storage. The pointer to the
+   tuple structure is not passed so you should memorize the
+   pointer to the memory block passed the caller on
+   func
+   = ADDTUP_ALLOC_BINARY
+   with id is -1. This function must return
+   any non-NULL values for success. You must properly put back the
+   memory blocks passed to the caller for this function if needed.
+ 
+ 
+   
+	 res
+	 
+	   
+	 A pointer to the PGresult object.
+	   
+	 
+   
+   
+	 func
+	 
+	   
+	 An enum value telling the function to perform.
+	   
+	 
+   
+   
+	 param
+	 
+	   
+	 A pointer to contextual parameter passed to func.
+	   
+	 
+   
+ 
+
+   
+  
+
+  
+

Re: [HACKERS] Lots of FSM-related fragility in transaction commit

2011-12-08 Thread Heikki Linnakangas

On 08.12.2011 08:20, Tom Lane wrote:

I thought that removing the unreadable file would let me restart,
but after "rm 52860_fsm" and trying again to start the server,
there's a different problem:

LOG:  database system was interrupted while in recovery at 2011-12-08 00:56:11 
EST
HINT:  This probably means that some data is corrupted and you will have to use 
the last backup for recovery.
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  consistent recovery state reached at 0/5D71BA8
LOG:  redo starts at 0/5100050
WARNING:  page 18 of relation base/27666/52860 is uninitialized
CONTEXT:  xlog redo visible: rel 1663/27666/52860; blk 18
PANIC:  WAL contains references to invalid pages
CONTEXT:  xlog redo visible: rel 1663/27666/52860; blk 18
LOG:  startup process (PID 14507) was terminated by signal 6
LOG:  aborting startup due to startup process failure

Note that this isn't even the same symptom Shraibman hit, since
apparently he was failing on replaying the commit record.  How is it
that the main table file managed to have uninitialized pages?
I suspect that "redo visible" WAL processing is breaking one or more of
the fundamental WAL rules, perhaps by not generating a full-page image
when it needs to.

So this is really a whole lot worse than our behavior was in pre-FSM
days, and it needs to get fixed.


This bug was actually introduced only recently. Notice how the log says 
"consistent recovery state reached at 0/5D71BA8". This interacts badly 
with Fujii's patch I committed last week:



commit 1e616f639156b2431725f7823c999486ca46c1ea
Author: Heikki Linnakangas 
Date:   Fri Dec 2 10:49:54 2011 +0200

During recovery, if we reach consistent state and still have entries in the
invalid-page hash table, PANIC immediately. Immediate PANIC is much better
than waiting for end-of-recovery, which is what we did before, because the
end-of-recovery might not come until months later if this is a standby
server.
...


Recovery thinks it has reached consistency early on, so it PANICs as 
soon as it sees a reference to a page that belongs to a table that was 
later dropped.


The bug here is that we consider having immediately reached consistency 
during crash recovery. That's a false notion: during crash recovery the 
database isn't consistent until *all* WAL has been replayed. We should 
not set reachedMinRecoveryPoint during crash recovery at all. And 
perhaps the flag should be renamed to reachedConsistency, to make it 
clear that during crash recovery it's not enough to reach the nominal 
minRecoveryPoint.


That was harmless until last week, because reachedMinRecoveryPoint was 
not used for anything unless you're doing archive recovery and hot 
standby was enabled, but IMO the "consistent recovery state reached" log 
message was misleading even then. I propose that we apply the attached 
patch to master and backbranches.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9bec660..9d96044 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -562,7 +562,13 @@ static TimeLineID lastPageTLI = 0;
 static XLogRecPtr minRecoveryPoint;		/* local copy of
 		 * ControlFile->minRecoveryPoint */
 static bool updateMinRecoveryPoint = true;
-bool reachedMinRecoveryPoint = false;
+
+/*
+ * Have we reached a consistent database state? In crash recovery, we have
+ * to replay all the WAL, so reachedConsistency is never set. During archive
+ * recovery, the database is consistent once minRecoveryPoint is reached.
+ */
+bool reachedConsistency = false;
 
 static bool InRedo = false;
 
@@ -6894,9 +6900,16 @@ static void
 CheckRecoveryConsistency(void)
 {
 	/*
+	 * During crash recovery, we don't reach a consistent state until we've
+	 * replayed all the WAL.
+	 */
+	if (XLogRecPtrIsInvalid(minRecoveryPoint))
+		return;
+
+	/*
 	 * Have we passed our safe starting point?
 	 */
-	if (!reachedMinRecoveryPoint &&
+	if (!reachedConsistency &&
 		XLByteLE(minRecoveryPoint, EndRecPtr) &&
 		XLogRecPtrIsInvalid(ControlFile->backupStartPoint))
 	{
@@ -6906,7 +6919,7 @@ CheckRecoveryConsistency(void)
 		 */
 		XLogCheckInvalidPages();
 
-		reachedMinRecoveryPoint = true;
+		reachedConsistency = true;
 		ereport(LOG,
 (errmsg("consistent recovery state reached at %X/%X",
 		EndRecPtr.xlogid, EndRecPtr.xrecoff)));
@@ -6919,7 +6932,7 @@ CheckRecoveryConsistency(void)
 	 */
 	if (standbyState == STANDBY_SNAPSHOT_READY &&
 		!LocalHotStandbyActive &&
-		reachedMinRecoveryPoint &&
+		reachedConsistency &&
 		IsUnderPostmaster)
 	{
 		/* use volatile pointer to prevent code rearrangement */
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 350d434..b280434 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -85,7 +85,7 @@ log_invalid_pag