Re: [HACKERS] Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

2012-09-16 Thread Andres Freund
On Monday, September 17, 2012 07:35:06 AM Tom Lane wrote:
> Robert Haas  writes:
> > On Sep 15, 2012, at 11:32 PM, Tom Lane  wrote:
> >> Right, but we do a shutdown checkpoint at the end of crash recovery.
(as noted somewhere else and tackled by Simon, a END_OF_RECOVERY didn't sync 
those before)

> > Yes, but that only writes the buffers that are dirty. It doesn't fix the
> > lack of a BM_PERMANENT flag on a buffer that ought to have had one. So
> > that page can now get modified AGAIN, after recovery, and not
> > checkpointed.
> 
> Ugh.  Yeah, we need to fix this ASAP.  I've notified pgsql-packagers to
> expect a release this week.
Btw, I played with this some more on Saturday and I think, while definitely a 
bad bug, the actual consequences aren't as bad as at least I initially feared.

Fake relcache entries are currently set in 3 scenarios during recovery:
1. removal of ALL_VISIBLE in heapam.c
2. incomplete splits and incomplete deletions in nbtxlog.c
3. incomplete splits in ginxlog.c

As Jeff nicely showed its easy to corrupt the visibilitymap with this. 
Fortunately in < 9.2 that doesn't have too bad consequences and will be fixed 
by the next vacuum. In 9.2 that obviously can result in wrong query results but 
will still be fixed by a later (probably full table) vacuum.

To hit 2) and 3) the server needs to have crashed (or a strange 
recovery_target_* set) while doing a multilevel operation. I have only 
cursorily looked at gin but it looks to me like in both, nbtree and gin, the 
window during logging the multiple steps in a split/deletion is fairly short 
and its not likely that we crashed exactly during that. Unless we haven't read 
the complete multistep operation during recovery we won't ever create fake 
relcache entries for those relations/buffers! And even if that edge case is hit 
it seems somewhat likely that the few pages that are read with the fake entry 
are still in the cache (the incomplete operation has to have been soon before) 
and thus won't get the bad relpersistence flag set.

So I think while that bug had the possibility of being really bad we were 
pretty lucky...

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

2012-09-16 Thread Tom Lane
Robert Haas  writes:
> On Sep 15, 2012, at 11:32 PM, Tom Lane  wrote:
>> Right, but we do a shutdown checkpoint at the end of crash recovery.

> Yes, but that only writes the buffers that are dirty. It doesn't fix the lack 
> of a BM_PERMANENT flag on a buffer that ought to have had one. So that page 
> can now get modified AGAIN, after recovery, and not checkpointed.

Ugh.  Yeah, we need to fix this ASAP.  I've notified pgsql-packagers to
expect a release this week.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

2012-09-16 Thread Rural Hunter

于2012年9月17日 12:47:11,Tom Lane写到:

Bruce Momjian  writes:

On Sun, Sep 16, 2012 at 09:48:58PM -0400, Tom Lane wrote:

Well, that's even stranger, because (1) information_schema.sql_features
ought to have a toast table in either version, and (2) neither pg_dump
nor pg_upgrade ought to be attempting to dump or transfer that table.



I bet pg_upgrade is picking it up from the old cluster because it has an
oid >= FirstNormalObjectId and the table is not in the information
schema.


If it *isn't* in information_schema, but is just some random table that
happens to be named sql_features, then it's hard to explain why there's
anything going wrong at all.  My money is on the OP having done a reload
of the information_schema (as per, eg, the release notes for 9.1.2), and
somehow that's confusing pg_dump and/or pg_upgrade.
ah yes yes, now I can remember it! I have followed the release notes 
and re-created the whole information_schema schema.


regards, tom lane






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

2012-09-16 Thread Rural Hunter

于2012年9月17日 12:32:36,Bruce Momjian写到:

On Sun, Sep 16, 2012 at 06:04:16PM -0400, Tom Lane wrote:

Bruce Momjian  writes:

On Sun, Sep 16, 2012 at 12:38:37PM +0800, Rural Hunter wrote:

I ran the pg_upgrade with the patch and found the problematic object
is a toast object.



OK, this is exactly what I wanted to see, and it explains why pg_dump
didn't show it.  Can you find out what table references this toast
table?  Try this query on the old cluster:



select oid, * from pg_class WHERE reltoastrelid = 16439148;



I believe it will have an oid of 16439145, or it might not exist.


Most likely what's happened is that the table has a toast table that
it doesn't need, as a result of having dropped the only wide column(s)
in it.  So when the table is recreated in the new cluster, there's no
toast table for it.

So what you need to do is get rid of that check, or relax it so that it
doesn't insist on toast tables matching up exactly.  It seems possible
that there could be discrepancies in the other direction too, ie,
new cluster created a toast table when old cluster didn't have one.


pg_dump.c already has this code:

 if (OidIsValid(pg_class_reltoastrelid))
 {
 /*
  * One complexity is that the table definition might not require
  * the creation of a TOAST table, and the TOAST table might have
  * been created long after table creation, when the table was
  * loaded with wide data.  By setting the TOAST oid we force
  * creation of the TOAST heap and TOAST index by the backend so we
  * can cleanly copy the files during binary upgrade.
  */

 appendPQExpBuffer(upgrade_buffer,
   "SELECT 
binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
   pg_class_reltoastrelid);

 /* every toast table has an index */
 appendPQExpBuffer(upgrade_buffer,
   "SELECT 
binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
   pg_class_reltoastidxid);
 }

As you can see, we look at the existing TOAST usage and force the new
cluster to match.  As I remember we replay the DROP COLUMN in binary
upgrade mode so the new cluster always matches the old cluster's TOAST
usage.  I certainly have never seen this bug reported before.

I think the big question is why did this case fail?  I can say that the
query that pulls details from each cluster skips information_schema or
oid < FirstNormalObjectId.  I wonder if there is a mismatch between what
pg_dump filters out and pg_upgrade.  Can you tell us the schema of the
'sql_features' table?

# select * from pg_tables where tablename='sql_features';
schemaname |  tablename   | tableowner | tablespace | 
hasindexes | hasrules | hastriggers

+--++++--+-
information_schema | sql_features | postgres   || f
 | f| f

(1 row)


Also, does it appear in the pg_dump --schema-only output?  I don't think
it does because it wasn't reported in the pg_dump --schema-only diff I
requested, and pg_dump wouldn't have dumped it from the new cluster.

right. I checked the dump from the old cluster and it's not there.


What that means is that 'sql_features' got a TOAST table in the old
cluster but while 'sql_features' also has a TOAST table in the new
cluster, it isn't processed by pg_upgrade because it is in the
information schema and has an oid < FirstNormalObjectId.






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

2012-09-16 Thread Tom Lane
Bruce Momjian  writes:
> On Sun, Sep 16, 2012 at 09:48:58PM -0400, Tom Lane wrote:
>> Well, that's even stranger, because (1) information_schema.sql_features
>> ought to have a toast table in either version, and (2) neither pg_dump
>> nor pg_upgrade ought to be attempting to dump or transfer that table.

> I bet pg_upgrade is picking it up from the old cluster because it has an
> oid >= FirstNormalObjectId and the table is not in the information
> schema.

If it *isn't* in information_schema, but is just some random table that
happens to be named sql_features, then it's hard to explain why there's
anything going wrong at all.  My money is on the OP having done a reload
of the information_schema (as per, eg, the release notes for 9.1.2), and
somehow that's confusing pg_dump and/or pg_upgrade.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

2012-09-16 Thread Bruce Momjian
On Sun, Sep 16, 2012 at 09:48:58PM -0400, Tom Lane wrote:
> Rural Hunter  writes:
> > # select oid, * from pg_class WHERE reltoastrelid = 16439148;
> >oid|   relname| relnamespace | reltype  | reloftype | 
> > relowner | relam | relfilenode | reltablespace | relpages | reltuples | 
> > reltoastrelid | reltoastidxid | relhasindex | relisshared | 
> > relpersistence | relkind | relnatts | relchecks | relhasoids | 
> > relhaspkey | relhasrules | relhastriggers | relhassubclass | 
> > relfrozenxid | relacl  | reloptions
> > --+--+--+--+---+--+---+-+---+--+---+---+---+-+-++-+--+---+++-+++--+-+
> >  16439145 | sql_features | 16438995 | 16439147 | 0 |   
> > 10 | 0 |16439145 | 0 |0 | 0 |  
> > 16439148 | 0 | f   | f   | p  | 
> > r   |7 | 0 | f  | f  | f   
> > | f  | f  |630449585 | 
> > {postgres=arwdDxt/postgres,=r/postgres} |
> > (1 row)
> 
> Well, that's even stranger, because (1) information_schema.sql_features
> ought to have a toast table in either version, and (2) neither pg_dump
> nor pg_upgrade ought to be attempting to dump or transfer that table.

I bet pg_upgrade is picking it up from the old cluster because it has an
oid >= FirstNormalObjectId and the table is not in the information
schema.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

2012-09-16 Thread Bruce Momjian
On Sun, Sep 16, 2012 at 06:04:16PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Sun, Sep 16, 2012 at 12:38:37PM +0800, Rural Hunter wrote:
> >> I ran the pg_upgrade with the patch and found the problematic object
> >> is a toast object.
> 
> > OK, this is exactly what I wanted to see, and it explains why pg_dump
> > didn't show it.  Can you find out what table references this toast
> > table?  Try this query on the old cluster:
> 
> > select oid, * from pg_class WHERE reltoastrelid = 16439148;
> 
> > I believe it will have an oid of 16439145, or it might not exist.
> 
> Most likely what's happened is that the table has a toast table that
> it doesn't need, as a result of having dropped the only wide column(s)
> in it.  So when the table is recreated in the new cluster, there's no
> toast table for it.
> 
> So what you need to do is get rid of that check, or relax it so that it
> doesn't insist on toast tables matching up exactly.  It seems possible
> that there could be discrepancies in the other direction too, ie,
> new cluster created a toast table when old cluster didn't have one.

pg_dump.c already has this code:

if (OidIsValid(pg_class_reltoastrelid))
{
/*
 * One complexity is that the table definition might not require
 * the creation of a TOAST table, and the TOAST table might have
 * been created long after table creation, when the table was
 * loaded with wide data.  By setting the TOAST oid we force
 * creation of the TOAST heap and TOAST index by the backend so we
 * can cleanly copy the files during binary upgrade.
 */

appendPQExpBuffer(upgrade_buffer,
  "SELECT 
binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
  pg_class_reltoastrelid);

/* every toast table has an index */
appendPQExpBuffer(upgrade_buffer,
  "SELECT 
binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
  pg_class_reltoastidxid);
}

As you can see, we look at the existing TOAST usage and force the new
cluster to match.  As I remember we replay the DROP COLUMN in binary
upgrade mode so the new cluster always matches the old cluster's TOAST
usage.  I certainly have never seen this bug reported before.

I think the big question is why did this case fail?  I can say that the
query that pulls details from each cluster skips information_schema or
oid < FirstNormalObjectId.  I wonder if there is a mismatch between what
pg_dump filters out and pg_upgrade.  Can you tell us the schema of the
'sql_features' table?

Also, does it appear in the pg_dump --schema-only output?  I don't think
it does because it wasn't reported in the pg_dump --schema-only diff I
requested, and pg_dump wouldn't have dumped it from the new cluster. 

What that means is that 'sql_features' got a TOAST table in the old
cluster but while 'sql_features' also has a TOAST table in the new
cluster, it isn't processed by pg_upgrade because it is in the
information schema and has an oid < FirstNormalObjectId.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Patch : Change pg_ident.conf parsing to be the same as pg_hba.conf

2012-09-16 Thread Amit Kapila
On Monday, September 17, 2012 1:39 AM Jeff Janes wrote:
On Wed, Jul 4, 2012 at 7:38 AM, Amit Kapila  wrote:
> From: Robert Haas [mailto:robertmh...@gmail.com]
> Sent: Tuesday, July 03, 2012 9:43 PM
> On Mon, Jul 2, 2012 at 8:08 AM, Amit Kapila 
wrote:
 Suggestions?
>
>>> I suggest you add this to the next CommitFest.  :-)
>
>>> https://commitfest.postgresql.org/action/commitfest_view?id=14
>
>>> Meanwhile, we have this CommitFest to get finished with...
>
>> I shall add it to next CF.

> I put my self down as a reviewer for this a while ago, but I never got
> beyond the stage of verifying that it applies and compiles, and that
> it does what it says it does, at least in the broad sense, and that
> what it does is desirable (I think it is).

> I haven't done a detailed code review, or tested for corner cases.
> Unfortunately I don't think I'll be able to do either of those during
> this commitfest, or at least not until toward the end of it.  So I
> took myself off as reviewer so the commit-fest manager can assign
> someone else.  Sorry about that.

  Thank you for confirming the sanity of patch at broad level. 

With Regards,
Amit Kapila.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

2012-09-16 Thread Tom Lane
Rural Hunter  writes:
> 于2012年9月17日 9:48:58,Tom Lane写到:
>> I wonder whether you dropped and recreated the information_schema in
>> the lifetime of this database?  We have recommended doing that in the
>> past, IIRC.  Could such a thing have confused pg_dump?

> No, I have never manually re-created the table.

I think you must have, because the query output shows that sql_features,
its rowtype, and the information_schema all have OIDs much larger than
they would have had in a virgin installation.  The large relfilenode
could have been explained by a VACUUM FULL, but the other OIDs wouldn't
have been changed by that.

> This is the first time 
> I see the name. But I'm not sure other things I installed before 
> recreated it or not, such as pg_buffercache etc. One more thing, is 
> this a hidden table? I can see it with '\d 
> information_schema.sql_features' but it's not in the list of '\d'.

That just means that information_schema is not in your search_path.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

2012-09-16 Thread Rural Hunter

于2012年9月17日 9:48:58,Tom Lane写到:

Rural Hunter  writes:

# select oid, * from pg_class WHERE reltoastrelid = 16439148;
oid|   relname| relnamespace | reltype  | reloftype |
relowner | relam | relfilenode | reltablespace | relpages | reltuples |
reltoastrelid | reltoastidxid | relhasindex | relisshared |
relpersistence | relkind | relnatts | relchecks | relhasoids |
relhaspkey | relhasrules | relhastriggers | relhassubclass |
relfrozenxid | relacl  | reloptions
--+--+--+--+---+--+---+-+---+--+---+---+---+-+-++-+--+---+++-+++--+-+
  16439145 | sql_features | 16438995 | 16439147 | 0 |
10 | 0 |16439145 | 0 |0 | 0 |
16439148 | 0 | f   | f   | p  |
r   |7 | 0 | f  | f  | f
| f  | f  |630449585 |
{postgres=arwdDxt/postgres,=r/postgres} |
(1 row)


Well, that's even stranger, because (1) information_schema.sql_features
ought to have a toast table in either version, and (2) neither pg_dump
nor pg_upgrade ought to be attempting to dump or transfer that table.

I wonder whether you dropped and recreated the information_schema in
the lifetime of this database?  We have recommended doing that in the
past, IIRC.  Could such a thing have confused pg_dump?

regards, tom lane



No, I have never manually re-created the table. This is the first time 
I see the name. But I'm not sure other things I installed before 
recreated it or not, such as pg_buffercache etc. One more thing, is 
this a hidden table? I can see it with '\d 
information_schema.sql_features' but it's not in the list of '\d'.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

2012-09-16 Thread Tom Lane
Rural Hunter  writes:
> # select oid, * from pg_class WHERE reltoastrelid = 16439148;
>oid|   relname| relnamespace | reltype  | reloftype | 
> relowner | relam | relfilenode | reltablespace | relpages | reltuples | 
> reltoastrelid | reltoastidxid | relhasindex | relisshared | 
> relpersistence | relkind | relnatts | relchecks | relhasoids | 
> relhaspkey | relhasrules | relhastriggers | relhassubclass | 
> relfrozenxid | relacl  | reloptions
> --+--+--+--+---+--+---+-+---+--+---+---+---+-+-++-+--+---+++-+++--+-+
>  16439145 | sql_features | 16438995 | 16439147 | 0 |   
> 10 | 0 |16439145 | 0 |0 | 0 |  
> 16439148 | 0 | f   | f   | p  | 
> r   |7 | 0 | f  | f  | f   
> | f  | f  |630449585 | 
> {postgres=arwdDxt/postgres,=r/postgres} |
> (1 row)

Well, that's even stranger, because (1) information_schema.sql_features
ought to have a toast table in either version, and (2) neither pg_dump
nor pg_upgrade ought to be attempting to dump or transfer that table.

I wonder whether you dropped and recreated the information_schema in
the lifetime of this database?  We have recommended doing that in the
past, IIRC.  Could such a thing have confused pg_dump?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

2012-09-16 Thread Rural Hunter

于2012年9月17日 1:17:46,Bruce Momjian写到:

On Sun, Sep 16, 2012 at 12:38:37PM +0800, Rural Hunter wrote:

OK, I see many new ALTER TABLE commands, but nothing that would cause a
difference in relation count.

Attached is a patch that will return the OID of the old/new mismatched
entries.  Please research the pg_class objects on the old/new clusters
that have the mismatch and let me know.  It might be something that
isn't in the old cluster, or not in the new cluster.


I ran the pg_upgrade with the patch and found the problematic object
is a toast object.
Copying user relation files
/raid/pgsql/base/6087920/6088238
Mismatch of relation OID in database "forummon": old OID 16439148,
new OID 16439322

In old cluster:
# select * from pg_class WHERE oid=16439148;
relname | relnamespace | reltype | reloftype | relowner | relam |
relfilenode | reltablespace | relpages | reltuples | reltoastrelid |
reltoastidxid | relhasindex | relisshared | relpersistence | relkind
| relnatts | relchecks | relhasoids | relhaspkey | relhasrules |
relhastriggers | relhassubclass | relfrozenxid | relacl | reloptions
---+--+--+---+--+---+-+---+--+---+---+---+-+-++-+--+---+++-+++--++
pg_toast_16439145 | 99 | 16439149 | 0 | 10 | 0 | 16439148 | 0 | 0 |
0 | 0 | 16439150 | t | f | p | t | 3 | 0 | f | t | f | f | f |
630449585 | |
(1 row)

But it doesn't exist in new cluster:
select * from pg_class WHERE oid=16439148;
relname | relnamespace | reltype | reloftype | relowner | relam |
relfilenode | reltablespace | relpages | reltuples | relallvisible |
reltoastrelid | reltoastidxid | relhasindex | relisshared |
relpersistence | relkind | relnatts | relchecks | relhasoids |
relhaspkey | relhasrules | relhastriggers | relhassubclass |
relfrozenxid | relacl | reloptions
-+--+-+---+--+---+-+---+--+---+---+---+---+-+-++-+--+---+++-+++--++
(0 rows)


[ Thread moved to hackers list.]

OK, this is exactly what I wanted to see, and it explains why pg_dump
didn't show it.  Can you find out what table references this toast
table?  Try this query on the old cluster:

select oid, * from pg_class WHERE reltoastrelid = 16439148;

I believe it will have an oid of 16439145, or it might not exist.


# select oid, * from pg_class WHERE reltoastrelid = 16439148;
  oid|   relname| relnamespace | reltype  | reloftype | 
relowner | relam | relfilenode | reltablespace | relpages | reltuples | 
reltoastrelid | reltoastidxid | relhasindex | relisshared | 
relpersistence | relkind | relnatts | relchecks | relhasoids | 
relhaspkey | relhasrules | relhastriggers | relhassubclass | 
relfrozenxid | relacl  | reloptions

--+--+--+--+---+--+---+-+---+--+---+---+---+-+-++-+--+---+++-+++--+-+
16439145 | sql_features | 16438995 | 16439147 | 0 |   
10 | 0 |16439145 | 0 |0 | 0 |  
16439148 | 0 | f   | f   | p  | 
r   |7 | 0 | f  | f  | f   
| f  | f  |630449585 | 
{postgres=arwdDxt/postgres,=r/postgres} |

(1 row)

It's not a table. I haven't seen this name before. not sure why it 
exists. So what's the next thing I can do?




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 2/2] Add a new function pg_relation_by_filenode to lookup up a relation given the tablespace and the filenode OIDs

2012-09-16 Thread Andres Freund
On Monday, September 17, 2012 12:35:32 AM Tom Lane wrote:
> Andres Freund  writes:
> > This requires the previously added RELFILENODE syscache.
> 
> [ raised eyebrow... ]  There's a RELFILENODE syscache?  I don't see one,
> and I doubt it would work given that the contents of
> pg_class.relfilenode aren't unique (the zero entries are the problem).
Well, one patch upthread ;). It mentions the problem of it not being unique due 
to relfilenode in (reltablespace, relfilenode) being 0 for shared/nailed 
catalogs.

I am not really sure yet how big a problem for the caching infrastructure it is 
that values that shouldn't ever get queried (because the relfilenode is 
actually different) are duplicated. Reading code about all that atm.

Robert suggested writing a specialized cache akin to whats done in 
attoptcache.c or such.

I haven't formed an opinion on whats the way forward on that topic. But anyway, 
I don't see how the wal decoding stuff can progress without some variant of 
that mapping, so I sure hope I/we can build something. Changing that aspect of 
the patch should be trivial...

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 2/2] Add a new function pg_relation_by_filenode to lookup up a relation given the tablespace and the filenode OIDs

2012-09-16 Thread Tom Lane
Andres Freund  writes:
> This requires the previously added RELFILENODE syscache.

[ raised eyebrow... ]  There's a RELFILENODE syscache?  I don't see one,
and I doubt it would work given that the contents of
pg_class.relfilenode aren't unique (the zero entries are the problem).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] too much pgbench init output

2012-09-16 Thread Tomas Vondra
On 5.9.2012 06:17, Robert Haas wrote:
> On Tue, Sep 4, 2012 at 11:31 PM, Peter Eisentraut  wrote:
>> On Tue, 2012-09-04 at 23:14 -0400, Robert Haas wrote:
>>> Actually, this whole things seems like a solution in search of a
>>> problem to me.  We just reduced the verbosity of pgbench -i tenfold in
>>> the very recent past - I would have thought that enough to address
>>> this problem.  But maybe not.
>>
>> The problem is that
>>
>> a) It blasts out too much output and everything scrolls off the screen,
>> and
>>
>> b) There is no indication of where the end is.
>>
>> These are independent problems, and I'd be happy to address them
>> separately if there are such specific concerns attached to this.
>>
>> Speaking of tenfold, we could reduce the output frequency tenfold to
>> once every 100, which would alleviate this problem for a while
>> longer.
> 
> Well, I wouldn't object to displaying a percentage on each output
> line.  But I don't really like the idea of having them less frequent
> than they already are, because if you run into a situation that makes
> pgbench -i run slowly, as I occasionally do, it's marginal to tell the
> difference between "slow" and "completely hung" even with the current
> level of verbosity.
> 
> However, we could add a -q flag to run more quietly, or something like
> that.  Actually, I'd even be fine with making the default quieter,
> though we can't use -v for verbose since that's already taken.  But
> I'd like to preserve the option of getting the current amount of
> output because sometimes I need that to troubleshoot problems.
> Actually it'd be nice to even get a bit more output: say, a timestamp
> on each line, and a completion percentage... but now I'm getting
> greedy.

Hi,

I've been thinking about this a bit more, and do propose to use an
option that determines "logging step" i.e. number of items (either
directly or as a percentage) between log lines.

The attached patch defines a new option "--logging-step" that accepts
either integers or percents. For example if you want to print a line
each 1000 lines, you can to this

  $ pgbench -i -s 1000 --logging-step 1000 testdb

and if you want to print a line each 5%, you can do this

  $ pgbench -i -s 1000 --logging-step 5% testdb

and that's it.

Moreover the patch adds a record of elapsed an estimate of remaining
time. So for example with 21% you may get this:

creating tables...
21000 of 10 tuples (21%) done (elapsed 1.56 s, remaining 5.85 s).
42000 of 10 tuples (42%) done (elapsed 3.15 s, remaining 4.35 s).
63000 of 10 tuples (63%) done (elapsed 4.73 s, remaining 2.78 s).
84000 of 10 tuples (84%) done (elapsed 6.30 s, remaining 1.20 s).
10 of 10 tuples (100%) done (elapsed 8.17 s, remaining 0.00 s).
vacuum...
set primary keys...

Now, I've had a hard time with the patch - no matter what I do, I do get
"invalid option" error whenever I try to run that from command line for
some reason. But when I run it from gdb, it works just fine.

kind regards
Tomas
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index f5ac3b1..ce7e240 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -130,6 +130,11 @@ intforeign_keys = 0;
 intunlogged_tables = 0;
 
 /*
+ * logging step (inserts)
+ */
+intlog_step = 10;
+
+/*
  * tablespace selection
  */
 char  *tablespace = NULL;
@@ -356,6 +361,8 @@ usage(void)
   "   create tables in the specified tablespace\n"
   "  --unlogged-tables\n"
   "   create tables as unlogged tables\n"
+  "  --logging-step NUM\n"
+  "   how often to print info about init 
progress\n"
   "\nBenchmarking options:\n"
"  -c NUM   number of concurrent database clients (default: 
1)\n"
   "  -C   establish new connection for each 
transaction\n"
@@ -1340,6 +1347,10 @@ init(bool is_no_vacuum)
charsql[256];
int i;
 
+   /* used to track elapsed time and estimate of the remaining time */
+   instr_time  start, diff;
+   double elapsed_sec, remaining_sec;
+
if ((con = doConnect()) == NULL)
exit(1);
 
@@ -1408,6 +1419,8 @@ init(bool is_no_vacuum)
}
PQclear(res);
 
+   INSTR_TIME_SET_CURRENT(start);
+
for (i = 0; i < naccounts * scale; i++)
{
int j = i + 1;
@@ -1419,10 +1432,18 @@ init(bool is_no_vacuum)
exit(1);
}
 
-   if (j % 10 == 0)
-   fprintf(stderr, "%d of %d tuples (%d%%) done.\n",
+   if (j % log_step == 0 || j == scale * naccounts)
+   {
+   INSTR_TIME_SET_CURRENT(diff);
+   INSTR_TIME_SUBTRACT(diff, start);
+   

[HACKERS] [PATCH 2/2] Add a new function pg_relation_by_filenode to lookup up a relation given the tablespace and the filenode OIDs

2012-09-16 Thread Andres Freund

This requires the previously added RELFILENODE syscache.
---
 doc/src/sgml/func.sgml | 23 -
 src/backend/utils/adt/dbsize.c | 78 ++
 src/include/catalog/pg_proc.h  |  2 ++
 src/include/utils/builtins.h   |  1 +
 4 files changed, 103 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f8f63d8..708da35 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15170,7 +15170,7 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
 

 The functions shown in  assist
-in identifying the specific disk files associated with database objects.
+in identifying the specific disk files associated with database objects or doing the reverse.

 

@@ -15179,6 +15179,9 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());

 pg_relation_filepath

+   
+pg_relation_by_filenode
+   
 

 Database Object Location Functions
@@ -15207,6 +15210,15 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
 File path name of the specified relation

   
+  
+   
+pg_relation_by_filenode(tablespace oid, filenode oid)
+
+   regclass
+   
+Find the associated relation of a filenode
+   
+  
  
 

@@ -15230,6 +15242,15 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
 the relation.

 
+   
+pg_relation_by_filenode is the reverse of
+pg_relation_filenode. Given a tablespace OID and
+a filenode it returns the associated relation. The default
+tablespace for user tables can be replaced with 0. Check the
+documentation of pg_relation_filenode for an explanation why
+this cannot always easily answered by querying pg_class.
+   
+
   
 
   
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index cd23334..841a445 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -741,6 +741,84 @@ pg_relation_filenode(PG_FUNCTION_ARGS)
 }
 
 /*
+ * Get the relation via (reltablespace, relfilenode)
+ *
+ * This is expected to be used when somebody wants to match an individual file
+ * on the filesystem back to its table. Thats not trivially possible via
+ * pg_class because that doesn't contain the relfilenodes of shared and nailed
+ * tables.
+ *
+ * We don't fail but return NULL if we cannot find a mapping.
+ *
+ * Instead of knowing DEFAULTTABLESPACE_OID you can pass 0.
+ */
+Datum
+pg_relation_by_filenode(PG_FUNCTION_ARGS)
+{
+	Oid			reltablespace = PG_GETARG_OID(0);
+	Oid			relfilenode = PG_GETARG_OID(1);
+	Oid			lookup_tablespace = reltablespace;
+	Oid result = InvalidOid;
+	HeapTuple	tuple;
+
+	if (reltablespace == 0)
+		reltablespace = DEFAULTTABLESPACE_OID;
+
+	/* pg_class stores 0 instead of DEFAULTTABLESPACE_OID */
+	if (reltablespace == DEFAULTTABLESPACE_OID)
+		lookup_tablespace = 0;
+
+	tuple = SearchSysCache2(RELFILENODE,
+			lookup_tablespace,
+			relfilenode);
+
+	/* found it in the system catalog, not be a shared/nailed table */
+	if (HeapTupleIsValid(tuple))
+	{
+		result = HeapTupleHeaderGetOid(tuple->t_data);
+		ReleaseSysCache(tuple);
+	}
+	else
+	{
+		if (reltablespace == GLOBALTABLESPACE_OID)
+		{
+			result = RelationMapFilenodeToOid(relfilenode, true);
+		}
+		else
+		{
+			Form_pg_class relform;
+
+			result = RelationMapFilenodeToOid(relfilenode, false);
+
+			if (result != InvalidOid)
+			{
+/* check that we found the correct relation */
+tuple = SearchSysCache1(RELOID,
+	result);
+
+if (!HeapTupleIsValid(tuple))
+{
+	elog(ERROR, "Couldn't refind previously looked up relation with oid %u",
+		 result);
+}
+
+relform = (Form_pg_class) GETSTRUCT(tuple);
+
+if (relform->reltablespace != reltablespace)
+	result = InvalidOid;
+
+ReleaseSysCache(tuple);
+			}
+		}
+	}
+
+	if (!OidIsValid(result))
+		PG_RETURN_NULL();
+	else
+		PG_RETURN_OID(result);
+}
+
+/*
  * Get the pathname (relative to $PGDATA) of a relation
  *
  * See comments for pg_relation_filenode.
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index b5b886b..c8233cd 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -3430,6 +3430,8 @@ DATA(insert OID = 2998 ( pg_indexes_size		PGNSP PGUID 12 1 0 0 0 f f f f t f v 1
 DESCR("disk space usage for all indexes attached to the specified table");
 DATA(insert OID = 2999 ( pg_relation_filenode	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 26 "2205" _null_ _null_ _null_ _null_ pg_relation_filenode _null_ _null_ _null_ ));
 DESCR("filenode identifier of relation");
+DATA(insert OID = 3170 ( pg_relation_by_filenode PGNSP PGUID 12 1 0 0 0 f f f f t f s 2 0 2205 "26 26" _null_ _null_ _null_ _null_ pg_relation_by_filenode _null_ _null_ _null_ ));
+DESCR("filenode identifier of relation");
 DATA(insert OID = 3034 ( pg_relatio

[HACKERS] [PATCH 1/2] Add a new relmapper.c function RelationMapFilenodeToOid that acts as a reverse of RelationMapOidToFilenode

2012-09-16 Thread Andres Freund
---
 src/backend/utils/cache/relmapper.c | 53 +
 src/include/utils/relmapper.h   |  2 ++
 2 files changed, 55 insertions(+)

diff --git a/src/backend/utils/cache/relmapper.c b/src/backend/utils/cache/relmapper.c
index 6f21495..771f34d 100644
--- a/src/backend/utils/cache/relmapper.c
+++ b/src/backend/utils/cache/relmapper.c
@@ -180,6 +180,59 @@ RelationMapOidToFilenode(Oid relationId, bool shared)
 	return InvalidOid;
 }
 
+/* RelationMapFilenodeToOid
+ *
+ * Do the reverse of the normal direction of mapping done in
+ * RelationMapOidToFilenode.
+ *
+ * This is not supposed to be used during normal running but rather for
+ * information purposes when looking at the filesystem or the xlog.
+ *
+ * Returns InvalidOid if the OID is not know which can easily happen if the
+ * filenode is not of a relation that is nailed or shared or if it simply
+ * doesn't exists anywhere.
+ */
+Oid
+RelationMapFilenodeToOid(Oid filenode, bool shared)
+{
+	const RelMapFile *map;
+	int32		i;
+
+	/* If there are active updates, believe those over the main maps */
+	if (shared)
+	{
+		map = &active_shared_updates;
+		for (i = 0; i < map->num_mappings; i++)
+		{
+			if (filenode == map->mappings[i].mapfilenode)
+return map->mappings[i].mapoid;
+		}
+		map = &shared_map;
+		for (i = 0; i < map->num_mappings; i++)
+		{
+			if (filenode == map->mappings[i].mapfilenode)
+return map->mappings[i].mapoid;
+		}
+	}
+	else
+	{
+		map = &active_local_updates;
+		for (i = 0; i < map->num_mappings; i++)
+		{
+			if (filenode == map->mappings[i].mapfilenode)
+return map->mappings[i].mapoid;
+		}
+		map = &local_map;
+		for (i = 0; i < map->num_mappings; i++)
+		{
+			if (filenode == map->mappings[i].mapfilenode)
+return map->mappings[i].mapoid;
+		}
+	}
+
+	return InvalidOid;
+}
+
 /*
  * RelationMapUpdateMap
  *
diff --git a/src/include/utils/relmapper.h b/src/include/utils/relmapper.h
index 111a05c..4e56508 100644
--- a/src/include/utils/relmapper.h
+++ b/src/include/utils/relmapper.h
@@ -36,6 +36,8 @@ typedef struct xl_relmap_update
 
 extern Oid	RelationMapOidToFilenode(Oid relationId, bool shared);
 
+extern Oid	RelationMapFilenodeToOid(Oid relationId, bool shared);
+
 extern void RelationMapUpdateMap(Oid relationId, Oid fileNode, bool shared,
 	 bool immediate);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Add pg_relation_by_filenode(reltbspc, filenode) admin function

2012-09-16 Thread Andres Freund
Now that I proposed a new syscache upthread its easily possible to provide
pg_relation_by_filenode which I wished for multiple times in the past when
looking at filesystem activity and wondering which table does what. You can
sortof get the same result via

SELECT oid FROM (
   SELECT oid, pg_relation_filenode(oid::regclass) filenode
   FROM pg_class WHERE relkind != 'v'
) map
WHERE map.filenode = ...;

but thats neither efficient nor obvious.

So, two patches to do this:


Did others need this in the past? I can live with the 2nd patch living in a
private extension somewhere. The first one would also be useful for some
error/debug messages during decoding...

Greetings,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

2012-09-16 Thread Tom Lane
Bruce Momjian  writes:
> On Sun, Sep 16, 2012 at 12:38:37PM +0800, Rural Hunter wrote:
>> I ran the pg_upgrade with the patch and found the problematic object
>> is a toast object.

> OK, this is exactly what I wanted to see, and it explains why pg_dump
> didn't show it.  Can you find out what table references this toast
> table?  Try this query on the old cluster:

>   select oid, * from pg_class WHERE reltoastrelid = 16439148;

> I believe it will have an oid of 16439145, or it might not exist.

Most likely what's happened is that the table has a toast table that
it doesn't need, as a result of having dropped the only wide column(s)
in it.  So when the table is recreated in the new cluster, there's no
toast table for it.

So what you need to do is get rid of that check, or relax it so that it
doesn't insist on toast tables matching up exactly.  It seems possible
that there could be discrepancies in the other direction too, ie,
new cluster created a toast table when old cluster didn't have one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about SSI, subxacts, and aborted read-only xacts

2012-09-16 Thread Kevin Grittner
Jeff Davis  wrote:
On Mon, 2012-09-10 at 11:15 -0500, Kevin Grittner wrote:
 
>> Do you have any suggested wording [...] ?
 
> Attached. I thought about putting it as a "note", but it seems like
> it's easy to go overboard with those.
 
I agree about a note being overkill for this.
 
I'm attaching an alternative proposal, with changes for the following
reasons:
 
(1)  The complete re-wrap of that first paragraph made it really hard
to see what the actual change to the documentation was.  I would
rather change it like this and have a separate patch to re-wrap the
paragraph (with no content change) or maybe restrict the reformatting
to two or three lines.
 
(2)  The second paragraph starts with "There may still be
serialization anomalies involving aborted transactions" which seems
a bit alarming, seems to bend the definition of serialization
anomalies and seems odd to pick out for special attention when the
same could be said of data read in transactions at other isolation
levels if those transactions roll back from a deferred constraint or
a write conflict.
 
(3)  There is a significant exception to this caveat which I felt
would be useful to people who wanted to generate big reports without
waiting for transaction commit: deferrable read-only transactions
offer applications a way to count on data as soon as it is read.
 
I'm not sure whether the omission of this from the docs should be
considered a big enough hazard to merit a back-patch, or if it should
just be committed to HEAD.
 
-Kevin




ssi_doc-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Patch : Change pg_ident.conf parsing to be the same as pg_hba.conf

2012-09-16 Thread Jeff Janes
On Wed, Jul 4, 2012 at 7:38 AM, Amit Kapila  wrote:
> From: Robert Haas [mailto:robertmh...@gmail.com]
> Sent: Tuesday, July 03, 2012 9:43 PM
> On Mon, Jul 2, 2012 at 8:08 AM, Amit Kapila  wrote:
>>> Suggestions?
>
>> I suggest you add this to the next CommitFest.  :-)
>
>> https://commitfest.postgresql.org/action/commitfest_view?id=14
>
>> Meanwhile, we have this CommitFest to get finished with...
>
> I shall add it to next CF.

I put my self down as a reviewer for this a while ago, but I never got
beyond the stage of verifying that it applies and compiles, and that
it does what it says it does, at least in the broad sense, and that
what it does is desirable (I think it is).

I haven't done a detailed code review, or tested for corner cases.
Unfortunately I don't think I'll be able to do either of those during
this commitfest, or at least not until toward the end of it.  So I
took myself off as reviewer so the commit-fest manager can assign
someone else.  Sorry about that.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] _FORTIFY_SOURCE by default?

2012-09-16 Thread Tom Lane
Peter Eisentraut  writes:
> On Sun, 2012-09-16 at 00:41 -0400, Tom Lane wrote:
>> Doesn't seem like a good idea to me to add platform-specific options
>> with unspecified effects to platform-independent upstream sources.

> It's effectively a warning option, and we end up fixing all the warnings
> anyway, so I don't see the point of deferring that effort.  We could
> rephrase this request as, how about adding this new warning option, it's
> occasionally useful -- which we frequently do.

> We add platform-specific warning and optimization options in many
> places, and I don't think this is much different.

Maybe we're talking past each other.  What I thought you meant was
adding this #define unconditionally, without any awareness of what it
might do on particular platforms.  If you are thinking of adding it
only on platforms where it is considered standard, I can live with that.

Another point to consider here is that (at least on Red Hat) I believe
this enables address-space randomization; which is something I very much
do not want to happen in debug builds.

Bottom line is if you just want a new configure option for this, I don't
particularly object ... but on the other hand I also don't see the
usefulness.  It's already possible for packagers who want the flag
to inject it, and for buildfarm owners who want to test the case to do
so.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] _FORTIFY_SOURCE by default?

2012-09-16 Thread Peter Eisentraut
On Sun, 2012-09-16 at 10:36 -0400, Andrew Dunstan wrote:
> Might be worth having a buildfarm animal or two building with it, say by 
> setting CFLAGS before configure?

I don't really see the value in that.  Either this is part of our
standard set of warnings and checks that we are interested in, and then
everyone should see it, or we don't care about this, and then we should
ignore the issue altogether.  Creating different diagnostics sets for
different people and different circumstances without clear purpose one
way or the other just creates friction.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] _FORTIFY_SOURCE by default?

2012-09-16 Thread Peter Eisentraut
On Sun, 2012-09-16 at 00:41 -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > _FORTIFY_SOURCE=2 appears to be the default for package building on many
> > Linux distributions now, as part of harding or security options.  But we
> > often hear about problems related to this only when we hand the source
> > over to the packagers.  So I think we might as well add this to our
> > standard compilation options, for example in src/include/port/linux.h.
> > What do you think?
> 
> Doesn't seem like a good idea to me to add platform-specific options
> with unspecified effects to platform-independent upstream sources.

It's effectively a warning option, and we end up fixing all the warnings
anyway, so I don't see the point of deferring that effort.  We could
rephrase this request as, how about adding this new warning option, it's
occasionally useful -- which we frequently do.

We add platform-specific warning and optimization options in many
places, and I don't think this is much different.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] embedded list v2

2012-09-16 Thread Andres Freund
Hi,

On Sunday, September 16, 2012 04:23:14 PM Andres Freund wrote:
> What do you think about something like:
> 
> typedef struct dlist_iter
> {
>   /*
>* Use a union with equivalent storage as dlist_node to make it possible
> to * initialize the struct inside a macro without multiple evaluation. */
>   union {
>   struct {
>   dlist_node *cur;
>   dlist_node *end;
>   };
>   dlist_node init;
>   };
> } dlist_iter;
> 
> typedef struct dlist_mutable_iter
> {
>   union {
>   struct {
>   dlist_node *cur;
>   dlist_node *end;
>   };
>   dlist_node init;
>   };
>   dlist_node *next;
> } dlist_mutable_iter;
> 
> #define dlist_iter_foreach(iter, ptr) 
>  \
>   for (iter.init = (ptr)->head; iter.cur != iter.end; 
>  \
>iter.cur = iter.cur->next)
> 
> #define dlist_iter_foreach_modify(iter, ptr)  
>  
\
>   for (iter.init = (ptr)->head, iter.next = iter.cur->next;   
>  \
>iter.cur != iter.end   
>  \
>iter.cur = iter.next, iter.next = iter.cur->next)
> 
> With that and some trivial changes *all* multiple evaluation possibilities
> are gone.
> 
> (_iter_ in there would go, thats just so I can have both in the same file
> for now).

I am thinking whether a macro like:

#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
#define assert_compatible_types(a, b) _Static_assert(\
__builtin_types_compatible_p(a, __typeof__ (b) ), \
"variable `" #b "` is not compatible to type `" #a "`" )
#else
#define assert_compatible_types(a, b) (void)0
#endif

used like:

#define dlist_iter_foreach(iter, ptr)   
 \
assert_compatible_types(dlist_iter, iter);  
 \
for (iter.init = (ptr)->head; iter.cur != iter.end; 
 \
 iter.cur = iter.cur->next)

would be useful.

If you use the wrong type you get an error like:

error: static assertion failed: "variable `iter` is not compatible to type 
`dlist_iter`"

Do people think this is something worthwile for some of the macros in pg? At 
times the compiler errors that get generated in larger macros can be a bit 
confusing and something like that would make it easier to see the originating 
error.

I found __builtin_types_compatible while perusing the gcc docs to find whether 
there is something like __builtin_constant_p for checking the pureness of an 
expression ;)

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

2012-09-16 Thread Bruce Momjian
On Sun, Sep 16, 2012 at 12:38:37PM +0800, Rural Hunter wrote:
> >OK, I see many new ALTER TABLE commands, but nothing that would cause a
> >difference in relation count.
> >
> >Attached is a patch that will return the OID of the old/new mismatched
> >entries.  Please research the pg_class objects on the old/new clusters
> >that have the mismatch and let me know.  It might be something that
> >isn't in the old cluster, or not in the new cluster.
> >
> I ran the pg_upgrade with the patch and found the problematic object
> is a toast object.
> Copying user relation files
> /raid/pgsql/base/6087920/6088238
> Mismatch of relation OID in database "forummon": old OID 16439148,
> new OID 16439322
> 
> In old cluster:
> # select * from pg_class WHERE oid=16439148;
> relname | relnamespace | reltype | reloftype | relowner | relam |
> relfilenode | reltablespace | relpages | reltuples | reltoastrelid |
> reltoastidxid | relhasindex | relisshared | relpersistence | relkind
> | relnatts | relchecks | relhasoids | relhaspkey | relhasrules |
> relhastriggers | relhassubclass | relfrozenxid | relacl | reloptions
> ---+--+--+---+--+---+-+---+--+---+---+---+-+-++-+--+---+++-+++--++
> pg_toast_16439145 | 99 | 16439149 | 0 | 10 | 0 | 16439148 | 0 | 0 |
> 0 | 0 | 16439150 | t | f | p | t | 3 | 0 | f | t | f | f | f |
> 630449585 | |
> (1 row)
> 
> But it doesn't exist in new cluster:
> select * from pg_class WHERE oid=16439148;
> relname | relnamespace | reltype | reloftype | relowner | relam |
> relfilenode | reltablespace | relpages | reltuples | relallvisible |
> reltoastrelid | reltoastidxid | relhasindex | relisshared |
> relpersistence | relkind | relnatts | relchecks | relhasoids |
> relhaspkey | relhasrules | relhastriggers | relhassubclass |
> relfrozenxid | relacl | reloptions
> -+--+-+---+--+---+-+---+--+---+---+---+---+-+-++-+--+---+++-+++--++
> (0 rows)

[ Thread moved to hackers list.]

OK, this is exactly what I wanted to see, and it explains why pg_dump
didn't show it.  Can you find out what table references this toast
table?  Try this query on the old cluster:

select oid, * from pg_class WHERE reltoastrelid = 16439148;

I believe it will have an oid of 16439145, or it might not exist.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible fix for occasional failures on castoroides etc

2012-09-16 Thread Andrew Dunstan


On 09/16/2012 12:04 PM, Tom Lane wrote:

It's annoying that the buildfarm animals running on older versions of
Solaris randomly fail with "Connection refused" errors, such as in
today's example:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=castoroides&dt=2012-09-15%2015%3A42%3A52

I believe what's probably happening there is that the kernel has a small
hard-wired limit on the length of the postmaster's accept queue, and you
get this failure if too many connection attempts arrive faster than the
postmaster can service them.  If that theory is correct, we could
probably prevent these failures by reducing the number of tests run in
parallel, which could be done by adding say
MAX_CONNECTIONS=5
to the environment in which the regression tests run.  I'm not sure
though if that's "build_env" or some other setting for the buildfarm
script --- Andrew?





Yes, in the build_env section of the config file.

It's in the distributed sample config file, commented out.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Possible fix for occasional failures on castoroides etc

2012-09-16 Thread Tom Lane
It's annoying that the buildfarm animals running on older versions of
Solaris randomly fail with "Connection refused" errors, such as in
today's example:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=castoroides&dt=2012-09-15%2015%3A42%3A52

I believe what's probably happening there is that the kernel has a small
hard-wired limit on the length of the postmaster's accept queue, and you
get this failure if too many connection attempts arrive faster than the
postmaster can service them.  If that theory is correct, we could
probably prevent these failures by reducing the number of tests run in
parallel, which could be done by adding say
MAX_CONNECTIONS=5
to the environment in which the regression tests run.  I'm not sure
though if that's "build_env" or some other setting for the buildfarm
script --- Andrew?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to include c.h

2012-09-16 Thread Tom Lane
Gurjeet Singh  writes:
> I noticed that xlog.h uses PGDLLIMPORT, but it does not include c.h
> directly or indirectly.

In general, all include files in Postgres assume that you've included
postgres.h or postgres_fe.h (as appropriate) first; and practically
all of them have far more dependencies on that than you mention here.
If we were to decorate them with explicit inclusions such as you propose,
we would accomplish little except to bloat the sources and slow down
compilation.

The general rule about that is "thou shalt have no other gods before
c.h" --- that is, it is *necessary* that c.h be included before anything
else, in *every* Postgres file.  Otherwise you can run into
platform-specific problems.  An example I remember is individual files
having different ideas of whether 64-bit or 32-bit filesystem APIs are
in use, as a consequence of  being read before pg_config_os.h
has defined symbols controlling that.  Needless to say, this doesn't
work well at runtime.  You can find actual examples of that sort of
thing in the archives from years ago, before we began to enforce the
rule rigidly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] _FORTIFY_SOURCE by default?

2012-09-16 Thread Tom Lane
Andrew Dunstan  writes:
> On 09/16/2012 12:41 AM, Tom Lane wrote:
>> Doesn't seem like a good idea to me to add platform-specific options
>> with unspecified effects to platform-independent upstream sources.

> Might be worth having a buildfarm animal or two building with it, say by 
> setting CFLAGS before configure?

Possibly, but only if we actually pay attention to build warnings on
those animals.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] _FORTIFY_SOURCE by default?

2012-09-16 Thread Andrew Dunstan


On 09/16/2012 12:41 AM, Tom Lane wrote:

Peter Eisentraut  writes:

_FORTIFY_SOURCE=2 appears to be the default for package building on many
Linux distributions now, as part of harding or security options.  But we
often hear about problems related to this only when we hand the source
over to the packagers.  So I think we might as well add this to our
standard compilation options, for example in src/include/port/linux.h.
What do you think?

Doesn't seem like a good idea to me to add platform-specific options
with unspecified effects to platform-independent upstream sources.

To the extent that this option finds anything useful (which in my
experience is a negligibly small percentage anyway), it's the
responsibility of the packagers (including me) to report it.





Might be worth having a buildfarm animal or two building with it, say by 
setting CFLAGS before configure?


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] embedded list v2

2012-09-16 Thread Andres Freund
On Saturday, September 15, 2012 07:21:44 PM Tom Lane wrote:
> Andres Freund  writes:
> > On Saturday, September 15, 2012 07:32:45 AM Tom Lane wrote:
> >> Well, actually, that just brings us to the main point which is: I do not
> >> believe that circular links are a good design choice here.
> > 
> > I think I have talked about the reasoning on the list before, but here it
> > goes: The cases where I primarily used them are cases where the list
> > *manipulation* is a considerable part of the expense. In those situations
> > having less branches is beneficial and, to my knowledge, cannot be done
> > in normal flat lists.
> > For the initial user of those lists - the slab allocator for postgres
> > which I intend to finish at some point - the move to circular lists
> > instead of classical lists was an improvement in the 12-15% range.
> 
> Let me make my position clear: I will not accept performance as the sole
> figure of merit for this datatype.  It also has to be easy and reliable
> to use, and the current design is a failure on those dimensions.
> This question of ease and reliability of use isn't just academic, since
> you guys had trouble converting catcache.c without introducing bugs.
> That isn't exactly a positive showing for this design.
Uhm. I had introduced a bug there, true (Maybe Alvaro as well, I can't remember 
anything right now). But it was something like getting the tail instead of the 
head element due to copy paste. Nothing will be able to protect the code from 
me.

> Furthermore, one datapoint for one use-case (probably measured on only
> one CPU architecture) isn't even a very convincing case for the
> performance being better.  To take a handy example, if we were to
> convert dynahash to use dlists, having to initialize each hash bucket
> header this way would probably be a pretty substantial cost for a lot
> of hashtable use-cases.  We have a lot of short-lived dynahash tables.

What do you think about doing something like:

#define DLIST_INIT(name) {{&name.head, &name.head}}
static dlist_head DatabaseList = DLIST_INIT(DatabaseList);

That works, but obviously the initialization will have to be performed at some 
point.

> This also ties into the other problem, since it's hard to code such
> loop control as a macro without multiple evaluation of the list-head
> argument.  To me that's a show stopper of the first order.  I'm not
> going to accept a replacement for foreach() that introduces bug hazards
> that aren't in foreach().
What do you think about something like:

typedef struct dlist_iter
{
/*
 * Use a union with equivalent storage as dlist_node to make it 
possible to
 * initialize the struct inside a macro without multiple evaluation.
 */
union {
struct {
dlist_node *cur;
dlist_node *end;
};
dlist_node init;
};
} dlist_iter;

typedef struct dlist_mutable_iter
{
union {
struct {
dlist_node *cur;
dlist_node *end;
};
dlist_node init;
};
dlist_node *next;
} dlist_mutable_iter;

#define dlist_iter_foreach(iter, ptr)   
 \
for (iter.init = (ptr)->head; iter.cur != iter.end; 
 \
 iter.cur = iter.cur->next)

#define dlist_iter_foreach_modify(iter, ptr)
 \
for (iter.init = (ptr)->head, iter.next = iter.cur->next;   
 \
 iter.cur != iter.end   
 \
 iter.cur = iter.next, iter.next = iter.cur->next)

With that and some trivial changes *all* multiple evaluation possibilities are 
gone.

(_iter_ in there would go, thats just so I can have both in the same file for 
now).

> There are certainly some multiple-evaluation macros, but I don't think
> they are associated with core data types.  You will not find any in
> pg_list.h for instance.  I think it's important that these new forms
> of list be as easy and reliable to use as List is.  I'm willing to trade
> off some micro-performance to have that.
Just from what I had in open emacs frames without switching buffers when I read 
that email:
Min/Max in c.h and about half of the macros in htup.h (I had the 9.1 tree open 
at that point)...

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch to include c.h

2012-09-16 Thread Gurjeet Singh
I noticed that xlog.h uses PGDLLIMPORT, but it does not include c.h
directly or indirectly. Also, in timestamp.h different code is enabled
depending on HAVE_INT64_TMESTAMP being defined, but even though that macro
is defined in pg_config.h, it does not automatically trickle down into this
file, as it is supposed to.

Apparently all .c files manage to include c.h (directly or indirectly)
before including these files, so the compilers do the right thing. But I
caught these since I tried using an IDE, and it grays out sections to, or
shows error-markers when it doesn't find a macro included directly or
indirectly.

I vaguely remember a discussion where (I think) Bruce mentioned that we do
or intend to do some kind of compilability check on individual h files. DO
we have something in place to catch such things? I am sure there are many
other places where such inclusions are omitted, but these are the ones I
found on my first attempts to use this IDE.

Best regards,
-- 
Gurjeet Singh

http://gurjeet.singh.im/


include_c.h.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

2012-09-16 Thread Robert Haas
On Sep 15, 2012, at 11:32 PM, Tom Lane  wrote:
> Right, but we do a shutdown checkpoint at the end of crash recovery.

Yes, but that only writes the buffers that are dirty. It doesn't fix the lack 
of a BM_PERMANENT flag on a buffer that ought to have had one. So that page can 
now get modified AGAIN, after recovery, and not checkpointed.

> I could believe that this case gets missed, but it's not clear from
> what's been said --- and if that case *is* broken, there should have
> been a whole lot more corruption reports recently than what we've seen.

It's pretty clear from Jeff's example at the top of the thread, which involves 
two crashes and no archive recovery.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers