Re: [HACKERS] casting between range types

2011-08-30 Thread Heikki Linnakangas

On 31.08.2011 09:14, Jeff Davis wrote:

At one point, the question of casting between range types came up. At
first, this seemed like a fairly reasonable suggestion, but now I don't
think I like the semantics.

A normal cast changes between essentially equivalent values in different
domains. For instance 3 as an int4 is equivalent to 3.0 as a numeric.

However, if we take the simple approach with range types and cast the
bounds, we end up with some weird situations.

First, a range is really a set. So if we take '[1,10)'::int4range and
cast that to numrange, we end up moving from a set of exactly 9 elements
to a set of an infinite number of elements. Going the other way is
probably worse.

Sometimes casts are a bit "lossy" and I suppose we could write that off.

But things get weirder when the total order is different (e.g. different
text collations). Then you end up with a completely different set of
values, which doesn't sound like a cast to me at all.

So, I'm leaning toward just not providing any casts from one range type
to another.


Can you only provide casts that make sense, like between int4 and 
numeric range types, and leave out the ones that don't?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] casting between range types

2011-08-30 Thread Jeff Davis
At one point, the question of casting between range types came up. At
first, this seemed like a fairly reasonable suggestion, but now I don't
think I like the semantics.

A normal cast changes between essentially equivalent values in different
domains. For instance 3 as an int4 is equivalent to 3.0 as a numeric.

However, if we take the simple approach with range types and cast the
bounds, we end up with some weird situations.

First, a range is really a set. So if we take '[1,10)'::int4range and
cast that to numrange, we end up moving from a set of exactly 9 elements
to a set of an infinite number of elements. Going the other way is
probably worse.

Sometimes casts are a bit "lossy" and I suppose we could write that off.

But things get weirder when the total order is different (e.g. different
text collations). Then you end up with a completely different set of
values, which doesn't sound like a cast to me at all.

So, I'm leaning toward just not providing any casts from one range type
to another.

Thoughts?

Regards
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Comparing two PostgreSQL databases -- order of pg_dump output

2011-08-30 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


> Note that what I'm looking for is something to compare just about 
> EVERYTHING DDL under the PostgreSQL sun: tables, types, functions, 
> operators, etc. The description of same_schema appears to imply only a 
> subset of objects are compared (in fact, looking at the code, I can 
> confirm that limitation).

You should try the latest version in git (which will soon be released 
as 2.18.0). The same_schema check has been overhauled, and now can also 
store a copy of a databases state to allow checking the same database 
over time to see what has changed. It doesn't check *everything* yet, 
but the only things missing are some of the more obscure items such 
as custom conversions. It should be pretty easy to add in anything 
that is not already covered, even for someone not versed in Perl.

> BTW, I tried installing check_postgres, but not being much into Perl and 
> not knowing what dependencies it has, "make test" failed 38/42 tests.

That's not much to worry about. It's a pretty straightforward script, 
in that it is very easy to determine if it is working for you or not, 
even if some of the tests fail. :)

>> I'm not exactly sure how it does it

check_postgres queries the system catalogs, normalizes some things based 
on the version, and creates a Perl object representation of the database. 
It then compares that to the same thing from a different database/server, 
or to a frozen version of an earlier scan.

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201108302203
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAk5dl28ACgkQvJuQZxSWSsidhwCeMGEx8eVeaPlyRALuh8VuQ+rN
ynYAoLDGLOFNVbj3+NnRvZpLfgmh6Mgu
=w1eI
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] compile from git repository

2011-08-30 Thread Andrew Dunstan


On 08/30/2011 09:50 PM, 权宗亮 wrote:
> Thank you everyone.
>
> Now, there is still one error:
>
> Build started: Project: postgres, Configuration: Release|Win32
> Running flex on src\backend\parser\scan.l
> VCBUILD : cygwin warning :
> MS-DOS style path detected: src\backend\parser\scan.l
> Preferred POSIX equivalent is: src/backend/parser/scan.l
> CYGWIN environment variable option "nodosfilewarning" turns off this
> warning.
> Consult the user's guide for more details about POSIX paths:
> http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
> flex: fatal internal error, exec failed
> Project : error PRJ0002: Error result 3328 returned from
> 'C:\Windows\system32\cmd.exe'.
> postgres - 1 error(s), 0 warning(s)
> The command exited with code 1.
>
>

You are probably not setting the M4 environment variable correctly. The
instructions state:

To run this version of flex, you will need to set the M4 environment 
variable
to point to your installation of m4. You need one of these in order to run
both bison and flex, and it sometimes comes packaged with bison.

If you are using this in a buildfarm member, you can set that environment
variable in its config file, in the build_env stanza, something like:

M4 => 'c:\path\to\bin\m4.exe',

cheers

andrew


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] compile from git repository

2011-08-30 Thread 权宗亮
Thank you everyone.

Now, there is still one error:

  Build started: Project: postgres, Configuration: Release|Win32
  Running flex on src\backend\parser\scan.l
VCBUILD : cygwin warning :
MS-DOS style path detected: src\backend\parser\scan.l
Preferred POSIX equivalent is: src/backend/parser/scan.l
CYGWIN environment variable option "nodosfilewarning" turns off this
warning.
Consult the user's guide for more details about POSIX paths:
  http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
  flex: fatal internal error, exec failed
Project : error PRJ0002: Error result 3328 returned from
'C:\Windows\system32\cmd.exe'.
  postgres - 1 error(s), 0 warning(s)
  The command exited with code 1.

2011/8/31 Andrew Dunstan 

>
>
> On 08/30/2011 10:43 AM, Magnus Hagander wrote:
>
>> On Tue, Aug 30, 2011 at 16:30, Tom Lane  wrote:
>>
>>> Andrew Dunstan  writes:
>>>
 On 08/30/2011 07:11 AM, 权宗亮 wrote:

> I try to compile from git repository and get some errors.
>
> Windows 7 x86
> VC Express 2008
> perl v5.8.9
> flex 2.5.4a-1
> bison 2.4.1
>
 Your version of flex is too old.

>>> Hmm ... had the OP been using configure, he'd have been told that by the
>>> build scripts.  Should the MSVC build scripts be improved to check the
>>> versions of bison and flex?
>>>
>> We do it for bison already. It probably shouldn't be too hard to do
>> the same for flex.
>>
>>
> Done.
>
> cheers
>
> andrew
>



-- 
权 宗亮(Quan Zongliang)
quanzongli...@gmail.com


Re: [HACKERS] spinlocks on HP-UX

2011-08-30 Thread Tom Lane
I wrote:
> No I/O anywhere.  I'm thinking the reported idle time must correspond to
> spinlock delays that are long enough to reach the select() calls in
> s_lock.  If so, 38% is depressingly high, but it's not out of line with
> what we've seen in the past in tests designed to provoke spinlock
> contention.

I tried increasing MAX_SPINS_PER_DELAY from 1000 to 1.  (Again, this
is with the unlocked test added to TAS_SPIN.)  This resulted in a very
significant drop in the reported idle-time percentage, down to 10% or so
at full load; but unfortunately the TPS numbers got worse for the higher
end of the curve:

pgbench -c 1 -j 1 -S -T 300 bench   tps = 4526.914824 (including ...
pgbench -c 2 -j 1 -S -T 300 bench   tps = 8183.815526 (including ...
pgbench -c 8 -j 4 -S -T 300 bench   tps = 34637.075173 (including ...
pgbench -c 16 -j 8 -S -T 300 bench  tps = 68792.550304 (including ...
pgbench -c 32 -j 16 -S -T 300 bench tps = 159195.038317 (including ...
pgbench -c 64 -j 32 -S -T 300 bench tps = 220544.912947 (including ...
pgbench -c 96 -j 48 -S -T 300 bench tps = 147367.793544 (including ...
pgbench -c 128 -j 64 -S -T 300 benchtps = 79187.042252 (including ...
pgbench -c 160 -j 80 -S -T 300 benchtps = 43957.912879 (including ...

So that confirms the idea that the reported idle time corresponds to
s_lock select() sleeps.  Unfortunately, it doesn't appear to lead to
anything that would result in increasing performance.  I suppose the
reason that performance gets worse, even though we've presumably
eliminated some process context swaps, is that we have more cache line
contention for whichever spinlock(s) they're all fighting over.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Comparing two PostgreSQL databases -- order of pg_dump output

2011-08-30 Thread Joe Abbate

Hi Stephen,

On 08/30/2011 07:11 PM, Stephen Frost wrote:

* Joe Abbate (j...@freedomcircle.com) wrote:

In order to compare the schema of two presumably identical
databases, I've been diffing the output of pg_dump -Osx.


I'm not sure exactly how it does it, but check_postgres.pl offers this.

http://bucardo.org/wiki/Check_postgres

It also offers a whole slew of other useful things to monitor.


Note that what I'm looking for is something to compare just about 
EVERYTHING DDL under the PostgreSQL sun: tables, types, functions, 
operators, etc. The description of same_schema appears to imply only a 
subset of objects are compared (in fact, looking at the code, I can 
confirm that limitation).


BTW, I tried installing check_postgres, but not being much into Perl and 
not knowing what dependencies it has, "make test" failed 38/42 tests.


Joe

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on HP-UX

2011-08-30 Thread Robert Haas
On Tue, Aug 30, 2011 at 7:21 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Aug 30, 2011 at 6:33 PM, Tom Lane  wrote:
>>> I ran it up to "pgbench -c 200 -j 200 -S -T 300 bench" and still see
>>> vmstat numbers around 50% user time, 12% system time, 38% idle.
>>> So no lseek problem here, boss. Kernel calls itself 2.6.32-192.el6.x86_64.
>
>> Eh, wait a minute.  38% idle time?  Did you use a scale factor that
>> doesn't fit in shared_buffers?
>
> Nope: -s 100, 8GB shared_buffers, same as all the other tests.
>
> Typical strace of one backend looks like
>
> recvfrom(9, "Q\0\0\0?SELECT abalance FROM pgbenc"..., 8192, 0, NULL, NULL) = 
> 64
> lseek(10, 0, SEEK_END)                  = 269213696
> lseek(11, 0, SEEK_END)                  = 224641024
> sendto(9, 
> "T\0\0\0!\0\1abalance\0\0\0\241\267\0\3\0\0\0\27\0\4\377\377\377\377"..., 66, 
> 0, NULL, 0) = 66
> recvfrom(9, "Q\0\0\0?SELECT abalance FROM pgbenc"..., 8192, 0, NULL, NULL) = 
> 64
> lseek(10, 0, SEEK_END)                  = 269213696
> lseek(11, 0, SEEK_END)                  = 224641024
> sendto(9, 
> "T\0\0\0!\0\1abalance\0\0\0\241\267\0\3\0\0\0\27\0\4\377\377\377\377"..., 66, 
> 0, NULL, 0) = 66
> recvfrom(9, "Q\0\0\0?SELECT abalance FROM pgbenc"..., 8192, 0, NULL, NULL) = 
> 64
> select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
> lseek(10, 0, SEEK_END)                  = 269213696
> lseek(11, 0, SEEK_END)                  = 224641024
> select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
> select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
> sendto(9, 
> "T\0\0\0!\0\1abalance\0\0\0\241\267\0\3\0\0\0\27\0\4\377\377\377\377"..., 66, 
> 0, NULL, 0) = 66
> recvfrom(9, "Q\0\0\0?SELECT abalance FROM pgbenc"..., 8192, 0, NULL, NULL) = 
> 64
> lseek(10, 0, SEEK_END)                  = 269213696
> lseek(11, 0, SEEK_END)                  = 224641024
> select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
> sendto(9, 
> "T\0\0\0!\0\1abalance\0\0\0\241\267\0\3\0\0\0\27\0\4\377\377\377\377"..., 66, 
> 0, NULL, 0) = 66
>
> No I/O anywhere.  I'm thinking the reported idle time must correspond to
> spinlock delays that are long enough to reach the select() calls in
> s_lock.  If so, 38% is depressingly high, but it's not out of line with
> what we've seen in the past in tests designed to provoke spinlock
> contention.
>
> (BTW, this is with the unlocked test added to TAS_SPIN.)

Well, that is mighty interesting.  That strace looks familiar, but I
have never seen a case where the idle time was more than a few
percentage points on this test (well, assuming you're using 9.2
sources, anyway).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on HP-UX

2011-08-30 Thread Tom Lane
Robert Haas  writes:
> On Tue, Aug 30, 2011 at 6:33 PM, Tom Lane  wrote:
>> I ran it up to "pgbench -c 200 -j 200 -S -T 300 bench" and still see
>> vmstat numbers around 50% user time, 12% system time, 38% idle.
>> So no lseek problem here, boss. Kernel calls itself 2.6.32-192.el6.x86_64.

> Eh, wait a minute.  38% idle time?  Did you use a scale factor that
> doesn't fit in shared_buffers?

Nope: -s 100, 8GB shared_buffers, same as all the other tests.

Typical strace of one backend looks like

recvfrom(9, "Q\0\0\0?SELECT abalance FROM pgbenc"..., 8192, 0, NULL, NULL) = 64
lseek(10, 0, SEEK_END)  = 269213696
lseek(11, 0, SEEK_END)  = 224641024
sendto(9, 
"T\0\0\0!\0\1abalance\0\0\0\241\267\0\3\0\0\0\27\0\4\377\377\377\377"..., 66, 
0, NULL, 0) = 66
recvfrom(9, "Q\0\0\0?SELECT abalance FROM pgbenc"..., 8192, 0, NULL, NULL) = 64
lseek(10, 0, SEEK_END)  = 269213696
lseek(11, 0, SEEK_END)  = 224641024
sendto(9, 
"T\0\0\0!\0\1abalance\0\0\0\241\267\0\3\0\0\0\27\0\4\377\377\377\377"..., 66, 
0, NULL, 0) = 66
recvfrom(9, "Q\0\0\0?SELECT abalance FROM pgbenc"..., 8192, 0, NULL, NULL) = 64
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
lseek(10, 0, SEEK_END)  = 269213696
lseek(11, 0, SEEK_END)  = 224641024
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
sendto(9, 
"T\0\0\0!\0\1abalance\0\0\0\241\267\0\3\0\0\0\27\0\4\377\377\377\377"..., 66, 
0, NULL, 0) = 66
recvfrom(9, "Q\0\0\0?SELECT abalance FROM pgbenc"..., 8192, 0, NULL, NULL) = 64
lseek(10, 0, SEEK_END)  = 269213696
lseek(11, 0, SEEK_END)  = 224641024
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
sendto(9, 
"T\0\0\0!\0\1abalance\0\0\0\241\267\0\3\0\0\0\27\0\4\377\377\377\377"..., 66, 
0, NULL, 0) = 66

No I/O anywhere.  I'm thinking the reported idle time must correspond to
spinlock delays that are long enough to reach the select() calls in
s_lock.  If so, 38% is depressingly high, but it's not out of line with
what we've seen in the past in tests designed to provoke spinlock
contention.

(BTW, this is with the unlocked test added to TAS_SPIN.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Comparing two PostgreSQL databases -- order of pg_dump output

2011-08-30 Thread Stephen Frost
* Joe Abbate (j...@freedomcircle.com) wrote:
> In order to compare the schema of two presumably identical
> databases, I've been diffing the output of pg_dump -Osx.  

I'm not sure exactly how it does it, but check_postgres.pl offers this.

http://bucardo.org/wiki/Check_postgres

It also offers a whole slew of other useful things to monitor.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Comparing two PostgreSQL databases -- order of pg_dump output

2011-08-30 Thread Joe Abbate

On 08/30/2011 06:07 PM, Tom Lane wrote:

Yeah, we've been around on that before.  pg_dump does actually sort the
output items (modulo dependency requirements), but it sorts by the same
"tag" values that are printed by pg_restore -l, and those aren't currently
designed to be unique.  It's not too clear if we could get away with
changing the definitions of the tag strings.


The approach suggested by Jaime works fairly well.  The only change I 
would make is to add OPERATOR args to the pg_restore -l output, e.g.,


1843; 2617 699799 OPERATOR public <(geometry, geometry) jma
1861; 2617 700565 OPERATOR public <(geography, geography) jma

Joe

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on HP-UX

2011-08-30 Thread Robert Haas
On Tue, Aug 30, 2011 at 6:33 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Aug 30, 2011 at 4:37 PM, Tom Lane  wrote:
 If this is on Linux, I am surprised
 that you didn't get killed by the lseek() contention problem on a
 machine with that many cores.
>
>>> Hm ... now that you mention it, all of these tests have been using
>>> the latest-and-greatest unreleased RHEL kernels.
>
>> It should be pretty easy to figure it out, though.   Just fire up
>> pgbench with lots of clients (say, 160) and run vmstat in another
>> window.  If the machine reports 10% system time, it's fixed.  If it
>> reports 90% system time, it's not.
>
> I ran it up to "pgbench -c 200 -j 200 -S -T 300 bench" and still see
> vmstat numbers around 50% user time, 12% system time, 38% idle.
> So no lseek problem here, boss.  Kernel calls itself 2.6.32-192.el6.x86_64.

Eh, wait a minute.  38% idle time?  Did you use a scale factor that
doesn't fit in shared_buffers?  If so you're probably testing how fast
you pass BufFreelistLock around...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on HP-UX

2011-08-30 Thread Tom Lane
Robert Haas  writes:
> On Tue, Aug 30, 2011 at 4:37 PM, Tom Lane  wrote:
>>> If this is on Linux, I am surprised
>>> that you didn't get killed by the lseek() contention problem on a
>>> machine with that many cores.

>> Hm ... now that you mention it, all of these tests have been using
>> the latest-and-greatest unreleased RHEL kernels.

> It should be pretty easy to figure it out, though.   Just fire up
> pgbench with lots of clients (say, 160) and run vmstat in another
> window.  If the machine reports 10% system time, it's fixed.  If it
> reports 90% system time, it's not.

I ran it up to "pgbench -c 200 -j 200 -S -T 300 bench" and still see
vmstat numbers around 50% user time, 12% system time, 38% idle.
So no lseek problem here, boss.  Kernel calls itself 2.6.32-192.el6.x86_64.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Comparing two PostgreSQL databases -- order of pg_dump output

2011-08-30 Thread Tom Lane
Joe Abbate  writes:
> In order to compare the schema of two presumably identical databases, 
> I've been diffing the output of pg_dump -Osx.  However, I've found that 
> the order of the output is not very reliable.

Yeah, we've been around on that before.  pg_dump does actually sort the
output items (modulo dependency requirements), but it sorts by the same
"tag" values that are printed by pg_restore -l, and those aren't currently
designed to be unique.  It's not too clear if we could get away with
changing the definitions of the tag strings.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Comparing two PostgreSQL databases -- order of pg_dump output

2011-08-30 Thread Joe Abbate

On 08/30/2011 05:33 PM, Jaime Casanova wrote:

On Tue, Aug 30, 2011 at 3:14 PM, Joe Abbate  wrote:

Hola Jaime,

On 08/30/2011 03:24 PM, Jaime Casanova wrote:


what about using pg_dump -Fc -Osx and use pg_restore -l to list
objects. then you can sort and compare objects and then a script that
compare schema of objects extracting them with -P, -T or -t


That appears to be of limited use (i.e., it would only work for functions,
triggers and tables).  pg_restore -L/--use_list is more comprehensive.
So the script would have to do something like the following:

$ pg_dump -Fc -Osx postgis>  postgis.dump
$ pg_restore -l postgis.dump | sort -k4>  postgis.list


why not "sort -k4,5"?


sort -k4 sorts from the fourth field, the object type, to the end of 
line.  -k4,5 would sort on the type and schema name.  I want to sort on 
object name/attributes as well.  BTW, I figured out why it doesn't fully 
work.  For functions, the arguments are listed, e.g.,


82; 1255 700618 FUNCTION public _st_covers(geography, geography) jma
459; 1255 700259 FUNCTION public _st_covers(geometry, geometry) jma

Unfortunately, for operators, the operand types are not included:

843; 2617 699799 OPERATOR public < jma
1861; 2617 700565 OPERATOR public < jma

so the pg_restore -L still keeps the original dump order (geometry 
before geography).


Joe

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Comparing two PostgreSQL databases -- order of pg_dump output

2011-08-30 Thread Jaime Casanova
On Tue, Aug 30, 2011 at 3:14 PM, Joe Abbate  wrote:
> Hola Jaime,
>
> On 08/30/2011 03:24 PM, Jaime Casanova wrote:
>>
>> what about using pg_dump -Fc -Osx and use pg_restore -l to list
>> objects. then you can sort and compare objects and then a script that
>> compare schema of objects extracting them with -P, -T or -t
>
> That appears to be of limited use (i.e., it would only work for functions,
> triggers and tables).  pg_restore -L/--use_list is more comprehensive.
> So the script would have to do something like the following:
>
> $ pg_dump -Fc -Osx postgis > postgis.dump
> $ pg_restore -l postgis.dump | sort -k4  > postgis.list

why not "sort -k4,5"?

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on HP-UX

2011-08-30 Thread Robert Haas
On Tue, Aug 30, 2011 at 4:37 PM, Tom Lane  wrote:
>> If this is on Linux, I am surprised
>> that you didn't get killed by the lseek() contention problem on a
>> machine with that many cores.
>
> Hm ... now that you mention it, all of these tests have been using
> the latest-and-greatest unreleased RHEL kernels.  Maybe Red Hat already
> fixed that contention problem in their kernel?  Have you got a RH
> bugzilla number for the issue?

No, I haven't had much luck filing bugs against Red Hat releases, so
I've sort of given up on that.  I did have some off-list
correspondence with a Red Hat engineer who red my blog post, though.

It should be pretty easy to figure it out, though.   Just fire up
pgbench with lots of clients (say, 160) and run vmstat in another
window.  If the machine reports 10% system time, it's fixed.  If it
reports 90% system time, it's not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade automatic testing

2011-08-30 Thread Tom Lane
Magnus Hagander  writes:
> On Tue, Aug 30, 2011 at 22:25, Tom Lane  wrote:
>> How can we improve things here?  I've toyed with the idea of installing
>> pg_regress.so so that we can refer to it relative to $libdir, but that
>> might be a bit invasive, especially if we were to try to back-patch it
>> as far as 8.3.

> Would turning pg_regress.so into an extension and using that way fix
> it? That won't help for the 9.0->9.1 stage, but it would for
> 9.1->9.2...

Not really excited about that.  The contrib regression tests already
exercise the extension functionality, so making pg_regress.so into
another one would just reduce the number of code paths being covered.

In any case, if we don't find a way to allow automated testing of
pg_upgrade from the pre-9.1 versions, we have not fixed the problem.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Fast GiST index build

2011-08-30 Thread Alexander Korotkov
On Tue, Aug 30, 2011 at 9:29 PM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:

> On 30.08.2011 13:29, Alexander Korotkov wrote:
>
>> On Tue, Aug 30, 2011 at 1:13 PM, Heikki Linnakangas<
>> heikki.linnakangas@**enterprisedb.com>
>>  wrote:
>>
>>  So, over 50% of the CPU time is spent in choosing a block from the
>>> temporary files. That should be pretty easy to improve..
>>>
>>>  Yes, probably we can just remove free blocks sorting.
>>
>
> Ok, the first results are in for that:
>
> testname  |   nrows   |duration | accesses
> ---+--**-+-+--**
>  points unordered buffered | 25000 | 06:00:23.707579 |  4049832
>
> From the previous test runs, the unbuffered index build took under 4 hours,
> so even though this is a lot better than with the sorting, it's still a loss
> compared to non-buffered build.
>
> I had vmstat running during most of this index build. At a quick glance, it
> doesn't seem to be CPU bound anymore. I suspect the buffers in the temporary
> file gets very fragmented. Or, we're reading it in backwards order because
> the buffers work in a LIFO fashion. The system seems to be doing about 5
> MB/s of I/O during the build, which sounds like a figure you'd get for more
> or less random I/O.


So, we still have two questions:
1) Why buffering build algorithm doesn't show any benefit on these tests?
2) Why test results on your test setup differs from test results on my test
setup?

I can propose following answers now:
1) I think it's because high overlaps in the tree. As I mentioned before
high overlaps can cause only fraction of the tree to be used for actual
inserts. For comparison, with my split algorithm (which produce almost no
overlaps on uniform dataset) buffering index build took 4 hours, while
regular build is still running (already more than 8 days = 192 hours)!
2) Probably it's because different behavour of OS cache. For example, on my
test setup OS displace unused pages from cache too slowly. Thereby buffering
algorithm showed benefit nevertheless.

Also it seems to me that I start to understand problem of new linear
splitting algorithm. On dataset with 1M rows it produces almost no overlaps
while it produces significant overlaps already on 10M rows (drama!).
Probably nobody tested it on large enough datasets (neither while original
research or before commit). I'll dig it in more details and provide some
testing results.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] spinlocks on HP-UX

2011-08-30 Thread Tom Lane
Robert Haas  writes:
> I am a bit surprised by your test results, because I also tried x86_64
> with an unlocked test, also on pgbench -S, and I am pretty sure I got
> a regression.  Maybe I'll try rerunning that.  It seems possible that
> the x86_64 results depend on the particular sub-architecture and/or
> whether HT is in use, which would be kind of a nuisance.

Well, if you consider Opteron as a sub-architecture of x86_64, that was
already true the last time we did this.  So far there have not been
cases where something really good for one implementation was really bad
for another, but someday we'll probably hit that.

> Also, did you happen to measure the amount of user time vs. system
> time that your test runs used?

Did not think about that.  I was considering how to measure the average
context swap rate over each run, so that we could keep an eye out for
the "context swap storm" behavior that's the usual visible-in-top
symptom of these sorts of problems.  But it'd have to be automated;
I'm not going to keep my eyes glued to "top" output for several hours.

I'd be happy to re-run these tests with any RHEL-compatible measurement
scaffolding somebody else provides, but if I have to write it, it
probably won't happen very soon.

> If this is on Linux, I am surprised
> that you didn't get killed by the lseek() contention problem on a
> machine with that many cores.

Hm ... now that you mention it, all of these tests have been using
the latest-and-greatest unreleased RHEL kernels.  Maybe Red Hat already
fixed that contention problem in their kernel?  Have you got a RH
bugzilla number for the issue?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade automatic testing

2011-08-30 Thread Magnus Hagander
On Tue, Aug 30, 2011 at 22:25, Tom Lane  wrote:
> Peter Eisentraut  writes:
>> +# contrib/pg_upgrade/test.sh
>> +#
>> +# Test driver for pg_upgrade.  Initializes a new database cluster,
>> +# runs the regression tests (to put in some data), runs pg_dumpall,
>> +# runs pg_upgrade, runs pg_dumpall again, compares the dumps.
>
> Hm .. my experience is that that doesn't work at all, because the
> regression tests set up assorted C functions whose implementations are
> in pg_regress.so, and it creates them with absolute path references
> to pg_regress.so.  When you try to load that into another installation
> that's a different version of PG, it quite properly fails.  So I think
> that as given, this script is only useful for testing pg_upgrade of
> $currentversion to $currentversion.  Which is surely better than no test
> at all, but it would not for example have caught the 8.3 incompatibility
> that was just reported.
>
> How can we improve things here?  I've toyed with the idea of installing
> pg_regress.so so that we can refer to it relative to $libdir, but that
> might be a bit invasive, especially if we were to try to back-patch it
> as far as 8.3.

Would turning pg_regress.so into an extension and using that way fix
it? That won't help for the 9.0->9.1 stage, but it would for
9.1->9.2...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade automatic testing

2011-08-30 Thread Tom Lane
Peter Eisentraut  writes:
> +# contrib/pg_upgrade/test.sh
> +#
> +# Test driver for pg_upgrade.  Initializes a new database cluster,
> +# runs the regression tests (to put in some data), runs pg_dumpall,
> +# runs pg_upgrade, runs pg_dumpall again, compares the dumps.

Hm .. my experience is that that doesn't work at all, because the
regression tests set up assorted C functions whose implementations are
in pg_regress.so, and it creates them with absolute path references
to pg_regress.so.  When you try to load that into another installation
that's a different version of PG, it quite properly fails.  So I think
that as given, this script is only useful for testing pg_upgrade of
$currentversion to $currentversion.  Which is surely better than no test
at all, but it would not for example have caught the 8.3 incompatibility
that was just reported.

How can we improve things here?  I've toyed with the idea of installing
pg_regress.so so that we can refer to it relative to $libdir, but that
might be a bit invasive, especially if we were to try to back-patch it
as far as 8.3.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on HP-UX

2011-08-30 Thread Robert Haas
On Tue, Aug 30, 2011 at 4:05 PM, Tom Lane  wrote:
> This suggests that (1) an unlocked test in TAS_SPIN might be a good idea
> on x86_64 after all, and (2) this test scenario may not be pushing the
> system hard enough to expose limitations of the spinlock implementation.
>
> I am now thinking that the reason we saw clear differences in spinlock
> implementations years ago, and now are not seeing them except on insane
> hardware, is mainly that we've managed to reduce contention at higher
> levels of the system.  That doesn't mean spinlocks have become
> uninteresting, just that "pgbench -S" isn't the ideal test case for
> stressing them.  I'm thinking maybe we need a test scenario that
> generates sinval traffic, for example, or forces snapshots to be taken
> more often.  Ideas anyone?

On current sources, with a workload that fits into shared_buffers,
pgbench -S hammers the spinlock protecting ProcArrayLock extremely
hard.   I'm sure it's possible to come up with a test case that
hammers them harder, but using a real workload can expose issues (like
aggregate memory bandwidth) that you might not see otherwise.

I am a bit surprised by your test results, because I also tried x86_64
with an unlocked test, also on pgbench -S, and I am pretty sure I got
a regression.  Maybe I'll try rerunning that.  It seems possible that
the x86_64 results depend on the particular sub-architecture and/or
whether HT is in use, which would be kind of a nuisance.

Also, did you happen to measure the amount of user time vs. system
time that your test runs used?  If this is on Linux, I am surprised
that you didn't get killed by the lseek() contention problem on a
machine with that many cores.  I found it to be visible at 32 and
crippling at 64, so I can't even imagine what it would be like at 160.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Comparing two PostgreSQL databases -- order of pg_dump output

2011-08-30 Thread Joe Abbate

Hola Jaime,

On 08/30/2011 03:24 PM, Jaime Casanova wrote:

what about using pg_dump -Fc -Osx and use pg_restore -l to list
objects. then you can sort and compare objects and then a script that
compare schema of objects extracting them with -P, -T or -t


That appears to be of limited use (i.e., it would only work for 
functions, triggers and tables).  pg_restore -L/--use_list is more 
comprehensive.	So the script would have to do something like the following:


$ pg_dump -Fc -Osx postgis > postgis.dump
$ pg_restore -l postgis.dump | sort -k4  > postgis.list
$ pg_restore -L postgis.list postgis.dump > postgis.sorted

Rinse and repeat on the second database and then diff the .sorted files. 
 Tried it and although it doesn't completely do the trick it's much 
better than diffing the plain text pg_dump outputs (3000+ diff lines vs. 
less than 200 and about half of that are actual differences).


Thanks,

Joe

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on HP-UX

2011-08-30 Thread Tom Lane
I wrote:
> I am hoping to do a similar test on another machine with $bignum Xeon
> processors, to see if Intel hardware reacts any differently.  But that
> machine is in the Westford office which is currently without power,
> so it will have to wait a few days.

OK, the lights are on again in Westford, so here are some results from
an 8-socket Fujitsu PRIMEQUEST 1800 with 10-core Xeon E7-8870 processors,
hyperthreading enabled for a total of 160 virtual processors.
All test conditions the same as from my Opteron runs yesterday,
except just for the heck of it I ran it up to 160 backends.

Stock git head (of a couple of days ago now):

pgbench -c 1 -j 1 -S -T 300 bench   tps = 4401.589257 (including ...
pgbench -c 2 -j 1 -S -T 300 bench   tps = 8585.789827 (including ...
pgbench -c 8 -j 4 -S -T 300 bench   tps = 36315.227334 (including ...
pgbench -c 16 -j 8 -S -T 300 bench  tps = 73841.195884 (including ...
pgbench -c 32 -j 16 -S -T 300 bench tps = 155309.526039 (including ...
pgbench -c 64 -j 32 -S -T 300 bench tps = 77477.101725 (including ...
pgbench -c 96 -j 48 -S -T 300 bench tps = 41301.481915 (including ...
pgbench -c 128 -j 64 -S -T 300 benchtps = 30443.815506 (including ...
pgbench -c 160 -j 80 -S -T 300 benchtps = 24600.584202 (including ...

Non-locked test in TAS():

pgbench -c 1 -j 1 -S -T 300 bench   tps = 4412.336573 (including ...
pgbench -c 2 -j 1 -S -T 300 bench   tps = 8739.900806 (including ...
pgbench -c 8 -j 4 -S -T 300 bench   tps = 32957.710818 (including ...
pgbench -c 16 -j 8 -S -T 300 bench  tps = 71538.032629 (including ...
pgbench -c 32 -j 16 -S -T 300 bench tps = 153892.469308 (including ...
pgbench -c 64 -j 32 -S -T 300 bench tps = 127786.277182 (including ...
pgbench -c 96 -j 48 -S -T 300 bench tps = 92108.895423 (including ...
pgbench -c 128 -j 64 -S -T 300 benchtps = 75382.131814 (including ...
pgbench -c 160 -j 80 -S -T 300 benchtps = 67277.057981 (including ...

Non-locked test in TAS_SPIN() only:

pgbench -c 1 -j 1 -S -T 300 bench   tps = 4006.626861 (including ...
pgbench -c 2 -j 1 -S -T 300 bench   tps = 9020.124850 (including ...
pgbench -c 8 -j 4 -S -T 300 bench   tps = 36507.582318 (including ...
pgbench -c 16 -j 8 -S -T 300 bench  tps = 69668.921550 (including ...
pgbench -c 32 -j 16 -S -T 300 bench tps = 150886.395754 (including ...
pgbench -c 64 -j 32 -S -T 300 bench tps = 216697.745497 (including ...
pgbench -c 96 -j 48 -S -T 300 bench tps = 171013.266643 (including ...
pgbench -c 128 -j 64 -S -T 300 benchtps = 115205.718495 (including ...
pgbench -c 160 -j 80 -S -T 300 benchtps = 92073.704665 (including ...

This suggests that (1) an unlocked test in TAS_SPIN might be a good idea
on x86_64 after all, and (2) this test scenario may not be pushing the
system hard enough to expose limitations of the spinlock implementation.

I am now thinking that the reason we saw clear differences in spinlock
implementations years ago, and now are not seeing them except on insane
hardware, is mainly that we've managed to reduce contention at higher
levels of the system.  That doesn't mean spinlocks have become
uninteresting, just that "pgbench -S" isn't the ideal test case for
stressing them.  I'm thinking maybe we need a test scenario that
generates sinval traffic, for example, or forces snapshots to be taken
more often.  Ideas anyone?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] symbol mismatches on minor version upgrades

2011-08-30 Thread Tom Lane
Peter Eisentraut  writes:
> A while ago, I blogged about the following problem:
> (http://petereisentraut.blogspot.com/2011/07/undefined-symbol.html)

While not wishing to deny that this can be a problem, I think you're
overstating this aspect:

> Now if this had been, say, plpython, which is also developed closely
> together with the backend, but is probably shipped in a separate binary
> package and has extra dependencies, so it might reasonably not be
> upgraded at the same time, there would be additional problems.  We
> should figure out a way to advise packagers about putting in tight
> enough version dependencies when this happens.

This is not possible at least in the Red Hat world, because all the
subpackages have exact-version-and-release dependencies tying them
together.  That's distro policy not just my whim, and I'd expect other
server-grade distros to have similar policies.

You're right though that doing a "yum update" underneath a running
server could cause transient failures until the server was restarted
with the new postgres executable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Comparing two PostgreSQL databases -- order of pg_dump output

2011-08-30 Thread Jaime Casanova
On Tue, Aug 30, 2011 at 2:07 PM, Joe Abbate  wrote:
> Hi,
>
> In order to compare the schema of two presumably identical databases, I've
> been diffing the output of pg_dump -Osx.  However, I've found that the order
> of the output is not very reliable.

what about using pg_dump -Fc -Osx and use pg_restore -l to list
objects. then you can sort and compare objects and then a script that
compare schema of objects extracting them with -P, -T or -t

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Comparing two PostgreSQL databases -- order of pg_dump output

2011-08-30 Thread Joe Abbate

Hi,

In order to compare the schema of two presumably identical databases, 
I've been diffing the output of pg_dump -Osx.  However, I've found that 
the order of the output is not very reliable.  For example, after 
recreating the Pagila sample database, I find the following:


--- pagila.dmp  2011-08-26 14:34:48.0 -0400
+++ pagila.dev-dmp  2011-08-26 14:34:47.0 -0400
@@ -1140,7 +1140,7 @@
 --

 CREATE TRIGGER last_updated
-BEFORE UPDATE ON city
+BEFORE UPDATE ON actor
 FOR EACH ROW
 EXECUTE PROCEDURE last_updated();

@@ -1160,7 +1160,7 @@
 --

 CREATE TRIGGER last_updated
-BEFORE UPDATE ON customer
+BEFORE UPDATE ON category
 FOR EACH ROW
 EXECUTE PROCEDURE last_updated();
...

The same triggers exist on both databases, it's just that the order is 
different (apparently they're output in creation order).  This even more 
crucial with PostGIS databases, which have several hundred function and 
operator pairs where the only difference is one takes arguments of type 
geometry and the other uses type geography.  There the pg_dump diff 
approach is nearly useless.


I thought that comparing database schemas would be quite desirable, 
e.g., between development/test and production databases.  Is there 
perhaps some mechanism or tool that people use for this purpose, or is 
this not a requirement?


Incidentally, these comparisons are for the Pyrseas tools I'm 
developing.  The output of dbtoyaml is predictable (not because of 
anything I wrote, but because pyyaml outputs everything in alphabetical 
order), and I can compare the YAML outputs quite nicely (however, it 
doesn't show me things I haven't implemented yet, e.g., OPERATOR CLASSes 
in the case of PostGIS).


Joe

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] symbol mismatches on minor version upgrades

2011-08-30 Thread Peter Eisentraut
A while ago, I blogged about the following problem:
(http://petereisentraut.blogspot.com/2011/07/undefined-symbol.html)

Between PG 8.4.4 and 8.4.5, a new symbol PinPortal was added to the
backend and plpgsql was changed to call it.  So in that particular case,
upgrading plpgsql without also upgrading the backend at the same time
would cause plpgsql to fail.

I thought this might have been an accident, but it happened again with
the upgrade from 8.4.7 to 8.4.8 (do_convert_tuple).  So now I'm thinking
we ought to do something about this.

Obviously, we don't want to prevent bug fixes introducing new functions,
when necessary.  But perhaps we should advise users about this or make
sure they don't install mismatching versions.  Note that we have a
system in place to prevent calling mismatching major versions, but we
don't track mismatching minor versions.

There are a number of levels where this might be a problem:

As it happens, plpgsql and the backend are shipped in the same binary
package, so upgrading them together is really not a problem, you just
need to arrange to restart the server right away.  This can be avoided
by loading the library using shared_preload_libraries.  Perhaps that
should be promoted more for libraries like plpgsql that are closely
tied to the backend?

Now if this had been, say, plpython, which is also developed closely
together with the backend, but is probably shipped in a separate binary
package and has extra dependencies, so it might reasonably not be
upgraded at the same time, there would be additional problems.  We
should figure out a way to advise packagers about putting in tight
enough version dependencies when this happens.

In the future, we are presumably hoping for more people to write and
package extensions.  So if someone writes some module and thinks, oh,
I'll use this do_convert_tuple() function, then their binary won't work
with older versions of 8.4.

How could we address these issues?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.conf archive_command example

2011-08-30 Thread Brendan Jurd
On 31 August 2011 04:39, Peter Eisentraut  wrote:
> I think it would be useful to add the following explanation and sample
> to the postgresql.conf sample file:
>

Good idea Peter, +1.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] postgresql.conf archive_command example

2011-08-30 Thread Peter Eisentraut
I think it would be useful to add the following explanation and sample
to the postgresql.conf sample file:

diff --git i/src/backend/utils/misc/postgresql.conf.sample 
w/src/backend/utils/misc/postgresql.conf.sample
--- i/src/backend/utils/misc/postgresql.conf.sample
+++ w/src/backend/utils/misc/postgresql.conf.sample
@@ -186,6 +186,9 @@
 #archive_mode = off# allows archiving to be done
# (change requires restart)
 #archive_command = ''  # command to use to archive a logfile segment
+   # placeholders: %p = path of file to archive
+   #   %f = file name only
+   # e.g. 'test ! -f /mnt/server/archivedir/%f && 
cp %p /mnt/server/archivedir/%f'
 #archive_timeout = 0   # force a logfile segment switch after this
# number of seconds; 0 disables
 
This corresponds to what we have in the documentation and mirrors the
example in recovery.conf.sample.

Objections?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade automatic testing

2011-08-30 Thread Peter Eisentraut
Here is an updated version of the pg_upgrade test script I posted a
while ago.  I've cleaned it up so that it offers moderately
user-friendly feedback, it supports check and installcheck mode, and
should use all the things from the right directories in either case.

diff --git i/contrib/pg_upgrade/.gitignore w/contrib/pg_upgrade/.gitignore
index 03ec737..79c8cdb 100644
--- i/contrib/pg_upgrade/.gitignore
+++ w/contrib/pg_upgrade/.gitignore
@@ -1 +1,5 @@
 /pg_upgrade
+# Generated by test suite
+delete_old_cluster.sh
+/log/
+/tmp_check/
diff --git i/contrib/pg_upgrade/Makefile w/contrib/pg_upgrade/Makefile
index 8f3fd7c..dcd2c04 100644
--- i/contrib/pg_upgrade/Makefile
+++ w/contrib/pg_upgrade/Makefile
@@ -11,6 +11,14 @@ OBJS = check.o controldata.o dump.o exec.o file.o function.o info.o \
 PG_CPPFLAGS  = -DFRONTEND -DDLSUFFIX=\"$(DLSUFFIX)\" -I$(srcdir) -I$(libpq_srcdir)
 PG_LIBS = $(libpq_pgport)
 
+check: test.sh
+	MAKE=$(MAKE) bindir=$(bindir) libdir=$(libdir) $(SHELL) $< --install
+
+installcheck: test.sh
+	MAKE=$(MAKE) bindir=$(bindir) libdir=$(libdir) $(SHELL) $<
+
+EXTRA_CLEAN = delete_old_cluster.sh log/ tmp_check/
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git i/contrib/pg_upgrade/test.sh w/contrib/pg_upgrade/test.sh
index e69de29..7660951 100644
--- i/contrib/pg_upgrade/test.sh
+++ w/contrib/pg_upgrade/test.sh
@@ -0,0 +1,95 @@
+#!/bin/sh
+
+# contrib/pg_upgrade/test.sh
+#
+# Test driver for pg_upgrade.  Initializes a new database cluster,
+# runs the regression tests (to put in some data), runs pg_dumpall,
+# runs pg_upgrade, runs pg_dumpall again, compares the dumps.
+#
+# Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+
+set -e
+
+: ${MAKE=make}
+: ${PGPORT=50432}
+export PGPORT
+
+temp_root=$PWD/tmp_check
+
+if [ "$1" = '--install' ]; then
+	temp_install=$temp_root/install
+	bindir=$temp_install/$bindir
+	libdir=$temp_install/$libdir
+
+	"$MAKE" -s -C ../.. install DESTDIR="$temp_install"
+	"$MAKE" -s -C ../pg_upgrade_support install DESTDIR="$temp_install"
+	"$MAKE" -s -C . install DESTDIR="$temp_install"
+
+	# platform-specific magic to find the shared libraries; see pg_regress.c
+	LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH
+	export LD_LIBRARY_PATH
+	DYLD_LIBRARY_PATH=$libdir:$DYLD_LIBRARY_PATH
+	export DYLD_LIBRARY_PATH
+	LIBPATH=$libdir:$LIBPATH
+	export LIBPATH
+	PATH=$libdir:$PATH
+
+	# We need to make it use psql from our temporary installation,
+	# because otherwise the installcheck run below would try to
+	# use psql from the proper installation directory, which might
+	# be outdated or missing.
+	EXTRA_REGRESS_OPTS=--psqldir=$bindir
+	export EXTRA_REGRESS_OPTS
+fi
+
+PATH=$bindir:$PATH
+export PATH
+
+PGDATA=$temp_root/data
+export PGDATA
+rm -rf "$PGDATA" "$PGDATA".old
+
+logdir=$PWD/log
+rm -rf "$logdir"
+mkdir "$logdir"
+
+set -x
+
+initdb
+pg_ctl start -l "$logdir/postmaster1.log" -w
+if "$MAKE" -C ../.. installcheck; then
+	pg_dumpall >"$temp_root"/dump1.sql || pg_dumpall1_status=$?
+else
+	make_installcheck_status=$?
+fi
+pg_ctl -m fast stop
+if [ -n "$make_installcheck_status" ]; then
+	exit 1
+fi
+if [ -n "$pg_dumpall1_status" ]; then
+	echo "pg_dumpall of pre-upgrade database cluster failed"
+	exit 1
+fi
+
+mv "${PGDATA}" "${PGDATA}.old"
+
+initdb
+
+pg_upgrade -d "${PGDATA}.old" -D "${PGDATA}" -b "$bindir" -B "$bindir"
+
+pg_ctl start -l "$logdir/postmaster2.log" -w
+pg_dumpall >"$temp_root"/dump2.sql || pg_dumpall2_status=$?
+pg_ctl -m fast stop
+if [ -n "$pg_dumpall2_status" ]; then
+	echo "pg_dumpall of post-upgrade database cluster failed"
+	exit 1
+fi
+
+if diff -q "$temp_root"/dump1.sql "$temp_root"/dump2.sql; then
+	echo PASSED
+	exit 0
+else
+	echo "dumps were not identical"
+	exit 1
+fi
diff --git i/src/makefiles/pgxs.mk w/src/makefiles/pgxs.mk
index 84a296a..7dc8742 100644
--- i/src/makefiles/pgxs.mk
+++ w/src/makefiles/pgxs.mk
@@ -207,7 +207,7 @@ ifdef OBJS
 	rm -f $(OBJS)
 endif
 ifdef EXTRA_CLEAN
-	rm -f $(EXTRA_CLEAN)
+	rm -rf $(EXTRA_CLEAN)
 endif
 ifdef REGRESS
 # things created by various check targets
diff --git i/src/test/regress/GNUmakefile w/src/test/regress/GNUmakefile
index 90aea6c..ec29391 100644
--- i/src/test/regress/GNUmakefile
+++ w/src/test/regress/GNUmakefile
@@ -132,7 +132,7 @@ tablespace-setup:
 ## Run tests
 ##
 
-REGRESS_OPTS = --dlpath=.
+REGRESS_OPTS = --dlpath=. $(EXTRA_REGRESS_OPTS)
 
 check: all tablespace-setup
 	$(pg_regress_check) $(REGRESS_OPTS) --schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(TEMP_CONF) $(EXTRA_TESTS)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inputting relative datetimes

2011-08-30 Thread Tom Lane
Robert Haas  writes:
> Well, I'm fine with not back-patching it, but think the existing
> behavior is flat wrong.

I'm not arguing that this way isn't better, just that it's different.
There have been zero user complaints about this behavior since Tom
Lockhart put it in, more than ten years ago.  That sort of militates
against a hard-line "it's flat wrong" stance.

But more to the point, since there wasn't an error before and there
isn't an error now, this is just a silent behavioral change, and we
avoid doing those in released branches.  People don't want to have to
retest their applications against minor releases.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Fast GiST index build

2011-08-30 Thread Heikki Linnakangas

On 30.08.2011 13:29, Alexander Korotkov wrote:

On Tue, Aug 30, 2011 at 1:13 PM, Heikki Linnakangas<
heikki.linnakan...@enterprisedb.com>  wrote:


So, over 50% of the CPU time is spent in choosing a block from the
temporary files. That should be pretty easy to improve..


Yes, probably we can just remove free blocks sorting.


Ok, the first results are in for that:

 testname  |   nrows   |duration | accesses
---+---+-+--
 points unordered buffered | 25000 | 06:00:23.707579 |  4049832

From the previous test runs, the unbuffered index build took under 4 
hours, so even though this is a lot better than with the sorting, it's 
still a loss compared to non-buffered build.


I had vmstat running during most of this index build. At a quick glance, 
it doesn't seem to be CPU bound anymore. I suspect the buffers in the 
temporary file gets very fragmented. Or, we're reading it in backwards 
order because the buffers work in a LIFO fashion. The system seems to be 
doing about 5 MB/s of I/O during the build, which sounds like a figure 
you'd get for more or less random I/O.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dropdb and dropuser: IF EXISTS

2011-08-30 Thread Robert Haas
On Tue, Aug 30, 2011 at 12:44 PM, Josh Kupershmidt  wrote:
> On Tue, Aug 30, 2011 at 11:14 AM, Robert Haas  wrote:
>> Committed with some edits.  I stole the documentation language from
>> the DROP DATABASE and DROP USER pages and just copied it over, instead
>> of saying the same thing in different words.  And I rearranged the
>> options handling to be more consistent with what we do elsewhere.
>
> Thanks. I think you accidentally copied the DROP DATABASE snippet into
> dropuser.sgml as well.

D'oh.  Fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dropdb and dropuser: IF EXISTS

2011-08-30 Thread Josh Kupershmidt
On Tue, Aug 30, 2011 at 11:14 AM, Robert Haas  wrote:
> Committed with some edits.  I stole the documentation language from
> the DROP DATABASE and DROP USER pages and just copied it over, instead
> of saying the same thing in different words.  And I rearranged the
> options handling to be more consistent with what we do elsewhere.

Thanks. I think you accidentally copied the DROP DATABASE snippet into
dropuser.sgml as well.

Josh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange row number estimates in pg9.1rc1

2011-08-30 Thread Tom Lane
"Sergey E. Koposov"  writes:
> I wonder what should be the best way to proceed for already migrated 
> databases -- running analyze repeatedly may not be the best way for 
> very large clusters with large tables...

You can just manually update pg_class.reltuples to a more reasonable
value.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange row number estimates in pg9.1rc1

2011-08-30 Thread Sergey E. Koposov

On Tue, 30 Aug 2011, Tom Lane wrote:


Sure enough, that's the problem.  pg_upgrade leaves relpages/reltuples
set to zero, but it also imports the visibility map pages from the old
cluster.  If the old visibility map shows the table as all-visible,
then this happens when you try to VACUUM ANALYZE the table:

1. VACUUM doesn't process any pages, so it has no tuple density
estimate.  It leaves reltuples set to zero, but it does set relpages.

2. ANALYZE scans some part of the table.  It gets a tuple density
estimate for those pages ... but if that's only a small fraction of
the table, it believes the zero estimate of tuple density elsewhere.
So you get only a small update of reltuples.


Thanks for figuring this out.
I wonder what should be the best way to proceed for already migrated 
databases -- running analyze repeatedly may not be the best way for 
very large clusters with large tables...


S

***
Sergey E. Koposov, PhD
Institute for Astronomy, Cambridge/Sternberg Astronomical Institute
Web: http://lnfm1.sai.msu.ru/~math
E-mail: m...@sai.msu.ru

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] compile from git repository

2011-08-30 Thread Andrew Dunstan



On 08/30/2011 10:43 AM, Magnus Hagander wrote:

On Tue, Aug 30, 2011 at 16:30, Tom Lane  wrote:

Andrew Dunstan  writes:

On 08/30/2011 07:11 AM, 权宗亮 wrote:

I try to compile from git repository and get some errors.

Windows 7 x86
VC Express 2008
perl v5.8.9
flex 2.5.4a-1
bison 2.4.1

Your version of flex is too old.

Hmm ... had the OP been using configure, he'd have been told that by the
build scripts.  Should the MSVC build scripts be improved to check the
versions of bison and flex?

We do it for bison already. It probably shouldn't be too hard to do
the same for flex.



Done.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange row number estimates in pg9.1rc1

2011-08-30 Thread Tom Lane
I wrote:
> "Sergey E. Koposov"  writes:
>> I'm seeing something weird which looks like a bug in 9.1rc1 after the 
>> upgrade 8.4->9.0->9.1 done using pg_upgrade.

> Hm, I wonder what pg_upgrade left relpages/reltuples set to ...

Sure enough, that's the problem.  pg_upgrade leaves relpages/reltuples
set to zero, but it also imports the visibility map pages from the old
cluster.  If the old visibility map shows the table as all-visible,
then this happens when you try to VACUUM ANALYZE the table:

1. VACUUM doesn't process any pages, so it has no tuple density
estimate.  It leaves reltuples set to zero, but it does set relpages.

2. ANALYZE scans some part of the table.  It gets a tuple density
estimate for those pages ... but if that's only a small fraction of
the table, it believes the zero estimate of tuple density elsewhere.
So you get only a small update of reltuples.

(The above behavior is new as of commit
b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8, BTW.)

Basically, step 1 is buggy here: if we aren't making an update to
reltuples, we shouldn't set relpages either.  Setting it nonzero
changes the implied tuple density from "unknown" to "known zero",
which is wrong.

I'll go fix that, but I think it might be a good idea for pg_upgrade
to think about preserving the relpages/reltuples columns ...

regards, tom lane

PS: right now, you cannot reproduce this in a 9.0 -> HEAD upgrade,
because of this patch:

commit 00a7c9014a8fbb7388a807daeba3e0a85b49a747
Author: Bruce Momjian 
Date:   Fri Aug 19 11:20:30 2011 -0400

In pg_upgrade, don't copy visibility map files from clusters that did not
have crash-safe visibility maps to clusters that expect crash-safety.

Request from Robert Haas.

I did reproduce it in a 9.0->9.1 test.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dropdb and dropuser: IF EXISTS

2011-08-30 Thread Robert Haas
On Mon, Aug 29, 2011 at 7:34 PM, Josh Kupershmidt  wrote:
> On Fri, Aug 26, 2011 at 10:42 PM, Robert Haas  wrote:
>> +1 for --if-exists, but -X isn't doing a lot for me, especially since
>> we've used -X for other purposes in other commands.  I'd just skip
>> having a short form for this one.
>
> Fine by me. Updated patch attached.

Committed with some edits.  I stole the documentation language from
the DROP DATABASE and DROP USER pages and just copied it over, instead
of saying the same thing in different words.  And I rearranged the
options handling to be more consistent with what we do elsewhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inputting relative datetimes

2011-08-30 Thread Robert Haas
On Tue, Aug 30, 2011 at 11:52 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Sun, Aug 28, 2011 at 5:39 AM, Dean Rasheed  
>> wrote:
>>> The attached patch makes "today", "tomorrow" and "yesterday" only set
>>> the year, month and day fields. All the other fields are already
>>> initialised to 0 at the start, and may be set non-zero before or after
>>> encountering these special date values. The result should now be
>>> independent of the order of the fields.
>
>> OK, committed.  Perhaps it should be back-patched,
>
> No, I don't think so.  This is an incompatible behavioral change with a
> small-but-not-zero probability of breaking existing applications.

Well, I'm fine with not back-patching it, but think the existing
behavior is flat wrong.  Having '04:00:00 yesterday' return midnight
yesterday is pretty well unjustifiable.  An error would be reasonable,
and DWIM is reasonable, but anything else is the wrong answer.  How
much worse would it have to be to qualify as a bug?  What if we didn't
the hour, minute, and second at all, and returned a value based on
whatever garbage was left over in the relevant memory location?  What
if we returned 4 BC?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inputting relative datetimes

2011-08-30 Thread Tom Lane
Robert Haas  writes:
> On Sun, Aug 28, 2011 at 5:39 AM, Dean Rasheed  
> wrote:
>> The attached patch makes "today", "tomorrow" and "yesterday" only set
>> the year, month and day fields. All the other fields are already
>> initialised to 0 at the start, and may be set non-zero before or after
>> encountering these special date values. The result should now be
>> independent of the order of the fields.

> OK, committed.  Perhaps it should be back-patched,

No, I don't think so.  This is an incompatible behavioral change with a
small-but-not-zero probability of breaking existing applications.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inputting relative datetimes

2011-08-30 Thread Robert Haas
On Sun, Aug 28, 2011 at 5:39 AM, Dean Rasheed  wrote:
> On 28 August 2011 00:39, Robert Haas  wrote:
>> On Sat, Aug 27, 2011 at 7:43 AM, Dean Rasheed  
>> wrote:
>>> On 27 August 2011 12:29, Dean Rasheed  wrote:
 ... if nothing else it has been a
 fun exercise figuring out how the datetime string parsing code works.
>>>
>>> While looking through the current code, I spotted the following oddity:
>>>
>>> select timestamp 'yesterday 10:30';
>>>      timestamp
>>> -
>>>  2011-08-26 10:30:00
>>>
>>> which is what you'd expect, however:
>>>
>>> select timestamp '10:30 yesterday';
>>>      timestamp
>>> -
>>>  2011-08-26 00:00:00
>>>
>>> Similarly "today" and "tomorrow" reset any time fields so far, but
>>> ISTM that they should really be preserving the hour, min, sec fields
>>> decoded so far.
>>
>> Sounds right to me.  Want to send a patch?
>
> The attached patch makes "today", "tomorrow" and "yesterday" only set
> the year, month and day fields. All the other fields are already
> initialised to 0 at the start, and may be set non-zero before or after
> encountering these special date values. The result should now be
> independent of the order of the fields.

OK, committed.  Perhaps it should be back-patched, but since this was
only discovered during code-reading and not in the wild, I didn't
bother.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Join push-down for foreign tables

2011-08-30 Thread Tom Lane
Shigeru Hanada  writes:
> BTW, Is adding foreign server oid to RelOptInfo acceptable?

No objection here.

> This field
> is set in build_simple_rel() or build_join_rel() if the RelOptInfo
> itself is a foreign scan, or it is a foreign join and both inner and
> outer RelOptInfo have same and valid foreign server oid.

I think you mean "if all the base rels in the join rel come from the
same foreign server".

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] compile from git repository

2011-08-30 Thread Magnus Hagander
On Tue, Aug 30, 2011 at 16:30, Tom Lane  wrote:
> Andrew Dunstan  writes:
>> On 08/30/2011 07:11 AM, 权宗亮 wrote:
>>> I try to compile from git repository and get some errors.
>>>
>>> Windows 7 x86
>>> VC Express 2008
>>> perl v5.8.9
>>> flex 2.5.4a-1
>>> bison 2.4.1
>
>> Your version of flex is too old.
>
> Hmm ... had the OP been using configure, he'd have been told that by the
> build scripts.  Should the MSVC build scripts be improved to check the
> versions of bison and flex?

We do it for bison already. It probably shouldn't be too hard to do
the same for flex.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] compile from git repository

2011-08-30 Thread Tom Lane
Andrew Dunstan  writes:
> On 08/30/2011 07:11 AM, 权宗亮 wrote:
>> I try to compile from git repository and get some errors.
>> 
>> Windows 7 x86
>> VC Express 2008
>> perl v5.8.9
>> flex 2.5.4a-1
>> bison 2.4.1

> Your version of flex is too old.

Hmm ... had the OP been using configure, he'd have been told that by the
build scripts.  Should the MSVC build scripts be improved to check the
versions of bison and flex?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] postgesql-9.0.4 compile on AIX 6.1 using gcc 4.4.6

2011-08-30 Thread Weiss, Wilfried
Hello,

 

I am just trying to compile postgresql-9.0.4 on AIX 6100-06-03-1048
using gcc 4.4.6.

 

Since many years I did this successfully on other versions of AIX using
xlc. However my version 9 of xlc is no longer running on AIX 6.1 and no
one is willing to sponsor a new version of xlc. So I started trying to
compile postgresql with gcc 4.4.6. 

 

The installation of gcc was a nightmare. There were unresolved
dependencies that needed to be ignored with parameter nodeps when
installing by rpm!?

 

Unfortunately that was not all. 

There was also:

"[Bug target/46072] AIX linker chokes on debug info for uninitialized
static variables"

 

This is an IBM bug in AIX's assembler (as) which causes corrupt object
code that is crashing when trying to execute it.

As far as I know IBM still not delived a fix for this. It seems that
they are not interested in this as IBM's xlc is not using the assembler
to create object code.  

 

Does any one know whether there is an alternate way to compile
postgresql on AIX 6.1 using gcc???

I appreciate even the smallest hint!

 

Regards 

WW

 


http://www.pilkington.com/nsg/disclaimer.htm

Re: [HACKERS] compile from git repository

2011-08-30 Thread Andrew Dunstan


On 08/30/2011 07:11 AM, 权宗亮 wrote:

Hi, hackers

I try to compile from git repository and get some errors.

Windows 7 x86
VC Express 2008
perl v5.8.9
flex 2.5.4a-1
bison 2.4.1







Your version of flex is too old. You can try the one at 
, 
or the one that comes with Msysgit should also be suitable if you're 
using that.




cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Join push-down for foreign tables

2011-08-30 Thread Shigeru Hanada
Thanks for the comments.

(2011/08/30 1:42), Tom Lane wrote:
>> Costs of ForeignJoinPath are estimated by FDW via new routine
>> PlanForeignJoin, and SQL based FDW would need to generate remote SQL
>> here.  If a FDW can't push down that join, then it can set disable_cost
>> (1.0e10) to tell planner to not choose that path.
> 
> disable_cost is not a positive guarantee that a path won't be chosen.
> Particularly not for foreign table accesses, where the estimated costs
> could be pretty darn large in themselves.  You need to pick an API
> wherein refusal is unmistakable.  Probably, returning NULL instead of a
> Path structure is the appropriate way to signal "can't do this join".

Agreed.  Returning NULL seems fine.

>> In this design, cost of ForeignJoinPath is compared to other join nodes
>> such as NestPath and MergePath.  If ForeignJoinPath is the cheapest one
>> among the join candidates, planner will generates ForeignJoin plan node
>> and put it into plan tree as a leaf node.  In other words, joined
>> foreign tables are merged into upper ForeignJoin node.
> 
> Hmmm ... are you trying to describe what happens when three or more
> foreign tables are all to be joined at the remote end?

Yes, that's what I wanted to say :)

> I agree that's
> an important use-case, and that we probably want just one Plan node to
> result from it, but I'm less sure about what the Path representation
> ought to be.  It might be better to retain the Path tree showing what
> we'd concluded about what the join order ought to be, with the idea that
> the transmitted query could be constructed to reflect that, saving the
> remote-end planner from having to repeat that work.

It seems a fine solution.  Somehow I thought that one path node should
be mapped to one plan node. In fact, merge join path node might be
expanded to multiple plan nodes, through it's reversed case of foreign
join.  I'm going to implement this idea, and hopefully post proof patch
for next CF.

BTW, Is adding foreign server oid to RelOptInfo acceptable?  This field
is set in build_simple_rel() or build_join_rel() if the RelOptInfo
itself is a foreign scan, or it is a foreign join and both inner and
outer RelOptInfo have same and valid foreign server oid.  I think that
this field could avoid recursive search into foreign join subtree.

Regards,
-- 
Shigeru Hanada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] compile from git repository

2011-08-30 Thread 权宗亮
Hi, hackers

I try to compile from git repository and get some errors.

Windows 7 x86
VC Express 2008
perl v5.8.9
flex 2.5.4a-1
bison 2.4.1

git clone git://git.postgresql.org/git/postgresql.git
git checkout REL9_0_4

cd ...\src\tools\msvc
build.bat


  Build started: Project: postgres, Configuration: Release|Win32
  Running flex on src\backend\parser\scan.l
  "src\backend\parser\scan.l", line 112: unrecognized %option: reentrant
  "src\backend\parser\scan.l", line 113: unrecognized %option: bison-bridge
  "src\backend\parser\scan.l", line 114: unrecognized %option:
bison-locations
  "src\backend\parser\scan.l", line 121: unrecognized %option: yyalloc
  "src\backend\parser\scan.l", line 122: unrecognized %option: yyrealloc
  "src\backend\parser\scan.l", line 123: unrecognized %option: yyfree
Project : error PRJ0002: Error result 1 returned from
'C:\Windows\system32\cmd.exe'.
  postgres - 1 error(s), 0 warning(s)
  The command exited with code 1.

Help me, please.

-- 
Quan Zongliang
quanzongli...@gmail.com


Re: [HACKERS] WIP: Fast GiST index build

2011-08-30 Thread Heikki Linnakangas

On 30.08.2011 13:38, Alexander Korotkov wrote:

On Tue, Aug 30, 2011 at 1:08 PM, Heikki Linnakangas<
heikki.linnakan...@enterprisedb.com>  wrote:



Thanks. Meanwhile, I hacked together my own set of test scripts, and let
them run over the weekend. I'm still running tests with ordered data, but
here are some preliminary results:

   testname   |   nrows   |duration | accesses
-+**---+-+**--
  points unordered auto   | 25000 | 08:08:39.174956 |  3757848
  points unordered buffered   | 25000 | 09:29:16.47012  |  4049832
  points unordered unbuffered | 25000 | 03:48:10.999861 |  4564986

As you can see, the results are very disappointing :-(. The buffered builds
take a lot *longer* than unbuffered ones. I was expecting the buffering to
be very helpful at least in these unordered tests. On the positive side, the
buffering made index quality somewhat better (accesses column, smaller is
better), but that's not what we're aiming at.

What's going on here? This data set was large enough to not fit in RAM, the
table was about 8.5 GB in size (and I think the index is even larger than
that), and the box has 4GB of RAM. Does the buffering only help with even
larger indexes that exceed the cache size even more?


This seems pretty strange for me. Time of unbuffered index build shows that
there is not bottleneck at IO. That radically differs from my
experiments. I'm going to try your test script on my test setup.
While I have only express assumption that random function appears to be
somewhat bad. Thereby unordered dataset behave like the ordered one.


Oh. Doing a simple "SELECT * FROM points LIMIT 10", it looks pretty 
random to me. The data should be uniformly distributed in a rectangle 
from (0, 0) to (10, 10).



 Can you
rerun tests on your test setup with dataset generation on the backend like
this?
CREATE TABLE points AS (SELECT point(random(), random() FROM
generate_series(1,1000));


Ok, I'll queue up that test after the ones I'm running now.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Fast GiST index build

2011-08-30 Thread Heikki Linnakangas

On 30.08.2011 13:29, Alexander Korotkov wrote:

On Tue, Aug 30, 2011 at 1:13 PM, Heikki Linnakangas<
heikki.linnakan...@enterprisedb.com>  wrote:


So, over 50% of the CPU time is spent in choosing a block from the
temporary files. That should be pretty easy to improve..


Yes, probably we can just remove free blocks sorting.


I'm re-running the tests with that change now. It seems like using the 
list of free blocks as a simple stack would be better anyway, it 
probably yields a better cache hit ratio when we re-use blocks that have 
just been released.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Single pass vacuum - take 2

2011-08-30 Thread Pavan Deolasee
On Tue, Aug 23, 2011 at 2:47 AM, Jim Nasby  wrote:
> On Aug 22, 2011, at 1:22 AM, Pavan Deolasee wrote:
>> Hi All,
>>
>> Here is a revised patch based on our earlier discussion. I implemented
>> Robert's idea of tracking the vacuum generation number in the line
>> pointer itself. For LP_DEAD line pointers, the lp_off/lp_len is unused
>> (and always set to 0 for heap tuples). We use those 30 bits to store
>> the generation number of the vacuum which would have potentially
>> removed the corresponding index pointers, if the vacuum finished
>> successfully. The pg_class information is used to know the status of
>> the vacuum, whether it failed or succeeded. 30-bit numbers are large
>> enough that we can ignore any wrap-around related issues. With this
>
> +        * Note: We don't worry about the wrap-around issues here since it 
> would
> +        * take a 1 Billion vacuums on the same relation for the vacuum 
> generation
> +        * to wrap-around. That would take ages to happen and even if it 
> happens,
> +        * the chances that we might have dead-vacuumed line pointers still
> +        * stamped with the old (failed) vacuum are infinitely small since 
> some
> +        * other vacuum cycle would have taken care of them.
>
> It would be good if some comment explained how we're safe in the case of an 
> aborted vacuum. I'm guessing that when vacuum finds any line pointers that 
> don't match the last successful vacuum exactly it will go and re-examine them 
> from scratch?
>

Yeah. If we don't know the status of the vacuum that collected the
line pointer and marked it vacuum-dead, the next vacuum will pick it
up again and stamp it with its own generation number.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Fast GiST index build

2011-08-30 Thread Alexander Korotkov
On Tue, Aug 30, 2011 at 1:08 PM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:

>
> Thanks. Meanwhile, I hacked together my own set of test scripts, and let
> them run over the weekend. I'm still running tests with ordered data, but
> here are some preliminary results:
>
>   testname   |   nrows   |duration | accesses
> -+**---+-+**--
>  points unordered auto   | 25000 | 08:08:39.174956 |  3757848
>  points unordered buffered   | 25000 | 09:29:16.47012  |  4049832
>  points unordered unbuffered | 25000 | 03:48:10.999861 |  4564986
>
> As you can see, the results are very disappointing :-(. The buffered builds
> take a lot *longer* than unbuffered ones. I was expecting the buffering to
> be very helpful at least in these unordered tests. On the positive side, the
> buffering made index quality somewhat better (accesses column, smaller is
> better), but that's not what we're aiming at.
>
> What's going on here? This data set was large enough to not fit in RAM, the
> table was about 8.5 GB in size (and I think the index is even larger than
> that), and the box has 4GB of RAM. Does the buffering only help with even
> larger indexes that exceed the cache size even more?
>
This seems pretty strange for me. Time of unbuffered index build shows that
there is not bottleneck at IO. That radically differs from my
experiments. I'm going to try your test script on my test setup.
While I have only express assumption that random function appears to be
somewhat bad. Thereby unordered dataset behave like the ordered one. Can you
rerun tests on your test setup with dataset generation on the backend like
this?
CREATE TABLE points AS (SELECT point(random(), random() FROM
generate_series(1,1000));

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] WIP: Fast GiST index build

2011-08-30 Thread Alexander Korotkov
On Tue, Aug 30, 2011 at 1:13 PM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:

> So, over 50% of the CPU time is spent in choosing a block from the
> temporary files. That should be pretty easy to improve..
>
Yes, probably we can just remove free blocks sorting.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] WIP: Fast GiST index build

2011-08-30 Thread Heikki Linnakangas

On 30.08.2011 12:08, Heikki Linnakangas wrote:

What's going on here? This data set was large enough to not fit in RAM,
the table was about 8.5 GB in size (and I think the index is even larger
than that), and the box has 4GB of RAM. Does the buffering only help
with even larger indexes that exceed the cache size even more?


The tests are still running, so I decided to try oprofile. The build is 
in the final emptying phase, according to the log, and has been for over 
half an hour now. Oprofile output looks very interesting:


samples  %image name   symbol name
228590   30.3045  postgres pg_qsort
200558   26.5882  postgres gistBuffersFreeBlocksCmp
49397 6.5486  postgres gistchoose
45563 6.0403  postgres gist_box_penalty
31425 4.1661  postgres AllocSetAlloc
24182 3.2058  postgres FunctionCall3Coll
22671 3.0055  postgres rt_box_union
20147 2.6709  postgres gistpenalty
17007 2.2546  postgres DirectFunctionCall2Coll
15790 2.0933  no-vmlinux   /no-vmlinux
14148 1.8756  postgres XLogInsert
10612 1.4068  postgres gistdentryinit
10542 1.3976  postgres MemoryContextAlloc
9466  1.2549  postgres FunctionCall1Coll
9190  1.2183  postgres gist_box_decompress
6681  0.8857  postgres med3
4941  0.6550  libc-2.12.so isnanf

So, over 50% of the CPU time is spent in choosing a block from the 
temporary files. That should be pretty easy to improve..


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Fast GiST index build

2011-08-30 Thread Heikki Linnakangas

On 26.08.2011 17:18, Alexander Korotkov wrote:

On Thu, Aug 25, 2011 at 11:08 PM, Heikki Linnakangas<
heikki.linnakan...@enterprisedb.com>  wrote:


Could you share the test scripts, patches and data sets etc. needed to
reproduce the tests you've been running? I'd like to try them out on a test
server.



1) I've updated links to the datasets on the wiki page.
2) Script for index quality testing fastbuild_test.php is in the attachment.
In order to run it you need PHP with pdo and pdo_pgsql modules. Also
plantuner moduler is required (it is used to force planer to use specific
index). After running that script following query returns relative score of
index quality:

select indexname, avg(count::real/(select count from test_result a2 where
a2.indexname = 'usnoa2_idx3' and a2.predicate = a1.predicate and
a2.tablename = a1.tablename)::real) from test_result a1 where a1.tablename =
'usnoa2' group by indexname;

where 'usnoa2' - table name, 'usnoa2_idx3' - name of index which quality was
assumed to be 1.
3) Patch which makes plantuner work with HEAD is also in attachment.
4) Patch with my split algorithm implementation is attached. Now it's form
is appropriate only for testing purposes.
5) For indexes creation I use simple script which is attached as
'indexes.sql'. Also, similar script with different index names I'm running
with my split patch.

Feel free to ask questions about all this stuff.


Thanks. Meanwhile, I hacked together my own set of test scripts, and let 
them run over the weekend. I'm still running tests with ordered data, 
but here are some preliminary results:


   testname   |   nrows   |duration | accesses
-+---+-+--
 points unordered auto   | 25000 | 08:08:39.174956 |  3757848
 points unordered buffered   | 25000 | 09:29:16.47012  |  4049832
 points unordered unbuffered | 25000 | 03:48:10.999861 |  4564986

As you can see, the results are very disappointing :-(. The buffered 
builds take a lot *longer* than unbuffered ones. I was expecting the 
buffering to be very helpful at least in these unordered tests. On the 
positive side, the buffering made index quality somewhat better 
(accesses column, smaller is better), but that's not what we're aiming at.


What's going on here? This data set was large enough to not fit in RAM, 
the table was about 8.5 GB in size (and I think the index is even larger 
than that), and the box has 4GB of RAM. Does the buffering only help 
with even larger indexes that exceed the cache size even more?



Test methodology


These tests consist of creating a gist index using the point datatype.

 Table "public.points"
  Column |  Type   | Modifiers
+-+---
  x  | integer |
  y  | integer |

CREATE INDEX testindex ON points_ordered USING gist (point(x,y)) WITH 
(buffering = 'on');


The points in the table are uniformly distributed. In the 'unordered' 
tests, they are in random order. The ordered tests use the exact same 
data, but sorted by x, y coordinates.


The 'accesses' column measures the quality of the produced index. 
Smaller is better. It is calculated by performing a million queries on 
the table, selecting points within a small square at evenly spaced 
locations. Like:


(SELECT COUNT(*) FROM points WHERE point(x,y) <@ box(point(xx-20, 
yy-20), point(xx+20, yy+20)));


The number of index pages touched by those queries are count from 
pg_statio_user_indexes, and that number is reported in the 'accesses' 
column.


I've attached the whole script used. Pass the number of rows to use in 
the test as argument, and the script does the rest.


--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com



rungisttests.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers