Re: [HACKERS] pg_upgrade: convert on read is dead end

2008-10-13 Thread Heikki Linnakangas

Zdenek Kotala wrote:
PostgreSQL stores all table's external data in one TOAST table and 
stored data does not contains any clue about datatype. When postgreSQL 
reads TOSTEed value then there is not possible detect what type it is 
and perform varlena conversion on composite datatypes or arrays.


It could be converted in detoast_datum function but it means that datum 
have to be retoasted and store back on new pages. The old page MUST keep 
in old format because any page conversion lost information about version 
and different datatypes can be store on one page PosgreSQL.


Hmm. There's two things here:
1. The format of the toast page, and the toast chunk tuples.
2. The format of the toasted tuple, stored within the chunk tuples.

The format of the toast page and the varlena headers of the byteas in 
the toast tuples could be converted when the page is read in, as usual. 
The format of the toasted tuple within the chunks is indeed trickier. 
Retoasting all tuples on the master heap page when the page is read in 
is probably simplest, as you say. But they don't necessarily need to be 
stored on new pages, any toast pages will do, because we keep track of 
1. separately.


By my opinion, this issue completely kill convert on read solution and 
only read all formats... solution is right one.


It is quite heavy-weight, I agree, but doesn't completely kill the 
idea in my opinion.


Besides, the read all formats approach wouldn't really avoid it 
either. If you passed a toasted datum to a function, when the function 
needs to detoast it, detoast_datum still wouldn't know whether the datum 
is in old or new format. You'd still need to detoast all values in the 
tuple somewhere before they could be passed around.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Extending pg_class info + more flexible TOAST chunk size

2008-10-13 Thread Zdenek Kotala

Tom Lane napsal(a):

Zdenek Kotala [EMAIL PROTECTED] writes:
Bacause MAX_TOAST_CHUNK_SIZE is related on page layout version I need have toast 
chunk size more flexible.



Attached patch add three new columns into pg_class


relblocksize - which is always BLCKSZ. I put it there for fullness, but i could 
be use in future development to specify different BLCKSZ per relation.


relsegsize - currently it is always RELSEG_FILE. I performed basic adjustment in 
smgr.c and md.c. Now only smgropen contains reference to RELSEG_FILE. The 
problem how to do it fully dynamic is how to pass information rel_rd-relsegsize 
down into smgropen. One idea is to extend relfilenode, but I'm not sure about it.


relmaxitemsize - it is set to TOAST_MAX_CHUNK_SIZE. Other relation has this 
value set to zero for now. toast functions are fully aware about this setting 
and use it. This column will be convert to int2vector during pg_upgrade 
development (I need to track information for each page version).


There is not one of these things that we have any intention of being
allowed to vary on a per-relation basis.  Why don't you read them out of
pg_control?


The problem what I need to solve is how to handle different TOAST chunk size 
which becomes with upgrade. if you insert new record into TOAST table it will be 
created on the new page which has different max chunk size, because it depends 
on page header size. It means that one TOAST table will have more chunk size.
Use old value from previous version is possible but it could invoke waste of 
space in situation when pageheader in a new version is bigger.


Another solution is to take first chunk size and say - it is max chunk size.

Relsegsize is related to tablespace but when you upgrade you could want to use 
new size for new tables. But it is not important for pg_upgrade project.


Blocksize is more nice to have int this moment, but It makes me sense to have 
different block size for toast table and heap. I know that this idea requires 
lot of changes including buffer cache and so on.


Zdenek

--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Extending pg_class info + more flexible TOAST chunk size

2008-10-13 Thread Heikki Linnakangas

Zdenek Kotala wrote:
The problem what I need to solve is how to handle different TOAST chunk 
size which becomes with upgrade. if you insert new record into TOAST 
table it will be created on the new page which has different max chunk 
size, because it depends on page header size. It means that one TOAST 
table will have more chunk size.
Use old value from previous version is possible but it could invoke 
waste of space in situation when pageheader in a new version is bigger.


Another solution is to take first chunk size and say - it is max chunk 
size.


Not all chunks need to be the same size. We do currently require that, 
but AFAICS it's only because that allows random access to a given offset 
within a datum. That's of course nice, but I think we could live without 
it. Or try random access with the new toast size first, and if the 
chunks turn out to be different size, fall back to reading all chunks 
sequentially. And if we have to retoast all values before they're 
accessed, per the other thread, then we can just assume that all toast 
chunks that we need to random access are of the new size.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Extending pg_class info + more flexible TOAST chunk size

2008-10-13 Thread Zdenek Kotala

ITAGAKI Takahiro napsal(a):

Zdenek Kotala [EMAIL PROTECTED] wrote:

Bacause MAX_TOAST_CHUNK_SIZE is related on page layout version I need have toast 
chunk size more flexible.


I agree that flexible toast threshold is useful, but I have
several questions in your implementations.

relblocksize - which is always BLCKSZ. I put it there for fullness, but i could 
be use in future development to specify different BLCKSZ per relation.


We need many infrastructural changes for making it work:

  1. Change BLCKSZ to be a variable from a constant value.
 It allows us to have initdb --blocksize=16kB.
  2. Make bufmgr handle different sizes of pages in a server instance.
 I think it is a difficult task.
  3. Have a SQL interface for 2.

You suggested only 3, but 1 and 2 are more important.


I know about problem complexity. If you find on how many places BLCKSZ is used 
you get basic picture.


relsegsize - currently it is always RELSEG_FILE. I performed basic adjustment in 
smgr.c and md.c. Now only smgropen contains reference to RELSEG_FILE.


I'm not sure why relsegsize should be adjusted per-relation basis.
If there are no limitations in filesystem and in backup
utilities, large relsegsize is always better, no?
Is it enough to have initdb --segsize=4GB ?


It is related to tablespace. Usually one table space is stored on different 
volume which can have different FS. The question is how to handle relation 
migration from one tablespace to another. In most cases it requires physical 
copy  which could change seg size anyway. Maybe extend pg_tablespace makes more 
sense.



relmaxitemsize - it is set to TOAST_MAX_CHUNK_SIZE.


You added new columns in pg_class, but we have another choice to use
pg_class.reloptions. What is the reason you choose new-columns-way?


Good point I will look on it.


Another question is that we should have per-column toast control
options not only per-relation ones. Do we also need to modify
pg_attribute in the same way?


Currently we have one TOAST table per heap table (if it contains varlena 
attribute). Data in the TOAST table are stored without any idea what data type 
it is.


Thanks Zdenek

--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Convert check constraints into One-Time_Filter on prepared statements

2008-10-13 Thread Gregory Stark
ITAGAKI Takahiro [EMAIL PROTECTED] writes:

  EXPLAIN EXECUTE p('2008-07-01');
 -
   Append
 -  Result
   One-Time Filter: (('2008-01-01' = $1) AND ($1  '2009-01-01'))
 -  Index Scan on test_2008_t_key
   Index Cond: (($1 = t) AND (t  ($1 + '1 mon')))

 We can avoid internal scans when One-Time Filter returns false for each
 partition. So we can reuse the plan and receive benefit from CE.

 Is this concept ok and worth trying?
 If it is reasonable, I'll try it. Comments welcome.

It's tempting to go straight for the special case where we can detect that the
constraints are mutually exclusive and we can build a lookup table or do a
binary search to find the single partition which is included.

But the this would handle the general case and would be a huge help. I've
looked at plans where partitioning represented a huge slowdown and the time
seemed to just disappear down the hole. All the time was being spent in just
the index lookup startup and probe for dozens of partitions which returned no
rows.

In combination with the ordered append node -- which is still sitting in my
experimental directory -- this would basically fix partitioning. The planner
would recognize that the result is ordered and merge-joinable and it wouldn't
waste any time doing any index probes. 

There would be a much smaller waste of time setting up the degenerate heap
merge and checking all those conditions in the executor. These things will
matter when we get to scaling up to hundreds of plan nodes but the situation
would still be much better than today.

The other reason I think it's still worth doing this even though it's not as
effective as covering the special case of mutually exclusive partitions is
that there's still a place for this even once we have a more mature
partitioning scheme. 


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block nested loop join

2008-10-13 Thread Bramandia Ramadhana
Dear All,

I took a look at the source code for hash join this morning and I realized
that the block nested loop join is somewhat similar to that.

Thanks for the discussions.

Bramandia R.

On Fri, Oct 10, 2008 at 8:19 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Gregory Stark [EMAIL PROTECTED] writes:
  So the use case of a real block nested loop would be doing a cartesian
 join of
  two large tables where neither fits in RAM. That does seem like it might
 be
  kind of narrow given how large the output would be.

 Yeah.  If you have a hashable join condition then our existing batched
 hash join code should be roughly equivalent to this method.  So the use
 case is joining a couple of large tables with an un-hashable,
 un-indexable join condition (or none at all, ie cross product) and that
 just isn't something we hear people wanting to do a lot.  I can't really
 see why we'd bother maintaining extra code for block nested loop.

regards, tom lane



Re: [HACKERS] [PATCH] Extending pg_class info + more flexible TOAST chunk size

2008-10-13 Thread Heikki Linnakangas

Zdenek Kotala wrote:

Heikki Linnakangas napsal(a):
Not all chunks need to be the same size. We do currently require that, 
but AFAICS it's only because that allows random access to a given 
offset within a datum. That's of course nice, but I think we could 
live without it. 


Good point. I think it is good to keep this feature.


Yeah. At the moment, it's only used for substring(), I think.

Or try random access with the new toast size first, and if the chunks 
turn out to be different size, fall back to reading all chunks 
sequentially. 


I think it is not necessary to read it sequentially, only you need to 
recompute new position.


Yeah, true.

It occurs to me that instead of storing a chunk id, we could store a 
byte offset of the chunk. That would allow random access even if every 
chunk was of different size. You probably don't want any new changes you 
need to deal with in the upgrade, though :-).


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]

2008-10-13 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 No, I was thinking of something along the lines of:
 INFO:  clustering public.my_c
 INFO:  complete, was 33%, now 100% clustered
 The only such measure that we have is the correlation, which isn't very 
 good anyway, so I'm not sure if that's worthwhile.

It'd be possible to count the number of order reversals during the
indexscan, ie the number of tuples with CTID lower than the previous
one's.  But I'm not sure how useful that number really is.  Also it's
not clear how to preserve such functionality if cluster is
re-implemented with a sort.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Extending pg_class info + more flexible TOAST chunk size

2008-10-13 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):

Zdenek Kotala wrote:
The problem what I need to solve is how to handle different TOAST 
chunk size which becomes with upgrade. if you insert new record into 
TOAST table it will be created on the new page which has different max 
chunk size, because it depends on page header size. It means that one 
TOAST table will have more chunk size.
Use old value from previous version is possible but it could invoke 
waste of space in situation when pageheader in a new version is bigger.


Another solution is to take first chunk size and say - it is max chunk 
size.


Not all chunks need to be the same size. We do currently require that, 
but AFAICS it's only because that allows random access to a given offset 
within a datum. That's of course nice, but I think we could live without 
it. 


Good point. I think it is good to keep this feature.

Or try random access with the new toast size first, and if the 
chunks turn out to be different size, fall back to reading all chunks 
sequentially. 


I think it is not necessary to read it sequentially, only you need to recompute 
new position.


Zdenek


--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The Axe list

2008-10-13 Thread Markus Wanner
Hi,

Josh Berkus wrote:
 So it sounds like intagg is still in use/development.  But ... is it
 more of an example, or is it useful as a type/function in production?

We use it in production for quite remarkable speedups of operations on
int4[].

Having reviewed the last commit fest's intagg patch as well, I thought
we agreed that a more general functionality is wanted for core. But as
long as we don't have that, I'd like intagg to stay in contrib.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows

2008-10-13 Thread Merlin Moncure
On Wed, Oct 8, 2008 at 4:27 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Merlin Moncure [EMAIL PROTECTED] writes:
 On Tue, Oct 7, 2008 at 9:58 PM, Tom Lane [EMAIL PROTECTED] wrote:
 * Instead of the above, we could try to make
 ROW(some columns) = ANY (array variable)
 work.  This is shorter than the above syntax and would presumably have
 a lot less overhead too.  But it doesn't work right now, not even for
 named rowtypes much less anonymous ones.

 By extension, would this also mean things like
 select row(1,2,3)::foo = foo from foo;
 Would work (presumably as row-wise comparison does)?

 Well, it turned out to be easier than I thought to get the base case
 working --- all that's necessary is to define an array type for RECORD
 and add generic comparison functions, and the cases that are needed for
 recursive cycle detection Just Work!  See attached WIP patch, and
 particularly note the new test case in with.sql.

 The part that actually seems to need some thought is the relationship
 between this and operations on named rowtypes.  In the patch I tweaked
 parse_coerce.c to treat coercion from a named rowtype's array type to
 record[] as an allowed binary-compatible case, but I didn't do the
 other direction yet (I'm not fully convinced that it's necessary).

 What I'm really not clear about is the extent to which record[] ought
 to be treated like a polymorphic type --- should we consider that
 it acts like an anyrecordarray type, or is that a distinct notion?
 Do we even want that?  record itself is not considered a polymorphic
 type, though it has some similar qualities.

 Another point worth making is that I made the comparisons work like
 IS NOT DISTINCT, ie, they don't return NULL just because some field of
 the row is NULL.  This is *not* like SQL-spec row comparison, but we
 can't build a btree opclass from these functions if they insist on
 returning null for null fields.  (Our array comparisons work like this,
 too.)  I'm not sure how big a deal that is either way, but I am pretty
 sure that IS NOT DISTINCT is the semantics you need to have if you want
 cycle detection to work reliably.  (Hm, is that a bug in the spec?
 They say to use = rather than DISTINCT in cycle checking ...)

 Comments?  Do we want to go forward with this?

The record ops work as promised.  IMO this patch stands in its own
merits with or without the CTE changes.  I played around with it and
noticed a couple of oddities:

select foo = foo from foo;  --works
but

select distinct foo from foo; --yields
ERROR:  could not identify an equality operator for type foo
also,

select foo from foo order by foo;
ERROR:  could not identify an ordering operator for type foo

postgres=# create index foo_idx on foo((foo));
ERROR:  cache lookup failed for type 0

The above are probably not required to fulfill the CTE
requirements...but would be nice.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] out-of-date comment in auto-generated oidjoins.sql

2008-10-13 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 This file was moved lo, these many years

Done ... but why did your patch change oidjoins.out and not oidjoins.sql?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows

2008-10-13 Thread Tom Lane
Merlin Moncure [EMAIL PROTECTED] writes:
 The record ops work as promised.  IMO this patch stands in its own
 merits with or without the CTE changes.  I played around with it and
 noticed a couple of oddities:

 select foo = foo from foo;  --works
 but

 select distinct foo from foo; --yields
 ERROR:  could not identify an equality operator for type foo
 also,

 select foo from foo order by foo;
 ERROR:  could not identify an ordering operator for type foo

Yeah, these are because of the incomplete handling of named record
types.  I'm not sure how far we want to go in that direction.

 postgres=# create index foo_idx on foo((foo));
 ERROR:  cache lookup failed for type 0

Hm, that's not good ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Extending pg_class info + more flexible TOAST chunk size

2008-10-13 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):



It occurs to me that instead of storing a chunk id, we could store a 
byte offset of the chunk. That would allow random access even if every 
chunk was of different size. You probably don't want any new changes you 
need to deal with in the upgrade, though :-).


Yes, no change please :-)

Zdenek



--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade: convert on read is dead end

2008-10-13 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):

Zdenek Kotala wrote:

Heikki Linnakangas napsal(a):
Besides, the read all formats approach wouldn't really avoid it 
either. If you passed a toasted datum to a function, when the 
function needs to detoast it, detoast_datum still wouldn't know 
whether the datum is in old or new format. You'd still need to 
detoast all values in the tuple somewhere before they could be passed 
around.


If you look into pg_upgrade prototype patch I added page version 
information into HeapTupleData structure. It keeps information about 
format. All chunked data are stored on a pages with same page 
versions. I think these two things are enough to have all necessary 
information.


Functions are passed just a Datum, not HeapTupleData.



But executor works with HeapTupleData structure and it should convert it to the 
new format before it passes it to function. My idea is to convert tuple in 
ExecTupleStore or invent new node special for tuple conversion.


I expect that function cannot get toast pointer. It can get only detoasted 
attributes. Correct me if I'm wrong.


Zdenek

--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows

2008-10-13 Thread Tom Lane
I wrote:
 Merlin Moncure [EMAIL PROTECTED] writes:
 select foo from foo order by foo;
 ERROR:  could not identify an ordering operator for type foo

 Yeah, these are because of the incomplete handling of named record
 types.  I'm not sure how far we want to go in that direction.

On looking closer, all these cases fail because I forgot to teach
IsBinaryCoercible() that any composite type should be considered
binary-coercible to RECORD.  Which is clearly sensible.

I'm inclined to apply the patch with binary-coercibility adjustments
and not try to turn RECORD or RECORD[] into full-fledged polymorphic
types.  It's not immediately clear what the use of that would be
anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] out-of-date comment in auto-generated oidjoins.sql

2008-10-13 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 This file was moved lo, these many years

 Done ... but why did your patch change oidjoins.out and not oidjoins.sql?

I had other changes in my tree, I just diffed the files individually rather
than doing a cvs diff

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] out-of-date comment in auto-generated oidjoins.sql

2008-10-13 Thread Gregory Stark

This file was moved lo, these many years


diff --git a/src/tools/findoidjoins/make_oidjoins_check 
b/src/tools/findoidjoins/make_oidjoins_check
index 5061634..9c6b850 100755
--- a/src/tools/findoidjoins/make_oidjoins_check
+++ b/src/tools/findoidjoins/make_oidjoins_check
@@ -54,7 +54,7 @@ $AWK -F'[ \.]' '\
{
printf \
 --\n\
--- This is created by pgsql/contrib/findoidjoins/make_oidjoin_check\n\
+-- This is created by pgsql/src/tools/findoidjoins/make_oidjoins_check\n\
 --\n;
}
{
diff --git a/src/test/regress/expected/oidjoins.out 
b/src/test/regress/expected/oidjoins.out
index 3d5d3db..5b1c22b 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -1,5 +1,5 @@
 --
--- This is created by pgsql/contrib/findoidjoins/make_oidjoin_check
+-- This is created by pgsql/src/tools/findoidjoins/make_oidjoins_check
 --
 SELECT ctid, aggfnoid 
 FROM   pg_catalog.pg_aggregate fk 



-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]

2008-10-13 Thread Jim Cox
On Mon, Oct 13, 2008 at 8:30 AM, Tom Lane [EMAIL PROTECTED] wrote:

 Heikki Linnakangas [EMAIL PROTECTED] writes:
  No, I was thinking of something along the lines of:
  INFO:  clustering public.my_c
  INFO:  complete, was 33%, now 100% clustered
  The only such measure that we have is the correlation, which isn't very
  good anyway, so I'm not sure if that's worthwhile.

 It'd be possible to count the number of order reversals during the
 indexscan, ie the number of tuples with CTID lower than the previous
 one's.  But I'm not sure how useful that number really is.  Also it's
 not clear how to preserve such functionality if cluster is
 re-implemented with a sort.

regards, tom lane


Another version of the patch should be attached, this time counting the
number of inversions (pairs of tuples in the table that are in the wrong
order) as a measure of the sortedness of the original data (scanned/live
numbers still reported as an indication of the extent to which the table was
vacuumed).

N.B. -- I'm not familiar enough with Postgres internals to know if the
included inversion_count() method is a valid way to identify inversions.

In any case, example VERBOSE output:

postgres=# CLUSTER public.my_c VERBOSE ;
INFO:  clustering public.my_c
INFO:  complete, 15 tuples scanned, 10 tuples now live, 2 inversions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
CLUSTER
Index: src/backend/commands/cluster.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/commands/cluster.c,v
retrieving revision 1.177
diff -c -r1.177 cluster.c
*** src/backend/commands/cluster.c	19 Jun 2008 00:46:04 -	1.177
--- src/backend/commands/cluster.c	13 Oct 2008 14:25:09 -
***
*** 46,51 
--- 46,52 
  #include utils/snapmgr.h
  #include utils/syscache.h
  #include utils/tqual.h
+ #include utils/pg_rusage.h
  
  
  /*
***
*** 59,70 
  	Oid			indexOid;
  } RelToCluster;
  
! 
! static void cluster_rel(RelToCluster *rv, bool recheck);
! static void rebuild_relation(Relation OldHeap, Oid indexOid);
! static TransactionId copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex);
  static List *get_tables_to_cluster(MemoryContext cluster_context);
! 
  
  
  /*---
--- 60,80 
  	Oid			indexOid;
  } RelToCluster;
  
! /*
!  * This struct is used to collect stats for VERBOSE output.
!  */
! typedef struct
! {
! 	double	tuples_scanned ;
! 	double	tuples_copied ;
! 	double	inversions ;
! } ClusterStatistics;
! 
! static void cluster_rel(RelToCluster *rv, bool recheck, int elevel);
! static void rebuild_relation(Relation OldHeap, Oid indexOid, ClusterStatistics *cs);
! static TransactionId copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, ClusterStatistics *cs);
  static List *get_tables_to_cluster(MemoryContext cluster_context);
! static int inversion_count(BlockIdData *bid_old, OffsetNumber off_old, BlockIdData *bid, OffsetNumber off) ;
  
  
  /*---
***
*** 176,182 
  		heap_close(rel, NoLock);
  
  		/* Do the job */
! 		cluster_rel(rvtc, false);
  	}
  	else
  	{
--- 186,192 
  		heap_close(rel, NoLock);
  
  		/* Do the job */
! 		cluster_rel(rvtc, false, stmt-verbose ? INFO : DEBUG2);
  	}
  	else
  	{
***
*** 225,231 
  			StartTransactionCommand();
  			/* functions in indexes may want a snapshot set */
  			PushActiveSnapshot(GetTransactionSnapshot());
! 			cluster_rel(rvtc, true);
  			PopActiveSnapshot();
  			CommitTransactionCommand();
  		}
--- 235,241 
  			StartTransactionCommand();
  			/* functions in indexes may want a snapshot set */
  			PushActiveSnapshot(GetTransactionSnapshot());
! 			cluster_rel(rvtc, true, stmt-verbose ? INFO : DEBUG2);
  			PopActiveSnapshot();
  			CommitTransactionCommand();
  		}
***
*** 253,261 
   * them incrementally while we load the table.
   */
  static void
! cluster_rel(RelToCluster *rvtc, bool recheck)
  {
  	Relation	OldHeap;
  
  	/* Check for user-requested abort. */
  	CHECK_FOR_INTERRUPTS();
--- 263,273 
   * them incrementally while we load the table.
   */
  static void
! cluster_rel(RelToCluster *rvtc, bool recheck, int elevel)
  {
  	Relation	OldHeap;
+ 	PGRUsage	ru0 ;
+ 	ClusterStatistics	cs ;
  
  	/* Check for user-requested abort. */
  	CHECK_FOR_INTERRUPTS();
***
*** 343,349 
  	check_index_is_clusterable(OldHeap, rvtc-indexOid, recheck);
  
  	/* rebuild_relation does all the dirty work */
! 	rebuild_relation(OldHeap, rvtc-indexOid);
  
  	/* NB: rebuild_relation does heap_close() on OldHeap */
  }
--- 355,373 
  	check_index_is_clusterable(OldHeap, rvtc-indexOid, recheck);
  
  	/* rebuild_relation does all the dirty work */
! 	ereport(elevel,
! 			(errmsg(clustering \%s.%s\,
! 	

Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

2008-10-13 Thread Decibel!

On Oct 10, 2008, at 3:40 PM, Robert Haas wrote:

I dislike all own creatures - because nobody will understand so do
some wrong thing - using non standard formats is bad thing. So  
it's is

necessary, then who need it then he found it on pgfoundry. But why
smudge core?


I'm opposed to smudging core, but I'm in favor of this patch.  :-)

Of course, I'm biased, because I wrote it.  But I think that providing
input and output functions that make it easy to read and write common
formats, even if they happen to be non-standard, is useful.



I tend to agree, but I have a hard time swallowing that when it means  
a 2-3% performance penalty for those that aren't using that  
functionality. I could perhaps see adding a function that accepted  
common UUID formats and spit out the standard.


If you could get rid of the performance hit this might be more  
interesting. Perhaps default to assuming a good format and only fail  
back to something else if that doesn't work?

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Building Postgres in Eclipse

2008-10-13 Thread Chris Mayfield

I have added a link at the bottom of the page.
All the info on how to use Eclipse with Postgres can be found at 
http://wiki.postgresql.org/wiki/Working_with_Eclipse


All suggestions and comments are welcome of course.
Emmanuel


Looks great, thanks!

--Chris

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]

2008-10-13 Thread Heikki Linnakangas

Jim Cox wrote:

On Mon, Oct 13, 2008 at 8:30 AM, Tom Lane [EMAIL PROTECTED] wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:

No, I was thinking of something along the lines of:
INFO:  clustering public.my_c
INFO:  complete, was 33%, now 100% clustered
The only such measure that we have is the correlation, which isn't very
good anyway, so I'm not sure if that's worthwhile.

It'd be possible to count the number of order reversals during the
indexscan, ie the number of tuples with CTID lower than the previous
one's.  But I'm not sure how useful that number really is.  


It will look bad for patterns like:
2
1
4
3
6
5
..

which for all practical purposes is just as good as a perfectly sorted 
table. So no, I don't think that's a very useful metric either without 
somehow taking caching effects into account.



Another version of the patch should be attached, this time counting the
number of inversions (pairs of tuples in the table that are in the wrong
order) as a measure of the sortedness of the original data (scanned/live
numbers still reported as an indication of the extent to which the table was
vacuumed).


Until we have a better metric for sortedness, my earlier suggestion to 
print it was probably a bad idea. If anything, should probably print the 
same correlation metric that ANALYZE calculates, so that it would at 
least match what the planner uses for decision-making.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade: convert on read is dead end

2008-10-13 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):

Zdenek Kotala wrote:
PostgreSQL stores all table's external data in one TOAST table and 
stored data does not contains any clue about datatype. When postgreSQL 
reads TOSTEed value then there is not possible detect what type it is 
and perform varlena conversion on composite datatypes or arrays.


It could be converted in detoast_datum function but it means that 
datum have to be retoasted and store back on new pages. The old page 
MUST keep in old format because any page conversion lost information 
about version and different datatypes can be store on one page PosgreSQL.


Hmm. There's two things here:
1. The format of the toast page, and the toast chunk tuples.
2. The format of the toasted tuple, stored within the chunk tuples.

The format of the toast page and the varlena headers of the byteas in 
the toast tuples could be converted when the page is read in, as usual. 


Convert page and chunk varlena header is not problem. It works. You can get into 
trouble when new max chunk size is smaller, but it is not current case.


The format of the toasted tuple within the chunks is indeed trickier. 
Retoasting all tuples on the master heap page when the page is read in 
is probably simplest, as you say. But they don't necessarily need to be 
stored on new pages, any toast pages will do, because we keep track of 
1. separately.


You can store it on any page in the new format with enough free space, but in 
reality it will be mostly new page. You cannot convert old page, because it 
could contain chunk from different tuple. You could check it and converted it 
only if all chunk are related to one datum, but in general you need to have 
mechanism how solve problem with multi datum chunks.



By my opinion, this issue completely kill convert on read solution and 
only read all formats... solution is right one.


It is quite heavy-weight, I agree, but doesn't completely kill the 
idea in my opinion.


Besides, the read all formats approach wouldn't really avoid it 
either. If you passed a toasted datum to a function, when the function 
needs to detoast it, detoast_datum still wouldn't know whether the datum 
is in old or new format. You'd still need to detoast all values in the 
tuple somewhere before they could be passed around.


If you look into pg_upgrade prototype patch I added page version information 
into HeapTupleData structure. It keeps information about format. All chunked 
data are stored on a pages with same page versions. I think these two things are 
enough to have all necessary information.


Zdenek


--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER, REINDEX, VACUUM in read only transaction?

2008-10-13 Thread Simon Riggs

On Fri, 2008-10-10 at 09:41 -0400, Tom Lane wrote:

 So I was looking for other omissions in utility.c, and I noticed that
 check_xact_readonly() doesn't reject CLUSTER, REINDEX, or VACUUM.
 Now the notion of read only that we're trying to enforce is pretty
 weak (I think it's effectively no writes to non-temp tables).
 But I can't see that CLUSTER is a read-only operation even under the
 weakest definitions, and I'm not seeing the rationale for REINDEX or
 VACUUM here either.

I think you should add a few more to the list.

* LISTEN, UNLISTEN, NOTIFY
* nextval()
* ANALYZE
* CHECKPOINT
* GRANT, REVOKE, REASSIGN
* DISCARD ALL should not run UnlistenAll

These are all gaps I will have to plug for Hot Standby, and in fact
wrote code this morning for some of those before I saw this post.

(Yes, I went through every command).

If you're gonna do it, do it soon please, so I know not to bother any
further.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade: convert on read is dead end

2008-10-13 Thread Heikki Linnakangas

Zdenek Kotala wrote:
I expect that function cannot get toast pointer. It can get only 
detoasted attributes. Correct me if I'm wrong.


A function is passed a raw datum, which can be toasted. Now, practically 
all functions call one of the PG_GETARG_* macros that detoast the datum 
before doing anything else, but there is also PG_GETARG_RAW_VARLENA_P 
macro that returns the toast pointer. Although, grepping through the 
source code suggests that it's not actually used anywhere.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade: convert on read is dead end

2008-10-13 Thread Heikki Linnakangas

Zdenek Kotala wrote:

Heikki Linnakangas napsal(a):
Besides, the read all formats approach wouldn't really avoid it 
either. If you passed a toasted datum to a function, when the function 
needs to detoast it, detoast_datum still wouldn't know whether the 
datum is in old or new format. You'd still need to detoast all values 
in the tuple somewhere before they could be passed around.


If you look into pg_upgrade prototype patch I added page version 
information into HeapTupleData structure. It keeps information about 
format. All chunked data are stored on a pages with same page versions. 
I think these two things are enough to have all necessary information.


Functions are passed just a Datum, not HeapTupleData.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Convert check constraints into One-Time_Filter on prepared statements

2008-10-13 Thread ITAGAKI Takahiro
Hello,

Constraint exclusion (CE) is very useful for partitioned tables,
but prepared statements interfere with CE if the parameter
contains partition keys, because CE is a planner-time optimization
but the actual parameters are given at executor-time.

I have an idea to use constraint exclusion and prepared statements
together -- converting check constraints into One-Time Filter.

For example, when we have test table partitioned by test_{year}:

  CREATE TABLE test PARTITIONED BY
PARTITION test_2008 CHECK('2008-01-01' = t AND t  '2009-01-01')
PARTITION test_2009 CHECK('2009-01-01' = t AND t  '2010-01-01')
PARTITION test_2010 CHECK('2010-01-01' = t AND t  '2011-01-01')

and prepare a statement that have a partitioned key in the parameter:

  PREPARE p(timestamp) AS
SELECT * FROM test WHERE $1 = t AND t  $1 + '1 mon';

Then planner converts check constraints into One-Time Filter.
Plan will be the following:

 EXPLAIN EXECUTE p('2008-07-01');
-
  Append
-  Result
  One-Time Filter: (('2008-01-01' = $1) AND ($1  '2009-01-01'))
-  Index Scan on test_2008_t_key
  Index Cond: (($1 = t) AND (t  ($1 + '1 mon')))
-  Result
  One-Time Filter: (('2009-01-01' = $1) AND ($1  '2010-01-01'))
-  Index Scan on test_2009_t_key
  Index Cond: (($1 = t) AND (t  ($1 + '1 mon')))
-  Result
  One-Time Filter: (('2010-01-01' = $1) AND ($1  '2011-01-01'))
-  Index Scan on test_2010_t_key
  Index Cond: (($1 = t) AND (t  ($1 + '1 mon')))

We can avoid internal scans when One-Time Filter returns false for each
partition. So we can reuse the plan and receive benefit from CE.

Is this concept ok and worth trying?
If it is reasonable, I'll try it. Comments welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]

2008-10-13 Thread Heikki Linnakangas

Jim Cox wrote:

On Fri, Oct 10, 2008 at 10:23 AM, Heikki Linnakangas 
[EMAIL PROTECTED] wrote:


Kevin Grittner wrote:


Jim Cox [EMAIL PROTECTED] wrote:

if present an INFO message is generated which displays

the schema.tblname just before actual clustering is kicked off (see


example


below).




postgres=# CLUSTER VERBOSE ;
INFO:  clustering public.my_b
INFO:  clustering public.my_c
INFO:  clustering public.my_a
CLUSTER


 Would it make sense to display the pg_total_relation_size before and
after?


Assuming you run CLUSTER as a replacement for VACUUM FULL, yes. More
interesting would be a metric of clusteredness, I think.



Something more like the following?

postgres=# CLUSTER VERBOSE ;
INFO:  clustering public.my_b
INFO:  complete, 0 rows scanned, 0 rows now live
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  clustering public.my_c
INFO:  complete, 20 rows scanned, 10 rows now live
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  clustering public.my_a
INFO:  complete, 10 rows scanned, 10 rows now live
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
CLUSTER


No, I was thinking of something along the lines of:
INFO:  clustering public.my_c
INFO:  complete, was 33%, now 100% clustered

The only such measure that we have is the correlation, which isn't very 
good anyway, so I'm not sure if that's worthwhile.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A small note about the difficulty of managing subprojects

2008-10-13 Thread David E. Wheeler

On Oct 12, 2008, at 20:15, Joshua D. Drake wrote:

Yeah I would agree with that. I find that only real issue with  
PgFoundry is the PgFoundry (aka Gforge) not the quality of the  
projects being hosted.


The other thing that could use some love is searching for projects.  
Google doesn't rank pgFoundry stuff very highly, and Gforge's search  
functionality leaves something to be desired. As a lover of CPAN, I  
have to say that I don't use CPAN itself all that much; rather, I use  
search.cpan.org, which makes it dead easy to search for modules that  
have functionality I'm looking for.


But improving search should come after fixing/upgrading Gforge, IMHO.

Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Year 2038 Bug?

2008-10-13 Thread David E. Wheeler

Howdy,

Not sure if PostgreSQL uses time_t, but if so, this project provides  
useful code (a drop-in replacement for time.h) to address the 2038 bug  
on 32-bit platforms.


  http://code.google.com/p/y2038/

Useful for PostgreSQL?

Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Year 2038 Bug?

2008-10-13 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes:
 Not sure if PostgreSQL uses time_t,

We got rid of that some time ago.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Year 2038 Bug?

2008-10-13 Thread David E. Wheeler

On Oct 13, 2008, at 11:01, Tom Lane wrote:


David E. Wheeler [EMAIL PROTECTED] writes:

Not sure if PostgreSQL uses time_t,


We got rid of that some time ago.


Probably no problem, then. Do dates in PostgreSQL work for their  
entire documented ranges on 32bit processors?


Thanks,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Year 2038 Bug?

2008-10-13 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes:
 Probably no problem, then. Do dates in PostgreSQL work for their  
 entire documented ranges on 32bit processors?

As long as the C compiler supports int64 ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Year 2038 Bug?

2008-10-13 Thread David E. Wheeler

On Oct 13, 2008, at 11:13, Tom Lane wrote:


David E. Wheeler [EMAIL PROTECTED] writes:

Probably no problem, then. Do dates in PostgreSQL work for their
entire documented ranges on 32bit processors?


As long as the C compiler supports int64 ...


I was afraid you'd say that. See:

  http://code.google.com/p/y2038/wiki/WhyBother

Especially the 64 bit CPU doesn't mean 2038 clean section. Again,  
maybe this doesn't apply to PostgreSQL; I'm just doing a bit of  
diligence. :-)


Cheers,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Year 2038 Bug?

2008-10-13 Thread Zdenek Kotala

David E. Wheeler napsal(a):

On Oct 13, 2008, at 11:13, Tom Lane wrote:


David E. Wheeler [EMAIL PROTECTED] writes:

Probably no problem, then. Do dates in PostgreSQL work for their
entire documented ranges on 32bit processors?


As long as the C compiler supports int64 ...


I was afraid you'd say that. See:

  http://code.google.com/p/y2038/wiki/WhyBother

Especially the 64 bit CPU doesn't mean 2038 clean section. Again, 
maybe this doesn't apply to PostgreSQL; I'm just doing a bit of 
diligence. :-)


PostgreSQL 8.4 uses 64bit data type for time. But if you use system timezone 
then you can get in trouble if system does not support 64bit zic files.


Zdenek


--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Year 2038 Bug?

2008-10-13 Thread Andrew Chernow

David E. Wheeler wrote:

On Oct 13, 2008, at 11:13, Tom Lane wrote:


David E. Wheeler [EMAIL PROTECTED] writes:

Probably no problem, then. Do dates in PostgreSQL work for their
entire documented ranges on 32bit processors?


As long as the C compiler supports int64 ...


I was afraid you'd say that. See:

  http://code.google.com/p/y2038/wiki/WhyBother

Especially the 64 bit CPU doesn't mean 2038 clean section. Again, 
maybe this doesn't apply to PostgreSQL; I'm just doing a bit of 
diligence. :-)


Cheers,

David



PostgreSQL doesn't use the standard time_t and time functions for its 
timestamp types.  Therefore, any limitations in regards to 64-bit time_t 
values on 32-bit platforms don't apply; other than the limitation Tom 
spoke of ... no 64-bit int.


--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Year 2038 Bug?

2008-10-13 Thread David E. Wheeler

On Oct 13, 2008, at 11:22, Zdenek Kotala wrote:

PostgreSQL 8.4 uses 64bit data type for time. But if you use system  
timezone then you can get in trouble if system does not support  
64bit zic files.


I've never noticed a problem with the TZinfo database that ships on  
systems I've used. How would I know that there was a problem? What  
sort of trouble could I get into?


Thanks,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Year 2038 Bug?

2008-10-13 Thread David E. Wheeler

On Oct 13, 2008, at 11:24, Andrew Chernow wrote:

PostgreSQL doesn't use the standard time_t and time functions for  
its timestamp types.  Therefore, any limitations in regards to 64- 
bit time_t values on 32-bit platforms don't apply; other than the  
limitation Tom spoke of ... no 64-bit int.


Gotcha, thanks for the clarification.

Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Year 2038 Bug?

2008-10-13 Thread Zdenek Kotala

David E. Wheeler napsal(a):

On Oct 13, 2008, at 11:22, Zdenek Kotala wrote:

PostgreSQL 8.4 uses 64bit data type for time. But if you use system 
timezone then you can get in trouble if system does not support 64bit 
zic files.


I've never noticed a problem with the TZinfo database that ships on 
systems I've used. How would I know that there was a problem? What sort 
of trouble could I get into?


when you use --with-system-tzdata and run make check on head it fails on 
systemes without 64bit tzinfo support.


Zdenek



--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] There's some sort of race condition with the new FSM stuff

2008-10-13 Thread Tom Lane
Two different buildfarm machines are currently showing the same failure:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=kududt=2008-10-13%2015:30:00
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=dragonflydt=2008-10-13%2016:30:01

The postmaster log in each case shows

ERROR:  could not fsync segment 0 of relation 1663/16384/29270/1: No such file 
or directory
ERROR:  checkpoint request failed
HINT:  Consult recent messages in the server log for details.
STATEMENT:  DROP TABLESPACE testspace;
ERROR:  could not fsync segment 0 of relation 1663/16384/29270/1: No such file 
or directory
ERROR:  checkpoint request failed
HINT:  Consult recent messages in the server log for details.
STATEMENT:  DROP TABLESPACE testspace;

which looks like an fsync request has been allowed to hang around too
long, ie, after its file is already gone.  Since /1 indicates the FSM
fork, I suppose this has been introduced by the new FSM code.  I haven't
dug any more deeply than that though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Year 2038 Bug?

2008-10-13 Thread David E. Wheeler

On Oct 13, 2008, at 11:37, Zdenek Kotala wrote:

when you use --with-system-tzdata and run make check on head it  
fails on systemes without 64bit tzinfo support.


Oh. Is it not preferable to use the tzdata that ships with PostgreSQL?

Thanks,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Year 2038 Bug?

2008-10-13 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes:
 On Oct 13, 2008, at 11:37, Zdenek Kotala wrote:
 when you use --with-system-tzdata and run make check on head it  
 fails on systemes without 64bit tzinfo support.

 Oh. Is it not preferable to use the tzdata that ships with PostgreSQL?

Not necessarily; the system might have a more up-to-date tzdata.

Generally you'd use --with-system-tzdata on a platform where you expect
to receive routine package updates for the tzdata files, independently
of the Postgres release cycle.  It seems reasonable to assume that
anyone currently shipping tzdata is offering 64-bit files.  (But we do
have that regression test check in there to make sure.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Year 2038 Bug?

2008-10-13 Thread David E. Wheeler

On Oct 13, 2008, at 11:53, Tom Lane wrote:

Oh. Is it not preferable to use the tzdata that ships with  
PostgreSQL?


Not necessarily; the system might have a more up-to-date tzdata.


Gotcha.

Generally you'd use --with-system-tzdata on a platform where you  
expect

to receive routine package updates for the tzdata files, independently
of the Postgres release cycle.  It seems reasonable to assume that
anyone currently shipping tzdata is offering 64-bit files.  (But we do
have that regression test check in there to make sure.)


Understood. Thanks for the explanation (and the regression test!).

Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]

2008-10-13 Thread Jeff Davis
On Mon, 2008-10-13 at 08:30 -0400, Tom Lane wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
  No, I was thinking of something along the lines of:
  INFO:  clustering public.my_c
  INFO:  complete, was 33%, now 100% clustered
  The only such measure that we have is the correlation, which isn't very 
  good anyway, so I'm not sure if that's worthwhile.
 
 It'd be possible to count the number of order reversals during the
 indexscan, ie the number of tuples with CTID lower than the previous
 one's.  But I'm not sure how useful that number really is.  Also it's
 not clear how to preserve such functionality if cluster is
 re-implemented with a sort.
 

I assume here you mean a CTID with a lower page number, as the line
pointer wouldn't make any difference, right?

I think it would be a useful metric to decide whether or not to use an
index scan (I don't know how easy it is to estimate this from a sample,
but a CLUSTER could clearly get an exact number). It would solve the
problem where synchronized scans used by pg_dump could result in poor
correlation on restore and therefore not choose index scans (which is
what prompted turning off sync scans for pg_dump).

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Year 2038 Bug?

2008-10-13 Thread Zdenek Kotala

Tom Lane napsal(a):


Generally you'd use --with-system-tzdata on a platform where you expect
to receive routine package updates for the tzdata files, independently
of the Postgres release cycle.  It seems reasonable to assume that
anyone currently shipping tzdata is offering 64-bit files.  (But we do
have that regression test check in there to make sure.)


Unfortunately, you are not correct here :( see:

http://bugs.opensolaris.org/view_bug.do?bug_id=4246033

Zdenek

--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows

2008-10-13 Thread Merlin Moncure
On Mon, Oct 13, 2008 at 9:56 AM, Tom Lane [EMAIL PROTECTED] wrote:
 I wrote:
 Merlin Moncure [EMAIL PROTECTED] writes:
 select foo from foo order by foo;
 ERROR:  could not identify an ordering operator for type foo

 Yeah, these are because of the incomplete handling of named record
 types.  I'm not sure how far we want to go in that direction.

 On looking closer, all these cases fail because I forgot to teach
 IsBinaryCoercible() that any composite type should be considered
 binary-coercible to RECORD.  Which is clearly sensible.

 I'm inclined to apply the patch with binary-coercibility adjustments
 and not try to turn RECORD or RECORD[] into full-fledged polymorphic
 types.  It's not immediately clear what the use of that would be
 anyway.


...meaning, that you would not be able to create a function taking
generic 'record' as a parameter?  In that case I agree...any chance of
getting an updated patch?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Year 2038 Bug?

2008-10-13 Thread David E. Wheeler

On Oct 13, 2008, at 12:35, Zdenek Kotala wrote:


Tom Lane napsal(a):

Generally you'd use --with-system-tzdata on a platform where you  
expect
to receive routine package updates for the tzdata files,  
independently

of the Postgres release cycle.  It seems reasonable to assume that
anyone currently shipping tzdata is offering 64-bit files.  (But we  
do

have that regression test check in there to make sure.)


Unfortunately, you are not correct here :( see:

http://bugs.opensolaris.org/view_bug.do?bug_id=4246033


So ideally all OS venders would ship 64-bit tzdata files, eh?

Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Well its official, replicator is BSD

2008-10-13 Thread Joshua Drake
Hello,

We finally got around to releasing Replicator as FOSS. It is BSD
licensed and available here:

https://projects.commandprompt.com/public/replicator/wiki

(Yes it is a self signed cert, its on the list to fix).

Enjoy folks!

Sincerely,

Joshua D. Drake


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]

2008-10-13 Thread Gregory Stark

Heikki Linnakangas [EMAIL PROTECTED] writes:

 Jim Cox wrote:
 On Mon, Oct 13, 2008 at 8:30 AM, Tom Lane [EMAIL PROTECTED] wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:

 It'd be possible to count the number of order reversals during the
 indexscan, ie the number of tuples with CTID lower than the previous
 one's.  But I'm not sure how useful that number really is.  

Incidentally it finally occurred to me that sortedness is actually a pretty
good term to search on. I found several papers for estimating metrics of
sortedness from samples even. Though the best looks like it requires a sample
of size O(sqrt(n)) which is more than we currently take.

The two metrics which seem popular is either the length of the longest
subsequence which is sorted or the number of sorted subsequences. I think the
latter is equivalent to counting the inversions.

I didn't find any papers which claimed to present good ways to draw
conclusions based on these metrics but I only did a quick search. I imagine if
everyone is looking for ways to estimate them they they must be useful for
something...

For some reason my access to the ACM digital library stopped working. Does
anyone else have access?


 It will look bad for patterns like:
 2
 1
 4
 3
 6
 5
 ..

Hm, you could include some measure of how far the inversion goes -- but I
think that's counter-productive. Sure some of them will be cached but others
won't and that'll be equally bad regardless of how far back it goes.

 Until we have a better metric for sortedness, my earlier suggestion to print
 it was probably a bad idea. If anything, should probably print the same
 correlation metric that ANALYZE calculates, so that it would at least match
 what the planner uses for decision-making.

I agree with that. I like the idea of printing a message though -- we should
just have it print the correlation for now and when we improve the stats we'll
print the new metric.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] There's some sort of race condition with the new FSM stuff

2008-10-13 Thread Tom Lane
I wrote:
 Two different buildfarm machines are currently showing the same failure:
 ERROR:  could not fsync segment 0 of relation 1663/16384/29270/1: No such 
 file or directory
 ERROR:  checkpoint request failed

Some tests show that when the serial regression tests are run in a
freshly initdb'd installation, HEAD assigns OID 29270 to bmscantest
in the bitmapops test.  So that's been dropped some time before the
failure occurs; which means that this isn't a narrow-window race
condition; which raises the question of why we're not seeing it on more
machines.  I notice now that kudu and dragonfly are actually the same
machine ... could this be an OS-specific problem?  Kris, has there been
any system-software change on that machine recently?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]

2008-10-13 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
 Until we have a better metric for sortedness, my earlier suggestion to 
 print
 it was probably a bad idea. If anything, should probably print the same
 correlation metric that ANALYZE calculates, so that it would at least match
 what the planner uses for decision-making.

 I agree with that. I like the idea of printing a message though -- we should
 just have it print the correlation for now and when we improve the stats we'll
 print the new metric.

Short of actually running an ANALYZE, I'm not seeing a good way to
derive the same number it derives.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]

2008-10-13 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 I agree with that. I like the idea of printing a message though -- we should
 just have it print the correlation for now and when we improve the stats 
 we'll
 print the new metric.

 Short of actually running an ANALYZE, I'm not seeing a good way to
 derive the same number it derives.

Well we could print the _old_ value at least. So if you run cluster
periodically you can see whether you're running it often enough.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]

2008-10-13 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 Short of actually running an ANALYZE, I'm not seeing a good way to
 derive the same number it derives.

 Well we could print the _old_ value at least.

+1 ... seems an appropriate amount of effort for the likely value.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]

2008-10-13 Thread Kevin Grittner
 Tom Lane [EMAIL PROTECTED] wrote: 
 Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 Short of actually running an ANALYZE, I'm not seeing a good way to
 derive the same number it derives.
 
 Well we could print the _old_ value at least.
 
 +1 ... seems an appropriate amount of effort for the likely value.
 
That seems fine for sortedness, but am I the only one who would like
the verbose mode to show the bloat reduction?  Essentially, an INFO
line to show the same information you could get by bracketing the
CLUSTER with a couple SELECTs:
 
ccdev=# select pg_total_relation_size('DbTranImageStatus');
 pg_total_relation_size

 253952
(1 row)

ccdev=# cluster DbTranImageStatus;
CLUSTER
ccdev=# select pg_total_relation_size('DbTranImageStatus');
 pg_total_relation_size

  32768
(1 row)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] There's some sort of race condition with the new FSM stuff

2008-10-13 Thread Kris Jurka



On Mon, 13 Oct 2008, Tom Lane wrote:

I notice now that kudu and dragonfly are actually the same machine ... 
could this be an OS-specific problem?  Kris, has there been any 
system-software change on that machine recently?


This is a VM that I haven't touched in some time.  It was turned off after 
a host kernel upgrade no longer allowed vmware to work on it.  I recently 
turned it back on after switching from vmware workstation 5.5 to vmware 
server 2.0.  On the VM itself the only change I've made was to switch the 
timezone from Mountain to Pacific.


http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=dragonflybr=HEAD

Kris Jurka

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]

2008-10-13 Thread Joshua Drake
On Mon, 13 Oct 2008 15:34:04 -0500
Kevin Grittner [EMAIL PROTECTED] wrote:

 ccdev=# select pg_total_relation_size('DbTranImageStatus');
  pg_total_relation_size
 
  253952
 (1 row)
 
 ccdev=# cluster DbTranImageStatus;
 CLUSTER
 ccdev=# select pg_total_relation_size('DbTranImageStatus');
  pg_total_relation_size
 
   32768
 (1 row)
  
 -Kevin

Although I think it is an interesting bit of information, I find that
if I am going to be clustering, I have already done the above.


Sincerely,

Joshua D. Drake





-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How is random_page_cost=4 ok?

2008-10-13 Thread Michael Renner

Greg Smith wrote:

The drives themselves, and possibly the OS and disk controller, are all 
running read-ahead algorithms to accelerate this case.  In fact, this 
*exact* case for the Linux read-ahead stuff that just went mainline 
recently: http://kerneltrap.org/node/6642


Apparently only the simple stuff hit mainline, see [1] and [2], not 
knowing how this turns out for pg-style loads, especially compared to 
the full-fledged patch.


Readahead is probably too much of a beast that no one dares to touch 
with a 3-foot-pole, unless given a large team with good standing in the 
kernel community and concerted regression testing in whatever 
environment Linux is used these days...



michael


[1] http://lwn.net/Articles/235164/
[2] 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=history;f=mm/readahead.c


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.3 .4 + Vista + MingW + initdb = ACCESS_DENIED

2008-10-13 Thread Andrew Dunstan



Magnus Hagander wrote:

Second, once I've successfully built and installed postgres, I run into
a bigger problem.  When using initdb, I get this error:

creating template1 database in c:/Data/postgres30/base/1 ... FATAL:
could not create shared memory segment: 5
DETAIL:  Failed system call was CreateFileMapping(size=1802240,
name=Global\PostgreSQL:c:/Data/postgres).

A bit of googling and reading MSDN docs shows that applications that
don't run in Session 0 on Vista are not allowed to create shared memory
in the Global namespace.  Since initdb is invoked from the command line,
it runs in Session 1.



Where did you find that information? I've been specifically looking for
it, but my searches didn't turn up anything conclusive.

The latest versions contain a fix for the global namespace code. Dave
noticed that this caused issues on vista and thus manually reverted the
patch in the official binary installer. But since we haven't (hadn't)
yet found documentation as to *why* it was failing, the patch has not
yet been reverted in the main source tree.

This is why it's working, probably, and it's not related to how it's built.

If you want to revert the patch in your local tree, this is the one:
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/port/win32_shmem.c?r1=1.4r2=1.5

Specifically, you can reintroduce the old bug (that I think is hat made
it work on Vista) by removing the +18 in the lowest loop there.


  


I am getting this error on XP Pro. The buildfarm members run happily 
from the scheduler, but when run by hand from the command line they 
fail. This is true of both MinGW and MSVC.


This is in *URGENT* need of a fix.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows

2008-10-13 Thread Tom Lane
Merlin Moncure [EMAIL PROTECTED] writes:
 On Mon, Oct 13, 2008 at 9:56 AM, Tom Lane [EMAIL PROTECTED] wrote:
 I'm inclined to apply the patch with binary-coercibility adjustments
 and not try to turn RECORD or RECORD[] into full-fledged polymorphic
 types.  It's not immediately clear what the use of that would be
 anyway.

 ...meaning, that you would not be able to create a function taking
 generic 'record' as a parameter?

Well, you've never been able to do that, although for many of the PLs
there doesn't seem to be any very fundamental reason why not.  But
I was actually wondering about something beyond that: should we have the
equivalent of the polymorphic-type behaviors for composites?  That would
mean rules along the line of all records mentioned in the call and
result are the same composite type and record[] means the array type
corresponding to whichever type record is.

We don't seem to need these things in order to solve the recursion cycle
detection problem, so I'm not very excited about pursuing the line of
thought any further right now.

 In that case I agree...any chance of
 getting an updated patch?

See CVS HEAD ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Window Functions patch v06

2008-10-13 Thread David Rowley
Hitoshi Harada wrote:
I made up my mind to scratch former window functions and redesigned
completely new execution model, based on the discussion with Heikki.
Attached is the v06 against HEAD today.
http://umitanuki.net/pgsql/wfv06/design.html

First off, fantastic work!

In my eyes this and WITH RECURSIVE are a big step for both Postgres and open
source RBDMS'.

Only, one small query with LEAD() and LAG()

Going by http://www.wiscorp.com/sql200n.zip 

The lead and lag functions each take three arguments, a value expression
VE, an exact numeric literal
OFFSET, and a value expression DEFAULT. For each row R within the window
partition P of R defined by
a window structure descriptor, the lag function returns the value of VE
evaluated on a row that is OFFSET
number of rows before R within P, and the lead function returns the value of
VE evaluated on a row that is
OFFSET number of rows after R within P. The value of DEFAULT is returned as
the result if there is no row
corresponding to the OFFSET number of rows before R within P (for the lag
function) or after R within P (for
the lead function). In addition, RESPECT NULLS or IGNORE NULLS can be
specified to indicate whether
the rows within P for which VE evaluates to the null value are preserved or
eliminated

So going by that:
SELECT name,LAG(name,1,'None') OVER (ORDER BY employeeid) FROM employee;

Would use 'None' for rows that would be out of the bounds of the window.
 
The current patch only seems to accept 2 arguments.
ERROR:  function lag(character varying, integer, unknown) does not exist



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows

2008-10-13 Thread Alvaro Herrera
Tom Lane escribió:
 Merlin Moncure [EMAIL PROTECTED] writes:
  On Mon, Oct 13, 2008 at 9:56 AM, Tom Lane [EMAIL PROTECTED] wrote:
  I'm inclined to apply the patch with binary-coercibility adjustments
  and not try to turn RECORD or RECORD[] into full-fledged polymorphic
  types.  It's not immediately clear what the use of that would be
  anyway.
 
  ...meaning, that you would not be able to create a function taking
  generic 'record' as a parameter?
 
 Well, you've never been able to do that, although for many of the PLs
 there doesn't seem to be any very fundamental reason why not.

Yeah, it seems an arbitrary restriction for no very good reason.  When I
was working on PL/php (years ago) I tried to make it work because I
found it useful for some use case I was trying, but couldn't.  I don't
remember the details (and PL/php has been pretty much abandoned since
then anyway.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Window Functions patch v06

2008-10-13 Thread Hitoshi Harada
2008/10/14 David Rowley [EMAIL PROTECTED]:
 Hitoshi Harada wrote:
I made up my mind to scratch former window functions and redesigned
completely new execution model, based on the discussion with Heikki.
Attached is the v06 against HEAD today.
http://umitanuki.net/pgsql/wfv06/design.html

 First off, fantastic work!

 In my eyes this and WITH RECURSIVE are a big step for both Postgres and open
 source RBDMS'.

 Only, one small query with LEAD() and LAG()

 Going by http://www.wiscorp.com/sql200n.zip

 The lead and lag functions each take three arguments, a value expression
 VE, an exact numeric literal
 OFFSET, and a value expression DEFAULT. For each row R within the window
 partition P of R defined by
 a window structure descriptor, the lag function returns the value of VE
 evaluated on a row that is OFFSET
 number of rows before R within P, and the lead function returns the value of
 VE evaluated on a row that is
 OFFSET number of rows after R within P. The value of DEFAULT is returned as
 the result if there is no row
 corresponding to the OFFSET number of rows before R within P (for the lag
 function) or after R within P (for
 the lead function). In addition, RESPECT NULLS or IGNORE NULLS can be
 specified to indicate whether
 the rows within P for which VE evaluates to the null value are preserved or
 eliminated

 So going by that:
 SELECT name,LAG(name,1,'None') OVER (ORDER BY employeeid) FROM employee;

 Would use 'None' for rows that would be out of the bounds of the window.

 The current patch only seems to accept 2 arguments.
 ERROR:  function lag(character varying, integer, unknown) does not exist




Thanks for your feedback.

I agree I need to work on that. Also from the spec, RESPECT NULLS /
IGNORE NULLS may be specified but not supported yet. This syntax
specification is out of the postgres general function call so I wonder
if those functions are treated specially or not.

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Window Functions patch v06

2008-10-13 Thread Tom Lane
Hitoshi Harada [EMAIL PROTECTED] writes:
 I agree I need to work on that. Also from the spec, RESPECT NULLS /
 IGNORE NULLS may be specified but not supported yet. This syntax
 specification is out of the postgres general function call so I wonder
 if those functions are treated specially or not.

Egad, the SQL committee has certainly been taken over by creeping
COBOL-itis when it comes to inventing random new syntax ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] xact_desc

2008-10-13 Thread Simon Riggs
ISTM that xact_desc routines do not work properly when called with
WAL_DEBUG enabled from XLogInsert().

LOG:  INSERT @ 0/3740978: prev 0/3740938; xid 5699: Transaction -
commit: 2008-10-14 03:14:14.866437+01; subxacts: 10447936 0
STATEMENT:  commit;
LOG:  INSERT @ 0/37409F0: prev 0/37409B0; xid 5702: Transaction -
commit: 2008-10-14 03:14:17.687843+01; subxacts: 10447936 0
STATEMENT:  commit;

The arrays... work fine in recovery, just not prior to inserting.

Anyway, that led me a merry dance with other code. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xact_desc

2008-10-13 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 ISTM that xact_desc routines do not work properly when called with
 WAL_DEBUG enabled from XLogInsert().

Well, now that you mention it, that code is utterly, completely broken,
and always has been.  It's passing only the first rdata-chunk of the WAL
record to the print routine :-(

AFAICS the only way to fix it would be to allocate some workspace and
assemble the chunks of the record into that.  Doubtful that it's worth
it --- I'd be inclined to just remove the code instead.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches

2008-10-13 Thread KaiGai Kohei

Andrew Sullivan wrote:

On Fri, Oct 10, 2008 at 01:44:49PM +0900, KaiGai Kohei wrote:

Andrew Sullivan wrote:

I want to focus on this description, because you appear to be limiting
the problem scope tremendously here.  We've moved from general
security policy for database system to security policy for database
system as part of a web-application stack.

The general security policy for database system is an incorrect term.
SELinux does not cover database system only. It covers operating sytem
and application managing objects (like database object, X window, ...).
Thus, it should be talked as general security policy for operating
system, database system and so on.


Ok, then let's use the broader case, which is general security policy
for entire computing system including a RDBM subsystem (call this
GSPECS+DB, say).  This shows up even more the issue that considering
primarily the application stack does not actually cover all the cases.

I'm not suggesting, even a little bit, that securing an application
stack as you propose is a waste of time.  It could be, actually, that
this more modest goal is the more appropriate one, and that
SE-PostgreSQL would be a killer feature in this space (because it
would, if it worked, solve a lot of problems that other systems have,
as you have pointed out).  But it is not GSPECS+DB, because of all the
corner case problems whose behaviour still needs working out.


Indeed, SE-PostgreSQL is an important piece of GSPECS+DB but it cannot
catch the ultimate goal by itself only.
Do you know other efforts to apply SELinux security policy for objects
managed in userspace? One prior example is X-window system. Its resources
are managed by X server so in-kernel SELinux cannot trap accesses to the
objects. Some of them (like cutpaste buffer, key input events) can be
shared several processes, so it should be controled by the policy.
We can call it like GSPECS+X.

As widely known, security is an endless work. The ultimate goal might
not be as near as we can grab, but it does not mean it is not necessary
to fill up pieces to help it.

 But plainly, others who need to look after the code will want to know
 what the exact goal is before committing themselves to future maintenance.

It is same things as I repeated several times.
Its goal is to apply centralized manageable security policy (SELinux) on
database objects, as if SELinux doing on filesystem objects.
This feature can help web-application security, for example.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei [EMAIL PROTECTED]

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches - Patent problems?

2008-10-13 Thread KaiGai Kohei
Andres Freund wrote:
 Hi,
 
 It might be relevant for the whole discussion about inclusion of some form of 
 row level permissions, whatever based on, that there exist heaps of (in my 
 eyes conflicting) patents about row level permissions for relational 
 databases. I don't have any real clue about patent issues, but I fear that 
 makes inclusion into an open source product rather hard...

I'm not a lawyer, so we cannot decide whether it has patent issue or not
until we get an adjudication in actually. However, I don't think these are
conflicting the existing patent from the viewpoint of engineering.

 Data security system and method - 5751949 - MCI Corp. - 1998
It said the row-level access controls are applied to force users to access
tables via views. It does not conflicts our design.

 Rule based database security system and method - 6820082 - Allegis 
 Corporation - 2004
It said the row-level access controls are applied based on query modifying.
The legacy implementation of SE-PostgreSQL indeed modified WHERE clause of
given queries to apply row-level access controls, but current one does not.

 Row-level security in a relational database management system - 7240046 - IBM 
 - 2007
It said the row-level access controls are applied based on hierarcal 
relationship
between subject and object, which is well known as Bell-La-PaDula security 
model.
SE-PostgreSQL does not have any rules by itself, because it depends on an 
external
security feature (SELinux). Database ACL is not a hierarcal security model 
obviously.

 Database fine-grained access control - 7281003 - Oracle - 2007
It said the row-level access controls are applied based on query mofifying,
like as the patent 6820082 doing. It does not conflicts to SE-PostgreSQL.

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei [EMAIL PROTECTED]

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers