Re: [HACKERS] pg_upgrade: convert on read is dead end
Zdenek Kotala wrote: PostgreSQL stores all table's external data in one TOAST table and stored data does not contains any clue about datatype. When postgreSQL reads TOSTEed value then there is not possible detect what type it is and perform varlena conversion on composite datatypes or arrays. It could be converted in detoast_datum function but it means that datum have to be retoasted and store back on new pages. The old page MUST keep in old format because any page conversion lost information about version and different datatypes can be store on one page PosgreSQL. Hmm. There's two things here: 1. The format of the toast page, and the toast chunk tuples. 2. The format of the toasted tuple, stored within the chunk tuples. The format of the toast page and the varlena headers of the byteas in the toast tuples could be converted when the page is read in, as usual. The format of the toasted tuple within the chunks is indeed trickier. Retoasting all tuples on the master heap page when the page is read in is probably simplest, as you say. But they don't necessarily need to be stored on new pages, any toast pages will do, because we keep track of 1. separately. By my opinion, this issue completely kill convert on read solution and only read all formats... solution is right one. It is quite heavy-weight, I agree, but doesn't completely kill the idea in my opinion. Besides, the read all formats approach wouldn't really avoid it either. If you passed a toasted datum to a function, when the function needs to detoast it, detoast_datum still wouldn't know whether the datum is in old or new format. You'd still need to detoast all values in the tuple somewhere before they could be passed around. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Extending pg_class info + more flexible TOAST chunk size
Tom Lane napsal(a): Zdenek Kotala [EMAIL PROTECTED] writes: Bacause MAX_TOAST_CHUNK_SIZE is related on page layout version I need have toast chunk size more flexible. Attached patch add three new columns into pg_class relblocksize - which is always BLCKSZ. I put it there for fullness, but i could be use in future development to specify different BLCKSZ per relation. relsegsize - currently it is always RELSEG_FILE. I performed basic adjustment in smgr.c and md.c. Now only smgropen contains reference to RELSEG_FILE. The problem how to do it fully dynamic is how to pass information rel_rd-relsegsize down into smgropen. One idea is to extend relfilenode, but I'm not sure about it. relmaxitemsize - it is set to TOAST_MAX_CHUNK_SIZE. Other relation has this value set to zero for now. toast functions are fully aware about this setting and use it. This column will be convert to int2vector during pg_upgrade development (I need to track information for each page version). There is not one of these things that we have any intention of being allowed to vary on a per-relation basis. Why don't you read them out of pg_control? The problem what I need to solve is how to handle different TOAST chunk size which becomes with upgrade. if you insert new record into TOAST table it will be created on the new page which has different max chunk size, because it depends on page header size. It means that one TOAST table will have more chunk size. Use old value from previous version is possible but it could invoke waste of space in situation when pageheader in a new version is bigger. Another solution is to take first chunk size and say - it is max chunk size. Relsegsize is related to tablespace but when you upgrade you could want to use new size for new tables. But it is not important for pg_upgrade project. Blocksize is more nice to have int this moment, but It makes me sense to have different block size for toast table and heap. I know that this idea requires lot of changes including buffer cache and so on. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Extending pg_class info + more flexible TOAST chunk size
Zdenek Kotala wrote: The problem what I need to solve is how to handle different TOAST chunk size which becomes with upgrade. if you insert new record into TOAST table it will be created on the new page which has different max chunk size, because it depends on page header size. It means that one TOAST table will have more chunk size. Use old value from previous version is possible but it could invoke waste of space in situation when pageheader in a new version is bigger. Another solution is to take first chunk size and say - it is max chunk size. Not all chunks need to be the same size. We do currently require that, but AFAICS it's only because that allows random access to a given offset within a datum. That's of course nice, but I think we could live without it. Or try random access with the new toast size first, and if the chunks turn out to be different size, fall back to reading all chunks sequentially. And if we have to retoast all values before they're accessed, per the other thread, then we can just assume that all toast chunks that we need to random access are of the new size. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Extending pg_class info + more flexible TOAST chunk size
ITAGAKI Takahiro napsal(a): Zdenek Kotala [EMAIL PROTECTED] wrote: Bacause MAX_TOAST_CHUNK_SIZE is related on page layout version I need have toast chunk size more flexible. I agree that flexible toast threshold is useful, but I have several questions in your implementations. relblocksize - which is always BLCKSZ. I put it there for fullness, but i could be use in future development to specify different BLCKSZ per relation. We need many infrastructural changes for making it work: 1. Change BLCKSZ to be a variable from a constant value. It allows us to have initdb --blocksize=16kB. 2. Make bufmgr handle different sizes of pages in a server instance. I think it is a difficult task. 3. Have a SQL interface for 2. You suggested only 3, but 1 and 2 are more important. I know about problem complexity. If you find on how many places BLCKSZ is used you get basic picture. relsegsize - currently it is always RELSEG_FILE. I performed basic adjustment in smgr.c and md.c. Now only smgropen contains reference to RELSEG_FILE. I'm not sure why relsegsize should be adjusted per-relation basis. If there are no limitations in filesystem and in backup utilities, large relsegsize is always better, no? Is it enough to have initdb --segsize=4GB ? It is related to tablespace. Usually one table space is stored on different volume which can have different FS. The question is how to handle relation migration from one tablespace to another. In most cases it requires physical copy which could change seg size anyway. Maybe extend pg_tablespace makes more sense. relmaxitemsize - it is set to TOAST_MAX_CHUNK_SIZE. You added new columns in pg_class, but we have another choice to use pg_class.reloptions. What is the reason you choose new-columns-way? Good point I will look on it. Another question is that we should have per-column toast control options not only per-relation ones. Do we also need to modify pg_attribute in the same way? Currently we have one TOAST table per heap table (if it contains varlena attribute). Data in the TOAST table are stored without any idea what data type it is. Thanks Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Convert check constraints into One-Time_Filter on prepared statements
ITAGAKI Takahiro [EMAIL PROTECTED] writes: EXPLAIN EXECUTE p('2008-07-01'); - Append - Result One-Time Filter: (('2008-01-01' = $1) AND ($1 '2009-01-01')) - Index Scan on test_2008_t_key Index Cond: (($1 = t) AND (t ($1 + '1 mon'))) We can avoid internal scans when One-Time Filter returns false for each partition. So we can reuse the plan and receive benefit from CE. Is this concept ok and worth trying? If it is reasonable, I'll try it. Comments welcome. It's tempting to go straight for the special case where we can detect that the constraints are mutually exclusive and we can build a lookup table or do a binary search to find the single partition which is included. But the this would handle the general case and would be a huge help. I've looked at plans where partitioning represented a huge slowdown and the time seemed to just disappear down the hole. All the time was being spent in just the index lookup startup and probe for dozens of partitions which returned no rows. In combination with the ordered append node -- which is still sitting in my experimental directory -- this would basically fix partitioning. The planner would recognize that the result is ordered and merge-joinable and it wouldn't waste any time doing any index probes. There would be a much smaller waste of time setting up the degenerate heap merge and checking all those conditions in the executor. These things will matter when we get to scaling up to hundreds of plan nodes but the situation would still be much better than today. The other reason I think it's still worth doing this even though it's not as effective as covering the special case of mutually exclusive partitions is that there's still a place for this even once we have a more mature partitioning scheme. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block nested loop join
Dear All, I took a look at the source code for hash join this morning and I realized that the block nested loop join is somewhat similar to that. Thanks for the discussions. Bramandia R. On Fri, Oct 10, 2008 at 8:19 PM, Tom Lane [EMAIL PROTECTED] wrote: Gregory Stark [EMAIL PROTECTED] writes: So the use case of a real block nested loop would be doing a cartesian join of two large tables where neither fits in RAM. That does seem like it might be kind of narrow given how large the output would be. Yeah. If you have a hashable join condition then our existing batched hash join code should be roughly equivalent to this method. So the use case is joining a couple of large tables with an un-hashable, un-indexable join condition (or none at all, ie cross product) and that just isn't something we hear people wanting to do a lot. I can't really see why we'd bother maintaining extra code for block nested loop. regards, tom lane
Re: [HACKERS] [PATCH] Extending pg_class info + more flexible TOAST chunk size
Zdenek Kotala wrote: Heikki Linnakangas napsal(a): Not all chunks need to be the same size. We do currently require that, but AFAICS it's only because that allows random access to a given offset within a datum. That's of course nice, but I think we could live without it. Good point. I think it is good to keep this feature. Yeah. At the moment, it's only used for substring(), I think. Or try random access with the new toast size first, and if the chunks turn out to be different size, fall back to reading all chunks sequentially. I think it is not necessary to read it sequentially, only you need to recompute new position. Yeah, true. It occurs to me that instead of storing a chunk id, we could store a byte offset of the chunk. That would allow random access even if every chunk was of different size. You probably don't want any new changes you need to deal with in the upgrade, though :-). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]
Heikki Linnakangas [EMAIL PROTECTED] writes: No, I was thinking of something along the lines of: INFO: clustering public.my_c INFO: complete, was 33%, now 100% clustered The only such measure that we have is the correlation, which isn't very good anyway, so I'm not sure if that's worthwhile. It'd be possible to count the number of order reversals during the indexscan, ie the number of tuples with CTID lower than the previous one's. But I'm not sure how useful that number really is. Also it's not clear how to preserve such functionality if cluster is re-implemented with a sort. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Extending pg_class info + more flexible TOAST chunk size
Heikki Linnakangas napsal(a): Zdenek Kotala wrote: The problem what I need to solve is how to handle different TOAST chunk size which becomes with upgrade. if you insert new record into TOAST table it will be created on the new page which has different max chunk size, because it depends on page header size. It means that one TOAST table will have more chunk size. Use old value from previous version is possible but it could invoke waste of space in situation when pageheader in a new version is bigger. Another solution is to take first chunk size and say - it is max chunk size. Not all chunks need to be the same size. We do currently require that, but AFAICS it's only because that allows random access to a given offset within a datum. That's of course nice, but I think we could live without it. Good point. I think it is good to keep this feature. Or try random access with the new toast size first, and if the chunks turn out to be different size, fall back to reading all chunks sequentially. I think it is not necessary to read it sequentially, only you need to recompute new position. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The Axe list
Hi, Josh Berkus wrote: So it sounds like intagg is still in use/development. But ... is it more of an example, or is it useful as a type/function in production? We use it in production for quite remarkable speedups of operations on int4[]. Having reviewed the last commit fest's intagg patch as well, I thought we agreed that a more general functionality is wanted for core. But as long as we don't have that, I'd like intagg to stay in contrib. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows
On Wed, Oct 8, 2008 at 4:27 PM, Tom Lane [EMAIL PROTECTED] wrote: Merlin Moncure [EMAIL PROTECTED] writes: On Tue, Oct 7, 2008 at 9:58 PM, Tom Lane [EMAIL PROTECTED] wrote: * Instead of the above, we could try to make ROW(some columns) = ANY (array variable) work. This is shorter than the above syntax and would presumably have a lot less overhead too. But it doesn't work right now, not even for named rowtypes much less anonymous ones. By extension, would this also mean things like select row(1,2,3)::foo = foo from foo; Would work (presumably as row-wise comparison does)? Well, it turned out to be easier than I thought to get the base case working --- all that's necessary is to define an array type for RECORD and add generic comparison functions, and the cases that are needed for recursive cycle detection Just Work! See attached WIP patch, and particularly note the new test case in with.sql. The part that actually seems to need some thought is the relationship between this and operations on named rowtypes. In the patch I tweaked parse_coerce.c to treat coercion from a named rowtype's array type to record[] as an allowed binary-compatible case, but I didn't do the other direction yet (I'm not fully convinced that it's necessary). What I'm really not clear about is the extent to which record[] ought to be treated like a polymorphic type --- should we consider that it acts like an anyrecordarray type, or is that a distinct notion? Do we even want that? record itself is not considered a polymorphic type, though it has some similar qualities. Another point worth making is that I made the comparisons work like IS NOT DISTINCT, ie, they don't return NULL just because some field of the row is NULL. This is *not* like SQL-spec row comparison, but we can't build a btree opclass from these functions if they insist on returning null for null fields. (Our array comparisons work like this, too.) I'm not sure how big a deal that is either way, but I am pretty sure that IS NOT DISTINCT is the semantics you need to have if you want cycle detection to work reliably. (Hm, is that a bug in the spec? They say to use = rather than DISTINCT in cycle checking ...) Comments? Do we want to go forward with this? The record ops work as promised. IMO this patch stands in its own merits with or without the CTE changes. I played around with it and noticed a couple of oddities: select foo = foo from foo; --works but select distinct foo from foo; --yields ERROR: could not identify an equality operator for type foo also, select foo from foo order by foo; ERROR: could not identify an ordering operator for type foo postgres=# create index foo_idx on foo((foo)); ERROR: cache lookup failed for type 0 The above are probably not required to fulfill the CTE requirements...but would be nice. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] out-of-date comment in auto-generated oidjoins.sql
Gregory Stark [EMAIL PROTECTED] writes: This file was moved lo, these many years Done ... but why did your patch change oidjoins.out and not oidjoins.sql? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows
Merlin Moncure [EMAIL PROTECTED] writes: The record ops work as promised. IMO this patch stands in its own merits with or without the CTE changes. I played around with it and noticed a couple of oddities: select foo = foo from foo; --works but select distinct foo from foo; --yields ERROR: could not identify an equality operator for type foo also, select foo from foo order by foo; ERROR: could not identify an ordering operator for type foo Yeah, these are because of the incomplete handling of named record types. I'm not sure how far we want to go in that direction. postgres=# create index foo_idx on foo((foo)); ERROR: cache lookup failed for type 0 Hm, that's not good ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Extending pg_class info + more flexible TOAST chunk size
Heikki Linnakangas napsal(a): It occurs to me that instead of storing a chunk id, we could store a byte offset of the chunk. That would allow random access even if every chunk was of different size. You probably don't want any new changes you need to deal with in the upgrade, though :-). Yes, no change please :-) Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade: convert on read is dead end
Heikki Linnakangas napsal(a): Zdenek Kotala wrote: Heikki Linnakangas napsal(a): Besides, the read all formats approach wouldn't really avoid it either. If you passed a toasted datum to a function, when the function needs to detoast it, detoast_datum still wouldn't know whether the datum is in old or new format. You'd still need to detoast all values in the tuple somewhere before they could be passed around. If you look into pg_upgrade prototype patch I added page version information into HeapTupleData structure. It keeps information about format. All chunked data are stored on a pages with same page versions. I think these two things are enough to have all necessary information. Functions are passed just a Datum, not HeapTupleData. But executor works with HeapTupleData structure and it should convert it to the new format before it passes it to function. My idea is to convert tuple in ExecTupleStore or invent new node special for tuple conversion. I expect that function cannot get toast pointer. It can get only detoasted attributes. Correct me if I'm wrong. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows
I wrote: Merlin Moncure [EMAIL PROTECTED] writes: select foo from foo order by foo; ERROR: could not identify an ordering operator for type foo Yeah, these are because of the incomplete handling of named record types. I'm not sure how far we want to go in that direction. On looking closer, all these cases fail because I forgot to teach IsBinaryCoercible() that any composite type should be considered binary-coercible to RECORD. Which is clearly sensible. I'm inclined to apply the patch with binary-coercibility adjustments and not try to turn RECORD or RECORD[] into full-fledged polymorphic types. It's not immediately clear what the use of that would be anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] out-of-date comment in auto-generated oidjoins.sql
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: This file was moved lo, these many years Done ... but why did your patch change oidjoins.out and not oidjoins.sql? I had other changes in my tree, I just diffed the files individually rather than doing a cvs diff -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] out-of-date comment in auto-generated oidjoins.sql
This file was moved lo, these many years diff --git a/src/tools/findoidjoins/make_oidjoins_check b/src/tools/findoidjoins/make_oidjoins_check index 5061634..9c6b850 100755 --- a/src/tools/findoidjoins/make_oidjoins_check +++ b/src/tools/findoidjoins/make_oidjoins_check @@ -54,7 +54,7 @@ $AWK -F'[ \.]' '\ { printf \ --\n\ --- This is created by pgsql/contrib/findoidjoins/make_oidjoin_check\n\ +-- This is created by pgsql/src/tools/findoidjoins/make_oidjoins_check\n\ --\n; } { diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out index 3d5d3db..5b1c22b 100644 --- a/src/test/regress/expected/oidjoins.out +++ b/src/test/regress/expected/oidjoins.out @@ -1,5 +1,5 @@ -- --- This is created by pgsql/contrib/findoidjoins/make_oidjoin_check +-- This is created by pgsql/src/tools/findoidjoins/make_oidjoins_check -- SELECT ctid, aggfnoid FROM pg_catalog.pg_aggregate fk -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]
On Mon, Oct 13, 2008 at 8:30 AM, Tom Lane [EMAIL PROTECTED] wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: No, I was thinking of something along the lines of: INFO: clustering public.my_c INFO: complete, was 33%, now 100% clustered The only such measure that we have is the correlation, which isn't very good anyway, so I'm not sure if that's worthwhile. It'd be possible to count the number of order reversals during the indexscan, ie the number of tuples with CTID lower than the previous one's. But I'm not sure how useful that number really is. Also it's not clear how to preserve such functionality if cluster is re-implemented with a sort. regards, tom lane Another version of the patch should be attached, this time counting the number of inversions (pairs of tuples in the table that are in the wrong order) as a measure of the sortedness of the original data (scanned/live numbers still reported as an indication of the extent to which the table was vacuumed). N.B. -- I'm not familiar enough with Postgres internals to know if the included inversion_count() method is a valid way to identify inversions. In any case, example VERBOSE output: postgres=# CLUSTER public.my_c VERBOSE ; INFO: clustering public.my_c INFO: complete, 15 tuples scanned, 10 tuples now live, 2 inversions DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. CLUSTER Index: src/backend/commands/cluster.c === RCS file: /projects/cvsroot/pgsql/src/backend/commands/cluster.c,v retrieving revision 1.177 diff -c -r1.177 cluster.c *** src/backend/commands/cluster.c 19 Jun 2008 00:46:04 - 1.177 --- src/backend/commands/cluster.c 13 Oct 2008 14:25:09 - *** *** 46,51 --- 46,52 #include utils/snapmgr.h #include utils/syscache.h #include utils/tqual.h + #include utils/pg_rusage.h /* *** *** 59,70 Oid indexOid; } RelToCluster; ! ! static void cluster_rel(RelToCluster *rv, bool recheck); ! static void rebuild_relation(Relation OldHeap, Oid indexOid); ! static TransactionId copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex); static List *get_tables_to_cluster(MemoryContext cluster_context); ! /*--- --- 60,80 Oid indexOid; } RelToCluster; ! /* ! * This struct is used to collect stats for VERBOSE output. ! */ ! typedef struct ! { ! double tuples_scanned ; ! double tuples_copied ; ! double inversions ; ! } ClusterStatistics; ! ! static void cluster_rel(RelToCluster *rv, bool recheck, int elevel); ! static void rebuild_relation(Relation OldHeap, Oid indexOid, ClusterStatistics *cs); ! static TransactionId copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, ClusterStatistics *cs); static List *get_tables_to_cluster(MemoryContext cluster_context); ! static int inversion_count(BlockIdData *bid_old, OffsetNumber off_old, BlockIdData *bid, OffsetNumber off) ; /*--- *** *** 176,182 heap_close(rel, NoLock); /* Do the job */ ! cluster_rel(rvtc, false); } else { --- 186,192 heap_close(rel, NoLock); /* Do the job */ ! cluster_rel(rvtc, false, stmt-verbose ? INFO : DEBUG2); } else { *** *** 225,231 StartTransactionCommand(); /* functions in indexes may want a snapshot set */ PushActiveSnapshot(GetTransactionSnapshot()); ! cluster_rel(rvtc, true); PopActiveSnapshot(); CommitTransactionCommand(); } --- 235,241 StartTransactionCommand(); /* functions in indexes may want a snapshot set */ PushActiveSnapshot(GetTransactionSnapshot()); ! cluster_rel(rvtc, true, stmt-verbose ? INFO : DEBUG2); PopActiveSnapshot(); CommitTransactionCommand(); } *** *** 253,261 * them incrementally while we load the table. */ static void ! cluster_rel(RelToCluster *rvtc, bool recheck) { Relation OldHeap; /* Check for user-requested abort. */ CHECK_FOR_INTERRUPTS(); --- 263,273 * them incrementally while we load the table. */ static void ! cluster_rel(RelToCluster *rvtc, bool recheck, int elevel) { Relation OldHeap; + PGRUsage ru0 ; + ClusterStatistics cs ; /* Check for user-requested abort. */ CHECK_FOR_INTERRUPTS(); *** *** 343,349 check_index_is_clusterable(OldHeap, rvtc-indexOid, recheck); /* rebuild_relation does all the dirty work */ ! rebuild_relation(OldHeap, rvtc-indexOid); /* NB: rebuild_relation does heap_close() on OldHeap */ } --- 355,373 check_index_is_clusterable(OldHeap, rvtc-indexOid, recheck); /* rebuild_relation does all the dirty work */ ! ereport(elevel, ! (errmsg(clustering \%s.%s\, !
Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats
On Oct 10, 2008, at 3:40 PM, Robert Haas wrote: I dislike all own creatures - because nobody will understand so do some wrong thing - using non standard formats is bad thing. So it's is necessary, then who need it then he found it on pgfoundry. But why smudge core? I'm opposed to smudging core, but I'm in favor of this patch. :-) Of course, I'm biased, because I wrote it. But I think that providing input and output functions that make it easy to read and write common formats, even if they happen to be non-standard, is useful. I tend to agree, but I have a hard time swallowing that when it means a 2-3% performance penalty for those that aren't using that functionality. I could perhaps see adding a function that accepted common UUID formats and spit out the standard. If you could get rid of the performance hit this might be more interesting. Perhaps default to assuming a good format and only fail back to something else if that doesn't work? -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Building Postgres in Eclipse
I have added a link at the bottom of the page. All the info on how to use Eclipse with Postgres can be found at http://wiki.postgresql.org/wiki/Working_with_Eclipse All suggestions and comments are welcome of course. Emmanuel Looks great, thanks! --Chris -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]
Jim Cox wrote: On Mon, Oct 13, 2008 at 8:30 AM, Tom Lane [EMAIL PROTECTED] wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: No, I was thinking of something along the lines of: INFO: clustering public.my_c INFO: complete, was 33%, now 100% clustered The only such measure that we have is the correlation, which isn't very good anyway, so I'm not sure if that's worthwhile. It'd be possible to count the number of order reversals during the indexscan, ie the number of tuples with CTID lower than the previous one's. But I'm not sure how useful that number really is. It will look bad for patterns like: 2 1 4 3 6 5 .. which for all practical purposes is just as good as a perfectly sorted table. So no, I don't think that's a very useful metric either without somehow taking caching effects into account. Another version of the patch should be attached, this time counting the number of inversions (pairs of tuples in the table that are in the wrong order) as a measure of the sortedness of the original data (scanned/live numbers still reported as an indication of the extent to which the table was vacuumed). Until we have a better metric for sortedness, my earlier suggestion to print it was probably a bad idea. If anything, should probably print the same correlation metric that ANALYZE calculates, so that it would at least match what the planner uses for decision-making. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade: convert on read is dead end
Heikki Linnakangas napsal(a): Zdenek Kotala wrote: PostgreSQL stores all table's external data in one TOAST table and stored data does not contains any clue about datatype. When postgreSQL reads TOSTEed value then there is not possible detect what type it is and perform varlena conversion on composite datatypes or arrays. It could be converted in detoast_datum function but it means that datum have to be retoasted and store back on new pages. The old page MUST keep in old format because any page conversion lost information about version and different datatypes can be store on one page PosgreSQL. Hmm. There's two things here: 1. The format of the toast page, and the toast chunk tuples. 2. The format of the toasted tuple, stored within the chunk tuples. The format of the toast page and the varlena headers of the byteas in the toast tuples could be converted when the page is read in, as usual. Convert page and chunk varlena header is not problem. It works. You can get into trouble when new max chunk size is smaller, but it is not current case. The format of the toasted tuple within the chunks is indeed trickier. Retoasting all tuples on the master heap page when the page is read in is probably simplest, as you say. But they don't necessarily need to be stored on new pages, any toast pages will do, because we keep track of 1. separately. You can store it on any page in the new format with enough free space, but in reality it will be mostly new page. You cannot convert old page, because it could contain chunk from different tuple. You could check it and converted it only if all chunk are related to one datum, but in general you need to have mechanism how solve problem with multi datum chunks. By my opinion, this issue completely kill convert on read solution and only read all formats... solution is right one. It is quite heavy-weight, I agree, but doesn't completely kill the idea in my opinion. Besides, the read all formats approach wouldn't really avoid it either. If you passed a toasted datum to a function, when the function needs to detoast it, detoast_datum still wouldn't know whether the datum is in old or new format. You'd still need to detoast all values in the tuple somewhere before they could be passed around. If you look into pg_upgrade prototype patch I added page version information into HeapTupleData structure. It keeps information about format. All chunked data are stored on a pages with same page versions. I think these two things are enough to have all necessary information. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER, REINDEX, VACUUM in read only transaction?
On Fri, 2008-10-10 at 09:41 -0400, Tom Lane wrote: So I was looking for other omissions in utility.c, and I noticed that check_xact_readonly() doesn't reject CLUSTER, REINDEX, or VACUUM. Now the notion of read only that we're trying to enforce is pretty weak (I think it's effectively no writes to non-temp tables). But I can't see that CLUSTER is a read-only operation even under the weakest definitions, and I'm not seeing the rationale for REINDEX or VACUUM here either. I think you should add a few more to the list. * LISTEN, UNLISTEN, NOTIFY * nextval() * ANALYZE * CHECKPOINT * GRANT, REVOKE, REASSIGN * DISCARD ALL should not run UnlistenAll These are all gaps I will have to plug for Hot Standby, and in fact wrote code this morning for some of those before I saw this post. (Yes, I went through every command). If you're gonna do it, do it soon please, so I know not to bother any further. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade: convert on read is dead end
Zdenek Kotala wrote: I expect that function cannot get toast pointer. It can get only detoasted attributes. Correct me if I'm wrong. A function is passed a raw datum, which can be toasted. Now, practically all functions call one of the PG_GETARG_* macros that detoast the datum before doing anything else, but there is also PG_GETARG_RAW_VARLENA_P macro that returns the toast pointer. Although, grepping through the source code suggests that it's not actually used anywhere. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade: convert on read is dead end
Zdenek Kotala wrote: Heikki Linnakangas napsal(a): Besides, the read all formats approach wouldn't really avoid it either. If you passed a toasted datum to a function, when the function needs to detoast it, detoast_datum still wouldn't know whether the datum is in old or new format. You'd still need to detoast all values in the tuple somewhere before they could be passed around. If you look into pg_upgrade prototype patch I added page version information into HeapTupleData structure. It keeps information about format. All chunked data are stored on a pages with same page versions. I think these two things are enough to have all necessary information. Functions are passed just a Datum, not HeapTupleData. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Convert check constraints into One-Time_Filter on prepared statements
Hello, Constraint exclusion (CE) is very useful for partitioned tables, but prepared statements interfere with CE if the parameter contains partition keys, because CE is a planner-time optimization but the actual parameters are given at executor-time. I have an idea to use constraint exclusion and prepared statements together -- converting check constraints into One-Time Filter. For example, when we have test table partitioned by test_{year}: CREATE TABLE test PARTITIONED BY PARTITION test_2008 CHECK('2008-01-01' = t AND t '2009-01-01') PARTITION test_2009 CHECK('2009-01-01' = t AND t '2010-01-01') PARTITION test_2010 CHECK('2010-01-01' = t AND t '2011-01-01') and prepare a statement that have a partitioned key in the parameter: PREPARE p(timestamp) AS SELECT * FROM test WHERE $1 = t AND t $1 + '1 mon'; Then planner converts check constraints into One-Time Filter. Plan will be the following: EXPLAIN EXECUTE p('2008-07-01'); - Append - Result One-Time Filter: (('2008-01-01' = $1) AND ($1 '2009-01-01')) - Index Scan on test_2008_t_key Index Cond: (($1 = t) AND (t ($1 + '1 mon'))) - Result One-Time Filter: (('2009-01-01' = $1) AND ($1 '2010-01-01')) - Index Scan on test_2009_t_key Index Cond: (($1 = t) AND (t ($1 + '1 mon'))) - Result One-Time Filter: (('2010-01-01' = $1) AND ($1 '2011-01-01')) - Index Scan on test_2010_t_key Index Cond: (($1 = t) AND (t ($1 + '1 mon'))) We can avoid internal scans when One-Time Filter returns false for each partition. So we can reuse the plan and receive benefit from CE. Is this concept ok and worth trying? If it is reasonable, I'll try it. Comments welcome. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]
Jim Cox wrote: On Fri, Oct 10, 2008 at 10:23 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: Kevin Grittner wrote: Jim Cox [EMAIL PROTECTED] wrote: if present an INFO message is generated which displays the schema.tblname just before actual clustering is kicked off (see example below). postgres=# CLUSTER VERBOSE ; INFO: clustering public.my_b INFO: clustering public.my_c INFO: clustering public.my_a CLUSTER Would it make sense to display the pg_total_relation_size before and after? Assuming you run CLUSTER as a replacement for VACUUM FULL, yes. More interesting would be a metric of clusteredness, I think. Something more like the following? postgres=# CLUSTER VERBOSE ; INFO: clustering public.my_b INFO: complete, 0 rows scanned, 0 rows now live DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: clustering public.my_c INFO: complete, 20 rows scanned, 10 rows now live DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: clustering public.my_a INFO: complete, 10 rows scanned, 10 rows now live DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. CLUSTER No, I was thinking of something along the lines of: INFO: clustering public.my_c INFO: complete, was 33%, now 100% clustered The only such measure that we have is the correlation, which isn't very good anyway, so I'm not sure if that's worthwhile. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A small note about the difficulty of managing subprojects
On Oct 12, 2008, at 20:15, Joshua D. Drake wrote: Yeah I would agree with that. I find that only real issue with PgFoundry is the PgFoundry (aka Gforge) not the quality of the projects being hosted. The other thing that could use some love is searching for projects. Google doesn't rank pgFoundry stuff very highly, and Gforge's search functionality leaves something to be desired. As a lover of CPAN, I have to say that I don't use CPAN itself all that much; rather, I use search.cpan.org, which makes it dead easy to search for modules that have functionality I'm looking for. But improving search should come after fixing/upgrading Gforge, IMHO. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Year 2038 Bug?
Howdy, Not sure if PostgreSQL uses time_t, but if so, this project provides useful code (a drop-in replacement for time.h) to address the 2038 bug on 32-bit platforms. http://code.google.com/p/y2038/ Useful for PostgreSQL? Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Year 2038 Bug?
David E. Wheeler [EMAIL PROTECTED] writes: Not sure if PostgreSQL uses time_t, We got rid of that some time ago. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Year 2038 Bug?
On Oct 13, 2008, at 11:01, Tom Lane wrote: David E. Wheeler [EMAIL PROTECTED] writes: Not sure if PostgreSQL uses time_t, We got rid of that some time ago. Probably no problem, then. Do dates in PostgreSQL work for their entire documented ranges on 32bit processors? Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Year 2038 Bug?
David E. Wheeler [EMAIL PROTECTED] writes: Probably no problem, then. Do dates in PostgreSQL work for their entire documented ranges on 32bit processors? As long as the C compiler supports int64 ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Year 2038 Bug?
On Oct 13, 2008, at 11:13, Tom Lane wrote: David E. Wheeler [EMAIL PROTECTED] writes: Probably no problem, then. Do dates in PostgreSQL work for their entire documented ranges on 32bit processors? As long as the C compiler supports int64 ... I was afraid you'd say that. See: http://code.google.com/p/y2038/wiki/WhyBother Especially the 64 bit CPU doesn't mean 2038 clean section. Again, maybe this doesn't apply to PostgreSQL; I'm just doing a bit of diligence. :-) Cheers, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Year 2038 Bug?
David E. Wheeler napsal(a): On Oct 13, 2008, at 11:13, Tom Lane wrote: David E. Wheeler [EMAIL PROTECTED] writes: Probably no problem, then. Do dates in PostgreSQL work for their entire documented ranges on 32bit processors? As long as the C compiler supports int64 ... I was afraid you'd say that. See: http://code.google.com/p/y2038/wiki/WhyBother Especially the 64 bit CPU doesn't mean 2038 clean section. Again, maybe this doesn't apply to PostgreSQL; I'm just doing a bit of diligence. :-) PostgreSQL 8.4 uses 64bit data type for time. But if you use system timezone then you can get in trouble if system does not support 64bit zic files. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Year 2038 Bug?
David E. Wheeler wrote: On Oct 13, 2008, at 11:13, Tom Lane wrote: David E. Wheeler [EMAIL PROTECTED] writes: Probably no problem, then. Do dates in PostgreSQL work for their entire documented ranges on 32bit processors? As long as the C compiler supports int64 ... I was afraid you'd say that. See: http://code.google.com/p/y2038/wiki/WhyBother Especially the 64 bit CPU doesn't mean 2038 clean section. Again, maybe this doesn't apply to PostgreSQL; I'm just doing a bit of diligence. :-) Cheers, David PostgreSQL doesn't use the standard time_t and time functions for its timestamp types. Therefore, any limitations in regards to 64-bit time_t values on 32-bit platforms don't apply; other than the limitation Tom spoke of ... no 64-bit int. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Year 2038 Bug?
On Oct 13, 2008, at 11:22, Zdenek Kotala wrote: PostgreSQL 8.4 uses 64bit data type for time. But if you use system timezone then you can get in trouble if system does not support 64bit zic files. I've never noticed a problem with the TZinfo database that ships on systems I've used. How would I know that there was a problem? What sort of trouble could I get into? Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Year 2038 Bug?
On Oct 13, 2008, at 11:24, Andrew Chernow wrote: PostgreSQL doesn't use the standard time_t and time functions for its timestamp types. Therefore, any limitations in regards to 64- bit time_t values on 32-bit platforms don't apply; other than the limitation Tom spoke of ... no 64-bit int. Gotcha, thanks for the clarification. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Year 2038 Bug?
David E. Wheeler napsal(a): On Oct 13, 2008, at 11:22, Zdenek Kotala wrote: PostgreSQL 8.4 uses 64bit data type for time. But if you use system timezone then you can get in trouble if system does not support 64bit zic files. I've never noticed a problem with the TZinfo database that ships on systems I've used. How would I know that there was a problem? What sort of trouble could I get into? when you use --with-system-tzdata and run make check on head it fails on systemes without 64bit tzinfo support. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] There's some sort of race condition with the new FSM stuff
Two different buildfarm machines are currently showing the same failure: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=kududt=2008-10-13%2015:30:00 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=dragonflydt=2008-10-13%2016:30:01 The postmaster log in each case shows ERROR: could not fsync segment 0 of relation 1663/16384/29270/1: No such file or directory ERROR: checkpoint request failed HINT: Consult recent messages in the server log for details. STATEMENT: DROP TABLESPACE testspace; ERROR: could not fsync segment 0 of relation 1663/16384/29270/1: No such file or directory ERROR: checkpoint request failed HINT: Consult recent messages in the server log for details. STATEMENT: DROP TABLESPACE testspace; which looks like an fsync request has been allowed to hang around too long, ie, after its file is already gone. Since /1 indicates the FSM fork, I suppose this has been introduced by the new FSM code. I haven't dug any more deeply than that though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Year 2038 Bug?
On Oct 13, 2008, at 11:37, Zdenek Kotala wrote: when you use --with-system-tzdata and run make check on head it fails on systemes without 64bit tzinfo support. Oh. Is it not preferable to use the tzdata that ships with PostgreSQL? Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Year 2038 Bug?
David E. Wheeler [EMAIL PROTECTED] writes: On Oct 13, 2008, at 11:37, Zdenek Kotala wrote: when you use --with-system-tzdata and run make check on head it fails on systemes without 64bit tzinfo support. Oh. Is it not preferable to use the tzdata that ships with PostgreSQL? Not necessarily; the system might have a more up-to-date tzdata. Generally you'd use --with-system-tzdata on a platform where you expect to receive routine package updates for the tzdata files, independently of the Postgres release cycle. It seems reasonable to assume that anyone currently shipping tzdata is offering 64-bit files. (But we do have that regression test check in there to make sure.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Year 2038 Bug?
On Oct 13, 2008, at 11:53, Tom Lane wrote: Oh. Is it not preferable to use the tzdata that ships with PostgreSQL? Not necessarily; the system might have a more up-to-date tzdata. Gotcha. Generally you'd use --with-system-tzdata on a platform where you expect to receive routine package updates for the tzdata files, independently of the Postgres release cycle. It seems reasonable to assume that anyone currently shipping tzdata is offering 64-bit files. (But we do have that regression test check in there to make sure.) Understood. Thanks for the explanation (and the regression test!). Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]
On Mon, 2008-10-13 at 08:30 -0400, Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: No, I was thinking of something along the lines of: INFO: clustering public.my_c INFO: complete, was 33%, now 100% clustered The only such measure that we have is the correlation, which isn't very good anyway, so I'm not sure if that's worthwhile. It'd be possible to count the number of order reversals during the indexscan, ie the number of tuples with CTID lower than the previous one's. But I'm not sure how useful that number really is. Also it's not clear how to preserve such functionality if cluster is re-implemented with a sort. I assume here you mean a CTID with a lower page number, as the line pointer wouldn't make any difference, right? I think it would be a useful metric to decide whether or not to use an index scan (I don't know how easy it is to estimate this from a sample, but a CLUSTER could clearly get an exact number). It would solve the problem where synchronized scans used by pg_dump could result in poor correlation on restore and therefore not choose index scans (which is what prompted turning off sync scans for pg_dump). Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Year 2038 Bug?
Tom Lane napsal(a): Generally you'd use --with-system-tzdata on a platform where you expect to receive routine package updates for the tzdata files, independently of the Postgres release cycle. It seems reasonable to assume that anyone currently shipping tzdata is offering 64-bit files. (But we do have that regression test check in there to make sure.) Unfortunately, you are not correct here :( see: http://bugs.opensolaris.org/view_bug.do?bug_id=4246033 Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows
On Mon, Oct 13, 2008 at 9:56 AM, Tom Lane [EMAIL PROTECTED] wrote: I wrote: Merlin Moncure [EMAIL PROTECTED] writes: select foo from foo order by foo; ERROR: could not identify an ordering operator for type foo Yeah, these are because of the incomplete handling of named record types. I'm not sure how far we want to go in that direction. On looking closer, all these cases fail because I forgot to teach IsBinaryCoercible() that any composite type should be considered binary-coercible to RECORD. Which is clearly sensible. I'm inclined to apply the patch with binary-coercibility adjustments and not try to turn RECORD or RECORD[] into full-fledged polymorphic types. It's not immediately clear what the use of that would be anyway. ...meaning, that you would not be able to create a function taking generic 'record' as a parameter? In that case I agree...any chance of getting an updated patch? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Year 2038 Bug?
On Oct 13, 2008, at 12:35, Zdenek Kotala wrote: Tom Lane napsal(a): Generally you'd use --with-system-tzdata on a platform where you expect to receive routine package updates for the tzdata files, independently of the Postgres release cycle. It seems reasonable to assume that anyone currently shipping tzdata is offering 64-bit files. (But we do have that regression test check in there to make sure.) Unfortunately, you are not correct here :( see: http://bugs.opensolaris.org/view_bug.do?bug_id=4246033 So ideally all OS venders would ship 64-bit tzdata files, eh? Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Well its official, replicator is BSD
Hello, We finally got around to releasing Replicator as FOSS. It is BSD licensed and available here: https://projects.commandprompt.com/public/replicator/wiki (Yes it is a self signed cert, its on the list to fix). Enjoy folks! Sincerely, Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]
Heikki Linnakangas [EMAIL PROTECTED] writes: Jim Cox wrote: On Mon, Oct 13, 2008 at 8:30 AM, Tom Lane [EMAIL PROTECTED] wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: It'd be possible to count the number of order reversals during the indexscan, ie the number of tuples with CTID lower than the previous one's. But I'm not sure how useful that number really is. Incidentally it finally occurred to me that sortedness is actually a pretty good term to search on. I found several papers for estimating metrics of sortedness from samples even. Though the best looks like it requires a sample of size O(sqrt(n)) which is more than we currently take. The two metrics which seem popular is either the length of the longest subsequence which is sorted or the number of sorted subsequences. I think the latter is equivalent to counting the inversions. I didn't find any papers which claimed to present good ways to draw conclusions based on these metrics but I only did a quick search. I imagine if everyone is looking for ways to estimate them they they must be useful for something... For some reason my access to the ACM digital library stopped working. Does anyone else have access? It will look bad for patterns like: 2 1 4 3 6 5 .. Hm, you could include some measure of how far the inversion goes -- but I think that's counter-productive. Sure some of them will be cached but others won't and that'll be equally bad regardless of how far back it goes. Until we have a better metric for sortedness, my earlier suggestion to print it was probably a bad idea. If anything, should probably print the same correlation metric that ANALYZE calculates, so that it would at least match what the planner uses for decision-making. I agree with that. I like the idea of printing a message though -- we should just have it print the correlation for now and when we improve the stats we'll print the new metric. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] There's some sort of race condition with the new FSM stuff
I wrote: Two different buildfarm machines are currently showing the same failure: ERROR: could not fsync segment 0 of relation 1663/16384/29270/1: No such file or directory ERROR: checkpoint request failed Some tests show that when the serial regression tests are run in a freshly initdb'd installation, HEAD assigns OID 29270 to bmscantest in the bitmapops test. So that's been dropped some time before the failure occurs; which means that this isn't a narrow-window race condition; which raises the question of why we're not seeing it on more machines. I notice now that kudu and dragonfly are actually the same machine ... could this be an OS-specific problem? Kris, has there been any system-software change on that machine recently? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]
Gregory Stark [EMAIL PROTECTED] writes: Heikki Linnakangas [EMAIL PROTECTED] writes: Until we have a better metric for sortedness, my earlier suggestion to print it was probably a bad idea. If anything, should probably print the same correlation metric that ANALYZE calculates, so that it would at least match what the planner uses for decision-making. I agree with that. I like the idea of printing a message though -- we should just have it print the correlation for now and when we improve the stats we'll print the new metric. Short of actually running an ANALYZE, I'm not seeing a good way to derive the same number it derives. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: I agree with that. I like the idea of printing a message though -- we should just have it print the correlation for now and when we improve the stats we'll print the new metric. Short of actually running an ANALYZE, I'm not seeing a good way to derive the same number it derives. Well we could print the _old_ value at least. So if you run cluster periodically you can see whether you're running it often enough. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]
Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Short of actually running an ANALYZE, I'm not seeing a good way to derive the same number it derives. Well we could print the _old_ value at least. +1 ... seems an appropriate amount of effort for the likely value. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]
Tom Lane [EMAIL PROTECTED] wrote: Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Short of actually running an ANALYZE, I'm not seeing a good way to derive the same number it derives. Well we could print the _old_ value at least. +1 ... seems an appropriate amount of effort for the likely value. That seems fine for sortedness, but am I the only one who would like the verbose mode to show the bloat reduction? Essentially, an INFO line to show the same information you could get by bracketing the CLUSTER with a couple SELECTs: ccdev=# select pg_total_relation_size('DbTranImageStatus'); pg_total_relation_size 253952 (1 row) ccdev=# cluster DbTranImageStatus; CLUSTER ccdev=# select pg_total_relation_size('DbTranImageStatus'); pg_total_relation_size 32768 (1 row) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] There's some sort of race condition with the new FSM stuff
On Mon, 13 Oct 2008, Tom Lane wrote: I notice now that kudu and dragonfly are actually the same machine ... could this be an OS-specific problem? Kris, has there been any system-software change on that machine recently? This is a VM that I haven't touched in some time. It was turned off after a host kernel upgrade no longer allowed vmware to work on it. I recently turned it back on after switching from vmware workstation 5.5 to vmware server 2.0. On the VM itself the only change I've made was to switch the timezone from Mountain to Pacific. http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=dragonflybr=HEAD Kris Jurka -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO item: adding VERBOSE option to CLUSTER [with patch]
On Mon, 13 Oct 2008 15:34:04 -0500 Kevin Grittner [EMAIL PROTECTED] wrote: ccdev=# select pg_total_relation_size('DbTranImageStatus'); pg_total_relation_size 253952 (1 row) ccdev=# cluster DbTranImageStatus; CLUSTER ccdev=# select pg_total_relation_size('DbTranImageStatus'); pg_total_relation_size 32768 (1 row) -Kevin Although I think it is an interesting bit of information, I find that if I am going to be clustering, I have already done the above. Sincerely, Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How is random_page_cost=4 ok?
Greg Smith wrote: The drives themselves, and possibly the OS and disk controller, are all running read-ahead algorithms to accelerate this case. In fact, this *exact* case for the Linux read-ahead stuff that just went mainline recently: http://kerneltrap.org/node/6642 Apparently only the simple stuff hit mainline, see [1] and [2], not knowing how this turns out for pg-style loads, especially compared to the full-fledged patch. Readahead is probably too much of a beast that no one dares to touch with a 3-foot-pole, unless given a large team with good standing in the kernel community and concerted regression testing in whatever environment Linux is used these days... michael [1] http://lwn.net/Articles/235164/ [2] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=history;f=mm/readahead.c -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.3 .4 + Vista + MingW + initdb = ACCESS_DENIED
Magnus Hagander wrote: Second, once I've successfully built and installed postgres, I run into a bigger problem. When using initdb, I get this error: creating template1 database in c:/Data/postgres30/base/1 ... FATAL: could not create shared memory segment: 5 DETAIL: Failed system call was CreateFileMapping(size=1802240, name=Global\PostgreSQL:c:/Data/postgres). A bit of googling and reading MSDN docs shows that applications that don't run in Session 0 on Vista are not allowed to create shared memory in the Global namespace. Since initdb is invoked from the command line, it runs in Session 1. Where did you find that information? I've been specifically looking for it, but my searches didn't turn up anything conclusive. The latest versions contain a fix for the global namespace code. Dave noticed that this caused issues on vista and thus manually reverted the patch in the official binary installer. But since we haven't (hadn't) yet found documentation as to *why* it was failing, the patch has not yet been reverted in the main source tree. This is why it's working, probably, and it's not related to how it's built. If you want to revert the patch in your local tree, this is the one: http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/port/win32_shmem.c?r1=1.4r2=1.5 Specifically, you can reintroduce the old bug (that I think is hat made it work on Vista) by removing the +18 in the lowest loop there. I am getting this error on XP Pro. The buildfarm members run happily from the scheduler, but when run by hand from the command line they fail. This is true of both MinGW and MSVC. This is in *URGENT* need of a fix. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows
Merlin Moncure [EMAIL PROTECTED] writes: On Mon, Oct 13, 2008 at 9:56 AM, Tom Lane [EMAIL PROTECTED] wrote: I'm inclined to apply the patch with binary-coercibility adjustments and not try to turn RECORD or RECORD[] into full-fledged polymorphic types. It's not immediately clear what the use of that would be anyway. ...meaning, that you would not be able to create a function taking generic 'record' as a parameter? Well, you've never been able to do that, although for many of the PLs there doesn't seem to be any very fundamental reason why not. But I was actually wondering about something beyond that: should we have the equivalent of the polymorphic-type behaviors for composites? That would mean rules along the line of all records mentioned in the call and result are the same composite type and record[] means the array type corresponding to whichever type record is. We don't seem to need these things in order to solve the recursion cycle detection problem, so I'm not very excited about pursuing the line of thought any further right now. In that case I agree...any chance of getting an updated patch? See CVS HEAD ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Window Functions patch v06
Hitoshi Harada wrote: I made up my mind to scratch former window functions and redesigned completely new execution model, based on the discussion with Heikki. Attached is the v06 against HEAD today. http://umitanuki.net/pgsql/wfv06/design.html First off, fantastic work! In my eyes this and WITH RECURSIVE are a big step for both Postgres and open source RBDMS'. Only, one small query with LEAD() and LAG() Going by http://www.wiscorp.com/sql200n.zip The lead and lag functions each take three arguments, a value expression VE, an exact numeric literal OFFSET, and a value expression DEFAULT. For each row R within the window partition P of R defined by a window structure descriptor, the lag function returns the value of VE evaluated on a row that is OFFSET number of rows before R within P, and the lead function returns the value of VE evaluated on a row that is OFFSET number of rows after R within P. The value of DEFAULT is returned as the result if there is no row corresponding to the OFFSET number of rows before R within P (for the lag function) or after R within P (for the lead function). In addition, RESPECT NULLS or IGNORE NULLS can be specified to indicate whether the rows within P for which VE evaluates to the null value are preserved or eliminated So going by that: SELECT name,LAG(name,1,'None') OVER (ORDER BY employeeid) FROM employee; Would use 'None' for rows that would be out of the bounds of the window. The current patch only seems to accept 2 arguments. ERROR: function lag(character varying, integer, unknown) does not exist -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows
Tom Lane escribió: Merlin Moncure [EMAIL PROTECTED] writes: On Mon, Oct 13, 2008 at 9:56 AM, Tom Lane [EMAIL PROTECTED] wrote: I'm inclined to apply the patch with binary-coercibility adjustments and not try to turn RECORD or RECORD[] into full-fledged polymorphic types. It's not immediately clear what the use of that would be anyway. ...meaning, that you would not be able to create a function taking generic 'record' as a parameter? Well, you've never been able to do that, although for many of the PLs there doesn't seem to be any very fundamental reason why not. Yeah, it seems an arbitrary restriction for no very good reason. When I was working on PL/php (years ago) I tried to make it work because I found it useful for some use case I was trying, but couldn't. I don't remember the details (and PL/php has been pretty much abandoned since then anyway.) -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Window Functions patch v06
2008/10/14 David Rowley [EMAIL PROTECTED]: Hitoshi Harada wrote: I made up my mind to scratch former window functions and redesigned completely new execution model, based on the discussion with Heikki. Attached is the v06 against HEAD today. http://umitanuki.net/pgsql/wfv06/design.html First off, fantastic work! In my eyes this and WITH RECURSIVE are a big step for both Postgres and open source RBDMS'. Only, one small query with LEAD() and LAG() Going by http://www.wiscorp.com/sql200n.zip The lead and lag functions each take three arguments, a value expression VE, an exact numeric literal OFFSET, and a value expression DEFAULT. For each row R within the window partition P of R defined by a window structure descriptor, the lag function returns the value of VE evaluated on a row that is OFFSET number of rows before R within P, and the lead function returns the value of VE evaluated on a row that is OFFSET number of rows after R within P. The value of DEFAULT is returned as the result if there is no row corresponding to the OFFSET number of rows before R within P (for the lag function) or after R within P (for the lead function). In addition, RESPECT NULLS or IGNORE NULLS can be specified to indicate whether the rows within P for which VE evaluates to the null value are preserved or eliminated So going by that: SELECT name,LAG(name,1,'None') OVER (ORDER BY employeeid) FROM employee; Would use 'None' for rows that would be out of the bounds of the window. The current patch only seems to accept 2 arguments. ERROR: function lag(character varying, integer, unknown) does not exist Thanks for your feedback. I agree I need to work on that. Also from the spec, RESPECT NULLS / IGNORE NULLS may be specified but not supported yet. This syntax specification is out of the postgres general function call so I wonder if those functions are treated specially or not. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Window Functions patch v06
Hitoshi Harada [EMAIL PROTECTED] writes: I agree I need to work on that. Also from the spec, RESPECT NULLS / IGNORE NULLS may be specified but not supported yet. This syntax specification is out of the postgres general function call so I wonder if those functions are treated specially or not. Egad, the SQL committee has certainly been taken over by creeping COBOL-itis when it comes to inventing random new syntax ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] xact_desc
ISTM that xact_desc routines do not work properly when called with WAL_DEBUG enabled from XLogInsert(). LOG: INSERT @ 0/3740978: prev 0/3740938; xid 5699: Transaction - commit: 2008-10-14 03:14:14.866437+01; subxacts: 10447936 0 STATEMENT: commit; LOG: INSERT @ 0/37409F0: prev 0/37409B0; xid 5702: Transaction - commit: 2008-10-14 03:14:17.687843+01; subxacts: 10447936 0 STATEMENT: commit; The arrays... work fine in recovery, just not prior to inserting. Anyway, that led me a merry dance with other code. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xact_desc
Simon Riggs [EMAIL PROTECTED] writes: ISTM that xact_desc routines do not work properly when called with WAL_DEBUG enabled from XLogInsert(). Well, now that you mention it, that code is utterly, completely broken, and always has been. It's passing only the first rdata-chunk of the WAL record to the print routine :-( AFAICS the only way to fix it would be to allocate some workspace and assemble the chunks of the record into that. Doubtful that it's worth it --- I'd be inclined to just remove the code instead. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches
Andrew Sullivan wrote: On Fri, Oct 10, 2008 at 01:44:49PM +0900, KaiGai Kohei wrote: Andrew Sullivan wrote: I want to focus on this description, because you appear to be limiting the problem scope tremendously here. We've moved from general security policy for database system to security policy for database system as part of a web-application stack. The general security policy for database system is an incorrect term. SELinux does not cover database system only. It covers operating sytem and application managing objects (like database object, X window, ...). Thus, it should be talked as general security policy for operating system, database system and so on. Ok, then let's use the broader case, which is general security policy for entire computing system including a RDBM subsystem (call this GSPECS+DB, say). This shows up even more the issue that considering primarily the application stack does not actually cover all the cases. I'm not suggesting, even a little bit, that securing an application stack as you propose is a waste of time. It could be, actually, that this more modest goal is the more appropriate one, and that SE-PostgreSQL would be a killer feature in this space (because it would, if it worked, solve a lot of problems that other systems have, as you have pointed out). But it is not GSPECS+DB, because of all the corner case problems whose behaviour still needs working out. Indeed, SE-PostgreSQL is an important piece of GSPECS+DB but it cannot catch the ultimate goal by itself only. Do you know other efforts to apply SELinux security policy for objects managed in userspace? One prior example is X-window system. Its resources are managed by X server so in-kernel SELinux cannot trap accesses to the objects. Some of them (like cutpaste buffer, key input events) can be shared several processes, so it should be controled by the policy. We can call it like GSPECS+X. As widely known, security is an endless work. The ultimate goal might not be as near as we can grab, but it does not mean it is not necessary to fill up pieces to help it. But plainly, others who need to look after the code will want to know what the exact goal is before committing themselves to future maintenance. It is same things as I repeated several times. Its goal is to apply centralized manageable security policy (SELinux) on database objects, as if SELinux doing on filesystem objects. This feature can help web-application security, for example. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches - Patent problems?
Andres Freund wrote: Hi, It might be relevant for the whole discussion about inclusion of some form of row level permissions, whatever based on, that there exist heaps of (in my eyes conflicting) patents about row level permissions for relational databases. I don't have any real clue about patent issues, but I fear that makes inclusion into an open source product rather hard... I'm not a lawyer, so we cannot decide whether it has patent issue or not until we get an adjudication in actually. However, I don't think these are conflicting the existing patent from the viewpoint of engineering. Data security system and method - 5751949 - MCI Corp. - 1998 It said the row-level access controls are applied to force users to access tables via views. It does not conflicts our design. Rule based database security system and method - 6820082 - Allegis Corporation - 2004 It said the row-level access controls are applied based on query modifying. The legacy implementation of SE-PostgreSQL indeed modified WHERE clause of given queries to apply row-level access controls, but current one does not. Row-level security in a relational database management system - 7240046 - IBM - 2007 It said the row-level access controls are applied based on hierarcal relationship between subject and object, which is well known as Bell-La-PaDula security model. SE-PostgreSQL does not have any rules by itself, because it depends on an external security feature (SELinux). Database ACL is not a hierarcal security model obviously. Database fine-grained access control - 7281003 - Oracle - 2007 It said the row-level access controls are applied based on query mofifying, like as the patent 6820082 doing. It does not conflicts to SE-PostgreSQL. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers