Re: [HACKERS] BufferAccessStrategy for bulk insert
On Thu, 2008-10-30 at 23:05 -0400, Robert Haas wrote: > > Whatever timings you have are worth publishing. > > Here are the timings for copying the first ten million integers into a > one-column table created in the same transaction, with and without the > patch. As you can see, now that I've corrected my previous error of > not putting CREATE TABLE and COPY in the same transaction, the savings > are quite substantial, about 15%. Nice! I had faith. ;-) Can you test whether using the buffer access strategy is a win or a loss? Most of that gain is probably coming from the reduction in pinning. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] array_agg and array_accum (patch)
On Wed, 2008-10-29 at 00:08 -0400, Robert Haas wrote: > It's worth noting that this is the third version of this idea that has > been submitted. Ian Caulfield submitted a patch to add this, and so > did I. Someone should probably look at all three of them and compare. > If we include a function named array_accum(), it should return an empty array on no input to match the function in the docs: http://www.postgresql.org/docs/8.3/static/xaggr.html Your function returns NULL on no input, which seems more like array_agg(). Aside from that, I'm pretty open to anything, as long as one of our patches makes it. If there are potential problems with the standard (where we don't want to implement a violation), we should just do array_accum(). If not, we might as well do the standard array_agg(), perhaps without the ORDER BY clause. We could also do both, because it is a little annoying to coalesce the result or array_agg(). Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] can we create a record of more than one field?
"Jaime Casanova" <[EMAIL PROTECTED]> writes: > in the spanish list someone reports that the following function > doesn't even compile: > CREATE FUNCTION select_temporal(OUT valor integer) RETURNS SETOF record AS > $BODY$ > begin > return query select 1::integer; > return; > end; > $BODY$ > LANGUAGE 'plpgsql' VOLATILE; > and this is the error message > ERROR: function result type must be integer because of OUT parameters It's not a record, it's an integer... if you had more than one out parameter then "record" would be sensible ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] can we create a record of more than one field?
Hi, in the spanish list someone reports that the following function doesn't even compile: CREATE FUNCTION select_temporal(OUT valor integer) RETURNS SETOF record AS $BODY$ begin return query select 1::integer; return; end; $BODY$ LANGUAGE 'plpgsql' VOLATILE; and this is the error message ERROR: function result type must be integer because of OUT parameters i was expecting the result to be a record because of the SETOF record. it's a bug or intentional? this is in 8.3.4 -- regards, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BufferAccessStrategy for bulk insert
> Whatever timings you have are worth publishing. Here are the timings for copying the first ten million integers into a one-column table created in the same transaction, with and without the patch. As you can see, now that I've corrected my previous error of not putting CREATE TABLE and COPY in the same transaction, the savings are quite substantial, about 15%. Nice! Trunk: Time: 18931.516 ms Time: 18251.732 ms Time: 17284.274 ms Time: 15900.131 ms Time: 16439.617 ms Patch: Time: 14852.123 ms Time: 15673.759 ms Time: 15776.450 ms Time: 14160.266 ms Time: 13374.243 ms ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BufferAccessStrategy for bulk insert
> You should try profiling the patch. You can count the invocations of the > buffer access routines to check its all working in the right ratios. *goes and learns how to do profile PostgreSQL* OK, that was a good suggestion. It looks like part of my problem here is that I didn't put the CREATE TABLE and the COPY into the same transaction. As a result, a lot of time was spent on XLogInsert. Modified the test case, new profiling results attached. ...Robert gmon.trunk.txt.gz Description: GNU Zip compressed data gmon.patched.txt.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange query behavior where clause produces odd behavior on '>' query
> -Original Message- > From: Tom Lane [mailto:[EMAIL PROTECTED] > Sent: Thursday, October 30, 2008 6:34 PM > To: Dann Corbit > Cc: [HACKERS]; Sherry Griffin > Subject: Re: [HACKERS] Strange query behavior where clause produces odd > behavior on '>' query > > "Dann Corbit" <[EMAIL PROTECTED]> writes: > >> What encoding/locale are you using? > > > Whatever the default encoding/locale is. > > "Whatever" is the wrong answer here. I just finished verifying that > the > sort order you're complaining about is the expected ordering in some > locales. I suggest that you take the trouble to find out. English (United States) is my locale. > > Are you unable to reproduce it? > > Well, I see this on a Fedora machine: > > $ cat foo > specd > SPECD > $ sort foo > SPECD > specd > $ LANG=en_US sort foo > specd > SPECD > $ The compare works as it should. The only bug was in my understanding. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sun Studio compiler warnings
Ooooh yeah. Time for some caffeine. ...Robert On Thu, Oct 30, 2008 at 9:34 PM, Alvaro Herrera <[EMAIL PROTECTED]> wrote: > Robert Haas escribió: >> > The closing semicolon is strictly speaking not allowed here. We could >> > remove it, but that would probably upset pgindent? >> > >> > I recall that we used to have a bunch of similar problems with the AIX >> > compilers a long time ago. Does anyone recall the solution, and do we >> > still >> > care? (Note that it's only a warning in this case.) >> >> How about the good old >> >> do { >> ... >> } while (0) >> >> trick? > > That can't be used because the macro is defining a completely new > function. > > -- > Alvaro Herrerahttp://www.CommandPrompt.com/ > PostgreSQL Replication, Consulting, Custom Development, 24x7 support > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange query behavior where clause produces odd behavior on '>' query
"Dann Corbit" <[EMAIL PROTECTED]> writes: >> What encoding/locale are you using? > Whatever the default encoding/locale is. "Whatever" is the wrong answer here. I just finished verifying that the sort order you're complaining about is the expected ordering in some locales. I suggest that you take the trouble to find out. > Are you unable to reproduce it? Well, I see this on a Fedora machine: $ cat foo specd SPECD $ sort foo SPECD specd $ LANG=en_US sort foo specd SPECD $ regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sun Studio compiler warnings
Robert Haas escribió: > > The closing semicolon is strictly speaking not allowed here. We could > > remove it, but that would probably upset pgindent? > > > > I recall that we used to have a bunch of similar problems with the AIX > > compilers a long time ago. Does anyone recall the solution, and do we still > > care? (Note that it's only a warning in this case.) > > How about the good old > > do { > ... > } while (0) > > trick? That can't be used because the macro is defining a completely new function. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange query behavior where clause produces odd behavior on '>' query
> -Original Message- > From: Tom Lane [mailto:[EMAIL PROTECTED] > Sent: Thursday, October 30, 2008 5:31 PM > To: Dann Corbit > Cc: [HACKERS]; Sherry Griffin > Subject: Re: [HACKERS] Strange query behavior where clause produces odd > behavior on '>' query > > "Dann Corbit" <[EMAIL PROTECTED]> writes: > > The following query: > > SELECT * FROM Customers_connxstore where customerid > 'specd' > > Returns the row containing Customers_connxstore.customerid == 'SPECD' > > What encoding/locale are you using? Whatever the default encoding/locale is. I did not define any custom encoding, locale, or collating sequence. > And while I'm asking, which PG > version? All versions from PostgreSQL 7.1.3 to "PostgreSQL 8.3.3, compiled by Visual C++ build 1400" seem to display this behavior. Are you unable to reproduce it? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sun Studio compiler warnings
> The closing semicolon is strictly speaking not allowed here. We could > remove it, but that would probably upset pgindent? > > I recall that we used to have a bunch of similar problems with the AIX > compilers a long time ago. Does anyone recall the solution, and do we still > care? (Note that it's only a warning in this case.) How about the good old do { ... } while (0) trick? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange query behavior where clause produces odd behavior on '>' query
"Dann Corbit" <[EMAIL PROTECTED]> writes: > The following query: > SELECT * FROM Customers_connxstore where customerid > 'specd' > Returns the row containing Customers_connxstore.customerid == 'SPECD' What encoding/locale are you using? And while I'm asking, which PG version? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Strange query behavior where clause produces odd behavior on '>' query
The following query: SELECT * FROM Customers_connxstore where customerid > 'specd' Returns the row containing Customers_connxstore.customerid == 'SPECD' I would expect to get that row if the query was: SELECT * FROM Customers_connxstore where customerid >= 'specd' The other operators (<, <=, >, =, !=) all work as expected. Sample file to reproduce the problem: http://cap.connx.com/bugreport/pgbug.sql.bz2 Is this a known issue with PostgreSQL or for some reason the desired behavior? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG_PAGE_LAYOUT_VERSION 5 - time for change
Gregory Stark <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> writes: >> ... 3b sounds good until you >> reflect that a genuinely variable chunk size would preclude random >> access to sub-ranges of a toast value. > Hm, Heikki had me convinced it wouldn't but now that I try to explain it I > can't get it to work. I think the idea is you start a scan at the desired > offset and scan until you reach a chunk which overruns the end of the desired > piece. However you really need to start scanning at the last chunk *prior* to > the desired offset. Yeah, that was my conclusion too. > I think you can actually do this with btrees but I don't know if our apis > support it. If you scan to find the first chunk > the desired offset and then > scan backwards one tuple you should be looking at the chunk in which the > desired offset lies. Well, that might work but it would typically cost you an extra fetch. Do we really have a use case for variable chunk size that is worth the cost? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG_PAGE_LAYOUT_VERSION 5 - time for change
Tom Lane <[EMAIL PROTECTED]> writes: >> 2) Add page type (e.g. btree) and subtype (e.g. metapage) flag into page >> header. >> I think It will be useful when we will use share buffer for clog. > > I think this is a pretty bad idea, because it'll eat space on every page > for data that is only useful to indexes. I don't believe that clog will > find it interesting either. To share buffers with clog will require > a change in buffer lookup tags, not buffer contents. Another example application which came to mind, if we ever wanted to do something like retail vacuum, pruning, or hint bit setting from bgwriter it would have to know how to tell heap pages apart from index pages. I'm not sure whether that would have to be on the page or if it could be in the buffertag as well? If we do decide we want to do this it wouldn't have to be very much space. 16 page types with 16 subtypes each would be plenty which would fit on a single byte. >> 3) TOAST modification >>a) TOAST table per attribute >>b) replace chunk id with offset+variable chunk size >>c) add column identification into first chunk > > I don't like 3a any more than Greg does. 3b sounds good until you > reflect that a genuinely variable chunk size would preclude random > access to sub-ranges of a toast value. Hm, Heikki had me convinced it wouldn't but now that I try to explain it I can't get it to work. I think the idea is you start a scan at the desired offset and scan until you reach a chunk which overruns the end of the desired piece. However you really need to start scanning at the last chunk *prior* to the desired offset. I think you can actually do this with btrees but I don't know if our apis support it. If you scan to find the first chunk > the desired offset and then scan backwards one tuple you should be looking at the chunk in which the desired offset lies. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG_PAGE_LAYOUT_VERSION 5 - time for change
Zdenek Kotala <[EMAIL PROTECTED]> writes: > 1) HeapTupleHeader modification > typedef struct HeapTupleFields > { > TransactionId t_xmin; /* inserting xact ID */ > TransactionId t_xmax; /* deleting or locking xact ID */ > union > { > CommandId t_cid; > TransactionId t_xvac; /* VACUUM FULL xact ID */ > } t_field3; >uint16 t_infomask; > } HeapTupleFields; This is unworkable (hint: the compiler will decide sizeof the struct must be a multiple of 4). I am also frightened to death of the proposal to swap various bits around between infomask and infomask2 --- that is *guaranteed* to break code silently. And you didn't explain exactly what it buys, anyway. Not space savings in the Datum form; alignment issues will prevent that. > 2) Add page type (e.g. btree) and subtype (e.g. metapage) flag into page > header. > I think It will be useful when we will use share buffer for clog. I think this is a pretty bad idea, because it'll eat space on every page for data that is only useful to indexes. I don't believe that clog will find it interesting either. To share buffers with clog will require a change in buffer lookup tags, not buffer contents. > 3) TOAST modification >a) TOAST table per attribute >b) replace chunk id with offset+variable chunk size >c) add column identification into first chunk I don't like 3a any more than Greg does. 3b sounds good until you reflect that a genuinely variable chunk size would preclude random access to sub-ranges of a toast value. A column ID might be worth adding for robustness purposes, though reducing the toast chunk payload size to make that possible will cause you fits for in-place upgrade. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLE command
Peter Eisentraut Wrote: > If I read this right, SQL would allow writing > > TABLE foo; Interesting; I managed to find it in the spec: 4) The TABLE is equivalent to the ( SELECT * FROM ) So going by that would the patch also have to support something like: WITH a AS (SELECT * FROM b) TABLE a; ? I'd probably find it hard to find a use case. I'm too used to using SELECT * FROM .. in psql. On the other hand last night I read a good web page comparing the most popular RDBMS' for spec. compliance and PostgreSQL probably was the most compliant all of the ones listed, (at least on the topics covered). Oracle fails badly on '' IS NULL being true. I enjoy seeing more spec compliant things being added. But on the other hand, going with Tom's comments, if its lots of work for little gain... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] User defined I/O conversion casts
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > Here's a patch. It seems pretty straightforward, I'll apply this > tomorrow, barring objections. Note to self: bump the catalog version. Looks sane in a fast once-over. I didn't cross-check the pg_cast.h data changes, but that's what the regression test check is for ;-) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] WIP: Automatic view update rules
Please find attached my current patch for automatic view update rules. This is a stripped down version of my former patch discussed before 8.3 without CHECK OPTION support and less invasive changes to the rewriter itself. I'm currently cleaning up the code with all occurences of multiple base relations (in progress) hence supporting SQL92 at the moment, only If we decide to go further with this approach i would like to work on the CHECK OPTION implementation based on some ideas we had in the past (e.g. rewriter/executor support). Note that i'm still working on this (for example, RETURNING is missing yet), As always, discussion welcome ;) -- Thanks Bernd view_update.patch.bz2 Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG_PAGE_LAYOUT_VERSION 5 - time for change
Zdenek Kotala <[EMAIL PROTECTED]> writes: > 3) TOAST modification > a) TOAST table per attribute > b) replace chunk id with offset+variable chunk size > c) add column identification into first chunk > > Thats all. I think infomask/infomask2 shuffle flag should be done. TOAST > modification complicates in-place upgrade. I don't think TOAST table per attribute is feasible You would end up with thousands of toast tables. It might be interesting as an option if you plan to drop the column but I don't see it as terribly interesting. What seemed to make sense to me for solving your problem was including the type oid in the toast chunks. I suppose attribute number might be just as good -- it would let you save upgrading chunks for dropped columns at the expense of having to look up the column info first. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] User defined I/O conversion casts
Tom Lane wrote: Heikki Linnakangas <[EMAIL PROTECTED]> writes: Yeah, a magical OID clearly has some issues. A new field in pg_cast is the obvious alternative. How about adding a "castmethod" char field, with values: b = binary-compatible cast (CREATE CAST ... WITHOUT FUNCTION) i = I/O coercion cast (the new beast, CREATE CAST ... WITH INOUT) f = use cast function specified in castfunc field. castfunc is 0 for methods b and i. Seems sane to me. If you do that, please add a check in the opr_sanity regression test that castfunc is nonzero if and only if castmethod is f. Here's a patch. It seems pretty straightforward, I'll apply this tomorrow, barring objections. Note to self: bump the catalog version. BTW, it looks like we don't have any test cases for the CREATE CAST command. I think I'll add one. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com *** doc/src/sgml/catalogs.sgml --- doc/src/sgml/catalogs.sgml *** *** 1415,1423 cannot be deduced from some generic rule. For example, casting between a domain and its base type is not explicitly represented in pg_cast. Another important exception is that !I/O conversion casts, those performed using a data type's own !I/O functions to convert to or from text or other string types, !are not explicitly represented in pg_cast. --- 1415,1424 cannot be deduced from some generic rule. For example, casting between a domain and its base type is not explicitly represented in pg_cast. Another important exception is that !automatic I/O conversion casts, those performed using a data !type's own I/O functions to convert to or from text or other !string types, are not explicitly represented in !pg_cast. *** *** 1454,1461 pg_proc.oid The OID of the function to use to perform this cast. Zero is !stored if the data types are binary coercible (that is, no !run-time operation is needed to perform the cast) --- 1455,1461 pg_proc.oid The OID of the function to use to perform this cast. Zero is !stored if the cast method doesn't require a function. *** *** 1473,1478 --- 1473,1489 other cases + + castmethod + char + + +Indicates how the cast is performed. +f means that the function specified in the castfunc field is used. +i means that the input/output functions are used. +b means that the types are binary-coercible, thus no conversion is required + + *** doc/src/sgml/ref/create_cast.sgml --- doc/src/sgml/ref/create_cast.sgml *** *** 24,29 CREATE CAST (sourcetype AS targettypesourcetype AS targettype) WITHOUT FUNCTION [ AS ASSIGNMENT | AS IMPLICIT ] + + CREATE CAST (sourcetype AS targettype) + WITH INOUT + [ AS ASSIGNMENT | AS IMPLICIT ] *** *** 59,64 SELECT CAST(42 AS float8); --- 63,75 +You can define a cast as an I/O conversion cast using +the WITH INOUT syntax. An I/O conversion cast is +performed by invoking the output function of the source data type, and +passing the result to the input function of the target data type. + + + By default, a cast can be invoked only by an explicit cast request, that is an explicit CAST(x AS typename) or *** *** 200,205 SELECT CAST ( 2 AS numeric ) + 4.0; --- 211,228 + WITH INOUT + + + +Indicates that the cast is an I/O conversion cast, performed by +invoking the output function of the source data type, and passing the +result to the input function of the target data type. + + + + + AS ASSIGNMENT *** *** 284,296 SELECT CAST ( 2 AS numeric ) + 4.0; It is normally not necessary to create casts between user-defined types and the standard string types (text, varchar, and char(n), as well as user-defined types that !are defined to be in the string category). PostgreSQL will !automatically handle a cast to a string type by invoking the other !type's output function, or conversely handle a cast from a string type !by invoking the other type's input function. These !automatically-provided casts are known as I/O conversion !casts. I/O conversion casts to string types are treated as !assignment casts, while I/O conversion casts from string types are explicit-only. You can override this behavior by declaring your own cast to replace an I/O conversion cast, but usually the only reason to do so is if you want the conversion to be more easily invokable than the --- 307,316 It is normally not necessary to create casts
[HACKERS] Updated posix fadvise patch v19
Here is an updated posix_fadvise patch against CVS HEAD. It's my first patch generated using git, yay. I remembered to strip out configure (WHY is that STILL in CVS!?) and convert it to context diff, but if there's anything else I've missed just shout. Changes from before: 1) Based on discussions changed the parameter to effective_io_concurrency and modified the documentation discussion to include comments about SSDs and experimentation being necessary. 2) standardized all references to "prefetch" from the mixture of same and "preread" and "advise". This means the main entry point is now PrefetchBuffer(). 3) Added run-time check for buggy glibc versions with bad posix_fadvise() since it was pointed out that someone could run on a different version of glibc than it was compiled on (such as with binary installers). I kept the autoconf test though arguably it's redundant now and could be removed. Also, the previous patch included support for Zoltan's situation using POSIX_FADV_SEQUENTIAL for bulk sequential scans. I experimented with this and was not able to find any situation it improved. I've left it in so others can test and show it's helpful. This bit of code is a bit less polished -- it's missing documentation for the guc variable and there are some #defines which are probably in the wrong .h file (and probably should be an enum). This code is entirely separate from the prefetching code. They happen to both use posix_fadvise but I don't think they belong in the same abstraction for Postgres. PrefetchBuffer does just one thing -- prefetches a buffer. Sequential i/o needs to be set and tracked per-file. I have not changed the query costing side of things. I'm a bit leery about making those changes as I fear it will lead me to propose a major reworking of these parameters and I don't want to go there... posix-fadvise-v19.patch.gz Description: Binary data -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PG_PAGE_LAYOUT_VERSION 5 - time for change
It seems that we are going to bump Page Layout Version to version 5 (see CRC patch for detail). Maybe it is good time to do some other changes. There is a list of ideas (please, do not beat me :-). Some of them we discussed in Prato and Greg maybe have more. 1) HeapTupleHeader modification typedef struct HeapTupleFields { TransactionId t_xmin; /* inserting xact ID */ TransactionId t_xmax; /* deleting or locking xact ID */ union { CommandId t_cid; TransactionId t_xvac; /* VACUUM FULL xact ID */ } t_field3; uint16 t_infomask; } HeapTupleFields; typedef struct HeapTupleHeaderData { union { HeapTupleFields t_heap; DatumTupleFields t_datum; } t_choice; ItemPointerData t_ctid; /* current TID of this or newer tuple */ /* Fields below here must match MinimalTupleData! */ uint16 t_infomask2; uint8 t_hoff; /* ^ - 23 bytes - ^ */ bits8 t_bits[1]; } HeapTupleHeaderData; This also requires shuffle flags between infomask and infomask2. infomask2 should have only flag: HASNULL,HASOID,HASVARWIDTH and HASEXTERNAL And minimal tuple does not need infomask field which will contains only transaction hint bits. Unfortunately, structure alligment is not much friendly. 2) Add page type (e.g. btree) and subtype (e.g. metapage) flag into page header. I think It will be useful when we will use share buffer for clog. 3) TOAST modification a) TOAST table per attribute b) replace chunk id with offset+variable chunk size c) add column identification into first chunk Thats all. I think infomask/infomask2 shuffle flag should be done. TOAST modification complicates in-place upgrade. Comments other ideas? Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] contrib/pg_stat_statements
ITAGAKI Takahiro wrote: >> But is the idea of extending QueryDesc generally acceptable? Is it OK >> to make a copy of the query string? > > The only thing I'm worried about is that QueryDesc lives longer than > its queryText. Can I assume it never occurs? > I just finished validating this -- it seems OK. All of the query strings that make it to CreateQueryDesc are either pstrdup-ed to Portal, in long lived memory context or just literals. So no need to make extra copies :) regards, Martin *** a/src/backend/commands/copy.c --- b/src/backend/commands/copy.c *** *** 1056,1062 DoCopy(const CopyStmt *stmt, const char *queryString) /* Create a QueryDesc requesting no output */ cstate->queryDesc = CreateQueryDesc(plan, GetActiveSnapshot(), InvalidSnapshot, ! dest, NULL, false); /* * Call ExecutorStart to prepare the plan for execution. --- 1056,1062 /* Create a QueryDesc requesting no output */ cstate->queryDesc = CreateQueryDesc(plan, GetActiveSnapshot(), InvalidSnapshot, ! dest, NULL, false, queryString); /* * Call ExecutorStart to prepare the plan for execution. *** a/src/backend/commands/explain.c --- b/src/backend/commands/explain.c *** *** 172,178 ExplainOneQuery(Query *query, ExplainStmt *stmt, const char *queryString, plan = pg_plan_query(query, 0, params); /* run it (if needed) and produce output */ ! ExplainOnePlan(plan, params, stmt, tstate); } } --- 172,178 plan = pg_plan_query(query, 0, params); /* run it (if needed) and produce output */ ! ExplainOnePlan(plan, params, stmt, tstate, queryString); } } *** *** 219,225 ExplainOneUtility(Node *utilityStmt, ExplainStmt *stmt, */ void ExplainOnePlan(PlannedStmt *plannedstmt, ParamListInfo params, ! ExplainStmt *stmt, TupOutputState *tstate) { QueryDesc *queryDesc; instr_time starttime; --- 219,226 */ void ExplainOnePlan(PlannedStmt *plannedstmt, ParamListInfo params, ! ExplainStmt *stmt, TupOutputState *tstate, ! const char *queryString) { QueryDesc *queryDesc; instr_time starttime; *** *** 238,244 ExplainOnePlan(PlannedStmt *plannedstmt, ParamListInfo params, queryDesc = CreateQueryDesc(plannedstmt, GetActiveSnapshot(), InvalidSnapshot, None_Receiver, params, ! stmt->analyze); INSTR_TIME_SET_CURRENT(starttime); --- 239,245 queryDesc = CreateQueryDesc(plannedstmt, GetActiveSnapshot(), InvalidSnapshot, None_Receiver, params, ! stmt->analyze, queryString); INSTR_TIME_SET_CURRENT(starttime); *** a/src/backend/commands/prepare.c --- b/src/backend/commands/prepare.c *** *** 697,703 ExplainExecuteQuery(ExecuteStmt *execstmt, ExplainStmt *stmt, pstmt->intoClause = execstmt->into; } ! ExplainOnePlan(pstmt, paramLI, stmt, tstate); } else { --- 697,703 pstmt->intoClause = execstmt->into; } ! ExplainOnePlan(pstmt, paramLI, stmt, tstate, queryString); } else { *** a/src/backend/executor/functions.c --- b/src/backend/executor/functions.c *** *** 309,320 postquel_start(execution_state *es, SQLFunctionCachePtr fcache) es->qd = CreateQueryDesc((PlannedStmt *) es->stmt, snapshot, InvalidSnapshot, None_Receiver, ! fcache->paramLI, false); else es->qd = CreateUtilityQueryDesc(es->stmt, snapshot, None_Receiver, ! fcache->paramLI); /* We assume we don't need to set up ActiveSnapshot for ExecutorStart */ --- 309,320 es->qd = CreateQueryDesc((PlannedStmt *) es->stmt, snapshot, InvalidSnapshot, None_Receiver, ! fcache->paramLI, false, fcache->src); else es->qd = CreateUtilityQueryDesc(es->stmt, snapshot, None_Receiver, ! fcache->paramLI, fcache->src); /* We assume we don't need to set up ActiveSnapshot for ExecutorStart */ *** a/src/backend/executor/spi.c --- b/src/backend/executor/spi.c *** *** 1795,1801 _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI, qdesc = CreateQueryDesc((PlannedStmt *) stmt, snap, crosscheck_snapshot, dest, ! paramLI, false); res = _SPI_pquery(qdesc, fire_triggers, canSetTag ? tcount : 0); FreeQueryDesc(qdesc); --- 1795,1802 qdesc = CreateQueryDesc((PlannedStmt *) stmt, snap, crosscheck_snapshot, dest, ! paramLI, false, ! plansource->query_string); res = _SPI_pquery(qdesc, fire_triggers, canSetTag ? tcount : 0); FreeQueryDesc(qdesc); *** a/src/backend/tcop/pquery.c --- b/src/backend/tcop/pquery.c *** *** 37,43 Portal ActivePortal = NULL; static void ProcessQ
Re: [HACKERS] Block-level CRC checks
On Thu, Oct 30, 2008 at 03:41:17PM +, Gregory Stark wrote: > The CRC is chosen such that if you CRC the resulting packet including the CRC > you get a CRC of 0. That can be done for whatever offset the CRC appears at I > believe. IIRC, you calculate the CRC-32 of the page, then XOR it over where it's supposed to end up. No need to preseed (or more accurately, it doesn't matter how you preseed, the result is the same). For checking it doesn't matter either, just checksum the page and if you get zero it's correct. Have a nice day, -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [HACKERS] Block-level CRC checks
Alvaro Herrera wrote: > Hmm, oh I see another problem here -- the bit is not restored when > replayed heap_update's WAL record. I'm now wondering what other bits > are set without much care about correctly restoring them during replay. I'm now wondering whether it'd be easier to just ignore pd_flags in calculating the checksum. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Thu, Oct 30, 2008 at 12:14 PM, Alvaro Herrera <[EMAIL PROTECTED]> wrote: > Gregory Stark escribió: > >> What I'm wondering though -- are we going to make CRCs mandatory? Or set >> aside >> the 4 bytes even if you're not using them? Because if the size of the page >> header varies depending on whether you're using CRCs that sounds like it >> would >> be quite a pain. > > Not mandatory, but the space needs to be set aside. (Otherwise you > couldn't turn it on after running with it turned off, which would rule > out using the database after initdb). Agreed. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores
> With session variables we could implement srf function in plpgsql like > current C srf function. Like > > create or replace function foo() > returns record as $$ > #option with_srf_context(datatype of srf context) > begin > return row(...); > end; > $$ language plpgsql; Oh, sure - but what you can do with this will be somewhat limited compared to a Perl hash reference off which you can chain any arbitrary data structure with ease. I'd want to see an actual use case for this before anyone bothered implementing it. I was actually thinking one way to do it would be to extend the variable declaration syntax so that you could declare n>=0 variables as SRF context variables, which I think is nicer, but even that I think is of limited usefulness. I think the biggest value of PL/plgsql is the ability to RETURN QUERY, and I think the ability to push a lazy execution model down into that subordinate query is where the win is. That case won't be helped at all by this sort of alternate calling convention - in fact it'll be nearly impossible to even do that at all with this type of execution model. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] HeapTuple version extension + code cleanup
This patch requires patch: http://archives.postgresql.org/message-id/[EMAIL PROTECTED] It add t_ver information into HeapTuple structure and add DatumGetHeapTuple function which allows to reduce code on many places. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql heaptuple_ver.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
Gregory Stark escribió: > What I'm wondering though -- are we going to make CRCs mandatory? Or set aside > the 4 bytes even if you're not using them? Because if the size of the page > header varies depending on whether you're using CRCs that sounds like it would > be quite a pain. Not mandatory, but the space needs to be set aside. (Otherwise you couldn't turn it on after running with it turned off, which would rule out using the database after initdb). -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about GetAttributeByNum(Name) ExecQual.c
Zdenek Kotala <[EMAIL PROTECTED]> writes: > Does it mean that these function are every time called with heap tuple or > untoasted row type (composite data type)? What is purpose of these function. Legacy support for third-party modules. They're really pretty much deprecated but I don't foresee removing them. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
Tom Lane napsal(a): Zdenek Kotala <[EMAIL PROTECTED]> writes: By the way, do you need CRC as a first page member? Is it for future development like CLOG integration into buffers? Why not put it on the end as and mark it as a special? It will reduce space requirement when CRC is not enabled. ... and make life tremendously more complex for indexes, Indexes uses PageGetSpecial macro and they live with them and PageInit could do correct placement. Only problem are assert macros and extra check which verifies correct size of special. plus turning CRC checking on or off on-the-fly would be problematic. Yeah, it is problem. I think Alvaro has the right idea: just put the field there all the time. Agree. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
Zdenek Kotala <[EMAIL PROTECTED]> writes: > By the way, do you need CRC as a first page member? Is it for future > development > like CLOG integration into buffers? Why not put it on the end as and mark it > as > a special? It will reduce space requirement when CRC is not enabled. ... and make life tremendously more complex for indexes, plus turning CRC checking on or off on-the-fly would be problematic. I think Alvaro has the right idea: just put the field there all the time. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores
I wrote: > I'm currently going to have a look at just what it would take to support > both lazy and eager evaluation in functions.c (independently of what > syntax, if any, we settle on to expose the choice to the user). If it's > either really awful or really easy we should know that before arguing > further. Attached is a draft patch that allows SQL functions to return sets using either value-per-call or materialize mode. It does not expose any control to the user; for the moment, the choice is driven by whether the call site is ExecMakeFunctionResult (which prefers value-per-call) or ExecMakeTableFunctionResult (which prefers materialize). I estimate that functions.c is two or three hundred lines longer than it would be if we stripped the value-per-call support and simplified the logic down to what I had in my prior patch. Which is not all that much in the big scheme of things, so I'll withdraw my argument for simplifying. I'm not sure if it's worth adding a control knob or not --- it's still true that materialize is faster on a tuple-by-tuple basis, but whether the difference is all that significant for nontrivial queries is debatable. Anyway I don't really want to work on that point right now. The next step is to make it actually support RETURNING queries, and if I don't get on with that I won't finish it before commitfest. regards, tom lane binpZLb0Tylz0.bin Description: sql-functions-2.patch.gz -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
Alvaro Herrera napsal(a): Zdenek Kotala wrote: Alvaro Herrera napsal(a): Simon Riggs wrote: But perhaps writing a single WAL record if you scan whole page and set all bits at once. Then it makes sense in some cases. So this is what I ended up doing; attached. Please, DO NOT MOVE position of page version in PageHeader structure! Hmm. The only way I see we could do that is to modify the checksum struct member to a predefined value before calculating the page's checksum. Ah, actually there's another alternative -- leave the checksum on its current position (start of struct) and move other members below pg_pagesize_version (leaning towards pd_tli and pd_flags). That'd leave the page version in the same position. (Hmm, maybe it's better to move pd_lower and pd_upper?) No, please, keep pd_lower and pd_upper on same position. They are accessed more often than pd_tli and pd_flags. It is better for optimization. By the way, do you need CRC as a first page member? Is it for future development like CLOG integration into buffers? Why not put it on the end as and mark it as a special? It will reduce space requirement when CRC is not enabled. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Ah, actually there's another alternative -- leave the checksum on its > current position (start of struct) and move other members below > pg_pagesize_version (leaning towards pd_tli and pd_flags). That'd leave > the page version in the same position. I don't understand why the position of anything matters here. Look at TCP packets for instance, the checksum is not at the beginning or end of anything. The CRC is chosen such that if you CRC the resulting packet including the CRC you get a CRC of 0. That can be done for whatever offset the CRC appears at I believe. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres-R pacth
Hi, Imre Geczy wrote: > What kind of form or method must be used to patch that it can work correctly? I finally got around writing some installation instructions: http://www.postgres-r.org/documentation/installation Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Thu, Oct 30, 2008 at 11:27 AM, Tom Lane <[EMAIL PROTECTED]> wrote: > "Jonah H. Harris" <[EMAIL PROTECTED]> writes: >> On Thu, Oct 30, 2008 at 11:14 AM, Tom Lane <[EMAIL PROTECTED]> wrote: >>> Well, yeah, but it has to be able to tell which version it's dealing >>> with. I quite agree with Zdenek that keeping the version indicator >>> in a fixed location is appropriate. > >> Most of the other databases I've worked, which don't have different >> types of pages, put the page version as the first element of the page. >> That would let us put the crc right after it. Thoughts? > > "Fixed location" does not mean "let's move it". Just trying to be helpful. Just thought I might give some insight as to what others, who had implemented in-place upgrade functionality years before Postgres' existence, had done. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
Jonah H. Harris napsal(a): On Thu, Oct 30, 2008 at 10:33 AM, Zdenek Kotala <[EMAIL PROTECTED]> wrote: Please, DO NOT MOVE position of page version in PageHeader structure! And PG_PAGE_LAYOUT_VERSION should be bump to 5. Umm, any in-place upgrade should be capable of handling changes to the page header. Of, did I miss something significant in the in-place upgrade design? Not any change. If you move page header version field to another position it will require kind of magic to detect what version it is. Other field you can place everywhere :-), but do not touch page version. It will brings a lot of problems... Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
"Jonah H. Harris" <[EMAIL PROTECTED]> writes: > On Thu, Oct 30, 2008 at 11:14 AM, Tom Lane <[EMAIL PROTECTED]> wrote: >> Well, yeah, but it has to be able to tell which version it's dealing >> with. I quite agree with Zdenek that keeping the version indicator >> in a fixed location is appropriate. > Most of the other databases I've worked, which don't have different > types of pages, put the page version as the first element of the page. > That would let us put the crc right after it. Thoughts? "Fixed location" does not mean "let's move it". regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
Gregory Stark wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > > Alvaro Herrera wrote: > > > >> The "other hint bits" are: > >> > >> - LP_DEAD as used by the various callers of ItemIdMarkDead. > >> - PD_PAGE_FULL > >> - BTPageOpaque->btpo_flags and btpo_cycleid > >> > >> All of them are changed with only SetBufferCommitInfoNeedsSave being > >> called afterwards. > > > > I think we could get away with WAL-logging LP_DEAD via ItemIdMarkDead > > similar to what is done to SetHintBits in the posted patch, and cope > > with the rest by marking the page with the invalid checksum; they are > > not so frequent anyway so the reliability loss is low. > > If PD_PAGE_FULL is set and that causes the crc to be set to the invalid sum > will we ever get another chance to set it? I should have qualified that a bit more. It's not setting PD_FULL that's not logged, but clearing it (heap_prune_page, line 282). It's set in heap_update. Hmm, oh I see another problem here -- the bit is not restored when replayed heap_update's WAL record. I'm now wondering what other bits are set without much care about correctly restoring them during replay. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Thu, Oct 30, 2008 at 11:14 AM, Tom Lane <[EMAIL PROTECTED]> wrote: > "Jonah H. Harris" <[EMAIL PROTECTED]> writes: >> On Thu, Oct 30, 2008 at 10:33 AM, Zdenek Kotala <[EMAIL PROTECTED]> wrote: >>> Please, DO NOT MOVE position of page version in PageHeader structure! And >>> PG_PAGE_LAYOUT_VERSION should be bump to 5. > >> Umm, any in-place upgrade should be capable of handling changes to the >> page header. > > Well, yeah, but it has to be able to tell which version it's dealing > with. I quite agree with Zdenek that keeping the version indicator > in a fixed location is appropriate. Most of the other databases I've worked, which don't have different types of pages, put the page version as the first element of the page. That would let us put the crc right after it. Thoughts? -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
"Jonah H. Harris" <[EMAIL PROTECTED]> writes: > On Thu, Oct 30, 2008 at 10:33 AM, Zdenek Kotala <[EMAIL PROTECTED]> wrote: >> Please, DO NOT MOVE position of page version in PageHeader structure! And >> PG_PAGE_LAYOUT_VERSION should be bump to 5. > Umm, any in-place upgrade should be capable of handling changes to the > page header. Well, yeah, but it has to be able to tell which version it's dealing with. I quite agree with Zdenek that keeping the version indicator in a fixed location is appropriate. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
Zdenek Kotala wrote: > Alvaro Herrera napsal(a): >> Simon Riggs wrote: >> >>> But perhaps writing a single WAL record if you scan whole page and set >>> all bits at once. Then it makes sense in some cases. >> >> So this is what I ended up doing; attached. > > Please, DO NOT MOVE position of page version in PageHeader structure! Hmm. The only way I see we could do that is to modify the checksum struct member to a predefined value before calculating the page's checksum. Ah, actually there's another alternative -- leave the checksum on its current position (start of struct) and move other members below pg_pagesize_version (leaning towards pd_tli and pd_flags). That'd leave the page version in the same position. (Hmm, maybe it's better to move pd_lower and pd_upper?) > And PG_PAGE_LAYOUT_VERSION should be bump to 5. Easily done; thanks for the reminder. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. *** src/include/storage/bufpage.h 14 Jul 2008 03:22:32 - 1.83 --- src/include/storage/bufpage.h 30 Oct 2008 15:05:24 - *** *** 17,22 --- 17,23 #include "access/xlogdefs.h" #include "storage/item.h" #include "storage/off.h" + #include "utils/pg_crc.h" /* * A postgres disk page is an abstraction layered on top of a postgres *** *** 87,92 --- 88,94 * * space management information generic to any page * + * pd_checksum - the checksum of the page * pd_lsn - identifies xlog record for last change to this page. * pd_tli - ditto. * pd_flags - flag bits. *** *** 118,136 * the constraint on pagesize mod 256 is not an important restriction. * On the high end, we can only support pages up to 32KB because lp_off/lp_len * are 15 bits. */ typedef struct PageHeaderData { ! /* XXX LSN is member of *any* block, not only page-organized ones */ XLogRecPtr pd_lsn; /* LSN: next byte after last byte of xlog * record for last change to this page */ - uint16 pd_tli; /* least significant bits of the TimeLineID - * containing the LSN */ - uint16 pd_flags; /* flag bits, see below */ LocationIndex pd_lower; /* offset to start of free space */ LocationIndex pd_upper; /* offset to end of free space */ LocationIndex pd_special; /* offset to start of special space */ uint16 pd_pagesize_version; TransactionId pd_prune_xid; /* oldest prunable XID, or zero if none */ ItemIdData pd_linp[1]; /* beginning of line pointer array */ } PageHeaderData; --- 120,143 * the constraint on pagesize mod 256 is not an important restriction. * On the high end, we can only support pages up to 32KB because lp_off/lp_len * are 15 bits. + * + * Note that pd_tli appears in a rather awkward position in the struct; + * this is because we moved it to accomodate pd_checksum without changing + * pg_pagesize_version's offset. */ typedef struct PageHeaderData { ! /* XXX CRC & LSN are members of *any* block, not only page-organized ones */ ! pg_crc32 pd_checksum;/* The block-level checksum */ XLogRecPtr pd_lsn; /* LSN: next byte after last byte of xlog * record for last change to this page */ LocationIndex pd_lower; /* offset to start of free space */ LocationIndex pd_upper; /* offset to end of free space */ LocationIndex pd_special; /* offset to start of special space */ uint16 pd_pagesize_version; + uint16 pd_tli; /* least significant bits of the TimeLineID + * containing the LSN */ + uint16 pd_flags; /* flag bits, see below */ TransactionId pd_prune_xid; /* oldest prunable XID, or zero if none */ ItemIdData pd_linp[1]; /* beginning of line pointer array */ } PageHeaderData; *** *** 148,159 * PD_PAGE_FULL is set if an UPDATE doesn't find enough free space in the * page for its new tuple version; this suggests that a prune is needed. * Again, this is just a hint. */ #define PD_HAS_FREE_LINES 0x0001 /* are there any unused line pointers? */ #define PD_PAGE_FULL 0x0002 /* not enough free space for new * tuple? */ ! #define PD_VALID_FLAG_BITS 0x0003 /* OR of all valid pd_flags bits */ /* * Page layout version number 0 is for pre-7.3 Postgres releases. --- 155,172 * PD_PAGE_FULL is set if an UPDATE doesn't find enough free space in the * page for its new tuple version; this suggests that a prune is needed. * Again, this is just a hint. + * + * PG_UNLOGGED_CHANGE indicates whether a process has set hint bits on the + * page. This is used to determine whether a WAL message needs to be emitted + * before writing the page to disk when page checksums are enabled. */ #define PD_HAS_FREE_LINES 0x0001 /* are there any unused line pointers? */ #define PD_PAGE_FULL 0x0002 /* not enough free space for new * tuple? */ + #define PD_UNLOGGED_CHANG
Re: [HACKERS] Block-level CRC checks
"Joshua D. Drake" <[EMAIL PROTECTED]> writes: > Jonah H. Harris wrote: >> On Thu, Oct 30, 2008 at 10:33 AM, Zdenek Kotala <[EMAIL PROTECTED]> wrote: >>> Please, DO NOT MOVE position of page version in PageHeader structure! And >>> PG_PAGE_LAYOUT_VERSION should be bump to 5. >> >> Umm, any in-place upgrade should be capable of handling changes to the >> page header. Of, did I miss something significant in the in-place > > I thought that was kind of the point of in place upgrade. Sure, but he has to have a reliable way to tell what version of the page header he's looking at... What I'm wondering though -- are we going to make CRCs mandatory? Or set aside the 4 bytes even if you're not using them? Because if the size of the page header varies depending on whether you're using CRCs that sounds like it would be quite a pain. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
Jonah H. Harris wrote: On Thu, Oct 30, 2008 at 10:33 AM, Zdenek Kotala <[EMAIL PROTECTED]> wrote: Please, DO NOT MOVE position of page version in PageHeader structure! And PG_PAGE_LAYOUT_VERSION should be bump to 5. Umm, any in-place upgrade should be capable of handling changes to the page header. Of, did I miss something significant in the in-place I thought that was kind of the point of in place upgrade. Joshua D. Drake -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Thu, Oct 30, 2008 at 10:33 AM, Zdenek Kotala <[EMAIL PROTECTED]> wrote: > Please, DO NOT MOVE position of page version in PageHeader structure! And > PG_PAGE_LAYOUT_VERSION should be bump to 5. Umm, any in-place upgrade should be capable of handling changes to the page header. Of, did I miss something significant in the in-place upgrade design? -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
Alvaro Herrera napsal(a): Simon Riggs wrote: But perhaps writing a single WAL record if you scan whole page and set all bits at once. Then it makes sense in some cases. So this is what I ended up doing; attached. There are some gotchas in this patch: Please, DO NOT MOVE position of page version in PageHeader structure! And PG_PAGE_LAYOUT_VERSION should be bump to 5. Thanks Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sun Studio compiler warnings
Peter Eisentraut <[EMAIL PROTECTED]> writes: > CMPFUNC(tsquery_lt, res < 0); > CMPFUNC(tsquery_le, res <= 0); > CMPFUNC(tsquery_eq, res == 0); > CMPFUNC(tsquery_ge, res >= 0); > CMPFUNC(tsquery_gt, res > 0); > CMPFUNC(tsquery_ne, res != 0); > The closing semicolon is strictly speaking not allowed here. We could > remove it, but that would probably upset pgindent? If the warnings annoy you, do what PG_FUNCTION_INFO_V1 does. #define PG_FUNCTION_INFO_V1(funcname) \ extern PGDLLIMPORT const Pg_finfo_record * CppConcat(pg_finfo_,funcname)(void); \ const Pg_finfo_record * \ CppConcat(pg_finfo_,funcname) (void) \ { \ static const Pg_finfo_record my_finfo = { 1 }; \ return &my_finfo; \ } \ extern int no_such_variable regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Sun Studio compiler warnings
Sun Studio is producing these compiler warnings (among others): "tsquery_op.c", line 193: warning: syntax error: empty declaration "tsquery_op.c", line 194: warning: syntax error: empty declaration "tsquery_op.c", line 195: warning: syntax error: empty declaration "tsquery_op.c", line 196: warning: syntax error: empty declaration "tsquery_op.c", line 197: warning: syntax error: empty declaration "tsquery_op.c", line 198: warning: syntax error: empty declaration "tsvector_op.c", line 177: warning: syntax error: empty declaration "tsvector_op.c", line 178: warning: syntax error: empty declaration "tsvector_op.c", line 179: warning: syntax error: empty declaration "tsvector_op.c", line 180: warning: syntax error: empty declaration "tsvector_op.c", line 181: warning: syntax error: empty declaration "tsvector_op.c", line 182: warning: syntax error: empty declaration "tsvector_op.c", line 183: warning: syntax error: empty declaration This relates to the following sort of code: #define CMPFUNC( NAME, CONDITION ) \ Datum \ NAME(PG_FUNCTION_ARGS) {\ TSQuery a = PG_GETARG_TSQUERY_COPY(0); \ TSQuery b = PG_GETARG_TSQUERY_COPY(1); \ int res = CompareTSQ(a,b); \ \ PG_FREE_IF_COPY(a,0); \ PG_FREE_IF_COPY(b,1); \ \ PG_RETURN_BOOL( CONDITION );\ } CMPFUNC(tsquery_lt, res < 0); CMPFUNC(tsquery_le, res <= 0); CMPFUNC(tsquery_eq, res == 0); CMPFUNC(tsquery_ge, res >= 0); CMPFUNC(tsquery_gt, res > 0); CMPFUNC(tsquery_ne, res != 0); The closing semicolon is strictly speaking not allowed here. We could remove it, but that would probably upset pgindent? I recall that we used to have a bunch of similar problems with the AIX compilers a long time ago. Does anyone recall the solution, and do we still care? (Note that it's only a warning in this case.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Alvaro Herrera wrote: > >> The "other hint bits" are: >> >> - LP_DEAD as used by the various callers of ItemIdMarkDead. >> - PD_PAGE_FULL >> - BTPageOpaque->btpo_flags and btpo_cycleid >> >> All of them are changed with only SetBufferCommitInfoNeedsSave being >> called afterwards. > > I think we could get away with WAL-logging LP_DEAD via ItemIdMarkDead > similar to what is done to SetHintBits in the posted patch, and cope > with the rest by marking the page with the invalid checksum; they are > not so frequent anyway so the reliability loss is low. If PD_PAGE_FULL is set and that causes the crc to be set to the invalid sum will we ever get another chance to set it? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Please make sure your patches are on the wiki page
>> (1) moving all of the patches committed prior to 11/1 to a separate >> section or page > > Why? To reduce clutter, but I don't feel strongly about it. >> (2) sorting the pending patches by complexity or subject matter > > Sorting them by complexity would be great, if I thought I could do it. I'm > not sure I can. I think the biggest patches for this commitfest are SEPostgresql, Simon Riggs' work on hot standby (which is not on the commitfest page yet and probably supersedes some of the earlier patches that are still on there), window functions, DDL lock strength reduction (not sure how big it is but I would guess it probably has to be reviewed by -core), parallel restore, and maybe grouping sets (though it seems like that is a ways from being committable). I agree with Tom that it would probably be good to try to get these (and any other big ones that I may have missed) reviewed by core early and committed or feedback given quickly. It may require some of the other patches to be resnapped to head but that is probably the lesser of two evils. One other random note - I don't believe there has been a new version of the column-level permissions patch that Stephen Frost was working on since the last commitfest. Unless someone disagrees with Markus Wanner's conclusion that it wasn't ready for commit at that point, this one can probably be bounced from the get-go. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] User defined I/O conversion casts
Tom Lane wrote: Heikki Linnakangas <[EMAIL PROTECTED]> writes: Yeah, a magical OID clearly has some issues. A new field in pg_cast is the obvious alternative. How about adding a "castmethod" char field, with values: b = binary-compatible cast (CREATE CAST ... WITHOUT FUNCTION) i = I/O coercion cast (the new beast, CREATE CAST ... WITH INOUT) f = use cast function specified in castfunc field. castfunc is 0 for methods b and i. Seems sane to me. If you do that, please add a check in the opr_sanity regression test that castfunc is nonzero if and only if castmethod is f. Ok, will do. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] User defined I/O conversion casts
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > Yeah, a magical OID clearly has some issues. A new field in pg_cast is > the obvious alternative. How about adding a "castmethod" char field, > with values: > b = binary-compatible cast (CREATE CAST ... WITHOUT FUNCTION) > i = I/O coercion cast (the new beast, CREATE CAST ... WITH INOUT) > f = use cast function specified in castfunc field. > castfunc is 0 for methods b and i. Seems sane to me. If you do that, please add a check in the opr_sanity regression test that castfunc is nonzero if and only if castmethod is f. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] User defined I/O conversion casts
(Resurrecting an old thread.) Tom Lane wrote: Heikki Linnakangas <[EMAIL PROTECTED]> writes: Patch attached. I'm using a magic OID "1" in pg_cast.castfunc field to mark these extra I/O conversion casts. Ugh. That's really unacceptable (doesn't it make the oidjoins regression test fail?), Yeah, a magical OID clearly has some issues. A new field in pg_cast is the obvious alternative. How about adding a "castmethod" char field, with values: b = binary-compatible cast (CREATE CAST ... WITHOUT FUNCTION) i = I/O coercion cast (the new beast, CREATE CAST ... WITH INOUT) f = use cast function specified in castfunc field. castfunc is 0 for methods b and i. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing COPY
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > The basic idea is to replace the custom loop in CopyReadLineText with > memchr(), because memchr() is very fast. To make that possible, perform > the client-server encoding conversion on each raw block that we read in, > before splitting it into lines. That way CopyReadLineText only needs to > deal with server encodings, which is required for the memchr() to be safe. Okay, so of course the trick with that is the block boundary handling. The protocol says the client can break the data apart however it likes. I see you've tried to deal with that, but this part seems wrong: > ! if (convertable_bytes == 0) > ! { > ! /* > ! * EOF, and there was some unconvertable chars > at the end. > ! * Call pg_client_to_server on the remaining > bytes, to > ! * let it throw an error. > ! */ > ! cvt = pg_client_to_server(raw, inbytes); > ! Assert(false); /* pg_client_to_server should've > errored */ > ! } You're not (AFAICS) definitely at EOF here; you might just have gotten a pathologically short message. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby: Caches and Locks
Simon Riggs <[EMAIL PROTECTED]> writes: > On Thu, 2008-10-30 at 08:30 -0400, Tom Lane wrote: >> What about using the existing >> syscache logic to re-derive inval information from watching the update >> operations? > That does sound possible, but it makes some big assumptions about > transactional machinery being in place. It ain't. Subtransactions make > everything about 5 times more difficult, so it seems pretty scary to me. Um. Yeah, subtransactions would be a PITA. Never mind that then ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
Alvaro Herrera wrote: > The "other hint bits" are: > > - LP_DEAD as used by the various callers of ItemIdMarkDead. > - PD_PAGE_FULL > - BTPageOpaque->btpo_flags and btpo_cycleid > > All of them are changed with only SetBufferCommitInfoNeedsSave being > called afterwards. I think we could get away with WAL-logging LP_DEAD via ItemIdMarkDead similar to what is done to SetHintBits in the posted patch, and cope with the rest by marking the page with the invalid checksum; they are not so frequent anyway so the reliability loss is low. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Optimizing COPY
Back in March, I played around with various hacks to improve COPY FROM performance: http://archives.postgresql.org/pgsql-patches/2008-03/msg00145.php I got busy with other stuff, but I now got around to try what I planned back then. I don't know if I have the time to finish this for 8.4, but might as well post what I've got. The basic idea is to replace the custom loop in CopyReadLineText with memchr(), because memchr() is very fast. To make that possible, perform the client-server encoding conversion on each raw block that we read in, before splitting it into lines. That way CopyReadLineText only needs to deal with server encodings, which is required for the memchr() to be safe. Attached is a quick patch for that. Think of it as a prototype; I haven't tested it much, and I feel that it needs some further cleanup. Quick testing suggests that it gives 0-20% speedup, depending on the data. Very narrow tables don't benefit much, but the wider the table, the bigger the gain. I haven't found a case where it performs worse. I'm only using memchr() with non-csv format at the moment. It could be used for CSV as well, but it's more complicated because in CSV mode we need to keep track of the escapes. Some collateral damage: \. no longer works. If we only accept \. on a new line, like we already do in CSV mode, it shouldn't be hard or expensive to make it work again. The manual already suggests that we only accept it on a single line: "End of data can be represented by a single line containing just backslash-period (\.)." Escaping a linefeed or carriage return by prepending it with a backslash no longer works. You have to use \n and \r. The manual already warns against doing that, so I think we could easily drop support for it. If we wanted, it wouldn't be very hard to make it work, though: after hitting a newline, scan back and count how many backslashes there is before the newline. An odd number means that it's an escaped newline, even number (including 0) means it's a real newline. For the sake of simplifying the code, would anyone care if we removed support for COPY with protocol version 2? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com *** src/backend/commands/copy.c --- src/backend/commands/copy.c *** *** 147,153 typedef struct CopyStateData * can display it in error messages if appropriate. */ StringInfoData line_buf; - bool line_buf_converted; /* converted to server encoding? */ /* * Finally, raw_buf holds raw data read from the data source (file or --- 147,152 *** *** 160,165 typedef struct CopyStateData --- 159,168 char *raw_buf; int raw_buf_index; /* next byte to process */ int raw_buf_len; /* total # of bytes stored */ + + #define MAX_CONVERSION_GROWTH 4 /* from mbutils.c */ + char unconverted_buf[MAX_CONVERSION_GROWTH]; + int unconverted_buf_len; /* total # of bytes stored */ } CopyStateData; typedef CopyStateData *CopyState; *** *** 248,253 static void CopyOneRowTo(CopyState cstate, Oid tupleOid, --- 251,257 static void CopyFrom(CopyState cstate); static bool CopyReadLine(CopyState cstate); static bool CopyReadLineText(CopyState cstate); + static bool CopyReadLineCSV(CopyState cstate); static int CopyReadAttributesText(CopyState cstate, int maxfields, char **fieldvals); static int CopyReadAttributesCSV(CopyState cstate, int maxfields, *** *** 673,680 CopyLoadRawBuf(CopyState cstate) else nbytes = 0;/* no data need be saved */ ! inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes, ! 1, RAW_BUF_SIZE - nbytes); nbytes += inbytes; cstate->raw_buf[nbytes] = '\0'; cstate->raw_buf_index = 0; --- 677,773 else nbytes = 0;/* no data need be saved */ ! if (cstate->need_transcoding) ! { ! char *cvt; ! int convertable_bytes; ! char *raw; ! ! /* ! * Read data directly to raw_buf. That way, if pg_client_to_server ! * doesn't need to do any conversion after all, we avoid one memcpy. ! * Prepend any unconverted bytes from the last batch first. ! */ ! raw = cstate->raw_buf + nbytes; ! ! memcpy(raw, cstate->unconverted_buf, cstate->unconverted_buf_len); ! ! inbytes = CopyGetData(cstate, raw + cstate->unconverted_buf_len, 1, ! (RAW_BUF_SIZE - nbytes - cstate->unconverted_buf_len) / MAX_CONVERSION_GROWTH); ! inbytes += cstate->unconverted_buf_len; ! ! if (inbytes > 0) ! { ! /* ! * Determine the number of bytes that can be converted in this ! * batch. ! */ ! if (cstate->client_encoding == PG_UTF8) ! { ! /* ! * Start from the end, and backtrack to the beginning of ! * the last character. XXX: this should be generalized and ! * pushed to wchar.c. I believe we could do the same for ! * many other encodings too. ! */ ! convertable_bytes = inbytes; !
Re: [HACKERS] Hot Standby: Caches and Locks
On Thu, 2008-10-30 at 08:30 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > >> We can't augment the commit/abort messages because > >> we must cater for non-transactional invalidations also, plus commit > >> xlrecs are already complex enough. So we log invalidations prior to > >> commit, queue them and then trigger the send at commit (if it > >> happens). > > > Augmenting the commit messages seems like the better approach. It allows > > invalidation messages to be fired as they are read off the xlrec. Still > > need the additional message type to handle nontransactional > > invalidation. There are other messages possibly more complex than this > > already. > > I guess I hadn't been paying attention, but: adding syscache inval > traffic to WAL seems like a completely horrid idea, both from the > complexity and performance standpoints. Well, it's coming out fairly simple actually. Can you explain where you think the performance loss is? My expectation is less than a 0.1% WAL volume overhead for typical systems. My comment this morning was to say I've managed to augment the commit record, so we're not even sending many additional messages. It also makes much of the Hot Standby patch fairly simple, even if it is large. Write something to WAL, act on it on the other side. I've paid very close attention to minimising the effects on both the number of lock acquisitions and total WAL volume, but having said that I expect there to be many tuning opportunities. > What about using the existing > syscache logic to re-derive inval information from watching the update > operations? That does sound possible, but it makes some big assumptions about transactional machinery being in place. It ain't. Subtransactions make everything about 5 times more difficult, so it seems pretty scary to me. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about GetAttributeByNum(Name) ExecQual.c
Tom Lane napsal(a): Zdenek Kotala <[EMAIL PROTECTED]> writes: GetAttributeByNum has first parameter as HeapTupleHeader, but most functions use Datum and after that they call DatumGetHeapTupleHeader. The difference is that DatumGetHeapTupleHeader performs detoast on tuple(row type). Is it intention do not detoast in these functions or it is a bug? You would certainly not want a tuple to get separately detoasted for each attribute you pull from it. So having detoasting here would be the wrong thing IMHO. Does it mean that these function are every time called with heap tuple or untoasted row type (composite data type)? What is purpose of these function. They are not use in executor and never in the core (only in example and regress). Should be really in executor? Thanks Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about GetAttributeByNum(Name) ExecQual.c
Zdenek Kotala <[EMAIL PROTECTED]> writes: > GetAttributeByNum has first parameter as HeapTupleHeader, but most functions > use Datum and after that they call DatumGetHeapTupleHeader. The difference is > that DatumGetHeapTupleHeader performs detoast on tuple(row type). > Is it intention do not detoast in these functions or it is a bug? You would certainly not want a tuple to get separately detoasted for each attribute you pull from it. So having detoasting here would be the wrong thing IMHO. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby: Caches and Locks
Simon Riggs <[EMAIL PROTECTED]> writes: >> We can't augment the commit/abort messages because >> we must cater for non-transactional invalidations also, plus commit >> xlrecs are already complex enough. So we log invalidations prior to >> commit, queue them and then trigger the send at commit (if it >> happens). > Augmenting the commit messages seems like the better approach. It allows > invalidation messages to be fired as they are read off the xlrec. Still > need the additional message type to handle nontransactional > invalidation. There are other messages possibly more complex than this > already. I guess I hadn't been paying attention, but: adding syscache inval traffic to WAL seems like a completely horrid idea, both from the complexity and performance standpoints. What about using the existing syscache logic to re-derive inval information from watching the update operations? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLE command
Peter Eisentraut <[EMAIL PROTECTED]> writes: > If I read this right, SQL would allow writing > TABLE foo; > as a top-level command, equivalent to SELECT * FROM foo; (see production > ). It can be used whereever a SELECT or VALUES can be used. > This is probably about as useless as some of my other recent patches, > but the implementation is simple (see attachment), so we could add it. > Comments? Considering how ugly it was to fit top-level VALUES into our documentation, I'm not really looking forward to this one. I'd vote no, at least until such time as we see some field demand for it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pre-MED
David Fetter <[EMAIL PROTECTED]> writes: > On Wed, Oct 29, 2008 at 10:23:36PM -0400, Tom Lane wrote: >> I would argue that it's already designed wrong if there's need for >> PL-specific implementation effort. > I'm not sure how else to do this. The current implementation returns > char *, which doesn't translate uniformly into the PLs. Surely they all have a way to call a SQL function that returns text. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] contrib/pg_stat_statements
Martin Pihlak <[EMAIL PROTECTED]> wrote: > Attached is a patch that adds sourceText to QueryDesc. It worked fine surprisingly :) . Internal and C functions don't use executor, so we can ignore trivial function calls (ex. operators). Furthermore, it is ok if QueryDesc doesn't have queryText because the result is counted up in the upper statement. > But is the idea of extending QueryDesc generally acceptable? Is it OK > to make a copy of the query string? The only thing I'm worried about is that QueryDesc lives longer than its queryText. Can I assume it never occurs? > I tested with modified pg_stat_statements (removed toplevel checks), Stack of counters would be better. The attached is modified to do so. It might be worth thinking about adding counters that are equivalent to total_time and self_time in in pg_stat_user_functions. =# CREATE OR REPLACE FUNCTION plfn(integer) RETURNS bigint AS $$ DECLARE i bigint; BEGIN SELECT count(*) INTO i FROM pg_class; SELECT count(*) INTO i FROM pg_class; SELECT count(*) INTO i FROM generate_series(1, $1); RETURN i; END; $$ LANGUAGE plpgsql; =# SELECT sum(plfn(1)) FROM generate_series(1, 100); =# SELECT query, calls, total_time, rows FROM pg_stat_statements; query | calls | total_time | rows ---+---++-- SELECT sum(plfn(1)) FROM generate_series(1, 100); | 1 |428 | 1 SELECT count(*) FROM pg_class | 200 | 32 | 200 SELECT count(*) FROM generate_series(1, $1 ) | 100 |309 | 100 (3 rows) Regards, --- ITAGAKI Takahiro NTT Open Source Software Center pg_stat_statements.c Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updating FSM on recovery
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > Hmm. I think an enum is better than a bitmask here. At the moment, we need > three different modes of operation: > 1. Read the page as usual, throw an error on corrupted page (ReadBuffer()) > 2. Read the page, zero page on corruption (this is new) Is this new? Would it make sense for zero_damaged_pages to use this? Perhaps the enum should have an option to error on damaged pages, warn and zero damaged pages, or just zero damaged pages. We might also want different behaviour for pages for which the crc doesn't match versus pages that have nonsensical page headers. > 3. Don't read the page from disk, just allocate a buffer. (ReadOrZeroBuffer()) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Question about GetAttributeByNum(Name) ExecQual.c
GetAttributeByNum has first parameter as HeapTupleHeader, but most functions use Datum and after that they call DatumGetHeapTupleHeader. The difference is that DatumGetHeapTupleHeader performs detoast on tuple(row type). Is it intention do not detoast in these functions or it is a bug? thanks Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
Bruce Momjian wrote: Greg Stark wrote: I couldn't get async I/O to work on Linux. That is it "worked" but performed the same as reading one block at a time. On solaris the situation is reversed. In what way is fadvise a kludge? I think he is saying AIO gives us more flexibility, but I am unsure we need it. absolutely. posix_fadvise is easy to implement and i would assume that it takes away a lot of "guessing" on the OS internals side. the database usually knows that it is gonna read a lot of data in a certain way and it cannot be a bad idea to give the kernel a hint here. especially synchronized seq scans and so on are real winners here as you stop confusing the kernel with XX concurrent readers on the same file. this can also be an issue with some controller firmwares and so on. many thanks, hans -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby: Caches and Locks
On Tue, 2008-10-21 at 15:06 +0100, Simon Riggs wrote: > We can't augment the commit/abort messages because > we must cater for non-transactional invalidations also, plus commit > xlrecs are already complex enough. So we log invalidations prior to > commit, queue them and then trigger the send at commit (if it > happens). Augmenting the commit messages seems like the better approach. It allows invalidation messages to be fired as they are read off the xlrec. Still need the additional message type to handle nontransactional invalidation. There are other messages possibly more complex than this already. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updating FSM on recovery
On Thu, 2008-10-30 at 09:57 +, Simon Riggs wrote: > On Thu, 2008-10-30 at 10:40 +0200, Heikki Linnakangas wrote: > > > So, attached is a patch using an enum. Barring objections, I'll commit > > this. > > I probably agree with the changes from reading your post, but I'd ask > that you hang fire on committing this for a few days. Best thing from here is for me to just freeze my tree for next few days. It will make my submission a few days out of date, but we can fix that up fairly quickly. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updating FSM on recovery
On Thu, 2008-10-30 at 10:40 +0200, Heikki Linnakangas wrote: > So, attached is a patch using an enum. Barring objections, I'll commit > this. I probably agree with the changes from reading your post, but I'd ask that you hang fire on committing this for a few days. It's just going to prevent Koichi and myself from submitting clean patches on F-Day, or it will cause us to spend time on rework before we've even submitted the patch. I'd like to avoid the pileup for now, though don't have any problem with the rework after that point. Thanks, -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updating FSM on recovery
Tom Lane wrote: Heikki Linnakangas <[EMAIL PROTECTED]> writes: The ReadBuffer() interface is already pretty complex, with all the different variants. We should probably keep the good old ReadBuffer() the same, for the sake of simplicity in the callers, but try to reduce the number of other variatns. Indeed. Did you see the discussion about the similarly-too-complex heap_insert API a couple days ago in connection with bulk-write scenarios? The conclusion there was to try to shift stuff into a bitmask options argument, in hopes that future additions might not require touching every caller. Can we do it similarly here? Hmm. I think an enum is better than a bitmask here. At the moment, we need three different modes of operation: 1. Read the page as usual, throw an error on corrupted page (ReadBuffer()) 2. Read the page, zero page on corruption (this is new) 3. Don't read the page from disk, just allocate a buffer. (ReadOrZeroBuffer()) If we turned this into a bitmask, what would the bits be? Perhaps: DONT_READ /* don't read the page from disk, just allocate buffer */ NO_ERROR_ON_CORRUPTION /* don't throw an error if page is corrupt */ With two bits, there's four different combinations. I don't think the DONT_READ | NO_ERROR_ON_CORRUPTION combination makes much sense. Also, negative arguments like that can be confusing, but if we inverted the meanings, most callers would have to pass both flags to get the normal behavior. Looking into the crystal ball, there's two forthcoming features to the interface that I can see: 1. Pin the buffer if the page is in buffer cache. If it's not, do nothing. This is what Simon proposed for the B-tree vacuum interlocking, and I can see that it might be useful elsewhere as well. 2. The posix_fadvise() thing. Or async I/O. It looks like it's going to be a separate function you call before ReadBuffer(), but it could also be implemented as a new mode to ReadBuffer() that just allocates a buffer, issues a posix_fadvise(), and returns. You would then pass the Buffer to another function to finish the read and make the contents of the buffer valid. Neither of these fits too well with the bitmask. Neither would make sense with DONT_READ or NO_ERROR_ON_CORRUPTION. So, attached is a patch using an enum. Barring objections, I'll commit this. There is a conflict with Simon's hot standby patch, though. Simon's patch adds yet another argument to XLogReadBufferWithFork(), to indicate whether a normal exclusive lock or a cleanup lock is taken on the buffer. I'm inclined to change the interface of XLogReadBufferExtended (as it's now called, after this patch) so that it only pins the page, and leave the locking to the caller. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com *** src/backend/access/gin/ginvacuum.c --- src/backend/access/gin/ginvacuum.c *** *** 155,164 xlogVacuumPage(Relation index, Buffer buffer) static bool ginVacuumPostingTreeLeaves(GinVacuumState *gvs, BlockNumber blkno, bool isRoot, Buffer *rootBuffer) { ! Buffer buffer = ReadBufferWithStrategy(gvs->index, blkno, gvs->strategy); ! Page page = BufferGetPage(buffer); bool hasVoidPage = FALSE; /* * We should be sure that we don't concurrent with inserts, insert process * never release root page until end (but it can unlock it and lock --- 155,168 static bool ginVacuumPostingTreeLeaves(GinVacuumState *gvs, BlockNumber blkno, bool isRoot, Buffer *rootBuffer) { ! Buffer buffer; ! Page page; bool hasVoidPage = FALSE; + buffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, blkno, + RBM_NORMAL, gvs->strategy); + page = BufferGetPage(buffer); + /* * We should be sure that we don't concurrent with inserts, insert process * never release root page until end (but it can unlock it and lock *** *** 241,253 static void ginDeletePage(GinVacuumState *gvs, BlockNumber deleteBlkno, BlockNumber leftBlkno, BlockNumber parentBlkno, OffsetNumber myoff, bool isParentRoot) { ! Buffer dBuffer = ReadBufferWithStrategy(gvs->index, deleteBlkno, gvs->strategy); ! Buffer lBuffer = (leftBlkno == InvalidBlockNumber) ? ! InvalidBuffer : ReadBufferWithStrategy(gvs->index, leftBlkno, gvs->strategy); ! Buffer pBuffer = ReadBufferWithStrategy(gvs->index, parentBlkno, gvs->strategy); Page page, parentPage; LockBuffer(dBuffer, GIN_EXCLUSIVE); if (!isParentRoot) /* parent is already locked by * LockBufferForCleanup() */ --- 245,268 ginDeletePage(GinVacuumState *gvs, BlockNumber deleteBlkno, BlockNumber leftBlkno, BlockNumber parentBlkno, OffsetNumber myoff, bool isParentRoot) { ! Buffer dBuffer; ! Buffer lBuffer; ! Buffer pBuffer; Page page, parentPage; + dBuffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, deleteBlkno, + RBM_NORMAL, gvs->strategy); + + if (leftBlkno != InvalidBlockNumber) + lBuffer = R
Re: [HACKERS] SQL/MED compatible connection manager
> I have put together a draft that describes a possible implementation: > http://wiki.postgresql.org/wiki/SqlMedConnectionManager > I'll update this with an experimental patch. This implements: * System catalogs for FDW, SERVER and USER MAPPING * regserver data type for servers (convert from text to oid etc). * FDW placeholders as shared libraries -- currently provides dummy and pgsql wrappers. * Connection lookup via FDW. * SQL interface to the lookup function. There is an example of how all this works at: http://wiki.postgresql.org/wiki/SqlMedConnectionManager#Current_implementation I was hoping to get the internal features done by the start of November Commitfest. But right now this seems unlikely. Nevertheless, I'm all ears for suggestions and criticism. PS. Jonah, does this somehow coincide with your approach to FDW-s? regards, Martin connection-manager.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] TABLE command
If I read this right, SQL would allow writing TABLE foo; as a top-level command, equivalent to SELECT * FROM foo; (see production ). It can be used whereever a SELECT or VALUES can be used. This is probably about as useless as some of my other recent patches, but the implementation is simple (see attachment), so we could add it. Comments? diff -ur ../cvs-pgsql/src/backend/parser/gram.y ./src/backend/parser/gram.y --- ../cvs-pgsql/src/backend/parser/gram.y 2008-10-29 13:37:47.0 +0200 +++ ./src/backend/parser/gram.y 2008-10-29 16:49:09.0 +0200 @@ -6416,6 +6416,25 @@ $$ = (Node *)n; } | values_clause { $$ = $1; } + | TABLE table_ref + { + /* same as SELECT * FROM table_ref */ + ColumnRef *cr = makeNode(ColumnRef); + ResTarget *rt = makeNode(ResTarget); + SelectStmt *n = makeNode(SelectStmt); + + cr->fields = list_make1(makeNode(A_Star)); + cr->location = -1; + + rt->name = NULL; + rt->indirection = NIL; + rt->val = (Node *)cr; + rt->location = -1; + + n->targetList = list_make1(rt); + n->fromClause = list_make1($2); + $$ = (Node *)n; + } | select_clause UNION opt_all select_clause { $$ = makeSetOp(SETOP_UNION, $3, $1, $4); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers