[HACKERS] Better support for whole-row operations and composite types
We have a number of issues revolving around the fact that composite types (row types) aren't first-class objects. I think it's past time to fix that. Here are some notes about doing it. I am not sure all these ideas are fully-baked ... comments appreciated. When represented as a Datum, the format of a row-type object needs to be something like this: * overall length: int4(this makes the Datum a valid varlena item) * row type id: Oid (either a composite type id or RECORDOID) * row type typmod: int4(see below for usage) -- pad if needed to MAXALIGN boundary * heap tuple representation, beginning with a HeapTupleHeaderData struct If we do it exactly as above then we will be wasting some space, because the xmin/xmax/cmax and ctid fields of HeapTupleHeaderData are of no use in a row that isn't actually a table member row. It is very tempting to overlay the length and rowtype fields with the HeapTupleHeaderData struct. This would save some code as well as space --- see discussion below. Only named composite types, not RECORD, will be allowed to be used as table column types. This ensures that any row object stored on disk will have a valid composite type ID embedded in it, so that the row structure can be retrieved when the row is read. However, we want to be able to support row objects in memory that are of transient record types (for example, the output of a function returning RECORD will have a record type determined by the query itself). I propose that we handle this case by setting the type id to RECORDOID and using the typmod to identify the particular record type --- the typmod will essentially be an index into a backend-local cache of record types. More detail below. We'll add "tdtypeid" and "tdtypmod" fields to TupleDesc structs. This will make it easy to set the embedded type information correctly when manufacturing a row datum using a TupleDesc. For TupleDescs associated with relations, tdtypeid is just the relation's row type OID, and tdtypmod is -1. For TupleDescs representing transient row types, we initially set tdtypeid to RECORDOID and tdtypmod to -1 (indicating a completely anonymous row type). If the row type actually needs to be identifiable then we establish a cache entry for it and set the typmod to an index for the cache entry. I think this will only need to happen when the query contains a function-returning-RECORD or a whole-row variable referencing what would otherwise be an anonymous row type, such as a JOIN result. Composite types, as well as the RECORD type, will be marked in pg_type as pass-by-ref, varlena (typlen -1), typalign 'd'. (We will use the maximum alignment always to avoid any dependency on types of the contained columns.) The present function call and return conventions involving TupleTableSlots will be replaced by simply passing and returning these row objects as pass-by-reference Datums. In the case of functions returning rowtypes, we'll continue to support the present ReturnSetInfo convention for returning a separate TupleDesc describing the result type --- but this will just be a crosscheck. We will be able to make generic I/O routines for composite types, comparable to those used now for arrays. Not sure what a convenient external format would look like. (Possibly use the same conventions as for a 1-D array?) We will need to make the convention that the type OID of a composite type is passed to the input routine, in the same way that an array input routine gets the typelem OID; else the input routine won't know what to do. We could also think about allowing functions that are declared as accepting RECORD (ie, polymorphic-across-row-types functions). They would use the same methods already used by polymorphic functions to find out the true types of their inputs. (Might be best to invent a separate pseudotype, say ANYRECORD, rather than overloading RECORD for this purpose.) The recently developed SRF API is a bit unfortunate since it exposes the assumption that a TupleTableSlot must be involved in returning a tuple. If we don't overlay the Datum header with HeapTupleHeader then I think we have to make TupleGetDatum copy the passed tuple and insert the row type info from the slot's tupledesc, which'd be pretty inefficient because it means making an extra copy of the row data. But if we do overlay the header fields, then I think we can set up backwards-compatibility definitions in which the slot is simply ignored. Specifically: TupleDescGetSlot: no-op, returns NULL TupleGetDatum: ignore slot, return tuple t_data pointer as datum This will work because heap_formtuple and BuildTupleFromCStrings can return a HeapTuple whose t_data part is already a valid row Datum, simply by setting the appropriate length and type fields in it. (If the tuple is ever stored to disk as a regular table row, these fields will be overwritten with xmin/cmin info at that time.) To convert a row Datum into something
Re: [HACKERS] Better support for whole-row operations and composite
Tom Lane wrote: We have a number of issues revolving around the fact that composite types (row types) aren't first-class objects. I think it's past time to fix that. Here are some notes about doing it. I am not sure all these ideas are fully-baked ... comments appreciated. [snip] Only named composite types, not RECORD, will be allowed to be used as table column types. [snip] Interesting. I'm slightly curious to know if there's an external driver for this. Will this apply recursively (an a has a b which has an array of c's)? Are there indexing implications? Could one index on a subfield? cheers andrew ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Better support for whole-row operations and composite
Andrew Dunstan <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Only named composite types, not RECORD, will be allowed to be used as >> table column types. > Interesting. I'm slightly curious to know if there's an external driver > for this. There's noplace to store a permanent record of an anonymous rowtype's structure. To do otherwise would amount to executing an implicit CREATE TYPE AS for the user, so we might as well just say up front that you have to create the type. > Will this apply recursively (an a has a b which has an array of c's)? Yup. > Are there indexing implications? Could one index on a subfield? Using an expression index, sure. I don't think we need to support it as a "primitive" index type. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Better support for whole-row operations and composite
Tom Lane wrote: We have a number of issues revolving around the fact that composite types (row types) aren't first-class objects. I think it's past time to fix that. Here are some notes about doing it. I am not sure all these ideas are fully-baked ... comments appreciated. [Sorry for the delay in responding] Nice work, and in general it makes sense to me. A few comments below. We will be able to make generic I/O routines for composite types, comparable to those used now for arrays. Not sure what a convenient external format would look like. (Possibly use the same conventions as for a 1-D array?) So you mean like an array, but with possibly mixed datatypes? '{1 , "abc def", 2.3}' Seems to make sense. Another option might be to use the ROW keyword, something like: ROW[1 , 'abc', 2.3] We could also think about allowing functions that are declared as accepting RECORD (ie, polymorphic-across-row-types functions). They would use the same methods already used by polymorphic functions to find out the true types of their inputs. (Might be best to invent a separate pseudotype, say ANYRECORD, rather than overloading RECORD for this purpose.) Check. I really like this idea. TupleDescGetSlot: no-op, returns NULL TupleGetDatum: ignore slot, return tuple t_data pointer as datum This will work because heap_formtuple and BuildTupleFromCStrings can return a HeapTuple whose t_data part is already a valid row Datum, simply by setting the appropriate length and type fields in it. (If the tuple is ever stored to disk as a regular table row, these fields will be overwritten with xmin/cmin info at that time.) Is this the way you did things in your recent commit? To convert a row Datum into something that can be passed to heap_getattr, one could use a local variable of type HeapTupleData and set its t_data field to the datum's pointer value. t_len is copied from the datum contents, while the other fields of HeapTupleData can just be set to zeroes. I think I understand this, but an example would help. * We have to be able to re-use an already-existing cache entry if it matches a requested TupleDesc. For anonymous record types, how will that lookup be done efficiently? Can the hash key be an array of attribute oids? If an ALTER TABLE command does something that requires examining or changing every row of a table, it would presumably have to do the same to all entries in any composite-type column of the table's rowtype. To avoid surprises and interesting debates about who has permissions to do this, it might be wise to restrict on-disk composite columns to be only of standalone composite types (ie, those made with CREATE TYPE AS). This restriction would also avoid debates about whether table constraints apply to composite-type columns. I agree. As an aside, it would be quite useful to have support for arrays of tuples. Any idea on how to do that without needing to define an explicit array type for each tuple type? Joe ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Better support for whole-row operations and composite
Tom Lane wrote: Joe Conway <[EMAIL PROTECTED]> writes: So you mean like an array, but with possibly mixed datatypes? '{1 , "abc def", 2.3}' Seems to make sense. The unresolved question in my mind is how to represent NULL elements. However, we have to solve that sooner or later for arrays too. Any thoughts? Good point. What's really ugly is that the external representation of string types differs depending on whether quotes are needed or not. If strings were *always* surrounded by quotes, we could just use the word NULL, without the quotes. Another option might be to use the ROW keyword, something like: ROW[1 , 'abc', 2.3] This is a separate issue, just as the ARRAY[] constructor has different uses from the array I/O representation. I do want some kind of runtime constructor, but ROW[...] doesn't get the job done because it doesn't provide any place to specify the rowtype name. Maybe we could combine ROW[...] with some sort of cast notation? ROW[1 , 'abc', 2.3] :: composite_type_name CAST(ROW[1 , 'abc', 2.3] AS composite_type_name) Does SQL99 provide any guidance here? The latter seems to agree with 6.12 () of SQL2003. I'd think we'd want the former supported anyway as an extension to standard. Almost. I ended up keeping TupleDescGetSlot as a live function, but its true purpose is only to ensure that the tupledesc gets registered with the type cache (see BlessTupleDesc() in CVS tip). The slot per se never gets used. I believe that CVS tip is source-code-compatible with existing SRFs, even though I adjusted all the ones in the distribution to stop using the TupleTableSlot stuff. Almost compatible. I found that, to my surprise, PL/R compiles with no changes after your commit. However it no segfaults (as I expected) on composite type arguments. Should be easy to fix though (I think, really haven't looked at it hard yet). The main point though is that row Datums now contain sufficient info embedded in them to allow runtime type lookup the same as we do for arrays. Sounds good to me. There are several in the PL sources now, for instance plpgsql does this with an incoming rowtype argument: Perfect -- thanks. As an aside, it would be quite useful to have support for arrays of tuples. Any idea on how to do that without needing to define an explicit array type for each tuple type? Hmm, messy ... I wonder now whether we still really need a separate pg_type entry for every array type. The original motivation for doing that has been at least partly subsumed by storing element type OIDs inside the arrays themselves. I wonder if we could go over to a scheme where, say, atttypid is the base type ID and attndims being nonzero is what you check to find out it's really an array of atttypid. Not sure how we could map that idea into function and expression args/results, though. Hmmm. I had thought maybe we could use a single datatype (anyarray?) with in/out functions that would need to do the right thing based on the element type. This would also allow, for example, arrays-of-arrays, which is the way that SQL99/2003 seem to allow for multidimensional arrays. Plan B would be to go ahead and create array types. Not sure I would want to do this for table rowtypes, but if we did it only for CREATE TYPE AS then it doesn't sound like an unreasonable amount of overhead. I was hoping we wouldn't need to do that. Joe ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Better support for whole-row operations and composite
Tom Lane wrote: Joe Conway <[EMAIL PROTECTED]> writes: Almost compatible. I found that, to my surprise, PL/R compiles with no changes after your commit. However it no segfaults (as I expected) on composite type arguments. Should be easy to fix though (I think, really haven't looked at it hard yet). Let me know what you find out --- if I missed a trick on compatibility, there's still plenty of time to fix it. I still haven't had time to look closely, and well may have been doing something non-standard all along, but in any case this is the current failing code: else if (function->arg_is_rel[i]) { /* for tuple args, convert to a one row data.frame */ TupleTableSlot *slot = (TupleTableSlot *) arg[i]; HeapTuple tuples = slot->val; TupleDesc tupdesc = slot->ttc_tupleDescriptor; PROTECT(el = pg_tuple_get_r_frame(1, &tuples, tupdesc)); } The problem was (I think -- I'll check a little later) that slot->ttc_tupleDescriptor is now '\0'. Hmmm. I had thought maybe we could use a single datatype (anyarray?) with in/out functions that would need to do the right thing based on the element type. If we have just one datatype, how will the parser determine the type of a "foo[subscript]" expression? After thinking a bit, I don't see how to do that except by adding an out-of-line decoration to the underlying type, somewhat like we do for "setof" or atttypmod. This is doable as far as the backend itself is concerned, but the compatibility implications for clients and user-written extensions seem daunting :-( I'll think-about/play-with this some more, hopefully this weekend. Thanks, Joe ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Better support for whole-row operations and composite
Tom Lane wrote: Joe Conway <[EMAIL PROTECTED]> writes: /* for tuple args, convert to a one row data.frame */ TupleTableSlot *slot = (TupleTableSlot *) arg[i]; HeapTuple tuples = slot->val; TupleDesc tupdesc = slot->ttc_tupleDescriptor; Um. Well, the arg is not a TupleTableSlot * anymore, so this is guaranteed to fail. This isn't part of what I thought the documented SRF API was though. I'm sure you're correct. The SRF API was for user defined functions, not procedural languages anyway. I'll look at how the other procedural languages handle tuple arguments. If you take the arg[i] value and pass it to GetAttributeByName or GetAttributeByNum it will work (with some compiler warnings) and AFAICS we never documented more than that. OK, thanks, Joe ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Better support for whole-row operations and composite
Tom Lane wrote: Joe Conway <[EMAIL PROTECTED]> writes: ... The SRF API was for user defined functions, not procedural languages anyway. I'll look at how the other procedural languages handle tuple arguments. It was a dozen-or-so-lines change in each of the PL languages AFAIR. You will probably also want to look at what you do to return tuple results. OK, thanks. Just for reference, what is arg[i] if it isn't a (TupleTableSlot *) anymore -- is it just a HeapTuple? Joe ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Better support for whole-row operations and composite
Tom Lane wrote: No, it's a HeapTupleHeader pointer. You need to reconstruct a HeapTuple on top of that to work with heap_getattr and most other core backend routines. Thanks. For triggers, I was previously building up the arguments thus: slot = TupleDescGetSlot(tupdesc); slot->val = trigdata->tg_trigtuple; arg[7] = PointerGetDatum(slot); I suppose now I should do this instead? arg[7] = PointerGetDatum(trigdata->tg_trigtuple->t_data); Also don't forget to ensure that you detoast the datum; this is not useful at the moment but will be important Real Soon Now. I added standard argument-fetch macros to fmgr.h to help with the detoasting bit. OK. This is the net result: #ifdef PG_VERSION_75_COMPAT Oid tupType; int32 tupTypmod; TupleDesc tupdesc; HeapTuple tuple = palloc(sizeof(HeapTupleData)); HeapTupleHeader tuple_hdr = DatumGetHeapTupleHeader(arg[i]); tupType = HeapTupleHeaderGetTypeId(tuple_hdr); tupTypmod = HeapTupleHeaderGetTypMod(tuple_hdr); tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod); tuple->t_len = HeapTupleHeaderGetDatumLength(tuple_hdr); ItemPointerSetInvalid(&(tuple->t_self)); tuple->t_tableOid = InvalidOid; tuple->t_data = tuple_hdr; PROTECT(el = pg_tuple_get_r_frame(1, &tuple, tupdesc)); pfree(tuple); #else TupleTableSlot *slot = (TupleTableSlot *) arg[i]; HeapTuple tuple = slot->val; TupleDesc tupdesc = slot->ttc_tupleDescriptor; PROTECT(el = pg_tuple_get_r_frame(1, &tuple, tupdesc)); #endif /* PG_VERSION_75_COMPAT */ Given the above changes, it's almost working now -- only problem left is with triggers: insert into foo values(11,'cat99',1.89); + ERROR: record type has not been registered + CONTEXT: In PL/R function rejectfoo delete from foo; + ERROR: cache lookup failed for type 0 + CONTEXT: In PL/R function rejectfoo (and a few other similar failures) Any ideas why the trigger tuple type isn't registered, or what I'm doing wrong? Thanks, Joe ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Better support for whole-row operations and composite
Joe Conway wrote: Given the above changes, it's almost working now -- only problem left is with triggers: insert into foo values(11,'cat99',1.89); + ERROR: record type has not been registered + CONTEXT: In PL/R function rejectfoo delete from foo; + ERROR: cache lookup failed for type 0 + CONTEXT: In PL/R function rejectfoo (and a few other similar failures) Any ideas why the trigger tuple type isn't registered, or what I'm doing wrong? A little more info on this. It appears that the tuple type is set to either 2249 (RECORDOID) or 0. In the case of RECORDOID this traces all the way back to here: /* * CreateTemplateTupleDesc * * This function allocates and zeros a tuple descriptor structure. * * Tuple type ID information is initially set for an anonymous record * type; caller can overwrite this if needed. * */ But the type id is never overwritten for a BEFORE INSERT trigger. It appears that somewhere it is explictly set to InvalidOid for both BEFORE DELETE and AFTER INSERT triggers (and possibly others). My take is that we now need to explicitly set the tuple type id for INSERT/UPDATE/DELETE statements -- not sure where the best place to do that is though. Does this sound correct? Joe ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Better support for whole-row operations and composite
Joe Conway <[EMAIL PROTECTED]> writes: >> Any ideas why the trigger tuple type isn't registered, or what I'm doing >> wrong? > A little more info on this. It appears that the tuple type is set to > either 2249 (RECORDOID) or 0. After further thought, we could possibly make it work for BEFORE triggers, but there's just no way for AFTER triggers: in that case what you are getting is an image of what went to disk, which is going to contain transaction info not type info. If you really want the trigger API for PL/R to be indistinguishable from the function-call API, then I think you will need to copy the passed tuple and insert type information. This is more or less what ExecEvalVar does now in the whole-tuple case (the critical code is actually in heap_getsysattr though): HeapTupleHeaderdtup; dtup = (HeapTupleHeader) palloc(tup->t_len); memcpy((char *) dtup, (char *) tup->t_data, tup->t_len); HeapTupleHeaderSetDatumLength(dtup, tup->t_len); HeapTupleHeaderSetTypeId(dtup, tupleDesc->tdtypeid); HeapTupleHeaderSetTypMod(dtup, tupleDesc->tdtypmod); result = PointerGetDatum(dtup); regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Better support for whole-row operations and composite
Tom Lane wrote: Joe Conway <[EMAIL PROTECTED]> writes: For triggers, I was previously building up the arguments thus: slot = TupleDescGetSlot(tupdesc); slot->val = trigdata->tg_trigtuple; arg[7] = PointerGetDatum(slot); I suppose now I should do this instead? arg[7] = PointerGetDatum(trigdata->tg_trigtuple->t_data); Hm, no, that won't work because a tuple being passed to a trigger probably isn't going to contain valid type information. The API for calling triggers is different from calling ordinary functions, so I never thought about trying to make it look the same. At what point are you trying to do the above, anyway? That's a shame -- it used to work fine -- done this way so the same function could handle tuple arguments to regular functions, and old/new tuples to trigger functions. It is in plr_trigger_handler(); vaguely similar to pltcl_trigger_handler(). I'll have to figure out a workaround I guess. Joe ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Better support for whole-row operations and composite
Tom Lane wrote: If you really want the trigger API for PL/R to be indistinguishable from the function-call API, then I think you will need to copy the passed tuple and insert type information. This is more or less what ExecEvalVar does now in the whole-tuple case (the critical code is actually in heap_getsysattr though): That got me there. It may not be the best in terms of pure speed, but it is easier and simpler than refactoring, at least at the moment. And I don't think the reason people will choose PL/R for triggers is speed in any case ;-) Thanks! Joe ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Better support for whole-row operations and composite types
Tom Lane <[EMAIL PROTECTED]> writes: > We have a number of issues revolving around the fact that composite types > (row types) aren't first-class objects. I think it's past time to fix > that. ... > Only named composite types, not RECORD, will be allowed to be used as > table column types. If I understand what you're talking about, you would be allowed to CREATE TYPE a composite type, like say, "address" and then use that as a datatype all over your database? And then if you find "address" needs a new field you can add it to the type and automatically have it added all over your database to any table column using that type? Speaking as a user, that would be **very** nice. I've often found myself wishing for just such a feature. It would simplify data model maintenance a whole heck of a lot. How will client programs see the data if i do a "select *"? In my ideal world it would be shipped over in a binary representation that a driver would translate to a perl hash / php array / whatever. But maybe it would be simpler to just ship them over the subcolumns with names like "shipping.line_1" and "shipping.country". -- greg ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Better support for whole-row operations and composite types
Greg Stark <[EMAIL PROTECTED]> writes: > If I understand what you're talking about, you would be allowed to > CREATE TYPE a composite type, like say, "address" and then use that as > a datatype all over your database? And then if you find "address" > needs a new field you can add it to the type and automatically have it > added all over your database to any table column using that type? I believe that would work, though you might have some issues with cached plans. > How will client programs see the data if i do a "select *"? TBD. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Better support for whole-row operations and composite types
Tom, > We have a number of issues revolving around the fact that composite types > (row types) aren't first-class objects. I think it's past time to fix > that. Here are some notes about doing it. I am not sure all these ideas > are fully-baked ... comments appreciated. I'll want to add to the documentation on composite types, then. We'll need a stern warning to users not to abuse them. Easily done, I think. Composite types are frequently abused by OO and Windows programmers to break the relational model.I used to be an MSDN member (thank you, I've recovered) and frequently ran into, on the mailing list, users getting themselves into some unresolvable mess becuase they'd used composite types in SQL server to combine several rows ... or even effectively an entire child table ... into one field. Othewise, looks good to me. I don't think I'm qualified to second-guess you on the implementation. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Better support for whole-row operations and composite types
Joe Conway <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> We will be able to make generic I/O routines for composite types, >> comparable to those used now for arrays. Not sure what a convenient >> external format would look like. (Possibly use the same conventions >> as for a 1-D array?) > So you mean like an array, but with possibly mixed datatypes? > '{1 , "abc def", 2.3}' > Seems to make sense. The unresolved question in my mind is how to represent NULL elements. However, we have to solve that sooner or later for arrays too. Any thoughts? > Another option might be to use the ROW keyword, something like: > ROW[1 , 'abc', 2.3] This is a separate issue, just as the ARRAY[] constructor has different uses from the array I/O representation. I do want some kind of runtime constructor, but ROW[...] doesn't get the job done because it doesn't provide any place to specify the rowtype name. Maybe we could combine ROW[...] with some sort of cast notation? ROW[1 , 'abc', 2.3] :: composite_type_name CAST(ROW[1 , 'abc', 2.3] AS composite_type_name) Does SQL99 provide any guidance here? >> TupleDescGetSlot: no-op, returns NULL TupleGetDatum: ignore slot, >> return tuple t_data pointer as datum >> >> This will work because heap_formtuple and BuildTupleFromCStrings can >> return a HeapTuple whose t_data part is already a valid row Datum, >> simply by setting the appropriate length and type fields in it. (If >> the tuple is ever stored to disk as a regular table row, these fields >> will be overwritten with xmin/cmin info at that time.) > Is this the way you did things in your recent commit? Almost. I ended up keeping TupleDescGetSlot as a live function, but its true purpose is only to ensure that the tupledesc gets registered with the type cache (see BlessTupleDesc() in CVS tip). The slot per se never gets used. I believe that CVS tip is source-code-compatible with existing SRFs, even though I adjusted all the ones in the distribution to stop using the TupleTableSlot stuff. The main point though is that row Datums now contain sufficient info embedded in them to allow runtime type lookup the same as we do for arrays. >> To convert a row Datum into something that can be passed to >> heap_getattr, one could use a local variable of type HeapTupleData >> and set its t_data field to the datum's pointer value. t_len is >> copied from the datum contents, while the other fields of >> HeapTupleData can just be set to zeroes. > I think I understand this, but an example would help. There are several in the PL sources now, for instance plpgsql does this with an incoming rowtype argument: if (!fcinfo->argnull[i]) { HeapTupleHeader td; OidtupType; int32tupTypmod; TupleDesctupdesc; HeapTupleData tmptup; td = DatumGetHeapTupleHeader(fcinfo->arg[i]); /* Extract rowtype info and find a tupdesc */ tupType = HeapTupleHeaderGetTypeId(td); tupTypmod = HeapTupleHeaderGetTypMod(td); tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod); /* Build a temporary HeapTuple control structure */ tmptup.t_len = HeapTupleHeaderGetDatumLength(td); ItemPointerSetInvalid(&(tmptup.t_self)); tmptup.t_tableOid = InvalidOid; tmptup.t_data = td; exec_move_row(&estate, NULL, row, &tmptup, tupdesc); } This is okay because the HeapTupleData is not needed after the call to exec_move_row. >> * We have to be able to re-use an already-existing cache entry if it >> matches a requested TupleDesc. > For anonymous record types, how will that lookup be done efficiently? > Can the hash key be an array of attribute oids? Right, that's the way I did it. See src/backend/utils/cache/typcache.c > As an aside, it would be quite useful to have support for arrays of > tuples. Any idea on how to do that without needing to define an explicit > array type for each tuple type? Hmm, messy ... I wonder now whether we still really need a separate pg_type entry for every array type. The original motivation for doing that has been at least partly subsumed by storing element type OIDs inside the arrays themselves. I wonder if we could go over to a scheme where, say, atttypid is the base type ID and attndims being nonzero is what you check to find out it's really an array of atttypid. Not sure how we could map that idea into function and expression args/results, though. Plan B would be to go ahead and create array types. Not sure I would want to do this for table rowtypes, but if we did it only for CREATE TYPE AS then it doesn't sound like an unreasonable amount of overhead. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Better support for whole-row operations and composite types
Joe Conway <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> ... I believe that CVS tip is source-code-compatible with >> existing SRFs, even though I adjusted all the ones in the distribution >> to stop using the TupleTableSlot stuff. > Almost compatible. I found that, to my surprise, PL/R compiles with no > changes after your commit. However it no segfaults (as I expected) on > composite type arguments. Should be easy to fix though (I think, really > haven't looked at it hard yet). Let me know what you find out --- if I missed a trick on compatibility, there's still plenty of time to fix it. >> ... I wonder if we could go over to a scheme where, say, >> atttypid is the base type ID and attndims being nonzero is what you >> check to find out it's really an array of atttypid. Not sure how we >> could map that idea into function and expression args/results, though. > Hmmm. I had thought maybe we could use a single datatype (anyarray?) > with in/out functions that would need to do the right thing based on the > element type. If we have just one datatype, how will the parser determine the type of a "foo[subscript]" expression? After thinking a bit, I don't see how to do that except by adding an out-of-line decoration to the underlying type, somewhat like we do for "setof" or atttypmod. This is doable as far as the backend itself is concerned, but the compatibility implications for clients and user-written extensions seem daunting :-( regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Better support for whole-row operations and composite types
Joe Conway <[EMAIL PROTECTED]> writes: > I still haven't had time to look closely, and well may have been doing > something non-standard all along, but in any case this is the current > failing code: > /* for tuple args, convert to a one row data.frame */ > TupleTableSlot *slot = (TupleTableSlot *) arg[i]; > HeapTupletuples = slot->val; > TupleDesctupdesc = slot->ttc_tupleDescriptor; Um. Well, the arg is not a TupleTableSlot * anymore, so this is guaranteed to fail. This isn't part of what I thought the documented SRF API was though. If you take the arg[i] value and pass it to GetAttributeByName or GetAttributeByNum it will work (with some compiler warnings) and AFAICS we never documented more than that. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Better support for whole-row operations and composite types
Joe Conway <[EMAIL PROTECTED]> writes: > ... The SRF API was for user defined functions, not > procedural languages anyway. I'll look at how the other procedural > languages handle tuple arguments. It was a dozen-or-so-lines change in each of the PL languages AFAIR. You will probably also want to look at what you do to return tuple results. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Better support for whole-row operations and composite types
Joe Conway <[EMAIL PROTECTED]> writes: > Just for reference, what is arg[i] if it isn't a (TupleTableSlot *) > anymore -- is it just a HeapTuple? No, it's a HeapTupleHeader pointer. You need to reconstruct a HeapTuple on top of that to work with heap_getattr and most other core backend routines. Also don't forget to ensure that you detoast the datum; this is not useful at the moment but will be important Real Soon Now. I added standard argument-fetch macros to fmgr.h to help with the detoasting bit. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Better support for whole-row operations and composite types
Joe Conway <[EMAIL PROTECTED]> writes: > For triggers, I was previously building up the arguments thus: > slot = TupleDescGetSlot(tupdesc); > slot->val = trigdata->tg_trigtuple; > arg[7] = PointerGetDatum(slot); > I suppose now I should do this instead? > arg[7] = PointerGetDatum(trigdata->tg_trigtuple->t_data); Hm, no, that won't work because a tuple being passed to a trigger probably isn't going to contain valid type information. The API for calling triggers is different from calling ordinary functions, so I never thought about trying to make it look the same. At what point are you trying to do the above, anyway? regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster