Re: [HACKERS] Better support for whole-row operations and composite types

2004-04-03 Thread Tom Lane
Joe Conway [EMAIL PROTECTED] writes:
 For triggers, I was previously building up the arguments thus:
  slot = TupleDescGetSlot(tupdesc);
  slot-val = trigdata-tg_trigtuple;
  arg[7] = PointerGetDatum(slot);

 I suppose now I should do this instead?
  arg[7] = PointerGetDatum(trigdata-tg_trigtuple-t_data);

Hm, no, that won't work because a tuple being passed to a trigger
probably isn't going to contain valid type information.  The API for
calling triggers is different from calling ordinary functions, so
I never thought about trying to make it look the same.  At what point
are you trying to do the above, anyway?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Better support for whole-row operations and composite types

2004-04-02 Thread Tom Lane
Joe Conway [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 We will be able to make generic I/O routines for composite types, 
 comparable to those used now for arrays.  Not sure what a convenient 
 external format would look like.  (Possibly use the same conventions
 as for a 1-D array?)

 So you mean like an array, but with possibly mixed datatypes?
 '{1 , abc def, 2.3}'
 Seems to make sense.

The unresolved question in my mind is how to represent NULL elements.
However, we have to solve that sooner or later for arrays too.  Any
thoughts?

 Another option might be to use the ROW keyword, something like:
 ROW[1 , 'abc', 2.3]

This is a separate issue, just as the ARRAY[] constructor has different
uses from the array I/O representation.  I do want some kind of runtime
constructor, but ROW[...] doesn't get the job done because it doesn't
provide any place to specify the rowtype name.  Maybe we could combine
ROW[...] with some sort of cast notation?

ROW[1 , 'abc', 2.3] :: composite_type_name
CAST(ROW[1 , 'abc', 2.3] AS composite_type_name)

Does SQL99 provide any guidance here?

 TupleDescGetSlot: no-op, returns NULL TupleGetDatum: ignore slot,
 return tuple t_data pointer as datum
 
 This will work because heap_formtuple and BuildTupleFromCStrings can 
 return a HeapTuple whose t_data part is already a valid row Datum,
 simply by setting the appropriate length and type fields in it.  (If
 the tuple is ever stored to disk as a regular table row, these fields
 will be overwritten with xmin/cmin info at that time.)

 Is this the way you did things in your recent commit?

Almost.  I ended up keeping TupleDescGetSlot as a live function, but its
true purpose is only to ensure that the tupledesc gets registered with
the type cache (see BlessTupleDesc() in CVS tip).  The slot per se never
gets used.  I believe that CVS tip is source-code-compatible with
existing SRFs, even though I adjusted all the ones in the distribution
to stop using the TupleTableSlot stuff.

The main point though is that row Datums now contain sufficient info
embedded in them to allow runtime type lookup the same as we do for arrays.

 To convert a row Datum into something that can be passed to
 heap_getattr, one could use a local variable of type HeapTupleData
 and set its t_data field to the datum's pointer value.  t_len is
 copied from the datum contents, while the other fields of
 HeapTupleData can just be set to zeroes.

 I think I understand this, but an example would help.

There are several in the PL sources now, for instance plpgsql does this
with an incoming rowtype argument:

if (!fcinfo-argnull[i])
{
HeapTupleHeader td;
OidtupType;
int32tupTypmod;
TupleDesctupdesc;
HeapTupleData tmptup;

td = DatumGetHeapTupleHeader(fcinfo-arg[i]);

/* Extract rowtype info and find a tupdesc */
tupType = HeapTupleHeaderGetTypeId(td);
tupTypmod = HeapTupleHeaderGetTypMod(td);
tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod);

/* Build a temporary HeapTuple control structure */
tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
ItemPointerSetInvalid((tmptup.t_self));
tmptup.t_tableOid = InvalidOid;
tmptup.t_data = td;

exec_move_row(estate, NULL, row, tmptup, tupdesc);
 }

This is okay because the HeapTupleData is not needed after the call to
exec_move_row.

 * We have to be able to re-use an already-existing cache entry if it 
 matches a requested TupleDesc.

 For anonymous record types, how will that lookup be done efficiently? 
 Can the hash key be an array of attribute oids?

Right, that's the way I did it.  See src/backend/utils/cache/typcache.c

 As an aside, it would be quite useful to have support for arrays of 
 tuples. Any idea on how to do that without needing to define an explicit 
 array type for each tuple type?

Hmm, messy ...

I wonder now whether we still really need a separate pg_type entry for
every array type.  The original motivation for doing that has been at
least partly subsumed by storing element type OIDs inside the arrays
themselves.  I wonder if we could go over to a scheme where, say,
atttypid is the base type ID and attndims being nonzero is what you
check to find out it's really an array of atttypid.  Not sure how we
could map that idea into function and expression args/results, though.

Plan B would be to go ahead and create array types.  Not sure I would
want to do this for table rowtypes, but if we did it only for CREATE
TYPE AS then it doesn't sound like an unreasonable amount of overhead.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Better support for whole-row operations and composite types

2004-04-02 Thread Tom Lane
Joe Conway [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 ...  I believe that CVS tip is source-code-compatible with
 existing SRFs, even though I adjusted all the ones in the distribution
 to stop using the TupleTableSlot stuff.

 Almost compatible. I found that, to my surprise, PL/R compiles with no 
 changes after your commit. However it no segfaults (as I expected) on 
 composite type arguments. Should be easy to fix though (I think, really 
 haven't looked at it hard yet).

Let me know what you find out --- if I missed a trick on compatibility,
there's still plenty of time to fix it.

 ... I wonder if we could go over to a scheme where, say,
 atttypid is the base type ID and attndims being nonzero is what you
 check to find out it's really an array of atttypid.  Not sure how we
 could map that idea into function and expression args/results, though.

 Hmmm. I had thought maybe we could use a single datatype (anyarray?) 
 with in/out functions that would need to do the right thing based on the 
 element type.

If we have just one datatype, how will the parser determine the type of
a foo[subscript] expression?  After thinking a bit, I don't see how to
do that except by adding an out-of-line decoration to the underlying
type, somewhat like we do for setof or atttypmod.  This is doable as
far as the backend itself is concerned, but the compatibility
implications for clients and user-written extensions seem daunting :-(

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Better support for whole-row operations and composite types

2004-04-02 Thread Tom Lane
Joe Conway [EMAIL PROTECTED] writes:
 I still haven't had time to look closely, and well may have been doing 
 something non-standard all along, but in any case this is the current 
 failing code:

  /* for tuple args, convert to a one row data.frame */
  TupleTableSlot *slot = (TupleTableSlot *) arg[i];
  HeapTupletuples = slot-val;
  TupleDesctupdesc = slot-ttc_tupleDescriptor;

Um.  Well, the arg is not a TupleTableSlot * anymore, so this is
guaranteed to fail.  This isn't part of what I thought the documented
SRF API was though.  If you take the arg[i] value and pass it to
GetAttributeByName or GetAttributeByNum it will work (with some compiler
warnings) and AFAICS we never documented more than that.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Better support for whole-row operations and composite types

2004-04-02 Thread Tom Lane
Joe Conway [EMAIL PROTECTED] writes:
 ... The SRF API was for user defined functions, not
 procedural languages anyway. I'll look at how the other procedural
 languages handle tuple arguments.

It was a dozen-or-so-lines change in each of the PL languages AFAIR.
You will probably also want to look at what you do to return tuple
results.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Better support for whole-row operations and composite types

2004-04-02 Thread Tom Lane
Joe Conway [EMAIL PROTECTED] writes:
 Just for reference, what is arg[i] if it isn't a (TupleTableSlot *) 
 anymore -- is it just a HeapTuple?

No, it's a HeapTupleHeader pointer.  You need to reconstruct a HeapTuple
on top of that to work with heap_getattr and most other core backend
routines.  Also don't forget to ensure that you detoast the datum;
this is not useful at the moment but will be important Real Soon Now.
I added standard argument-fetch macros to fmgr.h to help with the
detoasting bit.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Better support for whole-row operations and composite types

2004-03-30 Thread Josh Berkus
Tom,

 We have a number of issues revolving around the fact that composite types
 (row types) aren't first-class objects.  I think it's past time to fix
 that.  Here are some notes about doing it.  I am not sure all these ideas
 are fully-baked ... comments appreciated.

I'll want to add to the documentation on composite types, then.   We'll need a 
stern warning to users not to abuse them.  Easily done, I think.

Composite types are frequently abused by OO and Windows programmers to break 
the relational model.I used to be an MSDN member (thank you, I've 
recovered) and frequently ran into, on the mailing list, users getting 
themselves into some unresolvable mess becuase they'd used composite types in 
SQL server to combine several rows ... or even effectively an entire child 
table ... into one field.

Othewise, looks good to me.   I don't think I'm qualified to second-guess you 
on the implementation.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[HACKERS] Better support for whole-row operations and composite types

2004-03-29 Thread Tom Lane
We have a number of issues revolving around the fact that composite types
(row types) aren't first-class objects.  I think it's past time to fix
that.  Here are some notes about doing it.  I am not sure all these ideas
are fully-baked ... comments appreciated.

When represented as a Datum, the format of a row-type object needs to be
something like this:

* overall length:   int4(this makes the Datum a valid varlena item)
* row type id:  Oid (either a composite type id or RECORDOID)
* row type typmod:  int4(see below for usage)
-- pad if needed to MAXALIGN boundary
* heap tuple representation, beginning with a HeapTupleHeaderData struct

If we do it exactly as above then we will be wasting some space, because
the xmin/xmax/cmax and ctid fields of HeapTupleHeaderData are of no use
in a row that isn't actually a table member row.  It is very tempting to
overlay the length and rowtype fields with the HeapTupleHeaderData struct.
This would save some code as well as space --- see discussion below.

Only named composite types, not RECORD, will be allowed to be used as
table column types.  This ensures that any row object stored on disk will
have a valid composite type ID embedded in it, so that the row structure
can be retrieved when the row is read.  However, we want to be able to
support row objects in memory that are of transient record types (for
example, the output of a function returning RECORD will have a record type
determined by the query itself).  I propose that we handle this case by
setting the type id to RECORDOID and using the typmod to identify the
particular record type --- the typmod will essentially be an index into
a backend-local cache of record types.  More detail below.

We'll add tdtypeid and tdtypmod fields to TupleDesc structs.  This
will make it easy to set the embedded type information correctly when
manufacturing a row datum using a TupleDesc.  For TupleDescs associated
with relations, tdtypeid is just the relation's row type OID, and tdtypmod
is -1.  For TupleDescs representing transient row types, we initially set
tdtypeid to RECORDOID and tdtypmod to -1 (indicating a completely
anonymous row type).  If the row type actually needs to be identifiable
then we establish a cache entry for it and set the typmod to an index for
the cache entry.  I think this will only need to happen when the query
contains a function-returning-RECORD or a whole-row variable referencing
what would otherwise be an anonymous row type, such as a JOIN result.

Composite types, as well as the RECORD type, will be marked in pg_type as
pass-by-ref, varlena (typlen -1), typalign 'd'.  (We will use the maximum
alignment always to avoid any dependency on types of the contained
columns.)

The present function call and return conventions involving TupleTableSlots
will be replaced by simply passing and returning these row objects as
pass-by-reference Datums.  In the case of functions returning rowtypes,
we'll continue to support the present ReturnSetInfo convention for
returning a separate TupleDesc describing the result type --- but this
will just be a crosscheck.

We will be able to make generic I/O routines for composite types,
comparable to those used now for arrays.  Not sure what a convenient
external format would look like.  (Possibly use the same conventions as
for a 1-D array?)  We will need to make the convention that the type OID
of a composite type is passed to the input routine, in the same way that
an array input routine gets the typelem OID; else the input routine won't
know what to do.

We could also think about allowing functions that are declared as
accepting RECORD (ie, polymorphic-across-row-types functions).  They would
use the same methods already used by polymorphic functions to find out the
true types of their inputs.  (Might be best to invent a separate
pseudotype, say ANYRECORD, rather than overloading RECORD for this purpose.)

The recently developed SRF API is a bit unfortunate since it exposes the
assumption that a TupleTableSlot must be involved in returning a tuple.
If we don't overlay the Datum header with HeapTupleHeader then I think we
have to make TupleGetDatum copy the passed tuple and insert the row type
info from the slot's tupledesc, which'd be pretty inefficient because it
means making an extra copy of the row data.  But if we do overlay the
header fields, then I think we can set up backwards-compatibility
definitions in which the slot is simply ignored.  Specifically:

TupleDescGetSlot: no-op, returns NULL
TupleGetDatum: ignore slot, return tuple t_data pointer as datum

This will work because heap_formtuple and BuildTupleFromCStrings can
return a HeapTuple whose t_data part is already a valid row Datum, simply
by setting the appropriate length and type fields in it.  (If the tuple is
ever stored to disk as a regular table row, these fields will be
overwritten with xmin/cmin info at that time.)

To convert a row Datum into something 

Re: [HACKERS] Better support for whole-row operations and composite types

2004-03-29 Thread Greg Stark

Tom Lane [EMAIL PROTECTED] writes:

 We have a number of issues revolving around the fact that composite types
 (row types) aren't first-class objects.  I think it's past time to fix
 that.  
...
 Only named composite types, not RECORD, will be allowed to be used as
 table column types.  

If I understand what you're talking about, you would be allowed to CREATE TYPE
a composite type, like say, address and then use that as a datatype all over
your database? And then if you find address needs a new field you can add it
to the type and automatically have it added all over your database to any
table column using that type?

Speaking as a user, that would be **very** nice. I've often found myself
wishing for just such a feature. It would simplify data model maintenance a
whole heck of a lot.

How will client programs see the data if i do a select *? In my ideal world
it would be shipped over in a binary representation that a driver would
translate to a perl hash / php array / whatever. But maybe it would be simpler
to just ship them over the subcolumns with names like shipping.line_1 and
shipping.country.



-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Better support for whole-row operations and composite types

2004-03-29 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 If I understand what you're talking about, you would be allowed to
 CREATE TYPE a composite type, like say, address and then use that as
 a datatype all over your database? And then if you find address
 needs a new field you can add it to the type and automatically have it
 added all over your database to any table column using that type?

I believe that would work, though you might have some issues with cached
plans.

 How will client programs see the data if i do a select *?

TBD.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html