Primary motivation of the attached patch is to support handling bytea
conversion allowing for embedded nulls, which in turn allows for
supporting the marshal module.
Secondary motivation is slightly improved performance for conversion
routines of basic datatypes that have simple mappings between
postgres/python.
Primary design is to change the conversion routines from being based
on cstrings to datums, eg:
PLyBool_FromString(const char *) =>
PLyBool_FromBool(PLyDatumToOb, Datum);
I have reworked this patch a bit and extended the plpython test suite
around it. Current copy attached.
The remaining problem is that the patch loses domain checking on the
return types, because some paths no longer go through the data type's
input function. I have marked these places as FIXME, and the regression
tests also contain a failing test case for this.
What's needed here, I think, is an API that takes a datum plus type
information and checks whether the datum is valid within the domain. I
haven't found one that is exported, but maybe someone could give a tip.
I see an intersection between the work I'm currently doing on COPY BINARY
and this.
Basically if you have an INT, you aren't going to make lots of checks.
However, for a TEXT, postgres needs to reject it if it has a NULL in it
(which doesn't bother Python at all), or if it is has chars which are not
valid in the current encoding, etc.
Many other types like TIMESTAMP have checks which are absolutely necessary
for correctness...
What's needed here, I think, is an API that takes a datum plus type
information and checks whether the datum is valid within the domain. I
haven't found one that is exported, but maybe someone could give a tip.
Problems :
- If the data you're trying to put in the Datum doesn't fit (example : out
of range error, varchar too small, etc), and you want a
datum-type-specific function to check your datum and reject it, how are
you going to build the datum ? perhaps you can't, since your value doesn't
fit. It's a chicken and egg problem : the check function that you expect
to reject your invalid datum will not know it's invalid, since you've
trimmed it at the edges to make it fit in the required Datum type...
- you are going to build a datum that is perhaps valid, and perhaps not,
and send this to a function... having known-invalid datums moving around
could be not such a good idea...
Why not use the copy binary format to communicate between python and pg ?
-> you write code to serialize python objects to binary form
-> you call the recv function to get a postgres datum
-> recv function throws an error if there is any problem
-> as a bonus, you release your python object <-> postgres binary code as
a separate library so people can use it to output data readable by COPY
BINARY and parse COPY BINARY dumps.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers