On 05/01/2013 10:41, Darren Duncan wrote:
My understanding about Postgres either compressing strings or using
"toast" segments for longer ones is that this is just an internal
implementation detail and that user-facing concepts like data types
should be ignoring these differences. Character data is just "text",
a sequence of characters of arbitrary length, and that's all there is
to it.
It does indeed:
http://www.postgresql.org/docs/9.2/static/storage-toast.html
So the PostgreSQL idea of char, varchar, text, they can all store up to
around 1gb.
I'm currently trying to find some sort of equivalence between the RDBMS
types. For PostgreSQL, the distinction between what they might call char
and clob are hidden from the user, but they do kind of exist internally.
I've got a copy of the standard, but I'm pretty sure I'd be breaking
some law if
I copied and pasted bits. DECIMAL is supposed to be an "exact
numeric". Whereas
FLOAT, REAL, DOUBLE PRECISION are "approximate numeric".
So I guess DOUBLE is a better fit as it's supposed to be more
accurate. But
neither are a good match really.
Here's the thing. The most important difference is "exact numeric"
versus "approximate numeric". Your type list should clearly and
explicitly separate these into separate rows from each other, and
never cross types between them.
Yes, there is good argument for this. I guess it would help to know
exactly what the DBD type_info is used for. I'm not sure if it has some
internal use, or whether it's just something to give to users who are
asking for particular types.
Things like integers or rationals or DECIMAL etc are exact numeric.
Things like FLOAT and DOUBLE are analogous to IEEE floats which are
inexact and are defined by using a certain number of bits to store
every value of their type.
I don't recall which of the above REAL goes in.
REAL is approximate, inexact.
If different DBMSs use the same FOO to be exact in one or approximate
in another, still keep them in different rows.
Since the SQL standard, as well as some other programming languages,
define "decimal" as being exact numeric, then it is absolutely wrong
to map them to either FLOAT or DOUBLE. In fact, in Perl 5 the only
native type that can map to a DECIMAL is a character string of numeric
digits. Don't shoehorn this. There is no good reason to map DECIMAL
to a floating point type.
I'm not overly familiar with Perl's internal handling of number. I guess
if you have DECIMAL from a character string Perl will switch it out to
an approximate the moment you do a calculation on it. Furthermore if the
DBI (or the DBDs, I'm not sure where the distinction lies) is already
putting it into a Perl decimal which is floating point, then the battle
has already been lost before it gets to me.
Likewise, under no circumstance, map integers to floating types.
It also has SQL_VARCHAR assoicated with TEXT instead of VARCHAR.
Not sure about this one either - if there was a reason for that,
I don't remember it offhand.
Thanks for doing this work, it's good to see someone poking at
this. +1 for getting the DBDs somewhat consistent, although it
may be tricky as some (looking at you, MySQL!) play more fast and
loose with their data type definitions than others. :)
Yes, the hard part is certainly trying to find consistency, or a way
of making
them act/emulate some consistency.
Since in Postgres a VARCHAR is just a restricted TEXT, map
length-restricted character types to VARCHAR and unrestricted ones
like Perl strings to TEXT, that would carry the best semantics.
PostgreSQL do recommend using VARCHAR instead of CHAR. Due to the way
they are implemented, VARCHAR is usually more efficient.
Lyle