On 05/01/2013 10:41, Darren Duncan wrote:
My understanding about Postgres either compressing strings or using "toast" segments for longer ones is that this is just an internal implementation detail and that user-facing concepts like data types should be ignoring these differences. Character data is just "text", a sequence of characters of arbitrary length, and that's all there is to it.

It does indeed:
http://www.postgresql.org/docs/9.2/static/storage-toast.html
So the PostgreSQL idea of char, varchar, text, they can all store up to around 1gb. I'm currently trying to find some sort of equivalence between the RDBMS types. For PostgreSQL, the distinction between what they might call char and clob are hidden from the user, but they do kind of exist internally.

I've got a copy of the standard, but I'm pretty sure I'd be breaking some law if I copied and pasted bits. DECIMAL is supposed to be an "exact numeric". Whereas
FLOAT, REAL, DOUBLE PRECISION are "approximate numeric".

So I guess DOUBLE is a better fit as it's supposed to be more accurate. But
neither are a good match really.

Here's the thing. The most important difference is "exact numeric" versus "approximate numeric". Your type list should clearly and explicitly separate these into separate rows from each other, and never cross types between them.

Yes, there is good argument for this. I guess it would help to know exactly what the DBD type_info is used for. I'm not sure if it has some internal use, or whether it's just something to give to users who are asking for particular types.

Things like integers or rationals or DECIMAL etc are exact numeric.

Things like FLOAT and DOUBLE are analogous to IEEE floats which are inexact and are defined by using a certain number of bits to store every value of their type.

I don't recall which of the above REAL goes in.

REAL is approximate, inexact.

If different DBMSs use the same FOO to be exact in one or approximate in another, still keep them in different rows.

Since the SQL standard, as well as some other programming languages, define "decimal" as being exact numeric, then it is absolutely wrong to map them to either FLOAT or DOUBLE. In fact, in Perl 5 the only native type that can map to a DECIMAL is a character string of numeric digits. Don't shoehorn this. There is no good reason to map DECIMAL to a floating point type.

I'm not overly familiar with Perl's internal handling of number. I guess if you have DECIMAL from a character string Perl will switch it out to an approximate the moment you do a calculation on it. Furthermore if the DBI (or the DBDs, I'm not sure where the distinction lies) is already putting it into a Perl decimal which is floating point, then the battle has already been lost before it gets to me.

Likewise, under no circumstance, map integers to floating types.

It also has SQL_VARCHAR assoicated with TEXT instead of VARCHAR.
Not sure about this one either - if there was a reason for that,
I don't remember it offhand.

Thanks for doing this work, it's good to see someone poking at
this. +1 for getting the DBDs somewhat consistent, although it
may be tricky as some (looking at you, MySQL!) play more fast and
loose with their data type definitions than others. :)

Yes, the hard part is certainly trying to find consistency, or a way of making
them act/emulate some consistency.

Since in Postgres a VARCHAR is just a restricted TEXT, map length-restricted character types to VARCHAR and unrestricted ones like Perl strings to TEXT, that would carry the best semantics.

PostgreSQL do recommend using VARCHAR instead of CHAR. Due to the way they are implemented, VARCHAR is usually more efficient.


Lyle

Reply via email to