Greg Stark wrote:

David Rysdam <[EMAIL PROTECTED]> writes:



In my brute force port, I just bulk copied the date
fields into temporary tables and then did a to_timestamp(field, 'Mon DD YYYY
HH:MI:SS:MSAM').





Again, I created a temporary table and did a decode(field, 'hex') to the
real table.



This is the standard approach. You're rather lucky these are the only data representation changes you've had to do so far. I fear you'll run into more and more complex changes over time and trying to avoid the temporary table will get harder and harder.



No, I think I'm OK there. These are programmatically-generated values and I've already been through them all once. Just the millisecond issue and the hex binary issue AFAIK.

If it were me I would consider processing the files in perl. It should be
pretty easy to do both of these modifications very quickly.



Very quick and easy to do one time. A little trickier to handle in an elegant, maintainable way for the dozens of data reloads I do every month for GBs of data onto two different server types.

If you really want to go with a custom C code then you might be able to just
grab the byteain/byteaout functions from src/backend/util/adt/varlena into a
separate module and create new functions with modified names. Load it with
CREATE FUNCTION byteain ... AS 'my_bytea_funcs.so' 'my_byteain';

Or maybe create the function as my_byteain in postgres and then update the
catalog entries somehow. I'm not sure how to do that but it shouldn't be too
hard. And it might make it easier to do the substitution for the data load and
then undo the change afterwards.



Why not create a type and then define the load function to be the equivalent of "decode('hex')"?

Doing the same for timmestamp is a bit trickier but you could copy
ParseDateTime from datetime.c as a static function for your module.

Be careful though, test this out thoroughly on a test database. I'm not sure
of all the impacts of altering the in/out functions for data types. I expect
it would break pg_dump, for example. And I would worry about the statistics
tables too.



This is kind of a hybrid of my suggestions and the problems are a hybrid as well. :)

1) Just change the timestamp type so that it allows a ':' delimiter for milliseconds. Potential problems: Other parts of the code won't expect it. People don't want that.

2) Create a new type. Potential problem: Things like date ranges probably wouldn't work anymore, since the server wouldn't know it's a date now.



---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Reply via email to