Jim,

Thanks for the suggestions, they're excellent.

Partly, I'm using this current task as an opportunity to dig into the
subject of bulk imports in Postgres. So far, I've figured out:

* Prepared statements don't deliver any real speed gain.

* Yes, bundling operations into transactions with explicit BEGIN; and
COMMIT; statements helps a lot. Shipping versions of Postgres use a
transaction invisibly for each statement, which makes lots of individual
inserts a bit expensive.

* DROP index before import and CREATE index after import is a real
time-saver. It's not hard to automate the process. I have decided to leave
unique indexes in place as, well, they're important and otherwise you have
to drop table constraints. And as the docs say, "think twice."

I had to spend a bit of time to work out the full UPSERT (INSERT ON
CONFLICT) syntax in Postgres. At first my reactin was, eh? I was used to
MySQL where it's pretty easy. In Postgres? A lot more complicated-looking.
Once I understood the reasoning, I was pretty happy with what Postgres
offers in this respect. You can, for example, do something like this:

-- Add this row if it doesn't exist
-- If it does exist, update it with the new data
-- unless the new data is the same as the old data

It's pretty easy to do, particularly with some advanced planning. (I'm
using a checksum on the values to distinguish between changed and unchanged
rows.)

Next up was to retest multi-value inserts and COPY...and then look at
files, but you guys are getting me excited about files now.

There's likely to be a lot of importing on a regular basis, so it seems
fruitful to check out the details now. Ultimately, Postgres is just another
piece of infrastructure/plumbing that I just want to get out of the way.
But given the potential scale of the data, ignorant design choices could be
pretty painful to redo.

For the record
4D is for data grinding and prep - raw transactional/relational data.
Postgres is serving as a data warehouse with summary data and extracts
Analytics will be done by something else

So, it's a data pipeline with 4D at the first step (importing, cleaning and
collating source data) and Postgres at the next step (storing and exposing
reporting/analysis data.)

Loving the great answers, much appreciated!
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**********************************************************************

Reply via email to