PostgreSQL info

Gil Hale Wed, 02 Apr 2008 10:33:33 -0700

> Now that your data is in place, what calls do you perform on it and do you
> have much in the way of add / update or is it all a query to see if you
> can't use that #, or you get a list of #'s that can be used with the name
> addy data as well?
>

Once the PG table is populated it is used for read-only purposes, to build
target VFP tables containing a subset of phone numbers within one or more
Area Codes.  To pull the desired records from the PG table I am using a
simple parameterized remote view.  I initialize the memory variable used for
the parameterized view as '', so when I first load the remote view into
memory it does not pull any records.  I then use a set of configuration
tables that define which area codes are used for which client, and set the
memory variable to that specific area code I want to pull into the remote
view.  Once the records start to float over I fire off the RECCOUNT()
command and they get sent over as quickly as the machinery and LAN can run.
Then I simply select the remote view's record set into the target VFP
table(s) as needed, one area code at a time, and append said individual area
codes into the master update table per client, zip the .dbf file for each
client and park the zip file onto the ftp server for later automated
retrieval by the client machinery.

The VFP table for each client will end up with records for several area
codes.  Once it is pulled into the client's server, and unzipped, I update
their production DoNotCall.dbf table with the new record set, looking for
records to be deleted or added into the production table.  Once that is done
I have a routine I run to scrub the Customer table phone #s against the
updated DoNotCall table.  The application that hits the Customer and
DoNotCall tables also uses tables with current customer Sales and Service
activity, which is how I determine if the 18 month Existing Business
Relationship exemption over-rides the FTC DoNotCall listing or not - that is
done on-the-fly in the app itself.

Before I outgrew VFP for the master Area Code table populated with the csv
files from the FTC, the pure VFP process worked fine, although somewhat
slower than the new VFP-PG hybrid version.  Even if I did not bump over the
2.0Gb file size limit it would have been marginally worth moving to PG just
to save some processing time.

I have done some updateable remote view work with VFP-PG, but not with
tables as large as what I am now dealing with for DoNotCall purposes.  From
what I can tell the writing back to the PG table via a VFP application
interface seems snappy enough to have it run behind a live app.  I normally
do not deal with table over 700,000 records, and they are usually under
200,000 records.  I have had a few VFP table come close to the 2Gb limit,
but by purging out dead data I hae always managed to not push over the top,
until this latest DoNotCall update from the FTC.  Time will tell how well PG
performs in a read-write-append live environment once I get around to that
part of a back end migration.  That may be a while, as nothing is yet
broken, so no need to correct it <g>...

Gil

> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of Stephen Russell
> Sent: Wednesday, April 02, 2008 10:33 AM
> To: [EMAIL PROTECTED]
> Subject: Re: [NF] RE: Microsoft New SQL OS >>> PostgreSQL info
>
>
> On Wed, Apr 2, 2008 at 9:13 AM, Gil Hale <[EMAIL PROTECTED]> wrote:
>
> > Well, I downloaded the latest FTC DoNotCall Area Code files Tuesday
> > afternoon, then unzipped them for processing.  Most of the files are
> > grouped
> > by state, so they will contain more than a single Area Code.  A
> few other
> > files are for specific Area Codes, or for US territories with
> one or more
> > Area Codes (Puerto Rico, US Samoa, US Virgin Islands).
> >
> > The file names are rather long (50+ characters), but the core
> > State/Territory/Area Code value for each file is preceded and appended
> > with
> > an underscore ("_") character.  Using the ADIR() function I renamed each
> > of
> > the 67 files with a shorter name (State, Area Code or
> Territory, preceded
> > with "_") than originally provided, just to keep things simple for me to
> > keep track of.  As part of the file renaming code I also created a SQL
> > Script file to use with the PGAdminIII SQL command window to import each
> > of
> > the csv files.  I ended up importing 9.07Gb of csv data into the
> > PostgreSQL
> > database (plus another 4.3Gb for the indexes).  That represents
> > 157,889,571
> > records.
> >
> > It took the machinery 3 hours and 34 minutes to import all the records
> > from
> > all 67 csv files, which covers all Area Codes in the US and its
> > territories.
> > Normally I would have only imported the files I needed for my clients'
> > Area
> > Codes they subscribed to, but curiosity got the better of me this time
> > <g>.
> > I guess I just wanted to see how well PostgreSQL could handle this size
> > table.  It did fine, but I see the "DoNotCall table" is
> actually comprised
> > of a series of files, each "only" 1Gb in size.  So I guess
> PostgreSQL does
> > its own internal file spanning for the entire DoNotCall table.
> Ditto for
> > the index, the files are split into 1Gb pieces.  Very interesting.
> >
> > After finishing the csv import process I ran my DoNotCall build routine
> > that
> > creates the updated .dbf files for my client machinery based on the Area
> > Codes they subscribed to.  Part of the build routine includes
> the Zipping
> > of
> > the .dbf file for each client/group using ZipGenius via CLI.  From there
> > each client PC connects to my ftp Server on a timed basis, downloads and
> > unzips the updated .dbf file, then processes the contents of the updated
> > file against the Customer.dbf table's phone numbers.  All
> automated.  Very
> > cool.
> >
> > Were it not for the PostgreSQL piece of the puzzle to handle
> the enormous
> > table size (even for only the 23 Area Codes I normally would have
> > imported)
> > I would not be able to provide the same level of automation and
> > flexibility
> > with native VFP due to the size of the required DoNotCall table alone.
> >  And
> > splitting the Area Codes into multiple tables could have
> worked, but would
> > not only have been a kludge, but resulted in a far slower
> processing time
> > overall.  All of VFP table builds took only a few minutes once the csv
> > file
> > imports were done.  I did not notice any speed degradation
> between using a
> > 2.5Gb PostgreSQL table with only 23 Area Codes in it vs using a 9.07Gb
> > PostgreSQL with all Area Codes in it.  The parameterized views ran
> > extremely
> > fast against the fully populated PostgreSQL table - very impressive of
> > both
> > PostgreSQL and VFP.  They certainly seem to play together well.
> >
> > The significant time burden was due to me deciding to import all the FTC
> > csv
> > files as opposed to only the ones I needed - no surprise there
> (just over
> > 20
> > minutes vs 3:34 hours).  I did note, however, that importing
> the csv file
> > into the PostgreSQL table via PGAdminIII was significantly
> faster than if
> > I
> > had run an APPEND FROM <FileName> CSV into a VFP table.  I have no
> > recorded
> > time comparisons, just my gut feel based on years of watching VFP import
> > records from large csv files (millions of records at a crack).  And, for
> > the
> > record, when I do my imports into a VFP table I leave any
> index/order set
> > off, as having a table with an active index/order will slow
> down an import
> > process horribly with large tables as VFP tries to rearrange imported
> > records into the active sort order...
> >
> > Now to test the process on the Compaq Server with 1/2 the RAM
> and a simple
> > SATA drive, but running Linux <g>...  heh-heh...  I may wait a few weeks
> > to
> > do that.
> >
> > -------------------------------------------------
>
> Now that your data is in place, what calls do you perform on it and do you
> have much in the way of add / update or is it all a query to see if you
> can't use that #, or you get a list of #'s that can be used with the name
> addy data as well?
>
>
>
>
>
> --
> Stephen Russell
> Sr. Production Systems Programmer
> Mimeo.com
> Memphis TN
>
> 901.246-0159
>
>
> --- StripMime Report -- processed MIME parts ---
> multipart/alternative
>   text/plain (text body -- kept)
>   text/html
> ---
>
[excessive quoting removed by server]

_______________________________________________
Post Messages to: [email protected]
Subscription Maintenance: http://leafe.com/mailman/listinfo/profox
OT-free version of this list: http://leafe.com/mailman/listinfo/profoxtech
Searchable Archive: http://leafe.com/archives/search/profox
This message: http://leafe.com/archives/byMID/profox/[EMAIL PROTECTED]
** All postings, unless explicitly stated otherwise, are the opinions of the 
author, and do not constitute legal or medical advice. This statement is added 
to the messages for those lawyers who are too stupid to see the obvious.

RE: [NF] RE: Microsoft New SQL OS >>> PostgreSQL info

Reply via email to