Re: [PERFORM] WAL + SSD = slow inserts?

2013-12-05 Thread Rosser Schwarz
On Thu, Dec 5, 2013 at 9:55 PM, Skarsol  wrote:

> The rule is being used to return the id of the insert...
>

Take a look at the RETURNING clause of the INSERT statement. That should
meet your needs here without having to bother with rules.

rls

-- 
:wq


Re: [PERFORM] Triggers and transactions

2013-01-28 Thread Rosser Schwarz
On Mon, Jan 28, 2013 at 10:54 AM, Craig James  wrote:

> But if the drop-and-restore-trigger operation blocks all access to the
> tables, that's a problem.
>

Were the triggers in question created with "CREATE CONSTRAINT TRIGGER"?  If
not, "ALTER TABLE foo DISABLE TRIGGER USER" may do what you need here.

rls

-- 
:wq


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Rosser Schwarz
On Tue, Dec 11, 2012 at 5:17 PM, Evgeny Shishkin wrote:

> Actually most of low-end SSDs don't do write caching, they do not have
> enough ram for that.
>

AIUI, *all* SSDs do write-caching of a sort: writes are actually flushed to
the NAND media by erasing, and then overwriting the erased space, and
erasing is done in fixed-size blocks, usually much larger than a
filesystem's pages.  The drive's controller accumulates writes in an
on-board cache until it has an "erase block"'s worth of them, which are
then flushed.  From casual searching, a common erase block size is 256
kbytes, while filesystem-level pages are usually 4k.

Most low-end (and even many mid-range) SSDs, including Sandforce-based
drives, don't offer any form of protection (e.g., supercaps, as featured on
the Intel 320 and 710-series drives) for the data in that write cache,
however, which may be what you're thinking of.  I wouldn't let one of those
anywhere near one of my servers, unless it was a completely disposable,
load-balanced slave, and probably not even then.

rls

-- 
:wq


Re: [PERFORM] A very long running query....

2012-07-20 Thread Rosser Schwarz
On Fri, Jul 20, 2012 at 2:27 PM, Ioannis Anagnostopoulos
 wrote:
> On 20/07/2012 22:23, Claudio Freire wrote:
>> Misestimated row counts... did you try running an analyze, or upping
>> statistic targets?
> I have run analyse every so often. I think the problem is that as I get 16K
> new rows every minutes, the "stats" are always out... Possible?

It may not help much with any skew in your data that results from
divergent data appearing, but you can update the statistics targets
for those columns and analyze again, and the planner should have much
better information about the distributions of their data.  The max
stats target is 1, but the default is 100.  Increasing it even
just to 500 or 1000 should help the planner significantly.

rls

-- 
:wq

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD selection

2012-05-15 Thread Rosser Schwarz
On Tue, May 15, 2012 at 8:21 AM, Віталій Тимчишин  wrote:
> We are using Areca controller with BBU. So as for me, question is: Can 520
> series be set up to handle fsyncs correctly?

No.

The cause for capacitors on SSD logic boards is that fsyncs aren't
flushed to NAND media, and hence persisted, immediately. SSDs are
divided into "pages", called "erase blocks" (usually much larger than
the filesystem-level block size; I don't know offhand what the block
size is on the 710, but on the older X-25 drives, it was 128K).  All
writes are accumulated in the on-board cache into erase block sized
chunks, and *then* flushed to the NAND media. In a power-loss
situation, the contents of that cache won't be preserved unless you
have a capacitor.  In some drives, you can disable the on-board cache,
but that does absolutely atrocious things both to your drive's
performance, and its longevity.

As the other posters in this thread have said, your best bet is
probably the Intel 710 series drives, though I'd still expect some
320-series drives in a RAID configuration to still be pretty
stupendously fast.

rls

-- 
:wq

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to improve insert speed with index on text column

2012-01-31 Thread Rosser Schwarz
On Mon, Jan 30, 2012 at 9:46 AM, Saurabh  wrote:
> I can not create the index after insertion because user can
> search the data as well while insertion.

Remember, DDL is transactional in PostgreSQL.  In principle, you
should be able to drop the index, do your inserts, and re-create the
index without affecting concurrent users, if you do all of that inside
an explicit transaction.  Doing the inserts inside a transaction may
speed them up, as well.

rls

-- 
:wq

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux I/O schedulers - CFQ & random seeks

2011-03-04 Thread Rosser Schwarz
On Fri, Mar 4, 2011 at 10:34 AM, Glyn Astill  wrote:
> I'm wondering (and this may be a can of worms) what peoples opinions are on 
> these schedulers?  I'm going to have to do some real world testing myself 
> with postgresql too, but initially was thinking of switching from our current 
> CFQ back to deadline.

It was a few years ago now, but I went through a similar round of
testing, and thought CFQ was fine, until I deployed the box.  It fell
on its face, hard.  I can't find a reference offhand, but I remember
reading somewhere that CFQ is optimized for more desktop type
workloads, and that in its efforts to ensure fair IO access for all
processes, it can actively interfere with high-concurrency workloads
like you'd expect to see on a DB server -- especially one as big as
your specs indicate.  Then again, it's been a few years, so the
scheduler may have improved significantly in that span.

My standard approach since has just been to use no-op.  We've shelled
out enough money for a RAID controller, if not a SAN, so it seems
silly to me not to defer to the hardware, and let it do its job.  With
big caches, command queueing, and direct knowledge of how the data is
laid out on the spindles, I'm hard-pressed to imagine a scenario where
the kernel is going to be able to do a better job of IO prioritization
than the controller.

I'd absolutely recommend testing with pg, so you can get a feel for
how it behaves under real-world workloads.  The critical thing there
is that your testing needs to create workloads that are in the
neighborhood of what you'll see in production.  In my case, the final
round of testing included something like 15-20% of the user-base for
the app the db served, and everything seemed fine.  Once we opened the
flood-gates, and all the users were hitting the new db, though,
nothing worked for anyone.  Minute-plus page-loads across the board,
when people weren't simply timing out.

As always, YMMV, the plural of anecdote isn't data, &c.

rls

-- 
:wq

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] prepared query performs much worse than regular query

2010-05-21 Thread Rosser Schwarz
On Fri, May 21, 2010 at 4:53 PM, Richard Yen  wrote:
> Any ideas why the query planner chooses a different query plan when using 
> prepared statements?

A prepared plan is the best one the planner can come up with *in
general* for the query in question.  If the distribution of the values
you're querying against -- in your case, "owner" and "assignment" --
aren't relatively uniform, that plan is going to be suboptimal, if not
downright pathological, for the more outlying-ly distributed values.

Looking at your prepared plan, it seems that, on average, there are
177 rows for every "assignment", and 184 per "owner".  As it turns
out, though, nearly a quarter of your table has an "owner" of -1.
It's not terribly surprising, with a table that big and a distribution
skew of that magnitude, that this query plan, with these arguments,
ends up pretty firmly in the "pathological" category.

rls

-- 
:wq

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PgPool II configuration with PostgreSQL 8.4

2010-05-06 Thread Rosser Schwarz
On Wed, May 5, 2010 at 10:17 PM, Neha Mehta wrote:

> I am trying to have synchronous master-master replication in PostgreSQL8.4
> using PgPool II. I am not able to configure PgPool on the system, it gives
> me an error, libpq is not installed or libpq is old.
>
(FYI: This topic is probably more germane to the -ADMIN list, or at least
-GENERAL, than it is to -PERFORM.)

Is there a particular reason you're building pgpool, rather than installing
it via your distribution's package manager?  Most distributions have it
available these days.  (At a minimum, any distribution that's widely-used
and well-enough understood to warrant hosting something as critical as your
RDBMS should have it.)

FWIW, I'm successfully using pgpool-II against a pair of 8.4 instances (in
the connection pool mode, not replication, and all installed from the PGDG
RPM repository).  I'm also using Bucardo (in its multi-master/swap mode) to
handle the replication, as suggested by someone else down-thread.  So
there's an existence proof that it *can* work.

Finally, when PostgreSQL is installed, libpq.so.N is usually put under
/usr/lib(64)/, not under the postgres install directory.  Your distribution
should have a postgresql-devel package available which will provide a
pg_config command that can be used to pass the *actual* installed locations
to a configure invocation, as in:

./configure --with-pgsql-libdir=`pg_config --libdir`...

rls

-- 
:wq


Re: [PERFORM] DB is slow until DB is reloaded

2010-01-04 Thread Rosser Schwarz
On Mon, Jan 4, 2010 at 2:57 PM, Madison Kelly  wrote:
> As for autovacuum, I assumed (yes, I know) that all v8.x releases enabled it
> by default. How would I confirm that it's running or not?

I believe it's not enabled by default in 8.1-land, and is as of 8.2
and later.  Whether it's running or not, try "SELECT * FROM
pg_autovacuum;".  If that returns the null set, it's not doing
anything, as it hasn't been told it has anything to do.

IME, however, if you really want to benefit from the autovacuum
daemon, you probably do want to be on something more recent than 8.1.
(And, yes, this is a bit of the pot calling the kettle black: I have a
mixed set of 8.1 and 8.3 hosts.  Autovacuum is only running on the
latter, while the former are queued for an upgrade.)

rls

-- 
:wq

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance drop on RAID5

2005-08-24 Thread Rosser Schwarz
On 8/24/05, Alexandre Barros <[EMAIL PROTECTED]> wrote:

> i wouldn't be so stunned if the newer machine was ( say ) twice faster
> than the older server, but over three times faster is disturbing.

RAID5 on so few spindles is a known losing case for PostgreSQL.  You'd
be far, far better off doing a pair of RAID1 sets or a single RAID10
set.

/rls

-- 
:wq

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [SQL] ORDER BY Optimization

2005-05-06 Thread Rosser Schwarz
while you weren't looking, Derek Buttineau|Compu-SOLVE wrote:

> I'm hoping this is the right place to send this.

The PostgreSQL Performance list, pgsql-performance@postgresql.org
would be more appropriate. I'm copying my followup there, as well.

As for your query, almost all the time is actually spent in the
nestloop, not the sort.  Compare:

>   ->  Sort  (cost=31402.85..31405.06 rows=886 width=306) (actual
> time=87454.187..87454.240 rows=10 loops=1)

vs.

>  ->  Nested Loop  (cost=0.00..31359.47 rows=886 width=306)
> (actual time=4.740..86430.468 rows=26308 loops=1)

That's 50-ish ms versus 80-odd seconds.

It seems to me a merge join might be more appropriate here than a
nestloop. What's your work_mem set at?  Off-the-cuff numbers show the
dataset weighing in the sub-ten mbyte range.

Provided it's not already at least that big, and you don't want to up
it permanently, try saying:

SET work_mem = 10240; -- 10 mbytes

immediately before running this query (uncached, of course) and see
what happens.

Also, your row-count estimates look pretty off-base.  When were these
tables last VACUUMed or ANALYZEd?

/rls

-- 
:wq

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] How to improve db performance with $7K?

2005-04-14 Thread Rosser Schwarz
while you weren't looking, Kevin Brown wrote:

[reordering bursty reads]

> In other words, it's a corner case that I strongly suspect
> isn't typical in situations where SCSI has historically made a big
> difference.

[...]

> But I rather doubt that has to be a huge penalty, if any.  When a
> process issues an fsync (or even a sync), the kernel doesn't *have* to
> drop everything it's doing and get to work on it immediately.  It
> could easily gather a few more requests, bundle them up, and then
> issue them.

To make sure I'm following you here, are you or are you not suggesting
that the kernel could sit on -all- IO requests for some small handful
of ms before actually performing any IO to address what you "strongly
suspect" is a "corner case"?

/rls

-- 
:wq

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Preventing query from hogging server

2005-03-24 Thread Rosser Schwarz
while you weren't looking, Matthew Nuzum wrote:

> select accountid, min(atime) as atime, sessionid from usage_access
> group by accountid,sessionid;

Try something along the lines of:

select ua.accountid
 , (select atime
  from usage_access
 where sessionid = ua.sessionid
   and accountid = ua.accountid
 order by atime asc
 limit 1
   ) as atime
 , ua.sessionid
  from usage_access ua
 group by accountid
 , sessionid

min() and max() currently do table scans, which, on large tables, or
even moderately sized tables with large numbers of accounts/sessions,
can add up.  You'll need to replace asc with desc in the subquery for
the max() version.

This form cheats a bit and uses the index to find the highest and
lowest values, provided you've created the appropriate indices.

This is, IIRC, in the FAQ.

/rls

-- 
:wq

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] which dual-CPU hardware/OS is fastest for PostgreSQL?

2005-01-11 Thread Rosser Schwarz
while you weren't looking, Greg Stark wrote:

> Back in the day, we used to have problems with our 1U dual pentiums. We
> attributed it to heat accelerating failure. I would fear four opterons in 1U
> would be damned hard to cool effectively, no?

Opterons actually run pretty coolly, comparatively.  If it's a big
concern, you can always drop a few more clams for the low-voltage
versions -- available in 1.4 and 2.0 GHz flavors, and of which I've
heard several accounts of their being run successfully /without/
active cooling -- or punt until later this year, when they ship
Winchester core Opterons (90nm SOI -- the current, uniprocessor
silicon fabbed with that process has some 3W heat dissipation idle,
~30W under full load; as a point of contrast, current 90nm P4s have
34W idle dissipation, and some 100W peak).

We have a number of 1U machines (P4s, I believe), and a Dell blade
server (six or seven P3 machines in a 3U cabinet) as our webservers,
and none of them seem to have any trouble with heat.  That's actually
a bigger deal than it might first seem, given how frighteningly
crammed with crap our machine room is.

/rls

-- 
:wq

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] which dual-CPU hardware/OS is fastest for PostgreSQL?

2005-01-11 Thread Rosser Schwarz
while you weren't looking, Merlin Moncure wrote:

> 2 way or 4 way Opteron depending on needs (looking on a price for 4-way?
> Go here: http://www.swt.com/qo3.html). 

Try also the Appro 1U 4-way Opteron server, at:
http://www.appro.com/product/server_1142h.asp

I specced a 4-way 842 (1.6 GHz: little to none of our db work is CPU
bound; there's just a lot of it going on at once) with 32G core for
within delta of what SWT wants /just/ for the 32G -- the price of the
box itself and anything else atop that.  Stepping up to a faster CPU
should increase the cost directly in line with the retail price for
the silicon.

We haven't yet ordered the machine (and the quote was from early last
month, so their prices will have fluctuated) and consequently, I can't
comment on their quality.  Their default warranty is three years,
"rapid exchange", though, and they offer on-site service for only
nominally more, IIRC.  Some slightly more than cursory googling hasn't
turned up anything overly negative, either.

As a 1U, the box has no appreciable storage of its own but we're
shopping for a competent, non bank-breaking fibre setup right now, so
that's not an issue for our situation.  While on the subject, anyone
here have anything to say about JMR fibre raid cabinets? 
(Fibre-to-fibre, not fibre-to-SATA or the like.)

/rls

-- 
:wq

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft

2005-01-10 Thread Rosser Schwarz
while you weren't looking, Gary Doades wrote:

> The .NET Runtime will be a part of the next MS SQLServer engine.

It won't be long before someone writes a procedural language binding
to PostgreSQL for Parrot [1].  That should offer us a handful or six
more languages that can be used, including BASIC, Ruby and Scheme,
Perl (5 and 6), Python and TCL for more or less free, and ... wait for
it, BrainF***.

IIRC, people have talked about porting C# to Parrot, as well.

/rls

[1] The new VM for Perl 6, &c: http://www.parrotcode.org

-- 
:wq

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] create index with substr function

2004-10-20 Thread Rosser Schwarz
while you weren't looking, Ray wrote:

> CREATE INDEX idx_doc_substr_doc_urn ON doc USING btree (SUBSTR(doc_urn,10));

CREATE INDEX idx_doc_substr_doc_urn ON doc USING btree ((SUBSTR(doc_urn,10)));

You need an additional set of parens around the SUBSTR() call.

/rls

-- 
:wq

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Understanding explains

2004-10-11 Thread Rosser Schwarz
while you weren't looking, Francisco Reyes wrote:

> Is there any disadvantage of having the enable_seqscan off?

Plenty.

The planner will choose whichever plan looks "cheapest", based on the
information it has available (table size, statistics, &c).  If a
sequential scan looks cheaper, and in your case above it clearly is,
the planner will choose that query plan.  Setting enable_seqscan =
false doesn't actually disable sequential scans; it merely makes them
seem radically more expensive to the planner, in hopes of biasing its
choice towards another query plan.  In your case, that margin made an
index scan look less expensive than sequential scan, but your query
runtimes clearly suggest otherwise.

In general, it's best to let the planner make the appropriate choice
without any artificial constraints.  I've seen pathalogical cases
where the planner makes the wrong choice(s), but upon analysis,
they're almost always attributable to poor statistics, long
un-vacuumed tables, &c.

/rls

-- 
:wq

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] finding a max value

2004-07-07 Thread Rosser Schwarz
On Fri, 02 Jul 2004 20:50:26 +0200, Edoardo Ceccarelli <[EMAIL PROTECTED]> wrote:

> This is the query:
> select max(KA) from annuncio

> wasn't supposed to do an index scan? it takes about 1sec to get the result.

> TIP 5: Have you checked our extensive FAQ?

I believe this is a FAQ.

See: http://www.postgresql.org/docs/faqs/FAQ.html#4.8

Try "select KA from annuncio order by KA desc limit 1;"

/rls

-- 
:wq

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] PostgreSQL caching

2004-05-21 Thread Rosser Schwarz
while you weren't looking, Vitaly Belman wrote:

> So, I guess it has to do with PostgreSQL caching.. But how exactly
> does it work? What does it cache? And how can I control it?

PostgreSQL uses the operating system's disk cache.  You can hint to
the postmaster how much memory is available for caching with the
effective_cache_size directive in your postgresql.conf.  If you're
running a *nix OS, you can find this by watching `top` for a while;
in the header, there's a "cached" value (or something to that effect).
Watching this value, you can determine a rough average and set your
effective_cache_size to that rough average, or perhaps slightly less.
I'm not sure how to get this value on Windows.

Pgsql uses the OS's disk cache instead of its own cache management
because the former is more likely to persist.  If the postmaster
managed the cache, as soon as the last connection died, the memory
allocated for caching would be released, and all the cached data
would be lost.  Relying instead on the OS to cache data means that,
whether or not there's a postmaster, so long as there has been one,
there'll be some data cached.

You can "prepopulate" the OS disk cache by periodically running a
handful of SELECT queries that pull from your most commonly accessed
tables in a background process.  (A good way of doing that is simply
to run your most commonly executed SELECTS.)  Those queries should
take the performance hit of fetching from disk, while your regular
queries hit the cache.

/rls

--
Rosser Schwarz
Total Card, Inc.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Rosser Schwarz
while you weren't looking, Gary Doades wrote:

> Recently  I have been looking at raw performance (CPU, IO) 
> rather than the plans. I have some test queries that (as far
> as I can determine) use the same access plans on PostgreSQL
> and SQLServer. Getting to the detail, an index scan of an
> index on a integer column (222512 rows) takes 60ms on
> SQLServer and  540ms on PostgreSQL.

After a recent power outage, I had the opportunity to watch both
PostgreSQL and MS SQL come back from forced shutdowns (clean,
though there were active connections, in one case a bulk insert).
PostgreSQL was available and responsive as soon as the postmaster
had started.  MS SQL, on the other hand, took the better part of
an hour to become remotely usable again -- on a radically faster
machine (Dell 6650, versus the 6450 we run PostgreSQL on).

Digging a bit, I noted that once MS SQL was up again, it was
using nearly 2GB main memory even when more or less idle.  From
this, and having observed the performance differences between
the two, I'm left with little alternative but to surmise that
part of MS SQL's noted performance advantage [1] is due to its
forcibly storing its indices in main memory.  Its startup lag
(during which it was utterly unusable; even SELECTs blocked)
could be accounted for by reindexing the tables. [2]

Granted, this is only a hypothesis, is rather unverifyable, and
probably belongs more on ADVOCACY than it does PERFORM, but it
seemed relevant.

It's also entirely possible your indices are using inaccurate
statistical information.  Have you ANALYZEd recently?

/rls

[1] Again, at least in our case, the comparison is entirely
invalid, as MS SQL gets a hell of a lot more machine than
PostgreSQL.  Even so, for day-to-day work and queries, even
our DBA, an until-recently fervent MS SQL advocate can't
fault PostgreSQL's SELECT, INSERT or DELETE performance.
We still can't get UPDATEs (at least bulk such) to pass
muster.

[2] This is further supported by having observed MS SQL run a
"recovery process" on databases that were entirely unused,
even for SELECT queries, at the time of the outage.  The
only thing it might conceivably need to recover on them
    is in-memory indices that were lost when power was lost.

--
Rosser Schwarz
Total Card, Inc.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] atrocious update performance

2004-04-05 Thread Rosser Schwarz
while you weren't looking, Kevin Barnard wrote:

> Have you added indexes for the custid column for tables 
> account.acct accunt.orgacct and note?

They were indexed in the original case, yes.  There was no
need to index them in today's test case, as that was done
purely in attempt to rule in or out foreign key validation
as the cause of the performance hit.  No foreign keys that
might be validated, no need to index the foreign key columns.

> I haven't followed the entire thread but it you have 
> cascading FK on those tables without an index on the
> column that could cause your delay.

The issue is that the foreign keys are being validated at
all, when the column being referenced by those foreign keys
(account.cust.custid) is never touched.

Regardless of whether or not the referencing columns are
indexed, validating them at all--in this specific case--is
broken.  The column they refer to is never touched; they
should remain utterly ignorant of whatever happens to other
columns in the same row.

/rls

--
Rosser Schwarz
Total Card, Inc.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] atrocious update performance

2004-04-05 Thread Rosser Schwarz
After deinstalling and scrubbing PostgreSQL from my server and doing
a clean build using a vanilla 7.4.2 tree, I'm rather more confident
that foreign key validation is at cause in my performance problems.

I recreated my schemas and ran the original update, with foreign
keys referring to the identity column of the target table.  The
update took roughly two days, as I'd predicted based on my analysis
of the previous installation.  (I can't say how long with certainty,
beyond that it finished some time between when I left work one night
and came in the next morning, the second day after starting the
query.)  I'm not sure what was wrong with the previous install, such
that the update took several days; two-ish days is long enough.

Just this morning, however, I created a copy of the target table (all
4.7M rows), with absolutely no foreign keys referring to it, and ran
the update against the copy.  That update took 2300 seconds.  The
join columns were indexed in both cases.

I'm in the process of migrating the machine to run kernel 2.6.4,
following the thread started by Gary, though I suspect that the
kernel revision is moot with respect to whether or not foreign keys
are being incorrectly validated.  I can keep the 2.4 kernel and
modules around to run using the current versions for testing
purposes, though any such work would necessarily be off-hours.

Please advise of anything I can do to help narrow down the specific
cause of the issue; I know just enough C to be mildly dangerous.

/rls

--
Rosser Schwarz
Total Card, Inc.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] atrocious update performance

2004-03-24 Thread Rosser Schwarz
Greg Spiegelberg wrote:

> > Will advise.

After creating 100, 1K, 10K, 100K and 1M-row subsets of account.cust and
the corresponding rows/tables with foreign key constraints referring to
the table, I'm unable to reproduce the behavior at issue.

explain analyze looks like the following, showing the query run with the
join column indexed and not, respectively:

# explain analyze update test.cust100 set prodid = tempprod.prodid,
subprodid = tempprod.subprodid where origid = tempprod.debtid;
-- with index
QUERY PLAN
---
 Merge Join  (cost=0.00..25.64 rows=500 width=220) (actual
 time=0.241..13.091 rows=100 loops=1)
   Merge Cond: (("outer".origid)::text = ("inner".debtid)::text)
   ->  Index Scan using ix_origid_cust100 on cust100  (cost=0.00..11.50
   rows=500 width=204) (actual time=0.125..6.465 rows=100 loops=1)
   ->  Index Scan using ix_debtid on tempprod  (cost=0.00..66916.71
   rows=4731410 width=26) (actual time=0.057..1.497 rows=101 loops=1)
 Total runtime: 34.067 ms
(5 rows)

-- without index
QUERY PLAN
--
 Merge Join  (cost=7.32..16.71 rows=100 width=220) (actual
 time=4.415..10.918 rows=100 loops=1)
   Merge Cond: (("outer".debtid)::text = "inner"."?column22?")
   ->  Index Scan using ix_debtid on tempprod  (cost=0.00..66916.71
   rows=4731410 width=26) (actual time=0.051..1.291 rows=101 loops=1)
   ->  Sort  (cost=7.32..7.57 rows=100 width=204) (actual
   time=4.311..4.450 rows=100 loops=1)
 Sort Key: (cust100.origid)::text
 ->  Seq Scan on cust100  (cost=0.00..4.00 rows=100 width=204)
 (actual time=0.235..2.615 rows=100 loops=1)
 Total runtime: 25.031 ms
(7 rows)

With the join column indexed, it takes roughly .32ms/row on the first
four tests (100.. 100K), and about .48ms/row on 1M rows.  Without the
index, it runs 100 rows @ .25/row, 1000 @ .26, 1 @ .27, 10 @
.48 and .5 @ 1M rows.

In no case does the query plan reflect foreign key validation.  Failing
any other suggestions for diagnosis in the soon, I'm going to nuke the
PostgreSQL install, scour it from the machine and start from scratch.
Failing that, I'm going to come in some weekend and re-do the machine.

> Problem is when I recreate the indexes and add the constraints back
> on ORIG I end up with the same long running process.  The original
> UPDATE runs for about 30 minutes on a table of 400,000 with the
> WHERE matching about 70% of the rows.  The above runs for about 2
> minutes without adding the constraints or indexes however adding the
> constraints and creating the dropped indexes negates any gain.

Is this a frequently-run update?

In my experience, with my seemingly mutant install, dropping indices
and constraints to shave 14/15 off the update time would be worth the
effort.  Just script dropping, updating and recreating into one large
transaction.  It's a symptom-level fix, but re-creating the fifteen
indices on one of our 5M row tables doesn't take 28 minutes, and your
hardware looks to be rather less IO and CPU bound than ours.  I'd also
second Tom's suggestion of moving to 7.4.

/rls


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] atrocious update performance

2004-03-23 Thread Rosser Schwarz
Greg Spiegelberg wrote:

> I've been following this thread closely as I have the same problem
> with an UPDATE.  Everything is identical here right down to the
> strace output.

> Has anyone found a workaround or resolved the problem?  If not,
> I have test systems here which I can use to help up test and explore.

I'm still gathering data.  The explain analyze I'd expected to finish
Thursday afternoon hasn't yet.  I'm going to kill it and try a few
smaller runs, increasing in size, until the behavior manifests.

Will advise.

/rls


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] atrocious update performance

2004-03-17 Thread Rosser Schwarz
I wrote:

> Regardless, something thinks it's still there.  Is there any way that
> it is, and that I've somehow been running 7.3.2 all along?  `which
> psql`, &c show the bindir from my configure, but I'm not sure that's
> sufficient.

The weird thing is that I know I never built 7.3.anything with 32K
BLCKSZ, never built 7.3.anything at all.  If 7.3 were installed, would
it have any problem reading a 7.4 cluster?

/rls

--
Rosser Schwarz
Total Card, Inc.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] atrocious update performance

2004-03-17 Thread Rosser Schwarz
while you weren't looking, Tom Lane wrote:

> I hate to break it to you, but that most definitely means you are
> running with BLCKSZ = 32K.  Whatever you thought you were rebuilding
> didn't take effect.

I saw that and thought so.  The other day, I was rooting around in
$PGDATA, and saw a lot of 32K files and wondered for a moment, too.
If that's the case, though, that's ... weird.
 
> I agree that the larger blocksize is of dubious value.  People used to
> do that back when the blocksize limited your row width, but these days
> I think you're probably best off with the standard 8K.

I'd been experimenting with larger blocksizes after we started seeing
a lot of seqscans in query plans.  32K proved quickly that it hurts
index scan performance, so I was--I thought--trying 16.

> If the big EXPLAIN ANALYZE is still running, would you get a dump of its
> open files (see "lsof -p") and correlate those with the tables being
> used in the query?  I'm trying to figure out what the different writes
> and reads represent.

It looks rather like it's hitting the foreign keys; one of the files
that shows is the account.note table, which has an fk to the pk of the
table being updated.  The file's zero size, but it's open.  The only
reason it should be open is if foreign keys are being checked, yes?

You'd said that the foreign keys were only checked if last-change is
after current-query, as of 7.3.4, yes?  `rpm -qa postgresql` comes up
with 7.3.2-3, which makes no sense, 'cos I know I removed it before
installing current; I remember making sure no-one was using pg on this
machine, and remember saying rpm -e.

Regardless, something thinks it's still there.  Is there any way that
it is, and that I've somehow been running 7.3.2 all along?  `which
psql`, &c show the bindir from my configure, but I'm not sure that's
sufficient.

How would I tell?  I don't remember any of the binaries having a
--version argument.

/rls

--
Rosser Schwarz
Total Card, Inc.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] atrocious update performance

2004-03-17 Thread Rosser Schwarz
while you weren't looking, Tom Lane wrote:

> Hm.  It looks like you mistakenly traced psql rather than the backend,
> but since the delay went away we wouldn't have learned 
> anything anyhow.
> Have you got any idea what conditions may have changed between seeing
> delay and not seeing delay?

None, offhand.  I have noticed that when a large query is running,
the machine can sporadically just freeze--or at least take inordinately
long for some other process, be it top or ls, another query, or whatever
to start.  Nothing looks saturated when it happens, and, while you can
count on it to happen, it's not consistent enough to reproduce.

> This is pretty odd too.  It looks like it's doing checkpoints every so
> often (look for the writes to pg_control), which a backend engaged in
> a long-running query surely ought not be doing.  Need to think about
> why that might be...

Does the fact that all the reads and writes are 32K mean anything out
of the ordinary?  $PGSRC/src/include/pg_config_manual.h has BLCKSZ
#defined to 16384.  I was running previously with a 32K BLCKSZ, but
that turned out to be rather sub-optimal for as heavily indexed as our
tables are.  I've dumped and rebuilt several times since then.

/rls

--
Rosser Schwarz
Total Card, Inc.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] atrocious update performance

2004-03-17 Thread Rosser Schwarz
{
SIG_DFL}, 8) = 0 <0.12>
 0.000136 rt_sigaction(SIGTSTP, {0x401ec910, [], SA_RESTORER,
0x420276f8}, {
SIG_DFL}, 8) = 0 <0.13>
 0.000136 rt_sigaction(SIGTTOU, {0x401ec910, [], SA_RESTORER,
0x420276f8}, {
SIG_DFL}, 8) = 0 <0.12>
 0.000136 rt_sigaction(SIGTTIN, {0x401ec910, [], SA_RESTORER,
0x420276f8}, {
SIG_DFL}, 8) = 0 <0.13>
 0.000212 rt_sigaction(SIGWINCH, {0x401ec9d0, [], SA_RESTORER,
0x420276f8},
{SIG_DFL}, 8) = 0 <0.12>
 0.000188 write(1, "\r\rtci=# \rtci=# ", 15) = 15 <0.19>
 0.000112 rt_sigprocmask(SIG_BLOCK, NULL, [33], 8) = 0 <0.12>
 0.000110 read(0, "\\", 1)  = 1 <18.366895>
18.368284 write(1, "\rtci=# \\\rtci=# \\", 16) = 16 <0.29>
 0.000134 rt_sigprocmask(SIG_BLOCK, NULL, [33], 8) = 0 <0.13>
 0.000125 read(0, "q", 1)   = 1 <0.117572>
 0.117719 write(1, "\rtci=# \\q\rtci=# \\q", 18) = 18 <0.20>
 0.000118 rt_sigprocmask(SIG_BLOCK, NULL, [33], 8) = 0 <0.12>
 0.000107 read(0, "\r", 1)  = 1 <1.767409>
 1.767604 write(1, "\n", 1) = 1 <0.32>
 0.000140 rt_sigprocmask(SIG_BLOCK, [INT], [33], 8) = 0 <0.13>
 0.000138 ioctl(0, SNDCTL_TMR_STOP, {B38400 opost isig icanon echo ...})
= 0
 <0.30>
 0.000143 rt_sigprocmask(SIG_SETMASK, [33], NULL, 8) = 0 <0.13>
 0.000111 rt_sigaction(SIGINT, {0x804d404, [], SA_RESTORER|SA_RESTART,
0x420
276f8}, {0x401ec910, [], SA_RESTORER, 0x420276f8}, 8) = 0 <0.14>
 0.000153 rt_sigaction(SIGTERM, {SIG_DFL}, {0x401ec910, [], SA_RESTORER,
0x4
20276f8}, 8) = 0 <0.13>
 0.000134 rt_sigaction(SIGQUIT, {SIG_DFL}, {0x401ec910, [], SA_RESTORER,
0x4
20276f8}, 8) = 0 <0.13>
 0.000134 rt_sigaction(SIGALRM, {SIG_DFL}, {0x401ec910, [], SA_RESTORER,
0x4
20276f8}, 8) = 0 <0.13>
 0.000133 rt_sigaction(SIGTSTP, {SIG_DFL}, {0x401ec910, [], SA_RESTORER,
0x4
20276f8}, 8) = 0 <0.13>
 0.000134 rt_sigaction(SIGTTOU, {SIG_DFL}, {0x401ec910, [], SA_RESTORER,
0x4
20276f8}, 8) = 0 <0.13>
 0.000134 rt_sigaction(SIGTTIN, {SIG_DFL}, {0x401ec910, [], SA_RESTORER,
0x4
20276f8}, 8) = 0 <0.12>
 0.000134 rt_sigaction(SIGWINCH, {SIG_DFL}, {0x401ec9d0, [],
SA_RESTORER, 0x
420276f8}, 8) = 0 <0.14>
 0.001271 rt_sigaction(SIGINT, {SIG_DFL}, {0x804d404, [],
SA_RESTORER|SA_RES
TART, 0x420276f8}, 8) = 0 <0.13>
 0.000532 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0 <0.14>
 0.000145 send(3, "X\0\0\0\4", 5, 0) = 5 <0.28>
 0.000126 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0 <0.13>
 0.000140 close(3)  = 0 <0.33>
 0.000147 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_DFL}, 8) = 0 <0.13>
 0.000197 open("/var/lib/pgsql/.psql_history", O_WRONLY|O_CREAT|O_TRUNC,
060
0) = 3 <0.000168>
 0.000694 write(3, "\\d payment.batch\nalter sequence "..., 16712) =
16712 <
0.000209>
 0.000311 close(3)  = 0 <0.57>
 0.055587 munmap(0x4003, 4096)  = 0 <0.32>
 0.000130 exit_group(0) = ?

/rls

--
Rosser Schwarz
Total Card, Inc. 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] atrocious update performance

2004-03-16 Thread Rosser Schwarz
while you weren't looking, Tom Lane wrote:

[trace]

`strace -p 21882` run behind the below query and plan ... below that.

# explain update account.cust set prodid = tempprod.prodid, subprodid =
tempprod.subprodid where origid = tempprod.debtid;
  QUERY PLAN
-
 Merge Join  (cost=0.00..232764.69 rows=4731410 width=252)
   Merge Cond: (("outer".origid)::text = ("inner".debtid)::text)
   ->  Index Scan using ix_origid on cust  (cost=0.00..94876.83
   rows=4731410 width=236)
   ->  Index Scan using ix_debtid on tempprod  (cost=0.00..66916.71
   rows=4731410 width=26)
(4 rows)

--

recv(9, "Q\0\0\0}explain update account.cust"..., 8192, 0) = 126
gettimeofday({1079482151, 106228}, NULL) = 0
brk(0)  = 0x82d9000
brk(0x82db000)  = 0x82db000
open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 8
_llseek(8, 212402176, [212402176], SEEK_SET) = 0
write(8, "\342\1\0\0\0\314\374\6\24\0\0\0\214\7pG\360\177\1\200\320"...,
32768) = 32768
close(8)= 0
open("/var/lib/pgsql/data/base/495616/16635", O_RDWR|O_LARGEFILE) = 8
read(8, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 10
_llseek(10, 60817408, [60817408], SEEK_SET) = 0
write(10, "\342\1\0\0`\334\5\7\24\0\0\0t\0010x\360\177\1\200\330\377"...,
32768) = 32768
close(10)   = 0
read(8, "\334\1\0\0h\217\270n\24\0\0\0H\0H|[EMAIL PROTECTED]"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834165", O_RDWR|O_LARGEFILE) = 10
_llseek(10, 130777088, [130777088], SEEK_SET) = 0
write(10, "\342\1\0\0<\341\7\7\24\0\0\0004\t0I\360\177\1\200\330\377"...,
32768) = 32768
close(10)   = 0
open("/var/lib/pgsql/data/base/495616/16595", O_RDWR|O_LARGEFILE) = 10
read(10, "[EMAIL PROTECTED]"...,
32768) = 32768
open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 11
_llseek(11, 145915904, [145915904], SEEK_SET) = 0
write(11, "\342\1\0\0\300\350\n\7\24\0\0\0\224\6\310Z\360\177\1\200"...,
32768) = 32768
close(11)   = 0
open("/var/lib/pgsql/data/base/495616/16614", O_RDWR|O_LARGEFILE) = 11
read(11, "\0\0\0\0\24\231P\306\16\0\0\0\24\0\360\177\360\177\1\200"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 12
_llseek(12, 148570112, [148570112], SEEK_SET) = 0
write(12, "\342\1\0\0\274\365\22\7\24\0\0\0X\3\234o\360\177\1\200"...,
32768)
= 32768
close(12)   = 0
_llseek(11, 98304, [98304], SEEK_SET)   = 0
read(11, "\0\0\0\0\24\231P\306\16\0\0\0\34\0\234\177\360\177\1\200"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 12
_llseek(12, 251789312, [251789312], SEEK_SET) = 0
write(12, "\342\1\0\0l\366\23\7\24\0\0\0\364\10\260J\360\177\1\200"...,
32768)
= 32768
close(12)   = 0
_llseek(11, 32768, [32768], SEEK_SET)   = 0
read(11, "\340\1\0\0\324\231\273\241\24\0\0\0\234\5\330\26\360\177"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834165", O_RDWR|O_LARGEFILE) = 12
_llseek(12, 117309440, [117309440], SEEK_SET) = 0
write(12, "\342\1\0\0d\36)\7\24\0\0\\tHI\360\177\1\200\330\377"...,
32768)
= 32768
close(12)   = 0
open("/var/lib/pgsql/data/base/495616/1259", O_RDWR|O_LARGEFILE) = 12
_llseek(12, 32768, [32768], SEEK_SET)   = 0
read(12, "\334\1\0\0\324v-p\24\0\0\\3\304\3\0\200\1\200<\377"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834173", O_RDWR|O_LARGEFILE) = 13
_llseek(13, 247824384, [247824384], SEEK_SET) = 0
write(13, "\342\1\0\0h *\7\24\0\0\0\204\4dm\360\177\1\200\340\377"...,
32768)
= 32768
close(13)   = 0
open("/var/lib/pgsql/data/base/495616/16613", O_RDWR|O_LARGEFILE) = 13
read(13, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 14
_llseek(14, 204472320, [204472320], SEEK_SET) = 0
write(14, "\342\1\0\0\314\272:\7\24\0\0\0\324\t\354K\360\177\1\200"...,
32768)
= 32768
close(14)   = 0
read(13, "\340\1\0\0X\231\273\241\24\0\0\0\370\6Dk\360\177\1\200"..., 32768)
=
32768
open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 14
_llseek(14, 152010752, [152010752], SEEK_SET) = 0
write(14, "\342\1\0\0p\277<\7\24\0\0\0\364\n\220I\360\177\1\200\334"...,
32768) = 32768
close(14)   = 0
open("/var/lib/pgsql/data/base/495616/16610", O_RDWR|O_LARGEFILE) = 14
read(14, "\0\0\0\0\10\317\27\t\16\0\0\0\24\0\360\177\360\177\1\200"...,
32768)
= 32768
open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 15
_llseek(15, 86441984, [86441984], SEEK_SET) = 0
write(15, "\342\1\0\0\330B?\7\24\0\0\0\370\6 N\360\177\1\2

Re: [PERFORM] atrocious update performance

2004-03-16 Thread Rosser Schwarz
while you weren't looking, Tom Lane wrote:

> EXPLAIN won't tell you anything about triggers that might get fired
> during the UPDATE, so it's not much help for investigating possible
> FK performance problems.  EXPLAIN ANALYZE will give you some indirect
> evidence: the difference between the total query time and the total time
> reported for the topmost plan node represents the time spent running
> triggers and physically updating the tuples.  I suspect we are going
> to see a big difference.

It's still running.

> It took 20 seconds to EXPLAIN?  That's pretty darn odd in itself.

It struck me, too.

> I'm starting to think there must be something quite whacked-out about
> your installation, but I haven't got any real good ideas about what.

Built from source.  configure arguments:

./configure --prefix=/var/postgresql --bindir=/usr/bin
--enable-thread-safety --with-perl --with-python --with-openssl
--with-krb5=/usr/kerberos

I can answer more specific questions; otherwise, I'm not sure what to
look for, either.  If we could take the machine out of production (oh,
hell; I think I just volunteered myself for weekend work) long enough
to reinstall everything to get a fair comparison...

So far as I know, though, it's a more or less stock Red Hat.  2.4.20-
something.

> (I'm assuming of course that there weren't a ton of other jobs eating
> CPU while you tried to do the EXPLAIN.)

CPU's spiked sopradically, which throttled everything else, but it never
stays high.  top shows the current explain analyze running between 50-
ish% and negligible.  iostat -k 3 shows an average of 3K/sec written, for
a hundred-odd tps.

I can't get any finer-grained than that, unfortunately; the machine was
handed to me with a single, contiguous filesystem, in production use.

> [ thinks for awhile... ]  The only theory that comes to mind
> for making
> the planner so slow is oodles of dead tuples in pg_statistic.  Could I
> trouble you to run
>   vacuum full verbose pg_statistic;
> and send along the output?

INFO:  vacuuming "pg_catalog.pg_statistic"
INFO:  "pg_statistic": found 215 removable, 349 nonremovable row versions
in 7 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 72 to 8132 bytes long.
There were 3 unused item pointers.
Total free space (including removable row versions) is 91572 bytes.
0 pages are or will become empty, including 0 at the end of the table.
7 pages containing 91572 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.71 sec.
INFO:  index "pg_statistic_relid_att_index" now contains 349 row versions
in 2 pages
DETAIL:  215 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_statistic": moved 120 row versions, truncated 7 to 5 pages
DETAIL:  CPU 0.03s/0.01u sec elapsed 0.17 sec.
INFO:  index "pg_statistic_relid_att_index" now contains 349 row versions
in 2 pages
DETAIL:  120 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  vacuuming "pg_toast.pg_toast_16408"
INFO:  "pg_toast_16408": found 12 removable, 12 nonremovable row versions
in 5 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 660 to 8178 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 91576 bytes.
2 pages are or will become empty, including 0 at the end of the table.
5 pages containing 91576 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.27 sec.
INFO:  index "pg_toast_16408_index" now contains 12 row versions in 2 pages
DETAIL:  12 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.05 sec.
INFO:  "pg_toast_16408": moved 10 row versions, truncated 5 to 3 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  index "pg_toast_16408_index" now contains 12 row versions in 2 pages
DETAIL:  10 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

Having never more than glanced at the output of "vacuum verbose", I
can't say whether that makes the cut for oodles.  My suspicion is no.

/rls

--
Rosser Schwarz
Total Card, Inc.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] atrocious update performance

2004-03-16 Thread Rosser Schwarz
teger |
 debtid  | character varying(10)   |
 acctno  | character varying(50)   |
Foreign-key constraints:
"$1" FOREIGN KEY (custid) REFERENCES account.cust(custid)
 ON UPDATE CASCADE ON DELETE RESTRICT

And the table we were joining to get the new values for prodid and
subprodid:

# \d tempprod
Table "public.tempprod"
  Column   | Type  | Modifiers
---+---+---
 debtid    | character varying(10) | not null
 pool  | character varying(10) | not null
 port  | character varying(10) | not null
 subprodid | bigint|
 prodid| bigint|
Indexes:
"ix_debtid" btree (debtid)

/rls

--
Rosser Schwarz
Total Card, Inc.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] atrocious update performance

2004-03-15 Thread Rosser Schwarz
> You can create a new table using 'create table as' to produce your
> target results. This is real fast ...
> I often see 2 orders of magnitude improvement doing this, and no
> need to vacuum.

Indeed:

"Query returned successfully with no result in 582761 ms."

Though I must say, ten minutes is nominally more than two orders of
mangitude performance improvement, versus several days.

Many thanks, Aaron.

/rls

--
Rosser Schwarz
Total Card, Inc.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] atrocious update performance

2004-03-15 Thread Rosser Schwarz
> > # explain update account.cust set prodid = tempprod.prodid
> > where tempprod.did = origid;

> >  Merge Join  (cost=0.00..232764.69 rows=4731410 width=252)
> >Merge Cond: (("outer".origid)::text = ("inner".did)::text)
> >->  Index Scan using ix_origid on cust  (cost=0.00..94876.83
> >rows=4731410 width=244)
> >->  Index Scan using ix_did on tempprod  (cost=0.00..66916.71
> >rows=4731410 width=18)
 
> I'm going to hazard a guess and say you have a number of foreign keys
> that refer to account.cust.prodid? This is probably the time consuming
> part -- perhaps even a missing index on one of those keys 
> that refers to
> this field.

Actually, there are no foreign keys to those columns.  Once they're
populated, I'll apply a foreign key constraint and they'll refer to the
appropriate row in the prod and subprod tables, but nothing will 
reference account.cust.[sub]prodid.  There are, of course, several foreign
keys referencing account.cust.custid.

> Going the other way should be just as good for your purposes, and much
> faster since you're not updating several foreign key'd fields bound to
> account.cust.prodid.

> UPDATE tempprod.prodid = prodid
>   FROM account.cust
>  WHERE temprod.did = cust.origid;

Not quite. Without this update, acount.cust.[sub]prodid are null.  The
data was strewn across multiple tables in MS SQL; we're normalizing it
into one, hence the need to populate the two columns independently.

/rls

--
Rosser Schwarz
Total Card, Inc. 


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] atrocious update performance

2004-03-15 Thread Rosser Schwarz
rows=4731410 width=244)
   ->  Index Scan using ix_did on tempprod  (cost=0.00..66916.71
   rows=4731410 width=18)

The relevant bits from my postgreql.conf (note, we built with a BLCKSZ
of 16K):

shared_buffers = 4096
sort_mem = 32768
vacuum_mem = 32768
wal_buffers = 16384
checkpoint_segments = 64
checkpoint_timeout = 1800
checkpoint_warning = 30
commit_delay = 5
effective_cache_size = 131072

Any advice, suggestions or comments of the "You bleeding idiot, why do
you have frob set to x?!" sort welcome.  Unfortunately, if we can't
improve this, significantly, the powers what be will probably pass
on PostgreSQL, even though everything we've done so far--with this
marked exception--performs pretty spectacularly, all told.

/rls

--
Rosser Schwarz
Total Card, Inc.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])