Re: [HACKERS] Analyze on large changes...

2002-06-11 Thread Bruce Momjian

Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> >> Seems like a pretty serious (not to say fatal) objection to me.  Surely
> >> you can fix that.
> 
> > OK, suggestions.  I know CommandCounterIncrement will not help.  Should
> > I do more pfree'ing?
> 
> No, retail pfree'ing is not a maintainable solution.  I was thinking
> more along the lines of a MemoryContextResetAndDeleteChildren() on
> whatever the active context is.  If that doesn't work straight off,
> you might have to create a new working context and switch into it
> before calling the analyze subroutine --- then deleting that context
> would do the trick.

OK, how is this?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026


Index: src/backend/commands/analyze.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/analyze.c,v
retrieving revision 1.35
diff -c -r1.35 analyze.c
*** src/backend/commands/analyze.c  24 May 2002 18:57:55 -  1.35
--- src/backend/commands/analyze.c  12 Jun 2002 03:59:45 -
***
*** 156,170 
elevel = DEBUG1;
  
/*
-* Begin a transaction for analyzing this relation.
-*
-* Note: All memory allocated during ANALYZE will live in
-* TransactionCommandContext or a subcontext thereof, so it will all
-* be released by transaction commit at the end of this routine.
-*/
-   StartTransactionCommand();
- 
-   /*
 * Check for user-requested abort.  Note we want this to be inside a
 * transaction, so xact.c doesn't issue useless WARNING.
 */
--- 156,161 
***
*** 177,186 
if (!SearchSysCacheExists(RELOID,
  ObjectIdGetDatum(relid),
  0, 0, 0))
-   {
-   CommitTransactionCommand();
return;
-   }
  
/*
 * Open the class, getting only a read lock on it, and check
--- 168,174 
***
*** 196,202 
elog(WARNING, "Skipping \"%s\" --- only table or database 
owner can ANALYZE it",
 RelationGetRelationName(onerel));
relation_close(onerel, AccessShareLock);
-   CommitTransactionCommand();
return;
}
  
--- 184,189 
***
*** 211,217 
elog(WARNING, "Skipping \"%s\" --- can not process indexes, 
views or special system tables",
 RelationGetRelationName(onerel));
relation_close(onerel, AccessShareLock);
-   CommitTransactionCommand();
return;
}
  
--- 198,203 
***
*** 222,228 
strcmp(RelationGetRelationName(onerel), StatisticRelationName) == 0)
{
relation_close(onerel, AccessShareLock);
-   CommitTransactionCommand();
return;
}
  
--- 208,213 
***
*** 283,289 
if (attr_cnt <= 0)
{
relation_close(onerel, NoLock);
-   CommitTransactionCommand();
return;
}
  
--- 268,273 
***
*** 370,378 
 * entries we made in pg_statistic.)
 */
relation_close(onerel, NoLock);
- 
-   /* Commit and release working memory */
-   CommitTransactionCommand();
  }
  
  /*
--- 354,359 
Index: src/backend/commands/vacuum.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.226
diff -c -r1.226 vacuum.c
*** src/backend/commands/vacuum.c   24 May 2002 18:57:56 -  1.226
--- src/backend/commands/vacuum.c   12 Jun 2002 03:59:54 -
***
*** 110,117 
  
  
  /* non-export function prototypes */
- static void vacuum_init(VacuumStmt *vacstmt);
- static void vacuum_shutdown(VacuumStmt *vacstmt);
  static List *getrels(const RangeVar *vacrel, const char *stmttype);
  static void vac_update_dbstats(Oid dbid,
   TransactionId vacuumXID,
--- 110,115 
***
*** 160,165 
--- 158,165 
  void
  vacuum(VacuumStmt *vacstmt)
  {
+   MemoryContext anl_context,
+ old_context;
const char *stmttype = vacstmt->vacuum ? "VACUUM" : "ANALYZE";
List   *vrl,
   *cur;
***
*** 178,190 
 * user's transaction too, which would certainly not be the desired
 * behavior.
 */
!   if (IsTransactionBlock())
elog(ERRO

Re: [HACKERS] Analyze on large changes...

2002-06-11 Thread Tom Lane

Bruce Momjian <[EMAIL PROTECTED]> writes:
>> Seems like a pretty serious (not to say fatal) objection to me.  Surely
>> you can fix that.

> OK, suggestions.  I know CommandCounterIncrement will not help.  Should
> I do more pfree'ing?

No, retail pfree'ing is not a maintainable solution.  I was thinking
more along the lines of a MemoryContextResetAndDeleteChildren() on
whatever the active context is.  If that doesn't work straight off,
you might have to create a new working context and switch into it
before calling the analyze subroutine --- then deleting that context
would do the trick.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Analyze on large changes...

2002-06-11 Thread Bruce Momjian

Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > One change in this patch is that because analyze now runs in the outer
> > transaction, I can't clear the memory used to support each analyzed
> > relation.  Not sure if this is an issue.
> 
> Seems like a pretty serious (not to say fatal) objection to me.  Surely
> you can fix that.

OK, suggestions.  I know CommandCounterIncrement will not help.  Should
I do more pfree'ing?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Analyze on large changes...

2002-06-11 Thread Tom Lane

Bruce Momjian <[EMAIL PROTECTED]> writes:
> One change in this patch is that because analyze now runs in the outer
> transaction, I can't clear the memory used to support each analyzed
> relation.  Not sure if this is an issue.

Seems like a pretty serious (not to say fatal) objection to me.  Surely
you can fix that.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Analyze on large changes...

2002-06-11 Thread Bruce Momjian

Bruce Momjian wrote:
> Tom Lane wrote:
> > I tried to repeat this:
> > 
> > regression=# begin;
> > BEGIN
> > regression=# create table foo (f1 int);
> > CREATE
> > regression=# insert into foo [ ... some data ... ]
> > 
> > regression=# analyze foo;
> > ERROR:  ANALYZE cannot run inside a BEGIN/END block
> > 
> > This seems a tad silly; I can't see any reason why ANALYZE couldn't be
> > done inside a BEGIN block.  I think this is just a hangover from
> > ANALYZE's origins as part of VACUUM.  Can anyone see a reason not to
> > allow it?
> 
> The following patch allows analyze to be run inside a transaction.  
> Vacuum and vacuum analyze still can not be run in a transaction.

One change in this patch is that because analyze now runs in the outer
transaction, I can't clear the memory used to support each analyzed
relation.  Not sure if this is an issue.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Analyze on large changes...

2002-06-11 Thread Bruce Momjian

Tom Lane wrote:
> I tried to repeat this:
> 
> regression=# begin;
> BEGIN
> regression=# create table foo (f1 int);
> CREATE
> regression=# insert into foo [ ... some data ... ]
> 
> regression=# analyze foo;
> ERROR:  ANALYZE cannot run inside a BEGIN/END block
> 
> This seems a tad silly; I can't see any reason why ANALYZE couldn't be
> done inside a BEGIN block.  I think this is just a hangover from
> ANALYZE's origins as part of VACUUM.  Can anyone see a reason not to
> allow it?

The following patch allows analyze to be run inside a transaction.  
Vacuum and vacuum analyze still can not be run in a transaction.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026


Index: src/backend/commands/analyze.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/analyze.c,v
retrieving revision 1.35
diff -c -r1.35 analyze.c
*** src/backend/commands/analyze.c  24 May 2002 18:57:55 -  1.35
--- src/backend/commands/analyze.c  11 Jun 2002 21:38:51 -
***
*** 156,170 
elevel = DEBUG1;
  
/*
-* Begin a transaction for analyzing this relation.
-*
-* Note: All memory allocated during ANALYZE will live in
-* TransactionCommandContext or a subcontext thereof, so it will all
-* be released by transaction commit at the end of this routine.
-*/
-   StartTransactionCommand();
- 
-   /*
 * Check for user-requested abort.  Note we want this to be inside a
 * transaction, so xact.c doesn't issue useless WARNING.
 */
--- 156,161 
***
*** 177,186 
if (!SearchSysCacheExists(RELOID,
  ObjectIdGetDatum(relid),
  0, 0, 0))
-   {
-   CommitTransactionCommand();
return;
-   }
  
/*
 * Open the class, getting only a read lock on it, and check
--- 168,174 
***
*** 196,202 
elog(WARNING, "Skipping \"%s\" --- only table or database 
owner can ANALYZE it",
 RelationGetRelationName(onerel));
relation_close(onerel, AccessShareLock);
-   CommitTransactionCommand();
return;
}
  
--- 184,189 
***
*** 211,217 
elog(WARNING, "Skipping \"%s\" --- can not process indexes, 
views or special system tables",
 RelationGetRelationName(onerel));
relation_close(onerel, AccessShareLock);
-   CommitTransactionCommand();
return;
}
  
--- 198,203 
***
*** 222,228 
strcmp(RelationGetRelationName(onerel), StatisticRelationName) == 0)
{
relation_close(onerel, AccessShareLock);
-   CommitTransactionCommand();
return;
}
  
--- 208,213 
***
*** 283,289 
if (attr_cnt <= 0)
{
relation_close(onerel, NoLock);
-   CommitTransactionCommand();
return;
}
  
--- 268,273 
***
*** 370,378 
 * entries we made in pg_statistic.)
 */
relation_close(onerel, NoLock);
- 
-   /* Commit and release working memory */
-   CommitTransactionCommand();
  }
  
  /*
--- 354,359 
Index: src/backend/commands/vacuum.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.226
diff -c -r1.226 vacuum.c
*** src/backend/commands/vacuum.c   24 May 2002 18:57:56 -  1.226
--- src/backend/commands/vacuum.c   11 Jun 2002 21:38:59 -
***
*** 110,117 
  
  
  /* non-export function prototypes */
- static void vacuum_init(VacuumStmt *vacstmt);
- static void vacuum_shutdown(VacuumStmt *vacstmt);
  static List *getrels(const RangeVar *vacrel, const char *stmttype);
  static void vac_update_dbstats(Oid dbid,
   TransactionId vacuumXID,
--- 110,115 
***
*** 178,190 
 * user's transaction too, which would certainly not be the desired
 * behavior.
 */
!   if (IsTransactionBlock())
elog(ERROR, "%s cannot run inside a BEGIN/END block", stmttype);
  
/* Running VACUUM from a function would free the function context */
!   if (!MemoryContextContains(QueryContext, vacstmt))
elog(ERROR, "%s cannot be executed from a function", stmttype);
! 
/*
 * Send info

Re: [HACKERS] Analyze on large changes...

2002-05-01 Thread Tom Lane

Lincoln Yeoh <[EMAIL PROTECTED]> writes:
> My limited understanding of current behaviour is the search for a valid 
> row's tuple goes from older tuples to newer ones via forward links

No.  Each tuple is independently indexed and independently visited.
Given the semantics of MVCC I think that's correct --- after all, what's
dead to you is not necessarily dead to someone else.

There's been some speculation about trying to determine whether a dead
tuple is dead to everyone (essentially the same test VACUUM makes), and
if so propagating a marker back to the index tuple so that future
indexscans don't have to make useless visits to the heap tuple.
(I don't think we want to try to actually remove the index tuple; that's
VACUUM's job, and there are locking problems if we try to make it happen
in plain SELECTs.  But we could set a marker bit in the index entry.)
Under normal circumstances where all transactions are short, it wouldn't
be very long before a dead tuple could be marked, so this should fix the
performance issue.  Doing it in a reasonably clean fashion is the sticky
part.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Analyze on large changes...

2002-05-01 Thread Lincoln Yeoh

Hi Tom,

(Please correct me where I'm wrong)

Is it possible to reduce the performance impact of dead tuples esp when the 
index is used? Right now performance goes down gradually till we vacuum 
(something like a 1/x curve).

My limited understanding of current behaviour is the search for a valid 
row's tuple goes from older tuples to newer ones via forward links (based 
on some old docs[1]).

How about searching from newer tuples to older tuples instead, using 
backward links?

My assumption is newer tuples are more likely to be wanted than older ones 
- and so the number of tuples to search through will be less this way.

**If index update is ok.
If a tuple is inserted, the index record is updated to point to inserted 
tuple, and the inserted tuple is made to point to a previous tuple.
e.g.

Index-> old tuple->older tuple->oldest tuple
Index-> New tuple->old tuple->older tuple->oldest tuple

**if index update not desirable
Index points to first tuple (valid or not).

If a tuple is inserted, the first tuple is updated to point to inserted 
tuple, and the inserted tuple is made to point to a previous tuple.
e.g.

Index-> first tuple->old tuple->older tuple->oldest tuple
Index-> first tuple-> New tuple->old tuple->older tuple->oldest tuple

If this is done performance might not deterioriate as much when using index 
scans right? I'm not sure if a backward links would help for sequential 
scans, which are usually best done forward.

Regards,
Link.

[1] http://developer.postgresql.org/pdf/transactions.pdf
Tuple headers contain:
• xmin: transaction ID of inserting transaction
• xmax: transaction ID of replacing/ deleting transaction (initially NULL)
• forward link: link to newer version of same logical row, if any
Basic idea: tuple is visible if xmin is valid and xmax is not. "Valid"
means
"either committed or the current transaction".
If we plan to update rather than delete, we first add new version of row
to table, then set xmax and forward link in old tuple. Forward link will
be needed by concurrent updaters (but not by readers).

At 10:53 AM 5/1/02 -0400, Tom Lane wrote:
>estimates.  [ thinks... ]  Actually I think we might just be
>double-counting if we did.  The dead tuples surely should not count as
>part of the number of returned rows.  We already do account for the
>I/O effort to read them (because I/O is estimated based on the total




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Analyze on large changes...

2002-05-01 Thread Tom Lane

"Rod Taylor" <[EMAIL PROTECTED]> writes:
> Since dead, or yet to be visible tuples affect the plan that should be
> taken (until vacuum anyway) are these numbers reflected in the stats
> anywhere?

Analyze just uses SnapshotNow visibility rules, so it sees the same set
of tuples that you would see if you did a SELECT.

It might be interesting to try to estimate the fraction of dead tuples
in the table, though I'm not sure quite how to fold that into the cost
estimates.  [ thinks... ]  Actually I think we might just be
double-counting if we did.  The dead tuples surely should not count as
part of the number of returned rows.  We already do account for the
I/O effort to read them (because I/O is estimated based on the total
number of blocks in the table, which will include the space used by
dead tuples).  We're only missing the CPU time involved in the tuple
validity check, which is pretty small.

> Took an empty table, with a transaction I inserted a number of records
> and before comitting I ran analyze.

I tried to repeat this:

regression=# begin;
BEGIN
regression=# create table foo (f1 int);
CREATE
regression=# insert into foo [ ... some data ... ]

regression=# analyze foo;
ERROR:  ANALYZE cannot run inside a BEGIN/END block

This seems a tad silly; I can't see any reason why ANALYZE couldn't be
done inside a BEGIN block.  I think this is just a hangover from
ANALYZE's origins as part of VACUUM.  Can anyone see a reason not to
allow it?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



[HACKERS] Analyze on large changes...

2002-05-01 Thread Rod Taylor

I've run into an interesting issue.  A very long running transaction
doing data loads is getting quite slow.  I really don't want to break
up the transactions (and for now it's ok), but it makes me wonder what
exactly analyze counts.

Since dead, or yet to be visible tuples affect the plan that should be
taken (until vacuum anyway) are these numbers reflected in the stats
anywhere?

Took an empty table, with a transaction I inserted a number of records
and before comitting I ran analyze.

Analyze obviously saw the table as empty, as the pg_statistic row for
that relation doesn't exist.

Commit, then analyze again and the values were taken into account.


Certainly for large dataloads doing an analyze on the table after a
substantial (non-comitted) change has taken place would be worth while
for all elements involved.  An index scan on the visible records may
be faster, but on the actual tuples in the table a sequential scan
might be best.

Of course, for small transactions no-effect will be seen.  But this
may help with the huge dataloads, especially where triggers or
constraints are in effect.
--
Rod


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster