[PATCHES] bitmapscan changes patch review

2007-06-20 Thread Alexey Klyukin
Hi,

Here is a patch by Heikki Linnakangas with changes for amgetmulti index
access method to amgetbitmap, which returns all matching tuples at once.
The patch also introduces support for candidate matches. The original
post is here:
http://archives.postgresql.org/pgsql-patches/2007-03/msg00163.php

I've made minor changes to the patch:

- fixed all rejects when applying it to the current CVS head.
- fixed counting ntids in gistnext if TIDBitmap is not null.
- added missing expected/bitmapops.out

It passes all regression tests on my system.

Regards,
-- 
Alexey Klyukin http://www.commandprompt.com/
The PostgreSQL Company - Command Prompt, Inc.
diff -c -N -r ./src/backend/access/gin/ginget.c ../pgsql.new/src/backend/access/gin/ginget.c
*** ./src/backend/access/gin/ginget.c	2007-06-19 14:37:49.0 +0300
--- ../pgsql.new/src/backend/access/gin/ginget.c	2007-06-19 14:41:46.0 +0300
***
*** 476,509 
  #define GinIsVoidRes(s)		( ((GinScanOpaque) scan->opaque)->isVoidRes == true )
  
  Datum
! gingetmulti(PG_FUNCTION_ARGS)
  {
  	IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
! 	ItemPointer tids = (ItemPointer) PG_GETARG_POINTER(1);
! 	int32		max_tids = PG_GETARG_INT32(2);
! 	int32	   *returned_tids = (int32 *) PG_GETARG_POINTER(3);
  
  	if (GinIsNewKey(scan))
  		newScanKey(scan);
  
- 	*returned_tids = 0;
- 
  	if (GinIsVoidRes(scan))
! 		PG_RETURN_BOOL(false);
  
  	startScan(scan);
  
! 	do
  	{
! 		if (scanGetItem(scan, tids + *returned_tids))
! 			(*returned_tids)++;
! 		else
  			break;
! 	} while (*returned_tids < max_tids);
  
  	stopScan(scan);
  
! 	PG_RETURN_BOOL(*returned_tids == max_tids);
  }
  
  Datum
--- 476,510 
  #define GinIsVoidRes(s)		( ((GinScanOpaque) scan->opaque)->isVoidRes == true )
  
  Datum
! gingetbitmap(PG_FUNCTION_ARGS)
  {
  	IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
! 	TIDBitmap *tbm = (TIDBitmap *) PG_GETARG_POINTER(1);
! 	int32 ntids;
  
  	if (GinIsNewKey(scan))
  		newScanKey(scan);
  
  	if (GinIsVoidRes(scan))
! 		PG_RETURN_INT32(0);
  
  	startScan(scan);
  
! 	ntids = 0;
! 	for(;;)
  	{
! 		ItemPointerData iptr;
! 		
! 		if (!scanGetItem(scan, &iptr))
  			break;
! 
! 		ntids++;
! 		tbm_add_tuples(tbm, &iptr, 1, false);
! 	}
  
  	stopScan(scan);
  
! 	PG_RETURN_INT32(ntids);
  }
  
  Datum
diff -c -N -r ./src/backend/access/gist/gistget.c ../pgsql.new/src/backend/access/gist/gistget.c
*** ./src/backend/access/gist/gistget.c	2007-06-19 14:37:49.0 +0300
--- ../pgsql.new/src/backend/access/gist/gistget.c	2007-06-19 14:41:46.0 +0300
***
*** 22,28 
  
  static OffsetNumber gistfindnext(IndexScanDesc scan, OffsetNumber n,
  			 ScanDirection dir);
! static int	gistnext(IndexScanDesc scan, ScanDirection dir, ItemPointer tids, int maxtids, bool ignore_killed_tuples);
  static bool gistindex_keytest(IndexTuple tuple, IndexScanDesc scan,
    OffsetNumber offset);
  
--- 22,30 
  
  static OffsetNumber gistfindnext(IndexScanDesc scan, OffsetNumber n,
  			 ScanDirection dir);
! static int	gistnext(IndexScanDesc scan, ScanDirection dir, 
! 	 ItemPointer tid, TIDBitmap *tbm, 
! 	 bool ignore_killed_tuples);
  static bool gistindex_keytest(IndexTuple tuple, IndexScanDesc scan,
    OffsetNumber offset);
  
***
*** 114,145 
  	 * tuples, continue looping until we find a non-killed tuple that matches
  	 * the search key.
  	 */
! 	res = (gistnext(scan, dir, &tid, 1, scan->ignore_killed_tuples)) ? true : false;
  
  	PG_RETURN_BOOL(res);
  }
  
  Datum
! gistgetmulti(PG_FUNCTION_ARGS)
  {
  	IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
! 	ItemPointer tids = (ItemPointer) PG_GETARG_POINTER(1);
! 	int32		max_tids = PG_GETARG_INT32(2);
! 	int32	   *returned_tids = (int32 *) PG_GETARG_POINTER(3);
  
! 	*returned_tids = gistnext(scan, ForwardScanDirection, tids, max_tids, false);
  
! 	PG_RETURN_BOOL(*returned_tids == max_tids);
  }
  
  /*
   * Fetch a tuples that matchs the search key; this can be invoked
   * either to fetch the first such tuple or subsequent matching
!  * tuples. Returns true iff a matching tuple was found.
   */
  static int
! gistnext(IndexScanDesc scan, ScanDirection dir, ItemPointer tids,
! 		 int maxtids, bool ignore_killed_tuples)
  {
  	Page		p;
  	OffsetNumber n;
--- 116,152 
  	 * tuples, continue looping until we find a non-killed tuple that matches
  	 * the search key.
  	 */
! 	res = (gistnext(scan, dir, &tid, NULL, scan->ignore_killed_tuples)) ? true : false;
  
  	PG_RETURN_BOOL(res);
  }
  
  Datum
! gistgetbitmap(PG_FUNCTION_ARGS)
  {
  	IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
! 	TIDBitmap *tbm = (TIDBitmap *) PG_GETARG_POINTER(1);
! 	int32	   ntids;
  
! 	ntids = gistnext(scan, ForwardScanDirection, NULL, tbm, false);
  
! 	PG_RETURN_INT32(ntids);
  }
  
  /*
   * Fetch a tuples that matchs the search key; this can be invoked
   * either to fetch the first such t

Re: [PATCHES] [HACKERS] 'Waiting on lock'

2007-06-20 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes:

> Applied with some further revisions to improve the usefulness of the log
> messages.  There's now one issued when the deadlock timeout elapses, and
> another when the lock is finally obtained:
>
> LOG:  process 14945 still waiting for AccessExclusiveLock on relation 86076 
> of database 86042 after 1003.354 ms
> ...
> LOG:  process 14945 acquired AccessExclusiveLock on relation 86076 of 
> database 86042 after 5918.002 ms

Is it possible for unlocking the semaphore to wake another process other than
our own? In which case checking log_lock_waits before signalling the semaphore
arguably locks us into having log_lock_waits be PGC_POSTMASTER. Currently it's
PGC_SIGHUP which is odd since it could have been USERSET in the old regime.

Also, I did just think of a reason why maybe having the times in the messages
could be annoying: it makes it hard to write regression tests. I suppose
having the pids already interferes with regression tests though. Maybe we
should do something like optionally postprocess .out files with some sed
script like s/[0-9]+/###/ before running diff.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PATCHES] [EMAIL PROTECTED]: Re: [HACKERS] EXPLAIN omits schema?]

2007-06-20 Thread Alvaro Herrera
German sent this some time ago and it never reached the list.  He sent
it again from Gmail but again it was lost in the void.

I am forwarding it to improve the chances of it being delivered ...  The
patch in the fwd is not a nice MIME part but it should work without
problem anyway.

- Forwarded message from Germán Poó Caamaño <[EMAIL PROTECTED]> -

From: Germán Poó Caamaño <[EMAIL PROTECTED]>
To: Magnus Hagander <[EMAIL PROTECTED]>
Cc: Alvaro Herrera <[EMAIL PROTECTED]>,
Dave Page <[EMAIL PROTECTED]>,
PostgreSQL-development <[EMAIL PROTECTED]>
Date: Thu, 14 Jun 2007 17:53:37 -0400
Subject: Re: [HACKERS] EXPLAIN omits schema?
X-Bogosity: Ham, tests=bogofilter, spamicity=0.00, version=1.1.5

On Wed, 2007-06-13 at 14:49 +0200, Magnus Hagander wrote:
> On Wed, Jun 13, 2007 at 08:47:30AM -0400, Alvaro Herrera wrote:
> > Magnus Hagander wrote:
> > 
> > > Just to open a whole new can of worms ;-)
> > > 
> > > I read an article a couple of days ago about the "machine readable 
> > > showplan
> > > output" in SQL Server 2005 (basically, it's EXPLAIN output but in XML
> > > format). It does make a lot of sense if yourp rimary interface is !=
> > > commandline (psql), such as pgadmin or phppgadmin. The idea being that you
> > > can stick in *all* the details you want, since you can't possibly clutter
> > > up the display. And you stick them in a well-defined XML format (or 
> > > another
> > > format if you happen to hate XML) where the client-side program can easily
> > > parse out whatever it needs. It's also future-proof - if you add a new
> > > field somewhere, the client program parser won't break.
> > > 
> > > Something worth doing? Not to replace the current explain output, but as a
> > > second option (EXPLAIN XML whatever)?
> > 
> > FYI a patch was posted for this some time ago, because a friend of mine
> > wanted to help a student to write an EXPLAIN parsing tool.
> 
> Didn't see that one. Explain in XML format? Got an URL for it, I can't seem
> to find it on -patches.

I never send it, sorry.  It was made against 8.0 beta.  By the time of
the message, I was told that pgAdmin folks were working on parser the
text output. Hence, I thought it was useless after all.

Anyway, I made a break and I have the patch working against CVS HEAD.

Please note this is a 3 years old patch.  Some stuff are missing, such
as show_sort_info, but it should be easy to add it.

By the time this code was written, the 'XML' token didn't exists. 
Today, when I made the merge, I noted there is a XML_P.  So, I
touched keywords "xml".  I hope I'm not break anything.

Any comments are welcomed.  Even if this patch is a total crap.

PS: I owe you the DTD.

-- 
Germán Poó Caamaño
Concepción - Chile

*** /dev/fd/63  2007-06-14 17:40:26.478023384 -0400
--- src/backend/commands/explain.c  2007-06-14 17:23:29.280056575 -0400
*** typedef struct ExplainState
*** 49,54 
--- 49,61 
List   *rtable; /* range table */
  } ExplainState;
  
+ typedef struct ExplainXML
+ {
+   /* options */
+   StringInfo  str;
+   int level;  /* level of childs */
+ } ExplainXML;
+ 
  static void ExplainOneQuery(Query *query, ExplainStmt *stmt,
const char *queryString,
ParamListInfo params, 
TupOutputState *tstate);
*** static double elapsed_time(instr_time *s
*** 56,72 
  static void explain_outNode(StringInfo str,
Plan *plan, PlanState *planstate,
Plan *outer_plan,
!   int indent, ExplainState *es);
  static void show_scan_qual(List *qual, const char *qlabel,
   int scanrelid, Plan *outer_plan, Plan *inner_plan,
!  StringInfo str, int indent, ExplainState *es);
  static void show_upper_qual(List *qual, const char *qlabel, Plan *plan,
!   StringInfo str, int indent, ExplainState *es);
  static void show_sort_keys(Plan *sortplan, int nkeys, AttrNumber *keycols,
   const char *qlabel,
!  StringInfo str, int indent, ExplainState *es);
  static void show_sort_info(SortState *sortstate,
!  StringInfo str, int indent, ExplainState *es);
  static const char *explain_get_index_name(Oid indexId);
  
  
--- 63,79 
  static void explain_outNode(StringInfo str,
Plan *plan, PlanState *planstate,
Plan *outer_plan,
!   int indent, ExplainState *es, ExplainXML *exml);
  static void show_scan_qual(List *qual, const char *qlabel,
   int scanrelid, Plan *outer_plan, Plan *inner_plan,
!  StringInfo str, int indent, ExplainState *es, 
ExplainXML *exml

Re: [PATCHES] more autovacuum fixes

2007-06-20 Thread Alvaro Herrera
Alvaro Herrera wrote:

> Turns out that this problem is not serious at all, because if that
> palloc() fails, the whole postmaster will exit with a FATAL out of
> memory message.
> 
> The problems in the worker code after fork are still an issue though.

It _is_ still an issue -- and a very serious issue at that.  If a worker
fails before getting its entry from the startingWorker pointer, then
(patched) autovacuum will no longer run any task.

Since it doesn't seem like it is possible to signal the launcher until
the worker has set up shared memory, I think we should leave the current
shenanigans in place.  They are really ugly code and I'd love to get rid
of them, but currently I don't see any better mechanism to deal with
this failure scenario.

Thanks for your attention ;-)

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
Bob [Floyd] used to say that he was planning to get a Ph.D. by the "green
stamp method," namely by saving envelopes addressed to him as 'Dr. Floyd'.
After collecting 500 such letters, he mused, a university somewhere in
Arizona would probably grant him a degree.  (Don Knuth)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PATCHES] Load Distributed Checkpoints, take 3

2007-06-20 Thread Heikki Linnakangas

Here's an updated WIP patch for load distributed checkpoints.

I added a spinlock to protect the signaling fields between bgwriter and 
backends. The current non-locking approach gets really difficult as the 
patch adds two new flags, and both are more important than the existing 
ckpt_time_warn flag.


In fact, I think there's a small race condition in CVS HEAD:

1. pg_start_backup() is called, which calls RequestCheckpoint
2. RequestCheckpoint takes note of the old value of ckpt_started
3. bgwriter wakes up from pg_usleep, and sees that we've exceeded 
checkpoint_timeout.

4. bgwriter increases ckpt_started to note that a new checkpoint has started
5. RequestCheckpoint signals bgwriter to start a new checkpoint
6. bgwriter calls CreateCheckpoint, with the force-flag set to false 
because this checkpoint was triggered by timeout
7. RequestCheckpoint sees that ckpt_started has increased, and starts to 
wait for ckpt_done to reach the new value.
8. CreateCheckpoint finishes immediately, because there was no XLOG 
activity since last checkpoint.

9. RequestCheckpoint sees that ckpt_done matches ckpt_started, and returns.
10. pg_start_backup() continues, with potentially the same redo location 
and thus history filename as previous backup.


Now I admit that the chances for that to happen are extremely small, 
people don't usually do two pg_start_backup calls without *any* WAL 
logged activity in between them, for example. But as we add the new 
flags, avoiding scenarios like that becomes harder.


Since last patch, I did some clean up and refactoring, and added a bunch 
of comments, and user documentation.


I haven't yet changed GetInsertRecPtr to use the almost up-to-date value 
protected by the info_lck per Simon's suggestion, and I need to do some 
correctness testing. After that, I'm done with the patch.


Ps. In case you wonder what took me so long since last revision, I've 
spent a lot of time reviewing HOT.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
Index: doc/src/sgml/config.sgml
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.126
diff -c -r1.126 config.sgml
*** doc/src/sgml/config.sgml	7 Jun 2007 19:19:56 -	1.126
--- doc/src/sgml/config.sgml	19 Jun 2007 14:24:31 -
***
*** 1565,1570 
--- 1565,1608 

   
  
+  
+   checkpoint_smoothing (floating point)
+   
+checkpoint_smoothing configuration parameter
+   
+   
+
+ Specifies the target length of checkpoints, as a fraction of 
+ the checkpoint interval. The default is 0.3.
+ 
+ This parameter can only be set in the postgresql.conf
+ file or on the server command line.
+
+   
+  
+ 
+   checkpoint_rate (floating point)
+   
+checkpoint_rate configuration parameter
+   
+   
+
+ Specifies the minimum I/O rate used to flush dirty buffers during a
+ checkpoint, when there's not many dirty buffers in the buffer cache.
+ The default is 512 KB/s.
+ 
+ Note: the accuracy of this setting depends on
+ bgwriter_delaypostgresql.conf
+ file or on the server command line.
+
+   
+  
+ 
   
checkpoint_warning (integer)

Index: doc/src/sgml/wal.sgml
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/doc/src/sgml/wal.sgml,v
retrieving revision 1.43
diff -c -r1.43 wal.sgml
*** doc/src/sgml/wal.sgml	31 Jan 2007 20:56:19 -	1.43
--- doc/src/sgml/wal.sgml	19 Jun 2007 14:26:45 -
***
*** 217,225 

  

 There will be at least one WAL segment file, and will normally
 not be more than 2 * checkpoint_segments + 1
!files.  Each segment file is normally 16 MB (though this size can be
 altered when building the server).  You can use this to estimate space
 requirements for WAL.
 Ordinarily, when old log segment files are no longer needed, they
--- 217,245 

  

+If there is a lot of dirty buffers in the buffer cache, flushing them
+all at checkpoint will cause a heavy burst of I/O that can disrupt other
+activity in the system. To avoid that, the checkpoint I/O can be distributed
+over a longer period of time, defined with
+checkpoint_smoothing. It's given as a fraction of the
+checkpoint interval, as defined by checkpoint_timeout
+and checkpoint_segments. The WAL segment consumption
+and elapsed time is monitored and the I/O rate is adjusted during
+checkpoint so that it's finished when the given fraction of elapsed time
+or WAL segments has passed, whichever is sooner. However, that could lead
+to unnecessarily prolonged checkpoints when there's not many dirty buffers
+in the cache. To avoid that, checkpoint_rate can be used

Re: [PATCHES] [EMAIL PROTECTED]: Re: [HACKERS] EXPLAIN omits schema?]

2007-06-20 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> I am forwarding it to improve the chances of it being delivered ...  The
> patch in the fwd is not a nice MIME part but it should work without
> problem anyway.

I'm not sure why anyone would want *both* xml and regular output
produced at once.  The patch's treatment of name quoting seems both
inconsistent and highly unlikely to be correct (how does XML deal
with embedded quotes in attribute values, anyway?).  The submitter
appears to have no clue about the maintenance details required when
adding a field to a Node struct.

But the big question is: where's the DTD?  Has he even tried to design
a sane XML representation, or just emitted whatever was convenient given
the existing code structure?  I'm fairly suspicious that a patch that
doesn't rearrange the existing code at all is probably not producing
the ideal XML structure.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Preliminary GSSAPI Patches

2007-06-20 Thread Magnus Hagander
On Tue, Jun 19, 2007 at 06:19:37PM -0700, Henry B. Hotz wrote:
> Such timing!
> 
> I just spent most of yesterday stepping though the gssapi sample  
> app's in Java 1.4 with someone here at work.  Was thinking I needed  
> to get back to the JDBC client and do what I promised.  Also finished  
> filtering the PG lists for stuff just before seeing this email.

:-)

> 
> >On Sun, May 20, 2007 at 01:28:40AM -0700, Henry B. Hotz wrote:
> >>I finally got to testing that updated patch.  It's fine per-se, but
> >>was missing the updated README.GSSAPI file.  Herewith fixed.
> >>
> >
> >I've been reviewing and updating this patch, for a while now.I've  
> >changed
> >quite a bit around, and I have it working fine, but I have one  
> >question.
> 
> Be curious to see what you've done, but if you're actively changing  
> things I'll let them settle.

I've got a bit more cleanup to do, but I'm almost there.

Much of it is just cleanup. I've changed the structs arond to be more in
line with the other code around it, and such. Refacored some of the code to
cut down duplicate codes. Added some stuff to make it work on windows
(still just with MIT kerberos and not native though). Fixed two (I think it
was) small memory leaks.

Protocol-wise, it no longer piggybacks int eh AuthenticationOk message -
instead we send an extra continue message followed right away by an
AuthenticationOk one.

Oh, and I've added autoconf. Not complete yet, but getting there :-)

I'll post the updated patch shortly :-)

> >Is there a way to provoke GSSAPI into sending multiple packets in the
> >authentication? It doesn't seem to do that for me, and ISTM that  
> >the code
> >as it stands is broken in that case - but I'd like to verify it.
> 
> Remember wondering about that myself.  For SASL if you turn on all  
> the options you get an extra round trip.  Not for GSSAPI/Krb5, which  
> is pretty efficient in that respect.  The loop logic for SASL is just  
> different enough I can imagine messing up, but I would have thought  
> it would have made me get the logic right.
> 
> The only thing I can think of is to use a different GSSAPI  
> mechanism.  That opens an interesting can of worms that has nothing  
> to do with Postgresql.  First of all that means you need to use a  
> GSSAPI implementation that supports multiple mechanisms (which  
> precludes Java for now).  That in turn means either Sun, or MIT 1.6+,  
> or Heimdal 0.8+.  Second, you need another mechanism to install.   
> That means the commercial Entrust SPKM mechanism on Sun, or the UMICH  
> SPKM mechanism, or some SPNEGO mechanism.
> 
> Love says there are problems with the SPKM RFC and the UMICH  
> implementation won't interoperate with other implementations as a  
> result (but it works with itself).  I also know he's been  
> experimenting with other mechanisms.  Looking at the latest Heimdal  
> snapshot I have, it seems to have both SPNEGO and NTLM mechanism code  
> in it.
> 
> A configuration that used SPNEGO to negotiate Kerberos 5 ought to  
> take two round trips, at least.  Feel like trying it?

Not really ;-) I did check the code, and it certaily wasn't right the way
it was. I added some manual packet injection, and it ended up in the wrong
place. It looks right now, and I'll have to test that eventually (my test
packets end up in the right place now). But what I have now *seems* right.

> >Basically, pg_GSS_recvauth() is supposed to loop and read all  
> >"continuing
> >exchange packets", right? But the reading of packets from the  
> >network sits
> >*outside* the loop. So it basically just loops over and over on the  
> >same
> >data, which ISTM is wrong. It does send a proper ask-for-continue  
> >message
> >to the frontend inside the loop, but I can't figure out how it's  
> >supposed
> >to read the response.
> 
> I remember having to stand on my head to get it (apparently) right on  
> the client side.  I also remember the server side was a lot simpler,  
> but you're saying I may have oversimplified?  I can't answer the  
> question about the server side without reviewing the code.

The client side was much more complete, yes :) And with my other chanegs to
the protocol, it actually gets more than one packet.

As for the server-side fix, it was just moving a couple of lines of code
inside the loop...

//Magnus


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] [HACKERS] 'Waiting on lock'

2007-06-20 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes:
> Is it possible for unlocking the semaphore to wake another process other than
> our own? In which case checking log_lock_waits before signalling the semaphore
> arguably locks us into having log_lock_waits be PGC_POSTMASTER.

How you figure that?

> Currently it's PGC_SIGHUP which is odd since it could have been
> USERSET in the old regime.

Actually I changed it to SUSET yesterday.  I don't see any strong reason
why we should disallow different processes having different settings;
however, if the DBA is trying to gather this info, random users
shouldn't be able to turn it off.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] [HACKERS] 'Waiting on lock'

2007-06-20 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes:

> Gregory Stark <[EMAIL PROTECTED]> writes:
>> Is it possible for unlocking the semaphore to wake another process other than
>> our own? In which case checking log_lock_waits before signalling the 
>> semaphore
>> arguably locks us into having log_lock_waits be PGC_POSTMASTER.
>
> How you figure that?

Well I'm not clear exactly what's going on with the semaphores here. If it's
possible for to be printing the messages only as a result of another backend
unlocking the semaphore then making the PGSemaphoreUnlock conditional on
log_lock_waits means you can't enable log_lock_waits after startup and get
deterministic behaviour because whether you get messages will depend on which
other backend happens to wake you up.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] 'Waiting on lock'

2007-06-20 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes:
> "Tom Lane" <[EMAIL PROTECTED]> writes:
>> How you figure that?

> Well I'm not clear exactly what's going on with the semaphores here. If it's
> possible for to be printing the messages only as a result of another backend
> unlocking the semaphore then making the PGSemaphoreUnlock conditional on
> log_lock_waits means you can't enable log_lock_waits after startup and get
> deterministic behaviour because whether you get messages will depend on which
> other backend happens to wake you up.

I don't see how you arrive at that conclusion.  The message is printed
by the backend that is waiting for (or just obtained) a lock, dependent
on its own local setting of log_lock_waits, and not dependent on who
woke it up.

BTW, I just noticed that GUC allows deadlock_timeout to be set all the
way down to zero.  This seems bad --- surely the minimum value should at
least be positive?  As CVS HEAD stands, you're likely to get a lot of
spurious/useless log messages if you have log_lock_waits = true and
deadlock_timeout = 0.  Do we care?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PATCHES] postgresql-icu patch status

2007-06-20 Thread Ireneusz Pluta
Hello,

there was postgresql-icu patch for proper collation of UTF8 character
set, published here:
http://people.freebsd.org/~girgen/postgresql-icu/. I have been using
it with 8.1 and it worked fine for me. Now I am migrating to 8.2, but,
sadly, there is no patch available this version.

Does anybody know what happened to the development of this? Isn't
commiter interested, or things got now more complicated in the area
affected by this patch?

I am not a PostgreSQL developer, but having some minor C and CVS
experience I migth try to make a patch based on the last available for
8.1.4 and merging the changes into the last release, in a more or less
dummy way. But before I even try, it would be nice to hear something
from here - were there major changes in the source code affected by
the icu patch between 8.1.4 and 8.2.4, or were there any changes in
other areas that might affect the patch at all?

Thanks

Ireneusz Pluta


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] [HACKERS] 'Waiting on lock'

2007-06-20 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes:

> Gregory Stark <[EMAIL PROTECTED]> writes:
>> "Tom Lane" <[EMAIL PROTECTED]> writes:
>>> How you figure that?
>
>> Well I'm not clear exactly what's going on with the semaphores here. If it's
>> possible for to be printing the messages only as a result of another backend
>> unlocking the semaphore then making the PGSemaphoreUnlock conditional on
>> log_lock_waits means you can't enable log_lock_waits after startup and get
>> deterministic behaviour because whether you get messages will depend on which
>> other backend happens to wake you up.
>
> I don't see how you arrive at that conclusion.  The message is printed
> by the backend that is waiting for (or just obtained) a lock, dependent
> on its own local setting of log_lock_waits, and not dependent on who
> woke it up.

But in your version of the patch you're not calling PGSemaphoreUnlock() unless
log_lock_waits is set in the process doing the waking. 

Hm, I suppose it'll wake up itself when its own deadlock timer runs out
anyways. So I guess the worst case is that it doesn't say anything after a
soft deadlock fixup.

> BTW, I just noticed that GUC allows deadlock_timeout to be set all the
> way down to zero.  This seems bad --- surely the minimum value should at
> least be positive?  As CVS HEAD stands, you're likely to get a lot of
> spurious/useless log messages if you have log_lock_waits = true and
> deadlock_timeout = 0.  Do we care?

Does that actually work? I would expect setitimer to turn off the alarm in
that case.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] [HACKERS] 'Waiting on lock'

2007-06-20 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes:
> "Tom Lane" <[EMAIL PROTECTED]> writes:
>> I don't see how you arrive at that conclusion.  The message is printed
>> by the backend that is waiting for (or just obtained) a lock, dependent
>> on its own local setting of log_lock_waits, and not dependent on who
>> woke it up.

> But in your version of the patch you're not calling PGSemaphoreUnlock() unless
> log_lock_waits is set in the process doing the waking. 

Which is always the same process:
PGSemaphoreUnlock(&MyProc->sem);

>> BTW, I just noticed that GUC allows deadlock_timeout to be set all the
>> way down to zero.  This seems bad --- surely the minimum value should at
>> least be positive?  As CVS HEAD stands, you're likely to get a lot of
>> spurious/useless log messages if you have log_lock_waits = true and
>> deadlock_timeout = 0.  Do we care?

> Does that actually work? I would expect setitimer to turn off the alarm in
> that case.

Good point, which renders it definitely broken.  I propose we just tweak
GUC to set a minimum deadlock_timeout of 1 msec.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] more autovacuum fixes

2007-06-20 Thread Alvaro Herrera
Alvaro Herrera wrote:
> Alvaro Herrera wrote:
> 
> > Turns out that this problem is not serious at all, because if that
> > palloc() fails, the whole postmaster will exit with a FATAL out of
> > memory message.
> > 
> > The problems in the worker code after fork are still an issue though.
> 
> It _is_ still an issue -- and a very serious issue at that.  If a worker
> fails before getting its entry from the startingWorker pointer, then
> (patched) autovacuum will no longer run any task.

I figured that I could keep the old check there for when the worker
failed, but still add the new signalling mechanism so that a fork()
failure (which I would think is the most likely of all) is taken care of
in a more timely manner.

I've also improved the rest of the code and comments a bit, and the new
code does seem better now.  So I'll go ahead and commit it later today.

-- 
Alvaro Herrera http://www.flickr.com/photos/alvherre/
"Uno puede defenderse de los ataques; contra los elogios se esta indefenso"
Index: src/backend/postmaster/autovacuum.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/postmaster/autovacuum.c,v
retrieving revision 1.50
diff -c -p -r1.50 autovacuum.c
*** src/backend/postmaster/autovacuum.c	13 Jun 2007 21:24:55 -	1.50
--- src/backend/postmaster/autovacuum.c	20 Jun 2007 17:28:22 -
***
*** 4,9 
--- 4,45 
   *
   * PostgreSQL Integrated Autovacuum Daemon
   *
+  * The autovacuum system is structured in two different kinds of processes: the
+  * autovacuum launcher and the autovacuum worker.  The launcher is an
+  * always-running process, started by the postmaster when the autovacuum GUC
+  * parameter is set.  The launcher schedules autovacuum workers to be started
+  * when appropriate.  The workers are the processes which execute the actual
+  * vacuuming; they connect to a database as determined in the launcher, and
+  * once connected they examine the catalogs to select the tables to vacuum.
+  *
+  * The autovacuum launcher cannot start the worker processes by itself,
+  * because doing so would cause robustness issues (namely, failure to shut
+  * them down on exceptional conditions; and apart from that, since the launcher
+  * is connected to shared memory and is thus subject to corruption there, it is
+  * not as robust as the postmaster which is not).  So it leaves that task to
+  * the postmaster.
+  *
+  * There is an autovacuum shared memory area, where the launcher stores
+  * information about the database it wants vacuumed.  When it wants a new
+  * worker to start, it sets a flag in shared memory and sends a special signal
+  * to the postmaster.  Then postmaster knows nothing more than it must start a
+  * worker; so it forks a new child, which turns into a worker and examines
+  * shared memory to know what database it must connect to.
+  *
+  * If the fork() call fails, the postmaster sets a flag in the shared memory
+  * area, and it sends a signal to the launcher.  The launcher, upon noticing
+  * the flag, can try starting the worker again.  Note that the failure can only
+  * be transient (fork failure due to high load, memory pressure, too many
+  * processes, etc); more permanent problems, like failure to connect to a
+  * database, are detected later in the worker and dealt with in a different
+  * way.
+  *
+  * When the worker is done vacuuming it sends SIGUSR1 to the launcher.  The
+  * launcher then wakes up and is able to launch another worker, if the schedule
+  * is too tight.  Otherwise it will go back to sleeping and awake normally
+  * according to schedule.  At this time the launcher can also balance the
+  * settings for the various remaining workers' cost-based vacuum delay feature.
+  *
   *
   * Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group
   * Portions Copyright (c) 1994, Regents of the University of California
*** typedef struct WorkerInfoData
*** 158,185 
  
  typedef struct WorkerInfoData *WorkerInfo;
  
  /*-
   * The main autovacuum shmem struct.  On shared memory we store this main
   * struct and the array of WorkerInfo structs.  This struct keeps:
   *
   * av_launcherpid	the PID of the autovacuum launcher
   * av_freeWorkers	the WorkerInfo freelist
   * av_runningWorkers the WorkerInfo non-free queue
   * av_startingWorker pointer to WorkerInfo currently being started (cleared by
   *	the worker itself as soon as it's up and running)
-  * av_rebalance		true when a worker determines that cost limits must be
-  * 	rebalanced
   *
!  * This struct is protected by AutovacuumLock.
   *-
   */
  typedef struct
  {
  	pid_t			av_launcherpid;
  	SHMEM_OFFSET	av_freeWorkers;
  	SHM_QUEUE		av_runningWorkers;
  	SHMEM_OFFSET	av_startingWorker;
- 	bool			av_rebalance;
  } AutoVacuumShmemStruct;
  
  static AutoVacuumShmemStruct *AutoVacuumShmem;
--- 194,233 
  
  

Re: [PATCHES] [HACKERS] 'Waiting on lock'

2007-06-20 Thread Gregory Stark

"Tom Lane" <[EMAIL PROTECTED]> writes:

> Which is always the same process:
>   PGSemaphoreUnlock(&MyProc->sem);

Aaah! I just grokked what's going on with the semaphores here. It all makes a
lot more sense now. Nevermind.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PATCHES] psql: flush output in cursor-fetch mode

2007-06-20 Thread Neil Conway
psql's "FETCH_COUNT" feature is useful for incrementally displaying the
results of a long-running query. However, psql fails to flush its output
stream as new rows from the partial result set are produced, which means
that partial query results may not be visible to the client until the
stdio buffer is eventually flushed or the query produces its complete
result set.

Example:

$ cat ~/test.sql
-- a contrived function to get a query that slowly produces
-- more rows
create function slow_func() returns boolean as
$$
begin
perform pg_sleep(2);
return true;
end;
$$ language plpgsql;

neilc=# \i ~/test.sql 
CREATE FUNCTION
neilc=# create table t1 (a int, b int);
CREATE TABLE
neilc=# insert into t1 values (5, 10), (10, 15), (20, 25), (30, 35);
INSERT 0 4
neilc=# \set FETCH_COUNT 1
neilc=# select * from t1 where slow_func() is true;

With CVS HEAD, no output is visible until the complete result set has
been produced, at which point all 4 rows are printed. This is
undesirable: since the client has gone to the trouble of FETCH'ing the
rows one-at-a-time, it should display the partial result set before
issuing another FETCH.

Attached is a patch that fixes this, by calling fflush() on the psql
output stream after each call to printQuery() in ExecQueryUsingCursor().

-Neil

Index: src/bin/psql/common.c
===
RCS file: /home/neilc/postgres/cvs_root/pgsql/src/bin/psql/common.c,v
retrieving revision 1.134
diff -p -c -r1.134 common.c
*** src/bin/psql/common.c	16 Apr 2007 20:15:38 -	1.134
--- src/bin/psql/common.c	20 Jun 2007 22:34:04 -
*** ExecQueryUsingCursor(const char *query, 
*** 1076,1081 
--- 1076,1087 
  
  		printQuery(results, &my_popt, pset.queryFout, pset.logfile);
  
+ 		/*
+ 		 * Make sure to flush the output stream, so intermediate
+ 		 * results are visible to the client immediately.
+ 		 */
+ 		fflush(pset.queryFout);
+ 
  		/* after the first result set, disallow header decoration */
  		my_popt.topt.start_table = false;
  		my_popt.topt.prior_records += ntuples;

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-20 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> I added a spinlock to protect the signaling fields between bgwriter and 
> backends. The current non-locking approach gets really difficult as the 
> patch adds two new flags, and both are more important than the existing 
> ckpt_time_warn flag.

That may be, but you could minimize the cost and notational ugliness by
not using the spinlock where you don't have to.  Put the sig_atomic_t
fields back the way they were, and many of the uses of the spinlock go
away.  All you really need it for is the places where the additional
flags are set or read.

> In fact, I think there's a small race condition in CVS HEAD:

Yeah, probably --- the original no-locking design didn't have any side
flags.  The reason you need the lock is for a backend to be sure that
a newly-started checkpoint is using its requested flags.  But the
detection of checkpoint termination is still the same.

Some other comments:

I tend to agree with whoever said upthread that the combination of GUC
variables proposed here is confusing and ugly.  It'd make more sense to
have min and max checkpoint rates in KB/s, with the max checkpoint rate
only honored when we are predicting we'll finish before the next
checkpoint time.

The flow of control and responsibility between xlog.c, bgwriter.c and
bufmgr.c seems to have reached the breaking point of unintelligibility.
Can you think of a refactoring that would simplify it?  We might have
to resign ourselves to this much messiness, but I'd rather not...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-20 Thread ITAGAKI Takahiro
Heikki Linnakangas <[EMAIL PROTECTED]> wrote:

> Here's an updated WIP patch for load distributed checkpoints.
> Since last patch, I did some clean up and refactoring, and added a bunch 
> of comments, and user documentation.

The only thing I don't understand is the naming of 'checkpoint_smoothing'.
Can users imagine the unit of 'smoothing' is a fraction?

You explain the paremeter with the word 'fraction'.
Why don't you simply name it 'checkpoint_fraction' ?
| Specifies the target length of checkpoints, as a fraction of
| the checkpoint interval. The default is 0.3.

Sorry if I'm missing discussions abount the naming.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 6: explain analyze is your friend