Re: [HACKERS] Vote needed: revert beta2 changes or not?

2005-10-06 Thread Jonah H. Harris
Just my two cents... but I prefer option 1.

2005/10/6, Alvaro Herrera <[EMAIL PROTECTED]>:
> On Thu, Oct 06, 2005 at 10:57:33PM -0300, Marc G. Fournier wrote:
> > On Thu, 6 Oct 2005, [EMAIL PROTECTED] wrote:
> >
> > >I don't get a vote - but I do want to suggest, as a user, that I get
> > >generally annoyed with the presence of interfaces with names that
> > >were chosen for historical reasons, but are maintained only for
> > >compatibility, and either never did, or no longer apply.
> > >
> > >I'd rather you left it fixed. Returning it to the old name, for the
> > >sake of process, and no other good reason, doesn't appeal to me.
>
> It's not just for the sake of process.  It's because the pgAdmin guys,
> who were the ones which invented the API and the users of it, are
> already using it with this interface.  Changing it means they take the
> compatibility hit.  However, I question how hard the compatibility hit
> is -- for the return type, isn't it a matter of testing two possible
> values instead of one?  The naming case is harder, but how much?
>
> My vote is to not change them again.
>
> > >It is
> > >a lesson learned. We move on. Enforce the process next time. Self
> > >inflicted punishment is somewhat masochistic. :-)
> >
> > If we don't enforce the process this time, why would we enforce it next
> > time?
>
> Because we will know better.
>
> --
> Alvaro Herrera Architect, http://www.EnterpriseDB.com
> "La fuerza no está en los medios físicos
> sino que reside en una voluntad indomable" (Gandhi)
>
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
>
>http://www.postgresql.org/docs/faq
>


--
Respectfully,

Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] slower merge join on sorted data chosen over

2005-10-06 Thread Kevin Grittner
I am absolutely sure that I never changed any of the enable_xxx
settings other than enable_mergejoin during these tests.  The only
other setting I played with was random_page_cost, but the nested
loop query always chose the plain index scan in my tests.
 
There was one odd thing.  I started testing on one machine, then
we dumped it and loaded it onto a new machine, to try a slightly
different configuration for comparison.  I re-ran the tests and
noticed that on both boxes, with no enable_xxx options turned off,
when I reduced the random_page_cost to 1.2 the merge case went
to a plain index scan rather than the bitmap scan.  What was odd was
that on the original box, the bitmap scan version was consistently
faster, while on the new box it was slightly slower.  I repeated the
runs a few times because I found that surprising, but it seemed to
hold.  That seems more likely to be related to the density of the
index pages following the restore than to the difference in
hardware or OS.  It wasn't a very dramatic difference either way.

I notice that on the fully cached runs, the actual time for the plain
index was (barely) faster than the bitmap heap scan:
 
   ->  Bitmap Heap Scan on "DbTranRepository" dtr  
(cost=297.07..47081.47 rows=25067 width=17) (actual time=30.432..427.463 
rows=39690 loops=1)
 
 ->  Index Scan using "DbTranRepository_timestamp" on 
"DbTranRepository" dtr  (cost=0.00..49419.45 rows=25067 width=17) (actual 
time=0.083..412.268 rows=39690 loops=1)
 
However, it doesn't seem to be predicting that result when you look
at the cost numbers, so it is a mystery.  Is there anything I can set to
tell it to spit out the cost info for every path it considers, so we can
see why it is making this choice?  It doesn't bias toward a lower
starting cost when it is going to use a nested index scan, does it?
(OK, "starting cost" is probably not the right argot -- I'm talking about
the "0.00" in "cost=0.00..49419.45".) 
 
As for expectations -- an experienced programmer who knew the
schema, the data, and had just written the queries to retrieve data,
was testing the application with various selection criteria, and
saying something like "This one should be sub-second. Check.
This one should take a few seconds.  Check.  Here's a big one, this'll
probably take two or three minutes.  Dang -- it took seven and a half
minutes.  Could you look at this and see why it's so slow?  It seems
like it should be able to use this index and get the job done faster."
Some might call this "gut feel"; others might describe it as putting
forward a series of hypotheses, testing each, and investigating
unexpected results.;-)
 
The good news is that we don't think this sort of query will be
chosen by the users very often -- a few times a month, perhaps,
among the roughly 200 management-level people who will have
access to this application.  An extra five minutes per query isn't a
deal breaker, although it would obviously be nice if it ran faster.
 
I'll review the archives for prior discussions of the problem.
 
-Kevin
 
 
>>> Tom Lane <[EMAIL PROTECTED]> 10/06/05 9:28 PM >>>
"Kevin Grittner" <[EMAIL PROTECTED]> writes:
> In both the 8.1beta2 and using a build from this morning's
> dev snapshot, this query ran slower than expected:

There's a known issue that the planner tends to overestimate the cost of
inner-index-scan nestloops, because it doesn't allow for the strong
caching effects associated with repeated scans of the same index (in
particular, that all the upper index levels tend to stay in cache).
See the archives for past inconclusive discussions about how to fix
this.

However, something else caught my eye:

>->  Bitmap Heap Scan on "DbTranRepository" dtr  
> (cost=297.07..47081.47 rows=25067 width=17) (actual time=69.056..5560.895 
> rows=39690 loops=1)

>  ->  Index Scan using "DbTranRepository_timestamp" on 
> "DbTranRepository" dtr  (cost=0.00..49419.45 rows=25067 width=17) (actual 
> time=33.625..11510.723 rows=39690 loops=1)

I don't understand why the second case chose a plain index scan when
there was no need for sorted output; the bitmap scan is faster both
per cost estimate and in reality.  Are you sure you turned off only
enable_mergejoin and not enable_bitmapscan?

Also, when you say "slower than expected", what is setting your
expectation?

regards, tom lane


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] How to delete Large Object from Database?

2005-10-06 Thread Premsun Choltanwanich


Dear All,
 
I use '$libdir/lo' for manage my PostgreSQL Large Object. It work fine for me to get and put Large Object  from and to database. However I found something that may not correct when I try to backup my data. It seem that I cannot delete Large Object from database. It seem the thing I can do is only delete the reference oid from table but Object still in database.
 
A Detail shown below is  reason I think that I cannot delete Large Object out of database.
 
14MB of file size when BackUp Data (BLOB Included)

900KB of file size when BackUp Data (BLOB Excluded)

13MB of file size when BackUp Data after delete all data from table that has 'lo' column. (BLOB Included) 
 
So,How to delete Large Object from Database? or Is my understanding wrong?


Re: [HACKERS] Announcing Veil

2005-10-06 Thread Neil Conway
On Thu, 2005-06-10 at 23:56 -0400, Bruce Momjian wrote:
> True, but are people going to recompile PostgreSQL to use this feature?
> Seems they would have to.

They would need to recompile PostgreSQL to use more than the default
number of user-defined LWLocks, which seems perfectly reasonable to me.

-Neil



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Announcing Veil

2005-10-06 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian  writes:
> > Tom Lane wrote:
> >> With only one known request for a user-allocated lock, it's hard to
> >> justify the overhead of a GUC variable.
> 
> > True, but are people going to recompile PostgreSQL to use this feature?
> > Seems they would have to.
> 
> How you figure that?  The proposed default value was 4, which seems
> fine to me, given that the known worldwide demand amounts to 1.

Oh, so you are going to give him a few slots.  I thought we were going
to default to 0 and he was going to have to bump it up to use his
software.  That sounds fine to me.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Announcing Veil

2005-10-06 Thread Tom Lane
Bruce Momjian  writes:
> Tom Lane wrote:
>> With only one known request for a user-allocated lock, it's hard to
>> justify the overhead of a GUC variable.

> True, but are people going to recompile PostgreSQL to use this feature?
> Seems they would have to.

How you figure that?  The proposed default value was 4, which seems
fine to me, given that the known worldwide demand amounts to 1.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Announcing Veil

2005-10-06 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian  writes:
> > Tom Lane wrote:
> >> I'd be willing to add the proposed patch in 8.1 (style note:
> >> NUM_USER_DEFINED_LWLOCKS should be set in pg_config_manual.h not
> >> lwlock.h).
> 
> > Shouldn't it be something we can put in postgresql.conf?
> 
> No more than any of the other entries in pg_config_manual.h.
> With only one known request for a user-allocated lock, it's hard to
> justify the overhead of a GUC variable.

True, but are people going to recompile PostgreSQL to use this feature?
Seems they would have to.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Announcing Veil

2005-10-06 Thread Tom Lane
Bruce Momjian  writes:
> Tom Lane wrote:
>> I'd be willing to add the proposed patch in 8.1 (style note:
>> NUM_USER_DEFINED_LWLOCKS should be set in pg_config_manual.h not
>> lwlock.h).

> Shouldn't it be something we can put in postgresql.conf?

No more than any of the other entries in pg_config_manual.h.
With only one known request for a user-allocated lock, it's hard to
justify the overhead of a GUC variable.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Announcing Veil

2005-10-06 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian  writes:
> > I don't see NUM_USER_DEFINED_LWLOCKS defined in 8.0 or 8.1, so what
> > system do you propose to allow you to set this value?
> 
> I'd be willing to add the proposed patch in 8.1 (style note:
> NUM_USER_DEFINED_LWLOCKS should be set in pg_config_manual.h not
> lwlock.h).  However, it's certainly not going to magically appear in
> existing releases, so I dunno if that'd make Marc happy or not.

Shouldn't it be something we can put in postgresql.conf?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Announcing Veil

2005-10-06 Thread Tom Lane
Bruce Momjian  writes:
> I don't see NUM_USER_DEFINED_LWLOCKS defined in 8.0 or 8.1, so what
> system do you propose to allow you to set this value?

I'd be willing to add the proposed patch in 8.1 (style note:
NUM_USER_DEFINED_LWLOCKS should be set in pg_config_manual.h not
lwlock.h).  However, it's certainly not going to magically appear in
existing releases, so I dunno if that'd make Marc happy or not.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Announcing Veil

2005-10-06 Thread Bruce Momjian

I don't see NUM_USER_DEFINED_LWLOCKS defined in 8.0 or 8.1, so what
system do you propose to allow you to set this value?

---

Marc Munro wrote:
-- Start of PGP signed section.
> Tom,
> Thanks for your reponse.  Unless I am missing your point, to add more
> locks we require a minor code change to the postgres server.  I am happy
> to submit a patch but this will not help Veil work with existing
> versions of Postgres.  I am aiming for compatibility with 7.4 onward.
> Your views on this would be appreciated.
> 
> Assuming that simply allocating a few extra LWLocks for user-defined
> functions is acceptable, here are some patches:
> 
> --cut---
> *** ./src/backend/storage/lmgr/lwlock.c Sat Aug 20 16:26:24 2005
> --- lwlock.cWed Oct  5 08:20:31 2005
> ***
> *** 120,126 
>  */
> numLocks += 2 * NUM_SLRU_BUFFERS;
> 
> !   /* Perhaps create a few more for use by user-defined modules? */
> 
> return numLocks;
>   }
> --- 120,127 
>  */
> numLocks += 2 * NUM_SLRU_BUFFERS;
> 
> !   /* Create a few more for use by user-defined modules. */
> !   numLocks += NUM_USER_DEFINED_LWLOCKS;
> 
> return numLocks;
>   }
> --cut---
> *** ./src/include/storage/lwlock.h  Sat Aug 20 16:26:34 2005
> --- lwlock.hWed Oct  5 08:22:26 2005
> ***
> *** 53,58 
> --- 53,62 
> MaxDynamicLWLock = 10
>   } LWLockId;
>  
> + /*
> +  * Allocate a few LWLocks for user-defined functions.
> +  */
> + #define NUM_USER_DEFINED_LWLOCKS 4
> 
>   typedef enum LWLockMode
>   {
> --cut---
> 
> 
> __
> Marc Munro
> 
> On Tue, 2005-10-04 at 22:51 -0400, Tom Lane wrote:
> > Marc Munro <[EMAIL PROTECTED]> writes:
> > > Since I was unable to dynamically assign a LWLock using
> > > LWLockAssign (none available), I have fairly arbitrarily overloaded the
> > > use of existing LWLocks.  When the flames die down perhaps we can
> > > discuss making a small number (one would be enough for me) of LWLocks
> > > available.
> > 
> > Perhaps you missed the comment in NumLWLocks()?
> > 
> > regards, tom lane
-- End of PGP section, PGP failed!

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] prefix btree implementation

2005-10-06 Thread Qingqing Zhou


On Thu, 6 Oct 2005, Jim C. Nasby wrote:

> On Wed, Oct 05, 2005 at 03:40:43PM -0700, Qingqing Zhou wrote:
> > We do the prefix sharing when we build up index only, never on the fly.
>
> So are you saying that inserts of new data wouldn't make any use of
> this? ISTM that greatly reduces the usefulness, though I'm not objecting
> because compression during build is probably better than none at all. Is
> there a technical reason compression can't be used during normal
> operations?
>

Yes, there are. Think if we do it we when build up index, we can choose
the shared prefix optimally w.r.t. maximizing the number of index items on
the page. But on the fly, if we do so, I am afraid this will (1) kill the
performance; (2) introduce more complexities. I don't exclude the
possibility of doing prefix sharing on the fly, but for current stage, I
would like to first come up with a proof-of-concept patch not including
this part.

Regards,
Qingqing

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] prefix btree implementation

2005-10-06 Thread Bruce Momjian
Jim C. Nasby wrote:
> On Wed, Oct 05, 2005 at 03:40:43PM -0700, Qingqing Zhou wrote:
> > We do the prefix sharing when we build up index only, never on the fly.
> 
> So are you saying that inserts of new data wouldn't make any use of
> this? ISTM that greatly reduces the usefulness, though I'm not objecting
> because compression during build is probably better than none at all. Is
> there a technical reason compression can't be used during normal
> operations?

Added to TODO:

* Consider compressing indexes by storing key prefix values shared by
  several rows as a single index entry

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Vote needed: revert beta2 changes or not?

2005-10-06 Thread Alvaro Herrera
On Thu, Oct 06, 2005 at 10:57:33PM -0300, Marc G. Fournier wrote:
> On Thu, 6 Oct 2005, [EMAIL PROTECTED] wrote:
> 
> >I don't get a vote - but I do want to suggest, as a user, that I get
> >generally annoyed with the presence of interfaces with names that
> >were chosen for historical reasons, but are maintained only for
> >compatibility, and either never did, or no longer apply.
> >
> >I'd rather you left it fixed. Returning it to the old name, for the
> >sake of process, and no other good reason, doesn't appeal to me.

It's not just for the sake of process.  It's because the pgAdmin guys,
who were the ones which invented the API and the users of it, are
already using it with this interface.  Changing it means they take the
compatibility hit.  However, I question how hard the compatibility hit
is -- for the return type, isn't it a matter of testing two possible
values instead of one?  The naming case is harder, but how much?

My vote is to not change them again.

> >It is
> >a lesson learned. We move on. Enforce the process next time. Self
> >inflicted punishment is somewhat masochistic. :-)
> 
> If we don't enforce the process this time, why would we enforce it next 
> time?

Because we will know better.

-- 
Alvaro Herrera Architect, http://www.EnterpriseDB.com
"La fuerza no está en los medios físicos
sino que reside en una voluntad indomable" (Gandhi)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] slower merge join on sorted data chosen over nested loop

2005-10-06 Thread Tom Lane
"Kevin Grittner" <[EMAIL PROTECTED]> writes:
> In both the 8.1beta2 and using a build from this morning's
> dev snapshot, this query ran slower than expected:

There's a known issue that the planner tends to overestimate the cost of
inner-index-scan nestloops, because it doesn't allow for the strong
caching effects associated with repeated scans of the same index (in
particular, that all the upper index levels tend to stay in cache).
See the archives for past inconclusive discussions about how to fix
this.

However, something else caught my eye:

>->  Bitmap Heap Scan on "DbTranRepository" dtr  
> (cost=297.07..47081.47 rows=25067 width=17) (actual time=69.056..5560.895 
> rows=39690 loops=1)

>  ->  Index Scan using "DbTranRepository_timestamp" on 
> "DbTranRepository" dtr  (cost=0.00..49419.45 rows=25067 width=17) (actual 
> time=33.625..11510.723 rows=39690 loops=1)

I don't understand why the second case chose a plain index scan when
there was no need for sorted output; the bitmap scan is faster both
per cost estimate and in reality.  Are you sure you turned off only
enable_mergejoin and not enable_bitmapscan?

Also, when you say "slower than expected", what is setting your
expectation?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Vote needed: revert beta2 changes or not?

2005-10-06 Thread Joshua D. Drake




1.  Leave it as-is.



+1 From here..

Joshua D. Drake

--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Vote needed: revert beta2 changes or not?

2005-10-06 Thread Bruce Momjian
[EMAIL PROTECTED] wrote:
> I don't get a vote - but I do want to suggest, as a user, that I get
> generally annoyed with the presence of interfaces with names that
> were chosen for historical reasons, but are maintained only for
> compatibility, and either never did, or no longer apply.
> 
> I'd rather you left it fixed. Returning it to the old name, for the
> sake of process, and no other good reason, doesn't appeal to me. It is
> a lesson learned. We move on. Enforce the process next time. Self
> inflicted punishment is somewhat masochistic. :-)

I agree with this sentiment.  It is not like this happens regularly and
we need to punish someone.  Mistakes happen in process, but it is
usually not intentional, meaning fear of punishment isn't effective, or
even desirable.

If it happened regularly by a single individual, that would be a
different story.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Vote needed: revert beta2 changes or not?

2005-10-06 Thread David Fetter
On Thu, Oct 06, 2005 at 09:27:55PM -0400, Tom Lane wrote:
> Just before 8.1beta2 went out, Neil made the following changes:
> 
>   Rename pg_complete_relation_size() to
>   pg_total_relation_size(), for the sake of brevity and clarity.
>   
>   Make pg_reload_conf(), pg_rotate_logfile(), and pg_cancel_backend()
>   return a boolean rather than an integer to indicate success or
>   failure.
> 
> (BTW, this is by no means solely Neil's fault, because both Bruce and I
> encouraged him to proceed.)
> 
> Several people have opined that we ought to revert one or both of these
> changes.  The arguments in favor of reversion are basically
> 
> (a) we failed to follow normal development process.  The names and
> APIs of these functions were already hashed out in long discussions
> months ago, so second-guessing them with relatively little discussion
> is at best impolite.
> 
> (b) pg_cancel_backend() was already in 8.0, and so changing it now
> represents an API break, for which being "a little cleaner" is not
> sufficient justification.
> 
> As against that, changing them back now might just confuse matters even
> more.  And I tend to agree with Neil's judgment that the new definitions
> are cleaner in themselves.
> 
> We need to make a decision before releasing beta3.  We've already forced
> an initdb for beta3, so we can change "for free" now, but it's entirely
> possible that there will be no additional opportunity before 8.1 final.
> 
> Some private discussion among core didn't result in any clear consensus,
> so it seems the best thing to do is throw the matter out for a vote on
> pgsql-hackers.
> 
> The plausible alternatives seem to be:
> 
> 1.  Leave it as-is.

+1, for what it's worth.

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Vote needed: revert beta2 changes or not?

2005-10-06 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

  
> The plausible alternatives seem to be:
>
> 1.  Leave it as-is.

I vote for this. It's not an ideal situation, but the names should
be changed at some point - better now than later, as it reduces the
lifetime of the "bad" names. Put a large warning (and a small apology)
in the release notes.

- --
Greg Sabino Mullane [EMAIL PROTECTED]
PGP Key: 0x14964AC8 200510062202
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-BEGIN PGP SIGNATURE-

iD8DBQFDRdenvJuQZxSWSsgRAniyAJ9hjJBYdGl1PttvZm1VrfR+vPnI1wCeMW/t
u8dv1J8fD4ayUUEFSkhPNrY=
=brzE
-END PGP SIGNATURE-



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Vote needed: revert beta2 changes or not?

2005-10-06 Thread Marc G. Fournier

On Thu, 6 Oct 2005, Tom Lane wrote:


Just before 8.1beta2 went out, Neil made the following changes:

Rename pg_complete_relation_size() to
pg_total_relation_size(), for the sake of brevity and clarity.

Make pg_reload_conf(), pg_rotate_logfile(), and pg_cancel_backend()
return a boolean rather than an integer to indicate success or
failure.

(BTW, this is by no means solely Neil's fault, because both Bruce and I
encouraged him to proceed.)

Several people have opined that we ought to revert one or both of these
changes.  The arguments in favor of reversion are basically

(a) we failed to follow normal development process.  The names and
APIs of these functions were already hashed out in long discussions
months ago, so second-guessing them with relatively little discussion
is at best impolite.

(b) pg_cancel_backend() was already in 8.0, and so changing it now
represents an API break, for which being "a little cleaner" is not
sufficient justification.

As against that, changing them back now might just confuse matters even
more.  And I tend to agree with Neil's judgment that the new definitions
are cleaner in themselves.

We need to make a decision before releasing beta3.  We've already forced
an initdb for beta3, so we can change "for free" now, but it's entirely
possible that there will be no additional opportunity before 8.1 final.

Some private discussion among core didn't result in any clear consensus,
so it seems the best thing to do is throw the matter out for a vote on
pgsql-hackers.

The plausible alternatives seem to be:

1.  Leave it as-is.

2.  Revert the result type of pg_cancel_backend() to int, but leave the
   rest as-is (minimum change to avoid a compatibility break with 8.0).

3.  Revert all three result-type changes, in the name of consistency.

4.  Revert all four changes, on the grounds that we shouldn't allow such
   a violation of process.


I vote for this one, else we are setting a precedent that this sort of 
thing during a beta freeze is acceptable, which it shouldn't be :(




Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Vote needed: revert beta2 changes or not?

2005-10-06 Thread Marc G. Fournier

On Thu, 6 Oct 2005, [EMAIL PROTECTED] wrote:


I don't get a vote - but I do want to suggest, as a user, that I get
generally annoyed with the presence of interfaces with names that
were chosen for historical reasons, but are maintained only for
compatibility, and either never did, or no longer apply.

I'd rather you left it fixed. Returning it to the old name, for the
sake of process, and no other good reason, doesn't appeal to me. It is
a lesson learned. We move on. Enforce the process next time. Self
inflicted punishment is somewhat masochistic. :-)


If we don't enforce the process this time, why would we enforce it next 
time?


You either always enforce it, or never ... you don't pick and choose 
though ...



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Vote needed: revert beta2 changes or not?

2005-10-06 Thread Rod Taylor
On Thu, 2005-10-06 at 21:27 -0400, Tom Lane wrote:
> Just before 8.1beta2 went out, Neil made the following changes:
> 
>   Rename pg_complete_relation_size() to
>   pg_total_relation_size(), for the sake of brevity and clarity.
>   
>   Make pg_reload_conf(), pg_rotate_logfile(), and pg_cancel_backend()
>   return a boolean rather than an integer to indicate success or
>   failure.

> The plausible alternatives seem to be:
> 
> 1.  Leave it as-is.

+1

I prefer the changes and if the choice is do it now or do them in 8.2
(they are improvements), then I choose to take them now.

-- 


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Vote needed: revert beta2 changes or not?

2005-10-06 Thread mark
I don't get a vote - but I do want to suggest, as a user, that I get
generally annoyed with the presence of interfaces with names that
were chosen for historical reasons, but are maintained only for
compatibility, and either never did, or no longer apply.

I'd rather you left it fixed. Returning it to the old name, for the
sake of process, and no other good reason, doesn't appeal to me. It is
a lesson learned. We move on. Enforce the process next time. Self
inflicted punishment is somewhat masochistic. :-)

Cheers,
mark



On Thu, Oct 06, 2005 at 09:27:55PM -0400, Tom Lane wrote:
> Just before 8.1beta2 went out, Neil made the following changes:
> 
>   Rename pg_complete_relation_size() to
>   pg_total_relation_size(), for the sake of brevity and clarity.
>   
>   Make pg_reload_conf(), pg_rotate_logfile(), and pg_cancel_backend()
>   return a boolean rather than an integer to indicate success or
>   failure.
> 
> (BTW, this is by no means solely Neil's fault, because both Bruce and I
> encouraged him to proceed.)
> 
> Several people have opined that we ought to revert one or both of these
> changes.  The arguments in favor of reversion are basically
> 
> (a) we failed to follow normal development process.  The names and
> APIs of these functions were already hashed out in long discussions
> months ago, so second-guessing them with relatively little discussion
> is at best impolite.
> 
> (b) pg_cancel_backend() was already in 8.0, and so changing it now
> represents an API break, for which being "a little cleaner" is not
> sufficient justification.
> 
> As against that, changing them back now might just confuse matters even
> more.  And I tend to agree with Neil's judgment that the new definitions
> are cleaner in themselves.
> 
> We need to make a decision before releasing beta3.  We've already forced
> an initdb for beta3, so we can change "for free" now, but it's entirely
> possible that there will be no additional opportunity before 8.1 final.
> 
> Some private discussion among core didn't result in any clear consensus,
> so it seems the best thing to do is throw the matter out for a vote on
> pgsql-hackers.
> 
> The plausible alternatives seem to be:
> 
> 1.  Leave it as-is.
> 
> 2.  Revert the result type of pg_cancel_backend() to int, but leave the
> rest as-is (minimum change to avoid a compatibility break with 8.0).
> 
> 3.  Revert all three result-type changes, in the name of consistency.
> 
> 4.  Revert all four changes, on the grounds that we shouldn't allow such
> a violation of process.
> 
> Opinions please?
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match
> 

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Vote needed: revert beta2 changes or not?

2005-10-06 Thread Tom Lane
Just before 8.1beta2 went out, Neil made the following changes:

Rename pg_complete_relation_size() to
pg_total_relation_size(), for the sake of brevity and clarity.

Make pg_reload_conf(), pg_rotate_logfile(), and pg_cancel_backend()
return a boolean rather than an integer to indicate success or
failure.

(BTW, this is by no means solely Neil's fault, because both Bruce and I
encouraged him to proceed.)

Several people have opined that we ought to revert one or both of these
changes.  The arguments in favor of reversion are basically

(a) we failed to follow normal development process.  The names and
APIs of these functions were already hashed out in long discussions
months ago, so second-guessing them with relatively little discussion
is at best impolite.

(b) pg_cancel_backend() was already in 8.0, and so changing it now
represents an API break, for which being "a little cleaner" is not
sufficient justification.

As against that, changing them back now might just confuse matters even
more.  And I tend to agree with Neil's judgment that the new definitions
are cleaner in themselves.

We need to make a decision before releasing beta3.  We've already forced
an initdb for beta3, so we can change "for free" now, but it's entirely
possible that there will be no additional opportunity before 8.1 final.

Some private discussion among core didn't result in any clear consensus,
so it seems the best thing to do is throw the matter out for a vote on
pgsql-hackers.

The plausible alternatives seem to be:

1.  Leave it as-is.

2.  Revert the result type of pg_cancel_backend() to int, but leave the
rest as-is (minimum change to avoid a compatibility break with 8.0).

3.  Revert all three result-type changes, in the name of consistency.

4.  Revert all four changes, on the grounds that we shouldn't allow such
a violation of process.

Opinions please?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] slower merge join on sorted data chosen over nested loop

2005-10-06 Thread Kevin Grittner
In both the 8.1beta2 and using a build from this morning's
dev snapshot, this query ran slower than expected:

select count(*)
  from "DbTranRepository" AS "dtr"
  inner join "DbTranLogRecord" AS "dtlr"
on ("dtlr"."countyNo"   = "dtr"."countyNo"
and "dtlr"."tranImageSeqNo" = "dtr"."tranImageSeqNo"
and "dtlr"."tableName"  = 'Cal'
   )
  where "dtr"."countyNo" = 40
and "dtr"."timestampValue" between '2005-09-30 00:00:00' and '2005-10-01 
00:00:00';

Fresh restore from dump.  ANALYZE run.  No other activity.
Autovacuum was disabled prior to the restore and test.
8GB dual Xeon SuSE 9.3 system hitting RAID 5.


With the OS cache flushed by dd:

 Aggregate  (cost=176218.64..176218.65 rows=1 width=0) (actual 
time=32081.623..32081.626 rows=1 loops=1)
   ->  Merge Join  (cost=175627.39..176218.09 rows=218 width=0) (actual 
time=31774.573..32071.776 rows=3007 loops=1)
 Merge Cond: ("outer"."?column3?" = "inner"."?column3?")
 ->  Sort  (cost=48913.06..48975.72 rows=25067 width=17) (actual 
time=5828.984..5954.706 rows=39690 loops=1)
   Sort Key: (dtr."tranImageSeqNo")::numeric
   ->  Bitmap Heap Scan on "DbTranRepository" dtr  
(cost=297.07..47081.47 rows=25067 width=17) (actual time=69.056..5560.895 
rows=39690 loops=1)
 Recheck Cond: ((("countyNo")::smallint = 40) AND 
(("timestampValue")::timestamp with time zone >= '2005-09-30 
00:00:00-05'::timestamp with time zone) AND (("timestampValue")::timestamp with 
time zone <= '2005-10-01 00:00:00-05'::timestamp with time zone))
 ->  Bitmap Index Scan on "DbTranRepository_timestamp"  
(cost=0.00..297.07 rows=25067 width=0) (actual time=52.675..52.675 rows=39690 
loops=1)
   Index Cond: ((("countyNo")::smallint = 40) AND 
(("timestampValue")::timestamp with time zone >= '2005-09-30 
00:00:00-05'::timestamp with time zone) AND (("timestampValue")::timestamp with 
time zone <= '2005-10-01 00:00:00-05'::timestamp with time zone))
 ->  Sort  (cost=126714.34..126945.94 rows=92640 width=17) (actual 
time=25130.457..25551.291 rows=109605 loops=1)
   Sort Key: (dtlr."tranImageSeqNo")::numeric
   ->  Bitmap Heap Scan on "DbTranLogRecord" dtlr  
(cost=1037.84..119071.84 rows=92640 width=17) (actual time=203.528..23588.122 
rows=114113 loops=1)
 Recheck Cond: ((40 = ("countyNo")::smallint) AND 
(("tableName")::text = 'Cal'::text))
 ->  Bitmap Index Scan on "DbTranLogRecord_tableName"  
(cost=0.00..1037.84 rows=92640 width=0) (actual time=146.412..146.412 
rows=114113 loops=1)
   Index Cond: ((40 = ("countyNo")::smallint) AND 
(("tableName")::text = 'Cal'::text))
 Total runtime: 32091.211 ms


With the OS cache populated by previous run of this plan:

 Aggregate  (cost=176218.64..176218.65 rows=1 width=0) (actual 
time=4510.590..4510.593 rows=1 loops=1)
   ->  Merge Join  (cost=175627.39..176218.09 rows=218 width=0) (actual 
time=4203.451..4500.785 rows=3007 loops=1)
 Merge Cond: ("outer"."?column3?" = "inner"."?column3?")
 ->  Sort  (cost=48913.06..48975.72 rows=25067 width=17) (actual 
time=686.388..812.160 rows=39690 loops=1)
   Sort Key: (dtr."tranImageSeqNo")::numeric
   ->  Bitmap Heap Scan on "DbTranRepository" dtr  
(cost=297.07..47081.47 rows=25067 width=17) (actual time=30.432..427.463 
rows=39690 loops=1)
 Recheck Cond: ((("countyNo")::smallint = 40) AND 
(("timestampValue")::timestamp with time zone >= '2005-09-30 
00:00:00-05'::timestamp with time zone) AND (("timestampValue")::timestamp with 
time zone <= '2005-10-01 00:00:00-05'::timestamp with time zone))
 ->  Bitmap Index Scan on "DbTranRepository_timestamp"  
(cost=0.00..297.07 rows=25067 width=0) (actual time=19.089..19.089 rows=39690 
loops=1)
   Index Cond: ((("countyNo")::smallint = 40) AND 
(("timestampValue")::timestamp with time zone >= '2005-09-30 
00:00:00-05'::timestamp with time zone) AND (("timestampValue")::timestamp with 
time zone <= '2005-10-01 00:00:00-05'::timestamp with time zone))
 ->  Sort  (cost=126714.34..126945.94 rows=92640 width=17) (actual 
time=2705.007..3124.966 rows=109605 loops=1)
   Sort Key: (dtlr."tranImageSeqNo")::numeric
   ->  Bitmap Heap Scan on "DbTranLogRecord" dtlr  
(cost=1037.84..119071.84 rows=92640 width=17) (actual time=96.046..1195.302 
rows=114113 loops=1)
 Recheck Cond: ((40 = ("countyNo")::smallint) AND 
(("tableName")::text = 'Cal'::text))
 ->  Bitmap Index Scan on "DbTranLogRecord_tableName"  
(cost=0.00..1037.84 rows=92640 width=0) (actual time=54.917..54.917 rows=114113 
loops=1)
   Index Cond: ((40 = ("countyNo")::smallint) AND 
(("tableName")::text = 'Cal'::text))
 Total runtime: 4520.130 ms


Without the option to use merge jo

Re: [HACKERS] COPY FROM with CSV header

2005-10-06 Thread Andrew Dunstan

Jim C. Nasby wrote:


Instead of ignoring the first line of a COPY FROM ... WITH CSV HEADER,
what about allowing the first line to be used as a list of field names?
This means you wouldn't have to include field order in the COPY command
if the names matched field names in the table.
 

It was debated ... could be dangerous, and you'd need extra syntax, 
since we couldn't make it the default.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] prefix btree implementation

2005-10-06 Thread Jim C. Nasby
On Wed, Oct 05, 2005 at 03:40:43PM -0700, Qingqing Zhou wrote:
> We do the prefix sharing when we build up index only, never on the fly.

So are you saying that inserts of new data wouldn't make any use of
this? ISTM that greatly reduces the usefulness, though I'm not objecting
because compression during build is probably better than none at all. Is
there a technical reason compression can't be used during normal
operations?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-06 Thread Martijn van Oosterhout
On Thu, Oct 06, 2005 at 04:25:11PM -0400, Tom Lane wrote:
> Martijn van Oosterhout  writes:
> > Are we awfully worried about people still using 2.0 kernels? And it
> > would replace two calls with three in the worst case, we currently
> > lseek before every read.
> 
> That's utterly false.

Oops, you're right. I usually strace during a vacuum or a large query
and my screen fills up with:

lseek()
read()
lseek()
read()
...

So didn't wonder if the straight sequential read was optimised. Still,
I think pread() would be a worthwhile improvement, at least for Linux.

-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


pgpu7YI0K0Ugf.pgp
Description: PGP signature


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-06 Thread Martijn van Oosterhout
On Thu, Oct 06, 2005 at 03:57:38PM -0400, Tom Lane wrote:
> Martijn van Oosterhout  writes:
> > Indeed, one of the things on my list is to remove all the lseeks in
> > favour of pread. Halving the number of kernel calls has got to be worth
> > something right? Portability is an issue ofcourse...
> 
> Being sure that it's not a pessimization is another issue.  I note that
> glibc will emulate these functions if the kernel doesn't have them;
> which means you could be replacing one kernel call with three.

From the linux pread manpage:

HISTORY
   The pread and pwrite system calls were added to Linux in version
   2.1.60; the entries in the i386 system call table were added in
   2.1.69.  The libc support (including emulation on older kernels
   without the system calls) was added in glibc 2.1.

Are we awfully worried about people still using 2.0 kernels? And it
would replace two calls with three in the worst case, we currently
lseek before every read.

I don't know about other OSes.
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


pgpxa4MjoQpmJ.pgp
Description: PGP signature


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-06 Thread Alvaro Herrera
On Thu, Oct 06, 2005 at 03:57:38PM -0400, Tom Lane wrote:
> Martijn van Oosterhout  writes:
> > Indeed, one of the things on my list is to remove all the lseeks in
> > favour of pread. Halving the number of kernel calls has got to be worth
> > something right? Portability is an issue ofcourse...
> 
> Being sure that it's not a pessimization is another issue.  I note that
> glibc will emulate these functions if the kernel doesn't have them;
> which means you could be replacing one kernel call with three.
> 
> And I don't think autoconf has any way to determine whether a libc
> function represents a native kernel call or not ...

The problem kernels would be Linux 2.0, which I very much doubt is going
to be present in to-be-deployed database servers.

Unless someone runs glibc on top of some other kernel, I guess.  Is this
a common scenario?  I've never seen it.

-- 
Alvaro Herrera  http://www.amazon.com/gp/registry/DXLWNGRJD34
Oh, oh, las chicas galacianas, lo harán por las perlas,
¡Y las de Arrakis por el agua! Pero si buscas damas
Que se consuman como llamas, ¡Prueba una hija de Caladan! (Gurney Halleck)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-06 Thread Tom Lane
Martijn van Oosterhout  writes:
> Are we awfully worried about people still using 2.0 kernels? And it
> would replace two calls with three in the worst case, we currently
> lseek before every read.

That's utterly false.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] fixing LISTEN/NOTIFY

2005-10-06 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> I like your suggestion of "topic" for the notify name, and am tempted to
> go fix the documentation to use that term right now ...

Fwiw, I think the more conventional word here would be "channel". 
But whatever.

-- 
greg


---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] Error messages

2005-10-06 Thread smile khmer
Dear all,

I made some change ot the function in btcompare.c and compiling and 
installation is success, but when I start to run with 

/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data

I've got this error message : creating template1 database in 
/usr/local/pgsql/data/base/1 ... FATAL:  duplicate key violates unique 
constraint "pg_type_typname_nsp_index"

What I've changed is : in function btnamecmp(), I've converted the arg0 and 
arg1 into lowercase and call strncmp.

regards,
Smile
 

-- 
___
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] LDAP Authentication?

2005-10-06 Thread Magnus Hagander
People,


After writing dblink-ldap (http://pgfoundry.org/projects/dblink-ldap),
several people have contacted me asking if this will give LDAP
authentication to PostgreSQL, because they need this. And this is before
I've even released it, so apparantly there are a lot of people who want
this.

You can do this today using PAM authenication, but this is not always
possible. Notably it's never possible on Windows, and there are several
unix platforms/distros that don't support it without a lot of work.

It should be fairly easy to write a LDAP "backend" to password
authentication using openldap, winldap or whatever ldap library is
available.

Before I start working on anything I'd like to check if a patch for this
would be accepted?


//Magnus

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-06 Thread Tom Lane
Martijn van Oosterhout  writes:
> Indeed, one of the things on my list is to remove all the lseeks in
> favour of pread. Halving the number of kernel calls has got to be worth
> something right? Portability is an issue ofcourse...

Being sure that it's not a pessimization is another issue.  I note that
glibc will emulate these functions if the kernel doesn't have them;
which means you could be replacing one kernel call with three.

And I don't think autoconf has any way to determine whether a libc
function represents a native kernel call or not ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] comments on prepared transactions ...

2005-10-06 Thread Tom Lane
=?ISO-8859-1?Q?Hans-J=FCrgen_Sch=F6nig?= <[EMAIL PROTECTED]> writes:
> in oracle it is possible to comment transactions:

> COMMIT COMMENT 'ORA-2PC-CRASH-TEST-n';

> if we added the possibility to comment prepared transactions it would be 
> far easier for DBAs to find out what to do with prepared transactions 
> once something has gone wrong (at least if an application adds some 
> useful data to the comment). usually when the DBA has to fix something 
> it has to be done FAST - some additional info would definitely help here ...

You can do that already.  All you have to do is make sure your prepared
transaction identifiers are somewhat human readable.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] comments on prepared transactions ...

2005-10-06 Thread Simon Riggs
On Thu, 2005-10-06 at 19:13 +0200, Hans-Jürgen Schönig wrote:
> i had to deal with oracle in the past couple of days (*mega sigh*)
> i have seen a very interesting feature which would make sense for 
> PostgreSQL users.
> 
> currently we have:
> 
> test=# \h PREPARE TRANSACTION
> Command: PREPARE TRANSACTION
> Description: prepare the current transaction for two-phase commit
> Syntax:
> PREPARE TRANSACTION transaction_id
> 
> in oracle it is possible to comment transactions:
> 
> COMMIT COMMENT 'ORA-2PC-CRASH-TEST-n';
> 
> if we added the possibility to comment prepared transactions it would be 
> far easier for DBAs to find out what to do with prepared transactions 
> once something has gone wrong (at least if an application adds some 
> useful data to the comment). usually when the DBA has to fix something 
> it has to be done FAST - some additional info would definitely help here ...
> 
> what do people think about this?

I like it. It allows fairly easy definition of which exact transaction
you recover to when using PITR.

For 2PC, the difficulty might be to match up the name used on one
resource manager with that used on another. So we should look at how
other resource managers implement that so a match is more easily
possible.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] prefix btree implementation

2005-10-06 Thread Simon Riggs
On Thu, 2005-10-06 at 09:38 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > It might be worth teaching the optimiser that if it has an index on an
> > immutable function that if we have WHERE x = k and a functional index on
> > f(x) then we can access the functional index with 
> > f(x) = f(k), as long as we also reapply the original WHERE clause. 
> 
> As I just pointed out to Gaetano, this is utterly wrong.  We can't
> assume that much about the behavior of equality.

For any function, yes, because you can always construct a function that
violates that. But it seems straightforward to introduce another
declarative form for which it is true, similar to the way that IMMUTABLE
allows us to make other assumptions at parse time. That's only worth it
if you believe that many useful functions follow that rule; I would say
that they do.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-06 Thread Luke Lonergan
Andreas,

On 10/6/05 3:56 AM, "Zeugswetter Andreas DAZ SD" <[EMAIL PROTECTED]>
wrote:

> pg relys on the OS readahead (== larger block IO) to do efficient IO.
> Basically the pg scan performance should match a dd if=file of=/dev/null
> bs=8k,
> unless CPU bound.

Which it is.  Postgres will currently do a maximum of 120MB/s scan rate on
modern CPUs, no matter how fast the underlying I/O subsystem is.  Several
people on this list have confirmed that by posting 100MB/s numbers on faster
subsystems.  Others have posted that Oracle/DB2/etc run at near disk rates.
IMO, most of the problem is that there are too many small (tuple, single
block) operations in the I/O path and they need to be optimized out.

The first step is recognition that there is a problem worth solving, and it
seems that there are now many who have recognized the issue of poor I/O path
optimization in Postgres.  I think a prioritized short list might look like
this:

- Sort performance
  - The implementation of a possibly fine algorithm yields very poor use of
the underlying hardware.  It needs to be optimized to remove redundant I/O
operations and possibly (with different algorithm?) to make better use of
available RAM.  This would be practice for the next item on the list:

- Scan performance
  - Someone needs to trace the life of a tuple through the executor and
implement better buffering at each stage of the process.  There was a start
at this, but it only buffered in one node.  Yes, I know we have a shared
memory buffer, but it acts as a final destination, this is about the path
through the executor.
  - I suspect that after profiling (with system routines included), we will
find much more tuple-scale work that is interfering with the optimal flow of
tuples through the executor.  It's unlikely that the problem is isolated to
the COUNT() aggregate, but is present in many other nodes.  Redundant
allocation/free of memory on a per tuple basis should be replaced with more
persistent buffers on a page or larger basis.
  - Ultimately, after working through the above two issues, we will reach a
point of diminishing returns where async I/O is needed to be able to remove
producer/consumer problems in a single threaded executor. This can be
implemented using sliced processes or threading, or by using aio.  I believe
that even without async I/O, we should be seeing scan rates of 200-400 MB/s
on modern CPUs for the simple COUNT() aggregation pattern though.

FWIW,

- Luke 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] comments on prepared transactions ...

2005-10-06 Thread Hans-Jürgen Schönig

i had to deal with oracle in the past couple of days (*mega sigh*)
i have seen a very interesting feature which would make sense for 
PostgreSQL users.


currently we have:

test=# \h PREPARE TRANSACTION
Command: PREPARE TRANSACTION
Description: prepare the current transaction for two-phase commit
Syntax:
PREPARE TRANSACTION transaction_id

in oracle it is possible to comment transactions:

COMMIT COMMENT 'ORA-2PC-CRASH-TEST-n';

if we added the possibility to comment prepared transactions it would be 
far easier for DBAs to find out what to do with prepared transactions 
once something has gone wrong (at least if an application adds some 
useful data to the comment). usually when the DBA has to fix something 
it has to be done FAST - some additional info would definitely help here ...


what do people think about this?

best regards,

hans


--
Cybertec Geschwinde & Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-06 Thread Martijn van Oosterhout
On Wed, Oct 05, 2005 at 07:54:15PM -0400, Ron Peacetree wrote:
> I asked some questions about physical layout and format translation
> overhead being possibly suboptimal that seemed to be agreed to, but
> specifics as to where we are taking the hit don't seem to have been
> made explicit yet.

This hit is easy to see and terribly hard to do anything about at the
same time. Any single row in a table stores its values but the offsets
arn't constant. If a field is NULL, it is skipped. If a field is
variable length, you have to look at the length before you can jump
over to the next value.

If you have no NULLs and no variable length fields, then you can
optimise access. This is already done and it's hard to see how you
could improve it further. To cut costs, many places use
heap_deform_tuple and similar routines so that the costs are reduced,
but they're still there.

Upping the data transfer rate from disk is a worthy goal, just some
people beleive it is of lower priority than improving CPU usage.

> We were also told that evidently we are CPU bound far before one
> would naively expect to be based on the performance specifications
> of the components involved.

As someone pointed out, calls to the C library are not counted
seperately, making it harder to see if we're overcalling some of them.
Pinpointing the performance bottleneck is hard work.

> Now I've asked for the quickest path to detailed understanding of the
> pg IO subsystem.  The goal being to get more up to speed on its
> coding details.  Certainly not to annoy you or anyone else.

Well, the work is all in storage/smgr and storage/file. It's not
terribly complicated, it just sometimes takes a while to understand
*why* it is done this way.

Indeed, one of the things on my list is to remove all the lseeks in
favour of pread. Halving the number of kernel calls has got to be worth
something right? Portability is an issue ofcourse...

But it's been a productive thread, absolutly. Progress has been made...

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


pgp5wHEqj2N70.pgp
Description: PGP signature


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-06 Thread Josh Berkus

Andreas,


pg relys on the OS readahead (== larger block IO) to do efficient IO.
Basically the pg scan performance should match a dd if=file of=/dev/null
bs=8k,
unless CPU bound.


FWIW, we could improve performance by creating larger write blocks when 
appropriate, particularly on Unixes like Solaris.  But that's a bad 
effort/result tradeoff for most OSes, so it's not the route I'd be 
suggesting for general scans.


However, the external sort code could possibly be improved by more 
appropriate block sizing, which I think someone has already suggested.


--Josh


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] PG function call

2005-10-06 Thread Martijn van Oosterhout
On Thu, Oct 06, 2005 at 10:01:55AM -0500, smile khmer wrote:
> but when I write the output to file (not standard out put), it won't finish, 
> so I interupted and
> there're more than 50.000 lines,...

What did you expect? PostgreSQL uses indexes for everything from
looking up functions to finding tables, types, operators, etc. The
compare function will be called for eachtime it wants to do a
comparison in the index, which could be many, many times. I had a trace
that called btcompare 79 million times in a single query.

If you want to know how it got there, I suggest you attach a debugger
to the backend, put a breakpoint there and use "bt" to get a backtrace
to see where it came from.

> and during the compiling process, there's alot of messages like these :
> .
> tsort: -: input contains a loop:

Ignore that, not relevent.

> then, if I want to change somepart of the code or write one function
> which will tell postgresql to call this function whenever It compares
> the string or character data type. I've found "citext" project but I
> still don't understand where exactly to modified code.

Once you have created the functions and compiled them, you use CREATE
FUNCTION so the backend knows about them...
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


pgpXNCVJp5qt8.pgp
Description: PGP signature


Re: [HACKERS] execution of nested loop joins

2005-10-06 Thread Martijn van Oosterhout
On Thu, Oct 06, 2005 at 09:14:02PM +0530, Esha Palta wrote:
> ExecQual evaluates join conditions one at a time.It captures one 
> condition and passes it to function ExecEvalExpr which is actually a 
> macro that invokes another function evalfunc(a method of ExprState 
> structure).

It's not a "method" of the ExprState structure in the way object
oriented people might think. It's a function pointer that is set to the
function PostgreSQL wants to use to evaluate the expression. It's of
type:

typedef Datum (*ExprStateEvalFunc) (ExprState *expression,
ExprContext *econtext,
bool *isNull,
ExprDoneCond *isDone);

It's more like a virtual method where whoever created the structure
decides which method to use. the actual function called will probably
be one of the ones in backend/executor/execQual.c. Object-orientation
for C.

> I am not getting implementation and use of this evalfunc function. Is 
> this function used to evaluate the join condition or not. If yes, then 
> how it does this.

It does, using one of the defined expression evaluation functions.

Hope this helps,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


pgpWYkvOdKFp4.pgp
Description: PGP signature


[HACKERS] execution of nested loop joins

2005-10-06 Thread Esha Palta

Hi all,

nodeNestloop.c executes nested loop joins. After getting a pair of inner 
and outer it test the inner and outer tuples to see if they satisfy the 
node's qualification using function ExecQual(joinqual, econtext, false).


ExecQual evaluates join conditions one at a time.It captures one 
condition and passes it to function ExecEvalExpr which is actually a 
macro that invokes another function evalfunc(a method of ExprState 
structure).


I am not getting implementation and use of this evalfunc function. Is 
this function used to evaluate the join condition or not. If yes, then 
how it does this.


thanx
esha


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] PG function call

2005-10-06 Thread smile khmer

- Original Message -
From: "Alvaro Herrera" <[EMAIL PROTECTED]>
To: "smile khmer" <[EMAIL PROTECTED]>
Subject: Re: [HACKERS] PG function call
Date: Thu, 6 Oct 2005 10:30:37 -0400

> 
> On Thu, Oct 06, 2005 at 09:06:59AM -0500, smile khmer wrote:
> > Dear all,
> >
> > Does anyone know how index searching work in PG. I've explored 
> > the source code of PG, for btree, for searching, it will call the 
> > functions in file btcompare.c. As I've made a printf in the 
> > functions of the file btcompare.c. When I compile and run PG, it 
> > get into loop,. the messages are printed and printed .
> 
> Not a loop -- the function may just be called a lot of times.

but when I write the output to file (not standard out put), it won't finish, so 
I interupted and
there're more than 50.000 lines,...
and during the compiling process, there's alot of messages like these :
.
tsort: -: input contains a loop:
tsort: fe-misc.o
tsort: fe-exec.o
tsort: -: input contains a loop:
tsort: fe-misc.o
tsort: fe-exec.o
tsort: -: input contains a loop:
tsort: fe-misc.o
tsort: fe-exec.o
...
.

> 
> > I want to know how the function called happen in Postgres, it 
> > seems that it doesn't call the function directly !
> 
> The code gets function pointers through the catalogs, so they will be
> called even if you don't see a direct function call.  At some time the
> code will ask for the equality function for such-and-such datatype, then
> call it.
>

then, if I want to change somepart of the code or write one function which will 
tell postgresql to call this function whenever It compares the string or 
character data type. I've found "citext" project but I still don't understand 
where exactly to modified code.

thanks,
Moni
 
> --
> Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
> "Acepta los honores y aplausos y perderás tu libertad"
> 
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
> 
> http://archives.postgresql.org

-- 
___
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] How PG_FUNCTION_ARG works in PG

2005-10-06 Thread Tom Lane
sandeep satpal <[EMAIL PROTECTED]> writes:
> Whenever a function get called it receive one parameter 
> as
> PG_FUNCTION_ARG ( as in nbtcompare.c )
> I am not getting how it is interpreted and how it is used ??

It's a pointer to a struct containing the actual arguments.
You might find it helpful to read src/backend/utils/fmgr/README
which is old but still accurate.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] How PG_FUNCTION_ARG works in PG

2005-10-06 Thread sandeep satpal


Hi all,

Whenever a function get called it receive one parameter 
as

PG_FUNCTION_ARG ( as in nbtcompare.c )
I am not getting how it is interpreted and how it is used ??

thank u



--
--
| Sandeep Satpal |
| M.Tech Student |
| Lab 212 KReSIT |
--

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] PG function call

2005-10-06 Thread Alvaro Herrera
On Thu, Oct 06, 2005 at 09:06:59AM -0500, smile khmer wrote:
> Dear all,
> 
> Does anyone know how index searching work in PG. I've explored the 
> source code of PG, for btree, for searching, it will call the 
> functions in file btcompare.c. As I've made a printf in the functions 
> of the file btcompare.c. When I compile and run PG, it get into 
> loop,. the messages are printed and printed .

Not a loop -- the function may just be called a lot of times.

> I want to know how the function called happen in Postgres, 
> it seems that it doesn't call the function directly ! 

The code gets function pointers through the catalogs, so they will be
called even if you don't see a direct function call.  At some time the
code will ask for the equality function for such-and-such datatype, then
call it.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
"Acepta los honores y aplausos y perderás tu libertad"

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] fixing LISTEN/NOTIFY

2005-10-06 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> It might make sense to change the semantics so that we never lose a 
> notification, if we're going to implement NOTIFY 'msg', but that's another 
> discussion.

That's pretty much a given --- the ability to pass some payload data in
notifications has been on the TODO list for a very long time.  I don't
think we're going to reimplement listen/notify without adding it.

I like your suggestion of "topic" for the notify name, and am tempted to
go fix the documentation to use that term right now ...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Outer where pushed down

2005-10-06 Thread Gaetano Mendola
Tom Lane wrote:
> Gaetano Mendola <[EMAIL PROTECTED]> writes:
>> CREATE OR REPLACE VIEW v_current_connection AS
>> SELECT ul.id_user
>> FROM   user_login ul,
>>current_connection cc
>> WHERE ul.id_user = cc.id_user;
> 
>> # explain select * from v_current_connection_test where 
>> sp_connected_test(id_user) = FALSE;
> 
>> why postgres doesn't apply that function at table current_connection given 
>> the fact are extimated
>> only 919 vs 27024 rows?
> 
> Because the condition is on a field of the other table.
> 
> You seem to wish that the planner would use "ul.id_user = cc.id_user"
> to decide that "sp_connected_test(ul.id_user)" can be rewritten as
> "sp_connected_test(cc.id_user)", but in general this is not safe.
> The planner has little idea of what the datatype-specific semantics
> of equality are, and none whatsoever what the semantics of your
> function are.  As a real-world example: IEEE-standard floating
> point math considers that +0 and -0 are different bit patterns.
> They compare as equal, but it's very easy to come up with user-defined
> functions that will yield different results for the two inputs.
> So the proposed transformation is definitely unsafe for float8.

And what about to define for each type when this is safe and let
the planner make his best choice ?
Rewriting that view the execution time passed from 4 secs to 1 sec,
that is not bad if the planner can do it autonomously. In this very
example I can decide if it's better expose one column or the other
one but in other cases not...


Regards
Gaetano Mendola


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] PG function call

2005-10-06 Thread smile khmer
Dear all,

Does anyone know how index searching work in PG. I've explored the 
source code of PG, for btree, for searching, it will call the 
functions in file btcompare.c. As I've made a printf in the functions 
of the file btcompare.c. When I compile and run PG, it get into 
loop,. the messages are printed and printed .
I want to know how the function called happen in Postgres, 
it seems that it doesn't call the function directly ! 

I would appreciate your help
regards,

Smile


-- 
___
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] prefix btree implementation

2005-10-06 Thread Martijn van Oosterhout
On Thu, Oct 06, 2005 at 08:53:25AM +0100, Simon Riggs wrote:
> It would be possible to compress on similar values, since we know the
> output of the comparison in the final stage of the sort of the index
> build. That wouldn't need to rely upon anything to do with the datatype,
> since "they are equal" is a fact outside the encapsulation, and is
> arrived at by use of the datatype's own comparison logic.

Well, one thing I would be curious about would be when you index a
multicolumn index, not storing the value of each column in each index
entry? Couldn't there be a btree on the first key, then at the leaf a
pointer to a new btree on the second key, etc.

It would save a lot of space in large multicolumn indexes, no? And
since ctid is an automatic member of each indexkey, it could
automatically remove common key elements.

Codingwise however, it sucks. The whole index_getattr can't happen and
you have remember more state going down. And you probably can't split
the ctid out anyway, because you might end up needing a whole page per
key value then.

Still, I know a large 7 column index where the first three columns
don't change that often and would benefit from this kind of
optimisation.

> It might be worth teaching the optimiser that if it has an index on an
> immutable function that if we have WHERE x = k and a functional index on
> f(x) then we can access the functional index with 
> f(x) = f(k), as long as we also reapply the original WHERE clause. 

Until we have a better idea about how much functions cost we should be
wary of this. Also, consider if you had an index on sign(x). It's
immutable sure, but useless from a planner prespective for the
optimisation you suggest. How would it know? Presumably there may be
statistics on this but still...

Have a ncie day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


pgprEEjIB47Y3.pgp
Description: PGP signature


Re: [HACKERS] fixing LISTEN/NOTIFY

2005-10-06 Thread Heikki Linnakangas
First of all, I'd like to give a name to the thing that a backend listens 
to. The code talks about "listening to a relation", but as the 
comments there point out, the name doesn't actually identify a relation. 
I'm going to call it a topic from now on.


I'd like to point out that we don't really have a notification queue, 
but a notification flag for each topic. We're not constrained by the 
number of notifications, but by the number of topics.


It might make sense to change the semantics so that we never lose a 
notification, if we're going to implement NOTIFY 'msg', but that's another 
discussion.


I've been thinking about the options for shmem data structure. Replacing 
pg_listener in the straightforward way would give us an array:


struct {
  char topicname[NAMEDATALEN];
  int listenerpid;
  int notification;
} listenertable[max_subscriptions];

Where max_subscriptions is the maximum number of active LISTENs.

If we're ready to take the performance hit, we can exploit the fact that 
it's signal extra backends, as long as those backends can figure out that 
it was a false alarm. In fact, we can always signal all backends.


Exploiting that, we could have:

struct {
  char topicname[NAMEDATALEN];
  int subscriptions; /* number of active subscriptions for this topic */
  int notification_counter; /* increase by 1 on each NOTIFY */
} listenertable[max_topics]

NOTIFY increases the notification_counter by one, and signals all 
backends. Every backend keeps a private list of 
(topicname, notification_counter) pairs for the topics it's subscribed 
to, in addition to the shared memory table. The signal handler compares 
the notification_counter in the private list and in the listenertable. If 
they don't match, notify the client. If they match, it was a false alarm.


The shmem requirement of this is
  max_topics * (NAMEDATALEN + sizeof(int) * 2)

If we're not ready to take the performance hit, we can add the list of 
backends to shmem:


struct {
  char topicname[NAMEDATALEN];
  int subscriptions; /* number of active subscriptions for this topic */
  int notification_counter; /* increase by 1 on each NOTIFY */
  int subscribers[max_backends];
} listenertable[max_topics]

and only signal those backends that are in the subscribers array.

The shmem requirement of this is
  max_topics * (NAMEDATALEN + sizeof(int) * (2 + max_backends))


We can also do a tradeoff between shmem usage and unnecessary signals:

struct {
  char topicname[NAMEDATALEN];
  int subscriptions; /* number of active subscriptions for this topic */
  int notification_counter; /* increase by 1 on each NOTIFY */
  int subscriber_cache[cache_size];
} listenertable[max_topics]

Where cache_size can be any number
  max_topics * (NAMEDATALEN + sizeof(int) * (2 + cache_size))

Where cache_size can be anything between 0 and max_backends. If the cache 
gets full, NOTIFY signals all backends. Otherwise, only those that are in 
the cache.



If max_topics is large, a hash table should be used instead of an array of 
structs.


Now that I think of it, using the notification_counter, we *can* guarantee 
that no notification is lost. The signal handler just needs to notify the 
client (shmem notification_counter) - (private notification_counter) 
times.



- Heikki

On Thu, 6 Oct 2005, Neil Conway wrote:


Applications that frequently use LISTEN/NOTIFY can suffer from
performance problems because of the MVCC bloat created by frequent
insertions into pg_listener. A solution to this has been suggested in
the past: rewrite LISTEN/NOTIFY to use shared memory rather than system
catalogs.

The problem is that there is a static amount of shared memory and a
potentially unbounded number of notifications, so we can run out of
memory. There are two ways to solve this: we can do as sinval does and
clear the shared memory queue, then effectively issue a NOTIFY ALL that
awakens all listeners. I don't like this behaviour: it seems ugly to
expose an implementation detail (static sizing of shared memory) to
applications. While a lot of applications are only using LISTEN/NOTIFY
for cache invalidation (and so spurious notifications are just a
performance hit), this behaviour still seems unfortunate to me. Using
NOTIFY ALL also makes NOTIFY 'msg' far less useful, which is a feature
several users have asked for in the past.

I think it would be better to either fail the NOTIFY when there is not
enough shared memory to add a new notification to the queue, or have the
NOTIFY block until shared memory does become available (applications
could of course implement the latter on top of the former by using
savepoints and a loop, either on the client-side or in PL/PgSQL). I
guess we could add an option to NOTIFY to specify how to handle
failures.

A related question is when to add the notification to the shared memory
queue. We don't want the notification to fire until the NOTIFY's
transaction commits, so one alternative would be to delay appending to
the queue until transactio

Re: [HACKERS] fixing LISTEN/NOTIFY

2005-10-06 Thread Alvaro Herrera
On Thu, Oct 06, 2005 at 09:12:58AM -0400, Tom Lane wrote:
> Neil Conway <[EMAIL PROTECTED]> writes:
> > However, I don't really like the idea of blocking the backend for a
> > potentially significant amount of time in a state half-way between
> > "committed" and "ready for the next query".
> 
> I wonder whether we could use something comparable to pg_multixact
> or pg_subtrans, to convert the problem from one of "need to fit in
> fixed amount of memory" to one of "it's on disk with some buffers
> in memory".

The multixact mechanism seems a perfect fit to me (variable length
contents, identifiers produced serially and destroyed in a
not-too-distant future).  In fact I proposed it awhile back.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
"At least to kernel hackers, who really are human, despite occasional
rumors to the contrary" (LWN.net)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] prefix btree implementation

2005-10-06 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> It might be worth teaching the optimiser that if it has an index on an
> immutable function that if we have WHERE x = k and a functional index on
> f(x) then we can access the functional index with 
> f(x) = f(k), as long as we also reapply the original WHERE clause. 

As I just pointed out to Gaetano, this is utterly wrong.  We can't
assume that much about the behavior of equality.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] resource monitor

2005-10-06 Thread MOCHERLA LAKSHMI NARASIMHAM

hi

i would like to know whether the feature for resource control is there or 
not... i mean controlling the allocated resources in pgsql sothat when a 
shortage is going tobe happened.. it automatically prompts the admin to 
kill some of the processes or preempt some of the resources ...


thanku

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [Slony1-general] Re: [HACKERS] Slony RPM issue

2005-10-06 Thread Philip Yarra
On Thu, 6 Oct 2005 05:10 am, elein wrote:
> Generally a short sed (or perl if you like) script will fix
> these up.  But it is really pretty obscure trail for people
> to find the exact problem.

Yeah, it's not that it's hard to fix when you know where to look, but my aim 
is to produce a site installation checklist for other people to follow. The 
simpler it is, the better the chance of it being successful. 

I'd prefer a site installation to be:
- install PostgreSQL RPMs
- unpack our specific postgresql.conf and pg_hba.conf 
- set up database from scripts
- install Slony RPM
- unpack our specific slony config files
- run minimal commands to get Slony going
- profit! (or at least I don't get woken up by phone calls at 3am)

That's why I'm so keen on the RPMs working out of the box, with minimal need 
for post-installation tweaks/fixes. 

Regards, Philip.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-06 Thread Michael Stone

On Wed, Oct 05, 2005 at 04:55:51PM -0700, Luke Lonergan wrote:

You've proven my point completely.  This process is bottlenecked in the CPU.
The only way to improve it would be to optimize the system (libc) functions
like "fread" where it is spending most of it's time.


Or to optimize its IO handling to be more efficient. (E.g., use larger
blocks to reduce the number of syscalls.) 


Mike Stone

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Outer where pushed down

2005-10-06 Thread Tom Lane
Gaetano Mendola <[EMAIL PROTECTED]> writes:
> CREATE OR REPLACE VIEW v_current_connection AS
> SELECT ul.id_user
> FROM   user_login ul,
>current_connection cc
> WHERE ul.id_user = cc.id_user;

> # explain select * from v_current_connection_test where 
> sp_connected_test(id_user) = FALSE;

> why postgres doesn't apply that function at table current_connection given 
> the fact are extimated
> only 919 vs 27024 rows?

Because the condition is on a field of the other table.

You seem to wish that the planner would use "ul.id_user = cc.id_user"
to decide that "sp_connected_test(ul.id_user)" can be rewritten as
"sp_connected_test(cc.id_user)", but in general this is not safe.
The planner has little idea of what the datatype-specific semantics
of equality are, and none whatsoever what the semantics of your
function are.  As a real-world example: IEEE-standard floating
point math considers that +0 and -0 are different bit patterns.
They compare as equal, but it's very easy to come up with user-defined
functions that will yield different results for the two inputs.
So the proposed transformation is definitely unsafe for float8.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] fixing LISTEN/NOTIFY

2005-10-06 Thread Tom Lane
Neil Conway <[EMAIL PROTECTED]> writes:
> However, I don't really like the idea of blocking the backend for a
> potentially significant amount of time in a state half-way between
> "committed" and "ready for the next query".

I wonder whether we could use something comparable to pg_multixact
or pg_subtrans, to convert the problem from one of "need to fit in
fixed amount of memory" to one of "it's on disk with some buffers
in memory".

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] version dependent compilation

2005-10-06 Thread Dave Page
 

> -Original Message-
> From: Peter Eisentraut [mailto:[EMAIL PROTECTED] 
> Sent: 06 October 2005 13:47
> To: Andreas Pflug
> Cc: pgsql-hackers@postgresql.org; Dave Page
> Subject: Re: [HACKERS] version dependent compilation
> 
> Am Donnerstag, 6. Oktober 2005 12:50 schrieb Andreas Pflug:
> > Apparently, there's currently no way to perform conditional 
> compiling
> > dependent on the version of pgsql. Currently we're facing 
> the problem
> > that ParseDateTime changed its parameters between 8.0.3 and 8.0.4,
> > breaking backward compatibility (for good reasons in this case).
> 
> You should write an autoconf test to detect this.

This is for a contrib-style module that uses pgxs.

Regards, Dave

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] version dependent compilation

2005-10-06 Thread Alvaro Herrera
On Thu, Oct 06, 2005 at 10:50:29AM +, Andreas Pflug wrote:
> Apparently, there's currently no way to perform conditional compiling 
> dependent on the version of pgsql. Currently we're facing the problem 
> that ParseDateTime changed its parameters between 8.0.3 and 8.0.4, 
> breaking backward compatibility (for good reasons in this case).

Hmm.  People have reported using CATALOG_VERSION_NO for this purpose,
but evidently it's not enough in this case, because that symbol wasn't
changed between 8.0.3 and 8.0.4.  So there's a case for real version
symbols, apparently.

-- 
Alvaro Herrera   Valdivia, Chile   ICBM: S 39º 49' 17.7", W 73º 14' 26.8"
"Uno combate cuando es necesario... ¡no cuando está de humor!
El humor es para el ganado, o para hacer el amor, o para tocar el
baliset.  No para combatir."  (Gurney Halleck)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] version dependent compilation

2005-10-06 Thread Peter Eisentraut
Am Donnerstag, 6. Oktober 2005 12:50 schrieb Andreas Pflug:
> Apparently, there's currently no way to perform conditional compiling
> dependent on the version of pgsql. Currently we're facing the problem
> that ParseDateTime changed its parameters between 8.0.3 and 8.0.4,
> breaking backward compatibility (for good reasons in this case).

You should write an autoconf test to detect this.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] fixing LISTEN/NOTIFY

2005-10-06 Thread Alvaro Herrera
On Thu, Oct 06, 2005 at 01:32:32AM -0400, Neil Conway wrote:
> On Thu, 2005-06-10 at 01:14 -0400, Tom Lane wrote:
> > The idea of blocking during commit until shmem becomes available might
> > work.  There's some issues here about transaction atomicity, though:
> > how do you guarantee that all or none of your notifies get sent?
> > (Actually, supposing that the notifies ought to be sent post-commit,
> > "all" is the only acceptable answer.  So maybe you just never give up.)
> 
> Yeah, I think that would work. We could also write to shared memory
> before the commit proper, and embed an XID in the message to allow other
> backends to determine when/if to fire the notification.
> 
> However, I don't really like the idea of blocking the backend for a
> potentially significant amount of time in a state half-way between
> "committed" and "ready for the next query".

I don't like the idea of blocking indefinitely.  It means another global
DOS tool for anybody trying to NOTIFY: just do a LISTEN and sit there
doing nothing.

One idea would be to block for a while, with a timeout.  If it expires,
the receiving backend(s) has to copy the notification to local memory
and lets go of the one in shared memory.

-- 
Alvaro Herrerahttp://www.advogato.org/person/alvherre
"I'm always right, but sometimes I'm more right than other times."
  (Linus Torvalds)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] case insensitive joining in case of nested loop joins

2005-10-06 Thread Esha Palta

hi
I want to know which files to be explored so as to do case insensitive 
joining in case of nested loop joins

esha

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] prepared queries in plperl

2005-10-06 Thread Dmitry Karasik
Hi Dmitry!

On 27 Sep 05 at 16:16, "Dmitry" (Dmitry Karasik) wrote to "Andrew Dunstan":

 Andrew> Meanwhile, I will observe that this very desirable feature needs
 Andrew> an interface with spi_fetchrow()

I re-worked the patch ( http://www.karasik.eu.org/misc/plperl.diff ) and
now there's also spi_query_prepared(), that returns a cursor for a
previously prepared query that is to be used with subsequent spi_fetchrow()
calls.

I also modified spi_fetchrow() itself to avoid a couple of memory leaks,
and added spi_cursor_close() for cases where not all rows are fetched.

-- 
Sincerely,
Dmitry Karasik


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Outer where pushed down

2005-10-06 Thread Gaetano Mendola
Hi all,

consider this view:

CREATE OR REPLACE VIEW v_current_connection AS
SELECT ul.id_user
FROM   user_login ul,
   current_connection cc
WHERE ul.id_user = cc.id_user;


And this is the explain on a usage of that view:

# explain select * from v_current_connection_test where 
sp_connected_test(id_user) = FALSE;
QUERY PLAN
--
 Hash Join  (cost=42.79..1325.14 rows=451 width=5)
   Hash Cond: ("outer".id_user = "inner".id_user)
   ->  Seq Scan on user_login ul  (cost=0.00..1142.72 rows=27024 width=4)
 Filter: (sp_connected_test(id_user) = false)
   ->  Hash  (cost=40.49..40.49 rows=919 width=5)
 ->  Index Scan using idx_connected on current_connection cc  
(cost=0.00..40.49 rows=919 width=5)
   Index Cond: (connected = true)
(7 rows)

apart my initial surprise to see that function applied at rows not returned by 
the view
( Tom Lane explained me that the planner is able to push down the outer 
condition )
why postgres doesn't apply that function at table current_connection given the 
fact are extimated
only 919 vs 27024 rows?


redefining the view:

CREATE OR REPLACE VIEW v_current_connection AS
SELECT cc.id_user
FROM   user_login ul,
   current_connection cc
WHERE ul.id_user = cc.id_user;

then I obtain the "desidered" plan.


# explain select * from v_current_connection_test where 
sp_connected_test(id_user ) = FALSE;
QUERY PLAN
--
 Hash Join  (cost=46.23..1193.47 rows=452 width=5)
   Hash Cond: ("outer".id_user = "inner".id_user)
   ->  Seq Scan on user_login ul  (cost=0.00..872.48 rows=54048 width=4)
   ->  Hash  (cost=45.08..45.08 rows=460 width=5)
 ->  Index Scan using idx_connected on current_connection cc  
(cost=0.00..45.08 rows=460 width=5)
   Index Cond: (connected = true)
   Filter: (sp_connected_test(id_user) = false)
(7 rows)



Is not possible in any way push postgres to apply that function to the right 
table ?
Shall I rewrite the views figuring out wich column is better to expose ?



Regards
Gaetano Mendola
















---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-06 Thread Zeugswetter Andreas DAZ SD

> Now I've asked for the quickest path to detailed 
> understanding of the pg IO subsystem.  The goal being to get 
> more up to speed on its coding details.  Certainly not to 
> annoy you or anyone else.

Basically pg does random 8k (compile time blocksize) reads/writes only.
Bitmap and sequential scans read 8k blocks in order.
Only WAL does n x 8k writes with one system call.

pg relys on the OS readahead (== larger block IO) to do efficient IO.
Basically the pg scan performance should match a dd if=file of=/dev/null
bs=8k,
unless CPU bound.

Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] version dependent compilation

2005-10-06 Thread Andreas Pflug
Apparently, there's currently no way to perform conditional compiling 
dependent on the version of pgsql. Currently we're facing the problem 
that ParseDateTime changed its parameters between 8.0.3 and 8.0.4, 
breaking backward compatibility (for good reasons in this case).


IMHO it's quite helpful to have macros like

#define PGSQL_VERSION_MAJOR 8
#define PGSQL_VERSION_MINOR 0
#define PGSQL_VERSION_POINT 4

or whatever naming scheme seems convenient that allow to catch such 
deviations without having to maintain version specific sources manually.

Could be added for  8.1.0, 8.0.5, 7.4.10.

Regards,
Andreas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-06 Thread Hannu Krosing
On K, 2005-10-05 at 19:54 -0400, Ron Peacetree wrote:

> +I made the "from left field" suggestion that perhaps a pg native fs
> format would be worth consideration.  This is a major project, so
> the suggestion was to at least some extent tongue-in-cheek.

This idea is discussed about once a year on hackers. If you are more
interested in this, search the archives :)

-- 
Hannu Krosing <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-06 Thread Hannu Krosing
On K, 2005-10-05 at 13:21 -0400, Ron Peacetree wrote:
> First I wanted to verify that pg's IO rates were inferior to The Competition.
> Now there's at least an indication that someone else has solved similar
> problems.  Existence proofs make some things easier ;-)
> 
> Is there any detailed programmer level architectual doc set for pg?  I know
> "the best doc is the code",

For postgres it is often "best doc's are in the code, in form of
comments."

-- 
Hannu Krosing <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] prefix btree implementation

2005-10-06 Thread Simon Riggs
On Wed, 2005-10-05 at 00:50 -0400, Tom Lane wrote:
> Qingqing Zhou <[EMAIL PROTECTED]> writes:
> > 1/ What types of prefix compression shall we support?
> 
> Given the requirement of datatype independence, this idea seems a
> complete nonstarter to me...

It would be possible to compress on similar values, since we know the
output of the comparison in the final stage of the sort of the index
build. That wouldn't need to rely upon anything to do with the datatype,
since "they are equal" is a fact outside the encapsulation, and is
arrived at by use of the datatype's own comparison logic.

But that isn't prefix compression, just compression.

But why do we want this? Its very easy to work out a data-aware
prefixing or compression scheme and then encapulate that in a function.
The function can then be used in a functional index and the usage hidden
behind a view.

It might be worth teaching the optimiser that if it has an index on an
immutable function that if we have WHERE x = k and a functional index on
f(x) then we can access the functional index with 
f(x) = f(k), as long as we also reapply the original WHERE clause. 

Best Regards, Simon Riggs



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] current_user versus current_role SOLVED

2005-10-06 Thread Pavel Stehule


Hi,

> I used info from current_user for log. about some operations (who, when,
> ..). What I can see, current_user is equal current_role function. I had
> problem with it, because user (if is member of any group role) can 
change
> his identity. example: peter is member of role users. But peter can do 
set
> role to users. From this moment I lost possibility of get identity if 
user,
> because current_user returns users and not peter. I can check it (if 
some
> user changed identity) ~ by exampl. test if role has login privilegia, 
and

> if not I can stop any operations.

I believe you can use session_user for this.

--

it's exactly it what I want.

Thank You
Pavel Stehule

_
Najdete si svou lasku a nove pratele na Match.com. http://www.msn.cz/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match