Re: [HACKERS] EXPLAIN verbose?

2006-05-11 Thread Qingqing Zhou

"Cristiano Duarte" <[EMAIL PROTECTED]> wrote
> Does anyone know how to discover the tables (with schemas or oid) 
> involved
> on a select statement?
> I tried EXPLAIN but it only reports the table names, not schemas or 
> oid.
>

In general, this is a pretty difficult problem - thinking you may do a 
selection on a function which involves some other tables. By EXPLAIN, 
you can only see a function scan node.

Regards,
Qingqing 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Continuing "pgstat update timeout" failures in buildfarm

2006-05-11 Thread Tom Lane
The reliability of that recent pgstat patch is clearly completely
unacceptable.  Please revert it until it's fixed.  I'm tired of
looking at intermittent buildfarm failures.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-11 Thread Jim C. Nasby
On Thu, May 11, 2006 at 06:08:36PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> > I'd hope that wasn't what's happening... is the backend smart enough to
> > know not to fsync anything involved with the temp table?
> 
> The catalog entries required for it have to be fsync'd, unless you enjoy
> putting your entire database at risk (a bad block in pg_class, say,
> would probably take out more than one table).

Yeah, thought about that after sending... :(

> It's interesting to speculate about keeping such catalog entries in
> child tables of pg_class etc that are themselves temp tables.  Resolving
> the apparent circularity of this is left as an exercise for the reader.

Well, since it'd be a system table with a fixed OID there could
presumably be a special case in the recovery code for it, though that's
pretty fugly sounding.

Another alternative would be to support global temp tables... I think
that would handle all the complaints of the OP except for the cost of
analyze. I suspect this would be easier to do than creating a special
type of temp table that used tuplestore instead of the full table
framework, and it'd certainly be more general-purpose.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-11 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> I'd hope that wasn't what's happening... is the backend smart enough to
> know not to fsync anything involved with the temp table?

The catalog entries required for it have to be fsync'd, unless you enjoy
putting your entire database at risk (a bad block in pg_class, say,
would probably take out more than one table).

It's interesting to speculate about keeping such catalog entries in
child tables of pg_class etc that are themselves temp tables.  Resolving
the apparent circularity of this is left as an exercise for the reader.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Compressing table images

2006-05-11 Thread Jim C. Nasby
On Thu, May 11, 2006 at 05:05:26PM -0400, Alvaro Herrera wrote:
> Joshua D. Drake wrote:
> > Brian Hurt wrote:
> > >My apologies if this subject has already been hashed to death, or if 
> > >this is the wrong list, but I was wondering if people had seen this paper:
> > >http://www.cwi.nl/htbin/ins1/publications?request=intabstract&key=ZuHeNeBo:ICDE:06
> > > 
> > >
> > >
> > >Basically it describes a compression algorithm for tables of a 
> > >database.  The huge advantage of  doing this is that it reduced the disk 
> > >traffic by (approximately) a factor of four- at the cost of more CPU 
> > >utilization.
> > >Any thoughts or comments?
> > 
> > I don't know if that is the algorithm we use but PostgreSQL will 
> > compress its data within the table.
> 
> But only in certain very specific cases.  And we compress on a
> per-attribute basis.  Compressing at the page level is pretty much out
> of the question; but compressing at the tuple level I think is doable.
> How much benefit that brings is another matter.  I think we still have
> more use for our limited manpower elsewhere.

Except that I think it would be highly useful to allow users to change
the limits used for both toasting and compressing on a per-table and/or
per-field basis. For example, if you have a varchar(1500) in a table
it's unlikely to ever be large enough to trigger toasting, but if that
field is rarely updated it could be a big win to store it toasted. Of
course you can always create a 'side table' (vertical partitioning), but
all of that framework already exists in the database; we just don't
provide the required knobs. I suspect it wouldn't be that hard to expose
those knobs. In fact, if we could agree on syntax, this is probably a
beginner TODO.

ISTR having this discussion on one of the lists recently, but I can't
find it, and don't see anything in the TODO. Basically, I think we'd
want knobs that say: if this field is over X size, compress it. If it's
over Y size, store it externally. Per-table and per-cluster (ie: GUC)
knobs for that would be damn handy as well.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-11 Thread PFC




> 0.450 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id
DESC
> LIMIT 20
> 0.443 ms ANALYZE tmp
> 0.365 ms SELECT * FROM tmp
> 0.310 ms DROP TABLE tmp
> 32.918 ms COMMIT



The 32 seconds for commit can hardly be catalog related. It seems the
file is
fsynced before it is dropped.


I'd hope that wasn't what's happening... is the backend smart enough to
know not to fsync anything involved with the temp table? ISTM that that
transaction shouldn't actually be creating any WAL traffic at all.
Though on the other hand there's no reason that DROP should be in the
transaction at all; maybe that's gumming things up during the commit.


	I included the DROP to make it clear that the time was spent in  
COMMITting, not in DROPping the table.
	Also, you can't use CREATE TEMP TABLE AS SELECT ... and at the same time  
make it ON COMMIT DROP. You have to CREATE and INSERT.
	With an ON COMMIT DROP temp table, the global timings are the same wether  
or not it is dropped before commit : it is always the COMMIT which takes  
all the milliseconds.


	I still bet on system catalog updates being the main cause of the time  
spent in COMMIT...

(because ANALYZE changes this time)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] hashagg, statistisics and excessive memory allocation

2006-05-11 Thread Jim C. Nasby
On Thu, May 11, 2006 at 08:36:25PM +0200, Stefan Kaltenbrunner wrote:
> Hi!
> 
> on irc somebody complained yesterday that a simple group by on a 25M
> integer row caused his backend to exhaust the 3GB process limit on his
> 32bit built(one a box with 16GB Ram).
> Some testing showed that the planner was seriously underestimating the
> number of distinct rows in the table (with the default statistic target
> it estimated ~150k rows while there are about 19M distinct values) and
> chosing a hashagg for the aggregate.
> uping the statistics target to 1000 improves the estimate to about 5M
> rows which unfortunably is still not enough to cause the planner to
> switch to a groupagg with work_mem set to 256000.
> Some testing seems to indicate that even with perfectly matching stats
> like(8.1.3 here):
> 
> foo=# create table testtable AS select a from generate_series(1,500)
> as a;
> SELECT
> foo=# CREATE INDEX test_idx on testtable (a);
> CREATE INDEX
> foo=# ANALYZE ;
> ANALYZE
> foo=# explain select a,count(*) from testtable group by a;
>QUERY PLAN
> -
>  HashAggregate  (cost=97014.73..159504.51 rows=4999182 width=4)
>->  Seq Scan on testtable  (cost=0.00..72018.82 rows=4999182 width=4)
> (2 rows)
> 
> will use about 2,5x of what work_mem is set too, while that is partly
> expected it seems quite dangerous that one can even with only moderate
> underestimation of the expected resultcount(say 2x or 4x) run a server
> out of memory.

Well, the real problem is that hash operations will happily chew through
all available memory instead of either falling back to something else or
at least going to disk. I thought that was on the TODO to be fixed, but
I don't see it there. I also thought some improvement had been made
there, but maybe it's only in HEAD...?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Compressing table images

2006-05-11 Thread Alvaro Herrera
Joshua D. Drake wrote:
> Brian Hurt wrote:
> >My apologies if this subject has already been hashed to death, or if 
> >this is the wrong list, but I was wondering if people had seen this paper:
> >http://www.cwi.nl/htbin/ins1/publications?request=intabstract&key=ZuHeNeBo:ICDE:06
> > 
> >
> >
> >Basically it describes a compression algorithm for tables of a 
> >database.  The huge advantage of  doing this is that it reduced the disk 
> >traffic by (approximately) a factor of four- at the cost of more CPU 
> >utilization.
> >Any thoughts or comments?
> 
> I don't know if that is the algorithm we use but PostgreSQL will 
> compress its data within the table.

But only in certain very specific cases.  And we compress on a
per-attribute basis.  Compressing at the page level is pretty much out
of the question; but compressing at the tuple level I think is doable.
How much benefit that brings is another matter.  I think we still have
more use for our limited manpower elsewhere.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-11 Thread Jim C. Nasby
On Thu, May 11, 2006 at 09:55:15AM +0200, Zeugswetter Andreas DCP SD wrote:
> 
> > Something else worth considering is not using the normal 
> > catalog methods
> > for storing information about temp tables, but hacking that together
> > would probably be a rather large task.
> 
> But the timings suggest, that it cannot be the catalogs in the worst
> case
> he showed.
> 
> > 0.101 ms BEGIN
> > 1.451 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER
> NOT  
> > NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP
> 
> 1.4 seconds is not great for create table, is that what we expect ?
milliseconds... :) Given the amount of code and locking that it looks
like is involved in creating a table, that might not be unreasonable...

> > 0.450 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id
> DESC  
> > LIMIT 20
> > 0.443 ms ANALYZE tmp
> > 0.365 ms SELECT * FROM tmp
> > 0.310 ms DROP TABLE tmp
> > 32.918 ms COMMIT
> > 
> > CREATING the table is OK, but what happens on COMMIT ? I hear
> the disk  
> > seeking frantically.
> 
> The 32 seconds for commit can hardly be catalog related. It seems the
> file is 
> fsynced before it is dropped.

I'd hope that wasn't what's happening... is the backend smart enough to
know not to fsync anything involved with the temp table? ISTM that that
transaction shouldn't actually be creating any WAL traffic at all.
Though on the other hand there's no reason that DROP should be in the
transaction at all; maybe that's gumming things up during the commit.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] sblock state on FreeBSD 6.1

2006-05-11 Thread Jim C. Nasby
On Thu, May 11, 2006 at 09:50:14PM +0200, Martijn van Oosterhout wrote:
> On Thu, May 11, 2006 at 02:39:14PM -0500, Jim C. Nasby wrote:
> > On Thu, May 11, 2006 at 07:39:23PM +0200, Martijn van Oosterhout wrote:
> > > This is an idle backend waiting for the user.
> > 
> > So why would that be waiting to lock the socket? My understanding is
> > that nothing else should be contending for that socket, no?
> 
> I'm not sure about locks, but it will be blocking on the socket...

Yeah, talking to AndrewSN and some others on IRC they're wondering if
maybe it's running out of network buffers, which could possibly cause
this.

> > > > #0  0x00080137638c in sendto () from /lib/libc.so.6
> > > > #1  0x00535e67 in pgstat_report_tabstat () at pgstat.c:846
> > > 
> > > This definitly the statistics collector, which is something that was
> > > speculated upthread. Do you get a lot of these?
> >  
> > I included everything that was captured, but of course that's only a
> > small sampling. If it's helpful we could probably setup something that
> > would automatically grab stack traces for a larger number of backends
> > and then see how many were in that state.
> 
> If you know the pids you should be able to within gdb just do
> attach/bt/detech. gdb has some redimentary scripting capabilites so you
> might be able to do this fairly quickly.

Yeah, what I was thinking. But now that we've been investigating this
more I'm suspecting this is more a matter of tuning over it being a
possible bug. It turns out we have to be doing over 2000 queries per
second on this dual opteron before the problem will happen...

> > Yeah, my suspicion is that those processes had moved past waiting on the
> > socket lock by the time gdb got to them. Any idea of how you could tell
> > what state (as reported by top) the process was in when gdb stopped it?
> 
> Heh. Attaching to a process has the same effect as sending it a signal.
> Any active system call is aborted and gdb traps it as it goes to
> userspace. So by definition it's in running state when gdb gets it ...

Heh, yeah... oh well.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Compressing table images

2006-05-11 Thread Joshua D. Drake

Brian Hurt wrote:
My apologies if this subject has already been hashed to death, or if 
this is the wrong list, but I was wondering if people had seen this paper:
http://www.cwi.nl/htbin/ins1/publications?request=intabstract&key=ZuHeNeBo:ICDE:06 



Basically it describes a compression algorithm for tables of a 
database.  The huge advantage of  doing this is that it reduced the disk 
traffic by (approximately) a factor of four- at the cost of more CPU 
utilization.

Any thoughts or comments?


I don't know if that is the algorithm we use but PostgreSQL will 
compress its data within the table.


Joshua D. Drake




Brian


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq




--

   === The PostgreSQL Company: Command Prompt, Inc. ===
 Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
 Providing the most comprehensive  PostgreSQL solutions since 1997
http://www.commandprompt.com/



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] BEGIN inside transaction should be an error

2006-05-11 Thread Simon Riggs
On Wed, 2006-05-10 at 21:24 -0400, Tom Lane wrote:
> Martijn van Oosterhout  writes:
> > How do other database deal with this? Either they nest BEGIN/COMMIT or
> > they probably throw an error without aborting the transaction, which is
> > pretty much what we do. Is there a database that actually aborts a
> > whole transaction just for an extraneous begin?
> 
> Probably not.  The SQL99 spec does say (in describing START TRANSACTION,
> which is the standard spelling of BEGIN)
> 
>  1) If a  statement is executed when an
> SQL-transaction is currently active, then an exception condition
> is raised: invalid transaction state - active SQL-transaction.
> 
> *However*, they are almost certainly expecting that that condition only
> causes the START command to be ignored; not that it should bounce the
> whole transaction.  So I think the argument that this is required by
> the spec is a bit off base.

If you interpret the standard that way then the correct behaviour in the
face of *any* exception condition should be *not* abort the transaction.
In PostgreSQL, all exception conditions do abort the transaction, so why
not this one? Why would we special-case this?

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] Compressing table images

2006-05-11 Thread Brian Hurt
My apologies if this subject has already been hashed to death, or if 
this is the wrong list, but I was wondering if people had seen this paper:

http://www.cwi.nl/htbin/ins1/publications?request=intabstract&key=ZuHeNeBo:ICDE:06

Basically it describes a compression algorithm for tables of a 
database.  The huge advantage of  doing this is that it reduced the disk 
traffic by (approximately) a factor of four- at the cost of more CPU 
utilization. 


Any thoughts or comments?

Brian


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] BEGIN inside transaction should be an error

2006-05-11 Thread Jim C. Nasby
On Thu, May 11, 2006 at 08:05:57AM +0200, Tommi Maekitalo wrote:
> I'd vote for breaking broken applications and leave the 
> database-administrator 
> reactivate this currently broken behavior of postgresql via GUC.

+1...

As for whether this should or shouldn't abort the current transaction,
I'd argue that it should. Otherwise it's likely that your first commit
is actually bogus, which means you just hosed yourself.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-11 Thread Jim C. Nasby
On Thu, May 11, 2006 at 08:43:46PM +0200, Martijn van Oosterhout wrote:
> On Thu, May 11, 2006 at 11:35:34AM -0400, Greg Stark wrote:
> > I can say that I've seen plenty of instances where the ability to create
> > temporary tables very quickly with no overhead over the original query would
> > be useful.
> 
> I wonder if this requires what the standard refers to as a global
> temporary table. As I read it (which may be wrong, I find the language
> obtuse), a global temporary table is a temporary table whose structure
> is predefined. So, you'd define it once, updating the catalog only once
> but still get a table that is emptied each startup.
> 
> Ofcourse, it may not be what the standard means, but it still seems
> like a useful idea, to cut down on schema bloat.

IIRC that's the exact syntax Oracle uses:

CREATE GLOBAL TEMPORARY TABLE ...

I always found it a bit odd, since it always seemed to me like a global
temporary table would be one that every backend could read... something
akin to a real table that doesn't worry about fsync or any of that (and
is potentially not backed on disk at all).
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] sblock state on FreeBSD 6.1

2006-05-11 Thread Alvaro Herrera
Jim C. Nasby wrote:

> Yeah, my suspicion is that those processes had moved past waiting on the
> socket lock by the time gdb got to them. Any idea of how you could tell
> what state (as reported by top) the process was in when gdb stopped it?

I think you could send SIGSTOP to all backends at once, then grab the
backtraces with GDB, then restart them.  That way, all backends will
appear to be in the same frozen state, instead of having some in an
"earlier" state than others.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-11 Thread Jim C. Nasby
On Thu, May 11, 2006 at 08:03:19PM +0200, Martijn van Oosterhout wrote:
> On Thu, May 11, 2006 at 12:18:06PM -0500, Jim C. Nasby wrote:
> > > Yes, because there can be more than one active snapshot within a single
> > > transaction (think about volatile functions in particular).
> > 
> > Any documentation on how snapshot's work? They're a big mystery to me.
> > :(
> 
> A snapshot is a particular view on a database. In particular, you have
> to be able to view a version of the database that doesn't have you own
> changes, otherwise an UPDATE would keep updating the same tuple. Also,
> for example, a cursor might see an older version of the database than
> queries being run. I don't know of any particular information about it
> though. Google wasn't that helpful.

Ahh, I'd forgotten that commands sometimes needed to see prior data. But
that's done with cmin/max, right?

In any case, going back to the original thought/question... my point was
that in a single-session table, it should be possible to maintain a
row counter. Worst case, you might have to keep a seperate count for
each CID or XID, but that doesn't seem that unreasonable for a single
backend to do, unless you end up running a heck of a lot of commands.
More importantnly, it seems a lot more feasable to at least know how
many rows there are every time you COMMIT, which means you can
potentially avoid having to ANALYZE.

> > > No; you forgot about subtransactions.
> > 
> > Oh, I thought those were done with cmin and cmax... if that's not what
> > cmin/cmax are for, then what is?
> 
> cmin/cmax are command counters. So in the sequence:
> 
> BEGIN;
> SELECT 1;
> SELECT 2;
> 
> The second query runs as the same transaction ID but a higher command
> ID so it can see the result of the previous query. Subtransactions are
> (AIUI anyway) done by having transactions depend on other transactions.
> When you start a savepoint you start a new transaction ID whose status
> is tied to its top-level transaction ID but can also be individually
> rolledback.

Hmmm, interesting. I would have thought it was tied to CID, but I guess
XID has more of that machinery around to support rollback.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] XLOG_BLCKSZ vs. wal_buffers table

2006-05-11 Thread Mark Wong

Simon Riggs wrote:

On Wed, 2006-05-10 at 09:55 -0700, Mark Wong wrote:

Simon Riggs wrote:

Could you turn full_page_writes = off and do a few more tests? I think
the full page writes is swamping the xlog and masking the performance we
might see for normal small xlog writes.
I'd try XLOG_BLCKSZ = 4096 and 8192 to start with. Thanks.

Ok, got data for XLOG_BLCKXZ at 4096, 8192, and 32768 with
full_page_wirtes = off.  The new data is at the bottom of the page:
http://developer.osdl.org/markw/pgsql/xlog_blcksz.html


Mark,

What do you think is causing the step changes at 30+ and 60+ minutes on
these tests?



I'm at a loss to explain this myself.  The increase in the i/o wait and 
decrease in the response time charts looks backwards to me.  And the 
vmstat charts show that the i/o seems fairly consistent other than 
spikes in i/o chart.  It sort of looks like the operating system stopped 
doing something but I'm not sure how to isolate that any further.


Anyone else have a theory?

Mark

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] sblock state on FreeBSD 6.1

2006-05-11 Thread Martijn van Oosterhout
On Thu, May 11, 2006 at 02:39:14PM -0500, Jim C. Nasby wrote:
> On Thu, May 11, 2006 at 07:39:23PM +0200, Martijn van Oosterhout wrote:
> > This is an idle backend waiting for the user.
> 
> So why would that be waiting to lock the socket? My understanding is
> that nothing else should be contending for that socket, no?

I'm not sure about locks, but it will be blocking on the socket...

> > > #0  0x00080137638c in sendto () from /lib/libc.so.6
> > > #1  0x00535e67 in pgstat_report_tabstat () at pgstat.c:846
> > 
> > This definitly the statistics collector, which is something that was
> > speculated upthread. Do you get a lot of these?
>  
> I included everything that was captured, but of course that's only a
> small sampling. If it's helpful we could probably setup something that
> would automatically grab stack traces for a larger number of backends
> and then see how many were in that state.

If you know the pids you should be able to within gdb just do
attach/bt/detech. gdb has some redimentary scripting capabilites so you
might be able to do this fairly quickly.

> Yeah, my suspicion is that those processes had moved past waiting on the
> socket lock by the time gdb got to them. Any idea of how you could tell
> what state (as reported by top) the process was in when gdb stopped it?

Heh. Attaching to a process has the same effect as sending it a signal.
Any active system call is aborted and gdb traps it as it goes to
userspace. So by definition it's in running state when gdb gets it ...

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] sblock state on FreeBSD 6.1

2006-05-11 Thread Jim C. Nasby
On Thu, May 11, 2006 at 07:39:23PM +0200, Martijn van Oosterhout wrote:
> On Thu, May 11, 2006 at 12:09:56PM -0500, Jim C. Nasby wrote:
> > Unfortunately, I suspect some of these were grabbed after the process
> > had already moved past whatever was holding it in sblock.
> > 
> > Here's the traces that we captured...
> > 
> > Got 2 of these:
> > #0  0x00080135bd2c in recvfrom () from /lib/libc.so.6
> > #1  0x004f9898 in secure_read (port=0x834800, ptr=0x7cebe0,   
> > len=8192) at be-secure.c:320
> 
> This is an idle backend waiting for the user.

So why would that be waiting to lock the socket? My understanding is
that nothing else should be contending for that socket, no?

> > #0  0x00080137638c in sendto () from /lib/libc.so.6
> > #1  0x00535e67 in pgstat_report_tabstat () at pgstat.c:846
> 
> This definitly the statistics collector, which is something that was
> speculated upthread. Do you get a lot of these?
 
I included everything that was captured, but of course that's only a
small sampling. If it's helpful we could probably setup something that
would automatically grab stack traces for a larger number of backends
and then see how many were in that state.

What's interesting is that while we were able to re-create the same
state, this time we didn't see any messages in the log about the
statistics collector filling it's buffer.

BTW, I should point out that the goal is to try and ensure that the
machine doesn't end up in a state where all the CPU is going to system
calls, but I'm suspecting that maybe this is an OS issue and not a
PostgreSQL issue at this point...

> > I suspect the rest of these probably happened after the sblock state had
> > cleared, but here they are anyway in case I'm wrong. Also, I removed the
> > 'incriminating evidence' from the query strings; there wasn't anything
> > unusual about those queries, so I don't think it should matter.
> 
> The rest look like backends going through normal query functions...
> There was one waiting for a lock but that's unlikely to be
> significant...

Yeah, my suspicion is that those processes had moved past waiting on the
socket lock by the time gdb got to them. Any idea of how you could tell
what state (as reported by top) the process was in when gdb stopped it?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] XLOG_BLCKSZ vs. wal_buffers table

2006-05-11 Thread Simon Riggs
On Wed, 2006-05-10 at 09:55 -0700, Mark Wong wrote:
> Simon Riggs wrote:
> > Could you turn full_page_writes = off and do a few more tests? I think
> > the full page writes is swamping the xlog and masking the performance we
> > might see for normal small xlog writes.
> > I'd try XLOG_BLCKSZ = 4096 and 8192 to start with. Thanks.
> 
> Ok, got data for XLOG_BLCKXZ at 4096, 8192, and 32768 with
> full_page_wirtes = off.  The new data is at the bottom of the page:
>   http://developer.osdl.org/markw/pgsql/xlog_blcksz.html

Mark,

What do you think is causing the step changes at 30+ and 60+ minutes on
these tests?

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] hashagg, statistisics and excessive memory allocation

2006-05-11 Thread Josh Berkus
Stefan,

> Some testing showed that the planner was seriously underestimating the
> number of distinct rows in the table (with the default statistic target
> it estimated ~150k rows while there are about 19M distinct values) and
> chosing a hashagg for the aggregate.
> uping the statistics target to 1000 improves the estimate to about 5M
> rows which unfortunably is still not enough to cause the planner to
> switch to a groupagg with work_mem set to 256000.

Well, it's pretty well-known that we need to fix n-distinct estimation.  
But we also need to handle it gracefully if the estimate is still wrong 
and we start using too much memory.  Is there any way we can check how 
much memory the hashagg actually *is* using and spill to disk if it goes 
over work_mem?

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] hashagg, statistisics and excessive memory allocation

2006-05-11 Thread Stefan Kaltenbrunner
Hi!

on irc somebody complained yesterday that a simple group by on a 25M
integer row caused his backend to exhaust the 3GB process limit on his
32bit built(one a box with 16GB Ram).
Some testing showed that the planner was seriously underestimating the
number of distinct rows in the table (with the default statistic target
it estimated ~150k rows while there are about 19M distinct values) and
chosing a hashagg for the aggregate.
uping the statistics target to 1000 improves the estimate to about 5M
rows which unfortunably is still not enough to cause the planner to
switch to a groupagg with work_mem set to 256000.
Some testing seems to indicate that even with perfectly matching stats
like(8.1.3 here):

foo=# create table testtable AS select a from generate_series(1,500)
as a;
SELECT
foo=# CREATE INDEX test_idx on testtable (a);
CREATE INDEX
foo=# ANALYZE ;
ANALYZE
foo=# explain select a,count(*) from testtable group by a;
   QUERY PLAN
-
 HashAggregate  (cost=97014.73..159504.51 rows=4999182 width=4)
   ->  Seq Scan on testtable  (cost=0.00..72018.82 rows=4999182 width=4)
(2 rows)

will use about 2,5x of what work_mem is set too, while that is partly
expected it seems quite dangerous that one can even with only moderate
underestimation of the expected resultcount(say 2x or 4x) run a server
out of memory.


Stefan

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-11 Thread Martijn van Oosterhout
On Thu, May 11, 2006 at 11:35:34AM -0400, Greg Stark wrote:
> I can say that I've seen plenty of instances where the ability to create
> temporary tables very quickly with no overhead over the original query would
> be useful.

I wonder if this requires what the standard refers to as a global
temporary table. As I read it (which may be wrong, I find the language
obtuse), a global temporary table is a temporary table whose structure
is predefined. So, you'd define it once, updating the catalog only once
but still get a table that is emptied each startup.

Ofcourse, it may not be what the standard means, but it still seems
like a useful idea, to cut down on schema bloat.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-11 Thread Martijn van Oosterhout
On Thu, May 11, 2006 at 12:18:06PM -0500, Jim C. Nasby wrote:
> > Yes, because there can be more than one active snapshot within a single
> > transaction (think about volatile functions in particular).
> 
> Any documentation on how snapshot's work? They're a big mystery to me.
> :(

A snapshot is a particular view on a database. In particular, you have
to be able to view a version of the database that doesn't have you own
changes, otherwise an UPDATE would keep updating the same tuple. Also,
for example, a cursor might see an older version of the database than
queries being run. I don't know of any particular information about it
though. Google wasn't that helpful.

> > No; you forgot about subtransactions.
> 
> Oh, I thought those were done with cmin and cmax... if that's not what
> cmin/cmax are for, then what is?

cmin/cmax are command counters. So in the sequence:

BEGIN;
SELECT 1;
SELECT 2;

The second query runs as the same transaction ID but a higher command
ID so it can see the result of the previous query. Subtransactions are
(AIUI anyway) done by having transactions depend on other transactions.
When you start a savepoint you start a new transaction ID whose status
is tied to its top-level transaction ID but can also be individually
rolledback.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-11 Thread Scott Marlowe
On Thu, 2006-05-11 at 12:18, Jim C. Nasby wrote:
> On Wed, May 10, 2006 at 08:31:54PM -0400, Tom Lane wrote:
> > "Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> > > On Tue, May 09, 2006 at 03:13:01PM -0400, Tom Lane wrote:
> > >> PFC <[EMAIL PROTECTED]> writes:
> > >>> Fun thing is, the rowcount from a temp table (which is the problem 
> > >>> here)  
> > >>> should be available without ANALYZE ; as the temp table is not 
> > >>> concurrent,  
> > >>> it would be simple to inc/decrement a counter on INSERT/DELETE...
> > >> 
> > >> No, because MVCC rules still apply.
> > 
> > > But can anything ever see more than one version of what's in the table?
> > 
> > Yes, because there can be more than one active snapshot within a single
> > transaction (think about volatile functions in particular).
> 
> Any documentation on how snapshot's work? They're a big mystery to me.
> :(

http://www.postgresql.org/docs/8.1/interactive/mvcc.html

Does the concurrency doc not cover this subject well enough (I'm not
being sarcastic, it's a real question)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] sblock state on FreeBSD 6.1

2006-05-11 Thread Martijn van Oosterhout
On Thu, May 11, 2006 at 12:09:56PM -0500, Jim C. Nasby wrote:
> Unfortunately, I suspect some of these were grabbed after the process
> had already moved past whatever was holding it in sblock.
> 
> Here's the traces that we captured...
> 
> Got 2 of these:
> #0  0x00080135bd2c in recvfrom () from /lib/libc.so.6
> #1  0x004f9898 in secure_read (port=0x834800, ptr=0x7cebe0,   
> len=8192) at be-secure.c:320

This is an idle backend waiting for the user.

> #0  0x00080137638c in sendto () from /lib/libc.so.6
> #1  0x00535e67 in pgstat_report_tabstat () at pgstat.c:846

This definitly the statistics collector, which is something that was
speculated upthread. Do you get a lot of these?

> I suspect the rest of these probably happened after the sblock state had
> cleared, but here they are anyway in case I'm wrong. Also, I removed the
> 'incriminating evidence' from the query strings; there wasn't anything
> unusual about those queries, so I don't think it should matter.

The rest look like backends going through normal query functions...
There was one waiting for a lock but that's unlikely to be
significant...

Hope this helps,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-11 Thread Jim C. Nasby
On Wed, May 10, 2006 at 08:31:54PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> > On Tue, May 09, 2006 at 03:13:01PM -0400, Tom Lane wrote:
> >> PFC <[EMAIL PROTECTED]> writes:
> >>> Fun thing is, the rowcount from a temp table (which is the problem here)  
> >>> should be available without ANALYZE ; as the temp table is not 
> >>> concurrent,  
> >>> it would be simple to inc/decrement a counter on INSERT/DELETE...
> >> 
> >> No, because MVCC rules still apply.
> 
> > But can anything ever see more than one version of what's in the table?
> 
> Yes, because there can be more than one active snapshot within a single
> transaction (think about volatile functions in particular).

Any documentation on how snapshot's work? They're a big mystery to me.
:(

> > Speaking of which, if a temp table is defined as ON COMMIT DROP or
> > DELETE ROWS, there shouldn't be any need to store xmin/xmax, only
> > cmin/cmax, correct?
> 
> No; you forgot about subtransactions.

Oh, I thought those were done with cmin and cmax... if that's not what
cmin/cmax are for, then what is?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] sblock state on FreeBSD 6.1

2006-05-11 Thread Jim C. Nasby
On Wed, May 10, 2006 at 05:23:29PM -0500, Jim C. Nasby wrote:
> On Thu, May 11, 2006 at 12:09:32AM +0200, Martijn van Oosterhout wrote:
> > > > This stack trace doesn't make any sense.  ClosePostmasterPorts is not
> > > > calling PostgresMain.  And pq_getbyte is not calling TouchSocketFile,
> > > > which in turn isn't calling secure_read.
> > > 
> > > So I see... that's rather disturbing... any idea why gdb would end up
> > > that confused?
> > 
> > Given you don't have debug enabled, it's likely all static symbols have
> > been dropped from the symbol table and gdb is guessing by listing the
> > function with the highest address before the actual function.
> > 
> > You could try to find a consistant call tree where the functions call
> > eachother in the right order. Or just compile with debug symbols
> > enabled. It doesn't make a difference in performance or memory usage,
> > only diskspace (about 16MB on my machine, probably more on yours).
> 
> Yeah, I was actually surprised when gdb spit out real function names and
> not just gibberish since I assumed that PostgreSQL wasn't compiled with
> debug symbols. I thought my assumption was wrong when that stuff came
> out, but I guess that was a bad guess...
> 
> I'll recompile and try again... thanks!

Ok, got some good traces this time, with debug. Even better, by upping
the number of pgpool connections to 395, we were able to exactly
duplicate the problem we saw earlier in production (I know, who'd think
a dual opteron would have trouble servicing 400 simultaneous connections
at once? ;)

Unfortunately, I suspect some of these were grabbed after the process
had already moved past whatever was holding it in sblock.

Here's the traces that we captured...

Got 2 of these:
#0  0x00080135bd2c in recvfrom () from /lib/libc.so.6
#1  0x004f9898 in secure_read (port=0x834800, ptr=0x7cebe0,  
len=8192) at be-secure.c:320
#2  0x004fed7b in pq_recvbuf () at pqcomm.c:697
#3  0x004fee27 in pq_getbyte () at pqcomm.c:738
#4  0x0055febf in PostgresMain (argc=4, argv=0x83b828,  
username=0x83b740 "www") at postgres.c:289
#5  0x0053a487 in ServerLoop () at postmaster.c:2851
#6  0x0053bab7 in PostmasterMain (argc=3,  
argv=0x7fffebb0) at postmaster.c:941
#7  0x00500436 in main (argc=3, argv=0x7fffebb0) at  
main.c:265

#0  0x00080137638c in sendto () from /lib/libc.so.6
#1  0x00535e67 in pgstat_report_tabstat () at pgstat.c:846
#2  0x0055ff75 in PostgresMain (argc=4, argv=0x83b828,  
username=0x83b740 "www") at postgres.c:3162
#3  0x0053a487 in ServerLoop () at postmaster.c:2851
#4  0x0053bab7 in PostmasterMain (argc=3,  
argv=0x7fffebb0) at postmaster.c:941
#5  0x00500436 in main (argc=3, argv=0x7fffebb0) at  
main.c:265


I suspect the rest of these probably happened after the sblock state had
cleared, but here they are anyway in case I'm wrong. Also, I removed the
'incriminating evidence' from the query strings; there wasn't anything
unusual about those queries, so I don't think it should matter.

#0  0x00080138fcec in __syscall () from /lib/libc.so.6
#1  0x0054e67f in FileSeek (file=199, offset=0, whence=2) at  
fd.c:1173
#2  0x0055aca3 in mdnblocks (reln=0x9f68c0) at md.c:972
#3  0x0055bb4a in smgrnblocks (reln=0x9f68c0) at smgr.c:571
#4  0x0052fbde in get_relation_info (relationObjectId=199,  
rel=0xad3728) at plancat.c:206
#5  0x00531041 in make_reloptinfo (root=0xac13e0, relid=6,  
reloptkind=RELOPT_BASEREL) at relnode.c:146
#6  0x00521ed3 in add_base_rels_to_query (root=0xac13e0,  
jtnode=0xad2600) at initsplan.c:88
#7  0x00521e93 in add_base_rels_to_query (root=0xac13e0,  
jtnode=0xac9cd8) at initsplan.c:94
#8  0x00521ed3 in add_base_rels_to_query (root=0xac13e0,  
jtnode=0xad2710) at initsplan.c:88
#9  0x005238cf in query_planner (root=0xac13e0,  
tlist=0xace648, tuple_fraction=10, cheapest_path=0x7fffd308,
 sorted_path=0x7fffd310, num_groups=0x7fffd318) at  
planmain.c:141
#10 0x005240cb in grouping_planner (root=0xac13e0,  
tuple_fraction=10) at planner.c:796
#11 0x005252c2 in subquery_planner (parse=0x9f0140,  
tuple_fraction=0, subquery_pathkeys=0x0) at planner.c:374
#12 0x00525423 in planner (parse=0x9f0140, isCursor=0 '\0',  
cursorOptions=0, boundParams=0x0) at planner.c:134
#13 0x0055ddec in pg_plan_query (querytree=0x9f0140,  
boundParams=0x0) at postgres.c:725
#14 0x0055de7c in pg_plan_queries (querytrees=0xc7,  
boundParams=0x0, needSnapshot=0 '\0') at postgres.c:793
#15 0x0055e986 in exec_simple_query (
 query_string=0x9f0250 ""...) at postgres.c:944
#16 0x0056031e in PostgresMain (argc=4, argv=0x83b828,  
username=0x83b740 "www") at postgres.c:3217
#17 0x0053a487 in ServerLoop () at postmaster.c:2851
#18 0x0053bab7 in PostmasterMain (argc=3,  
argv=0x7fffebb0) at p

Re: [HACKERS] Bug in signal handler

2006-05-11 Thread Martijn van Oosterhout
On Thu, May 11, 2006 at 10:11:00AM -0400, Tom Lane wrote:
> Douglas McNaught <[EMAIL PROTECTED]> writes:
> > I don't disagree with your reasoning, but does POSIX actually say
> > this?
> 
> The fact remains that the postmaster has *always* been coded like that,
> and we have *never* seen any problems.  Barring proof that there is a
> problem, I'm uninterested in rewriting it just because someone doesn't
> like it.

It should probably also be remembered that the "fix" would involve either
polling the status by having select() return more often, or using
sigsetjmp/siglongjmp. The cure is definitly worse than the disease.

In a sense the test for errno == EINTR there is redundant since the
backend has arranged that EINTR can never be returned (signals don't
interrupt system calls) and there's a fair bit of code that relies on
that...

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-11 Thread Greg Stark

"Jim C. Nasby" <[EMAIL PROTECTED]> writes:

> Perhaps it would be worth creating a class of temporary tables that used
> a tuplestore, although that would greatly limit what could be done with
> that temp table.

I can say that I've seen plenty of instances where the ability to create
temporary tables very quickly with no overhead over the original query would
be useful.

For instance, in one site I had to do exactly what I always advise others
against: use offset/limit to implement paging. So first I have to execute the
query with a count(*) aggregate to get the total, then execute the same query
a second time to fetch the actual page of interest. This would be (or could be
arranged to be) within the same transaction and doesn't require the ability to
execute any dml against the tuple store which I imagine would be the main
issues?

For bonus points what would be real neat would be if the database could notice
shared plan segments, keep around the materialized tuple store, and substitute
it instead of reexecuting that segment of the plan. Of course this requires
keeping track of transaction snapshot states and making sure it's still
correct.

> Something else worth considering is not using the normal catalog methods
> for storing information about temp tables, but hacking that together
> would probably be a rather large task.

It would be nice if using this feature didn't interact poorly with preplanning
all your queries and using the cached plans. Perhaps if you had some way to
create a single catalog entry that defined all the column names and types and
then simply pointed it at a new tuplestore each time without otherwise
altering the catalog entry?

-- 
greg


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Upcoming releases

2006-05-11 Thread Tom Lane
Per discussion among pgsql-packagers, we'll release 8.1.4 and updates in
the back branches next week; tentative plan is tarballs wrapped Monday,
public release Wednesday.  Get those last-minute fixes in now ...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Bug in signal handler

2006-05-11 Thread Tom Lane
Douglas McNaught <[EMAIL PROTECTED]> writes:
> Martijn van Oosterhout  writes:
>> Running unsafe functions within a signal handler is not unsafe per-se.
>> It's only unsafe if the main program could also be running unsafe
>> functions.

> I don't disagree with your reasoning, but does POSIX actually say
> this?

The fact remains that the postmaster has *always* been coded like that,
and we have *never* seen any problems.  Barring proof that there is a
problem, I'm uninterested in rewriting it just because someone doesn't
like it.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Bug in signal handler

2006-05-11 Thread Martijn van Oosterhout
On Thu, May 11, 2006 at 08:24:02AM -0400, Douglas McNaught wrote:
> Martijn van Oosterhout  writes:
> 
> > Running unsafe functions within a signal handler is not unsafe per-se.
> > It's only unsafe if the main program could also be running unsafe
> > functions.
> 
> I don't disagree with your reasoning, but does POSIX actually say
> this?

On my machine, signal(2) has the following:

   The routine handler must be very careful, since processing
   elsewhere was interrupted at some arbitrary point. POSIX has the
   concept of "safe function".  If a signal interrupts an unsafe
   function, and handler calls an unsafe function, then the
   behavior is undefined. Safe functions are listed explicitly in
   the various standards.  The POSIX 1003.1-2003 list is



I havn't read POSIX myself though...

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] [COMMITTERS] pgsql: Clean up code associated with updating

2006-05-11 Thread Bruce Momjian
Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > Minor comment: since you removed the - lines from the index_create
> > header comment, I think pgindent will mess up the alignment for that
> > comment, which probably will not be very nice.  Same in
> > index_update_stats.  Otherwise, nice work!  I will update my relminxid
> > patch to this and re-submit.
> 
> No, left-justified comments aren't subject to reindenting (correct Bruce?)

Correct.

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [COMMITTERS] pgsql: Clean up code associated with updating pg_class statistics

2006-05-11 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Minor comment: since you removed the - lines from the index_create
> header comment, I think pgindent will mess up the alignment for that
> comment, which probably will not be very nice.  Same in
> index_update_stats.  Otherwise, nice work!  I will update my relminxid
> patch to this and re-submit.

No, left-justified comments aren't subject to reindenting (correct Bruce?)
Certainly there are plenty of other function header comments that don't
have any ---.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Bug in signal handler

2006-05-11 Thread Douglas McNaught
Martijn van Oosterhout  writes:

> Running unsafe functions within a signal handler is not unsafe per-se.
> It's only unsafe if the main program could also be running unsafe
> functions.

I don't disagree with your reasoning, but does POSIX actually say
this?

-Doug

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Bug in signal handler [Was: [TODO] Allow commenting

2006-05-11 Thread Martijn van Oosterhout
On Thu, May 11, 2006 at 01:59:46PM +0200, Zdenek Kotala wrote:
> Decision is that Postgres uses signal dangerous functions (fopen, ...) 
> and its signal handler is not save and should generate unpredictable 
> behavior after signal processing. I would like to fix it, but there is 
> some waiting patches for this source and I don't know how to correctly 
> (with minimal merge complication) process.

Look at the code more carefully. The restriction is that it is unsafe
to call non-reentrant functions from within a signal handler while
there may be a non-reentrant function run by the main program.

If you look at the code in postmaster.c, the only place the signal
handler can run is between the block (line 1223) and unblock (line
1231), the only function there is select() which is specifically listed
as being safe.

Running unsafe functions within a signal handler is not unsafe per-se.
It's only unsafe if the main program could also be running unsafe
functions.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Bug in signal handler [Was: [TODO] Allow commenting

2006-05-11 Thread Zdenek Kotala




Alvaro Herrera wrote:

  Zdenek Kotala wrote:

  
  
I performed some investigation and I found that signal handler
(SIGHUP_handler) contents a big code and contents signal nonsafe
functions. It should generate deadlock or damage some internal data
structure in the standard c library. See
http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html
for detail. By my opinion is necessary to rewrite signal handling in
postmaster to avoid postgres malfunction.  

  
  
Perhaps you missed these lines:

/*
 * Block all signals until we wait again.  (This makes it safe for our
 * signal handlers to do nontrivial work.)
 */
PG_SETMASK(&BlockSig);

postmaster.c 1227ff

  

Blocking signal is false safe. It is race condition problem. You can
receive signal before you block it, but main problem is different. See
example:

If you have following functions and signal handler:

char buffer[11];

void sig_handler(int signo)
{
   f1('B');
}

void f1(char ch)
{
  int n;
  for( n = 0 ; n <  10;  n++)
    buffer[n] = ch; 
}

If you call function f1('A') you expect that content of buffer
will be  "AA". It will be true until you don't receive signal.
Signal is asynchronous event and if you receive it during loop
processing (for example when n=3) then signal handler call f1 too, but
with parametr 'B'. The result in buffer will be "BB" after
signal processing. And now f1 continue with n=3, 4, 5 ...  And your
expected result is  "BBBAAA". That is all. For example this is
reason why you don't use functions like printf, because they content
static internal buffer. It is reason why there are only small group of
signal safe functions. 

Decision is that Postgres uses signal dangerous functions (fopen, ...)
and its signal handler is not save and should generate unpredictable
behavior after signal processing.  I would like to fix it, but there is
some waiting patches for this source and I don't know how to correctly 
(with minimal merge complication) process.

       Zdenek














Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-11 Thread Martijn van Oosterhout
On Thu, May 11, 2006 at 09:55:15AM +0200, Zeugswetter Andreas DCP SD wrote:
> > 0.101 ms BEGIN
> > 1.451 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER
> NOT  
> > NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP
> 
> 1.4 seconds is not great for create table, is that what we expect ?

Hmm, I'm hoping ms means milliseconds...
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] intarray internals

2006-05-11 Thread Volkan YAZICI
Hi,

First, thanks so much for your reply.

On May 10 04:01, Teodor Sigaev wrote:
> > Again, in g_int_decompress(), I couldn't figure out the functionality of
> > below lines:
> 
> gist__int_ops use rangeset compression technique, read about in "THE 
> RD-TREE: AN INDEX STRUCTURE FOR SETS", Joseph M. Hellerstein,
> http://www.sai.msu.su/~megera/postgres/gist/papers/rd-tree.ps

Thanks so much for the papers. I read the related section (and will read
whole today or tomorrow).

> * intarray_union.patch.0 - doesn't applied, but make small optimization to 
> reduce number non-unique values. I don't believe that one pass through 
> array with a lot of ifs will be faster than two pass with simple ifs. Did 
> you some tests?

IMHO, the only significant improvement in my proposal about _int_union()
is that this method will visit arrays only once (with extra price of x2
condition checks), while current one will also make a second visit to
arrays to remove duplicates (with small condition checks).

You can be right, maybe it doesn't worth for worrying about. Improvement
(if there's any) will probably be available to see for very long arrays.
(Sorry, no tests for this proposal.)

> * intarray_same.patch.0 - move SORT as you suggest, but don't touch 
> algorithm.
>  1) if (A[0] == B[0] && A[1] == B[1] && ...)
> 
>  2) if (A[0] == B[0] && A[  N] == B[  N] &&
> A[1] == B[1] && A[N-1] == B[N-1] &&
> ...)
> 
>   Why are you sure that second a much faster? Did you make tests? Number of 
> comparisons is the same...

Yep, both algorithms have O(n) comparisions in their worst cases. But
for general purposes, AFAICS, second one will perform better. For
instance consider below examples:

 [Best case for 2nd algo.]
 Input: 1, 2, 3, ..., 6, 7, *9
 1st algo.: O(n)
 2nd algo.: O(1)

 [Worst case for 2nd algo.]
 Input: 1, 2, 3, 4, *4, 6, 7, 8, 9
 1st algo.: O(n/2)
 2nd algo.: O(n)

But as you can see, because of our arrays are sorted, any missing (or
additional) element in the target array will produce a padding in the
end of the array --- assuming that arrays generally don't hold
duplicate values. Therefore, making comparisons for the tail elements
will perform better beucause of the unmatched values caused by padding.

Hope I managed to explain what I try to mean. Actually, IIRC, I saw this
method (both hacks for small sized arrays and comparisons for the tail
elements of a sorted array) in another FOSS project's source code ---
probably PHP, but I'm not sure.

For about testing, if you'd supply suitable inputs there occurs a quite
much performance improve.

> * intarray_sort.patch.0 - doesn't applied. isort() is very often called for 
> already sorted and unique arrays (which comes from index), so it should be 
> fast as possible for sorted arrays.

Uh, sorry. I missed that point.

> As I remember ordered array is a worst 
> case for qsort(). May be, it will be better choice to use mergesort.

I'll investigate alternative methods to sort already sorted arrays.


Regards.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-11 Thread Zeugswetter Andreas DCP SD

> Something else worth considering is not using the normal 
> catalog methods
> for storing information about temp tables, but hacking that together
> would probably be a rather large task.

But the timings suggest, that it cannot be the catalogs in the worst
case
he showed.

> 0.101 ms BEGIN
> 1.451 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER
NOT  
> NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP

1.4 seconds is not great for create table, is that what we expect ?

> 0.450 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id
DESC  
> LIMIT 20
> 0.443 ms ANALYZE tmp
> 0.365 ms SELECT * FROM tmp
> 0.310 ms DROP TABLE tmp
> 32.918 ms COMMIT
> 
>   CREATING the table is OK, but what happens on COMMIT ? I hear
the disk  
> seeking frantically.

The 32 seconds for commit can hardly be catalog related. It seems the
file is 
fsynced before it is dropped.

Andreas

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] BEGIN inside transaction should be an error

2006-05-11 Thread Marko Kreen

On 5/11/06, Martijn van Oosterhout  wrote:

On Wed, May 10, 2006 at 04:03:51PM -0500, Jim C. Nasby wrote:
> If the existing application is trying to start a new transaction from
> within an existing one, I'd say it's already broken and we're just
> hiding that fact.

Well maybe, except the extra BEGIN is harmless.


It _not_ harmless as it will be probably followed by 'extra' commit.
Those few cases where it does not happen do not matter in light
of cases where it happens.


--
marko

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] .pgpass file and unix domain sockets

2006-05-11 Thread Martijn van Oosterhout
On Wed, May 10, 2006 at 09:34:38PM -0400, Tom Lane wrote:
> I'm not sure if that's a bug or not.  Arguably, different socket paths
> might point to different servers for which you need different passwords.
> If we did want unix-socket connections to search for "localhost"
> regardless of socket path, it'd be a simple change (change the order of
> operations in connectOptions2).  But maybe the code is right and we
> should fix the documentation.  Or maybe this whole notion of using
> "localhost" is bogus and we should always use the socket path.

Maybe something like "unix:*" would match all sockets and "unix:/tmp"
would match just that one. Or maybe just allow the special string
"unix:" match any socket and leave the rest alone.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature