Re: [HACKERS] Postgres 8.3 archive_command

2007-11-22 Thread Zeugswetter Andreas ADI SD
   Perhaps we should move the successful archived message to DEBUG1
now,
   except for the first message after the archiver starts or when the
   archive_command changes, plus one message every 255 segments? 
   That would reduce the log volume in the normal case without
endangering 
   our ability to see what is happening.
  
  Wouldn't it be more useful to increase the WAL segment size on such
  installations
  that switch WAL files so frequently that it is a problem for the log
?
  
  This currently needs a recompile. I wondered for some time now
whether
  16 Mb isn't
  too low for current hw. Maybe it is time for making WAL segment size
  changeable 
  in the conf with a clean shutdown.
 
 I think its too late in the release cycle to fully consider all the
 implications of that. 16MB is hardcoded in lots of places. The
 performance advantages of that have been mostly removed in 8.3, you
 should note.

Oh sorry, this was definitely not meant for 8.3. And here I didn't mean
the 
performance of the db issue, but an issue for archiving the WAL files. 
I think most archiving systems are not too happy with extremely frequent
backup calls. Also the overall handling of too many WAL files is imho
not handy.

Andreas


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] 8.3devel slower than 8.2 under read-only load

2007-11-22 Thread Simon Riggs
On Thu, 2007-11-22 at 00:30 +0100, Guillaume Smet wrote:

  Is the data identical on both systems?

Guillaume,

Sounds comprehensive, thanks for double checking. 

Would it be possible to do these tests?

1. Compare SELECT 1;
This will allow us to remove planner and indexscan overheads from
results, though will still include protocol and tcop stuff.

2. Compare SELECT ... WHERE values are constants
This will cause the clients to reuse the plan already made, so should
effectively remove planner, but not indexscan overheads from the test.

3. Change the test to look at Integers columns only for the WHERE
clause, so we can remove any thought it has anything to do with text
data, collation etc..

From those tests we should be able to narrow things down to planner,
executor or indexscan related.

Thanks,

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Postgres 8.3 archive_command

2007-11-22 Thread Zeugswetter Andreas ADI SD

 I don't think that should even be a TODO item --- it seems far more
 likely to provide a foot-gun than useful capability.

On further reflection I think that initdb time is probably sufficient.
Do you think that would be a reasonable TODO ?

 Whether 16MB is still a reasonable default segment size is worth
 questioning, though I don't think that increasing it is an
open-and-shut
 proposition.  Larger segments mean more overhead in configurations
that
 force frequent segment switches, for instance.

Yes, imho there is no one size fits all (if there were, it would
probably be
 between 32 and 64 Mb). 
But there are installations where even 16Mb is too much e.g. for an 
embedded device.

Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] 8.3devel slower than 8.2 under read-only load

2007-11-22 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 The weird thing is that after a couple of hours of poking at it with
 oprofile and other sharp objects, I have no idea *why* it's slower.
 oprofile shows just about the same relative percentages for all the
 hot-spot functions in the backend.

Out of curiosity have you recompiled 8.2.5 recently? That is, are they
compiled with the same version of gcc?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Postgres 8.3 archive_command

2007-11-22 Thread Simon Riggs
On Thu, 2007-11-22 at 10:34 +0100, Zeugswetter Andreas ADI SD wrote:
  I don't think that should even be a TODO item --- it seems far more
  likely to provide a foot-gun than useful capability.
 
 On further reflection I think that initdb time is probably sufficient.
 Do you think that would be a reasonable TODO ?

I think you'd have to explain why this was needed. It was useful for
performance once, but not anymore.

There are many possible errors there, so not a road I would go down
without good reason. Extra configuration is just one more thing to go
wrong and one more thing for people to misconfigure.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] strange bison, cannot remove reduce

2007-11-22 Thread Pavel Stehule
On 22/11/2007, Peter Eisentraut [EMAIL PROTECTED] wrote:
 Pavel Stehule wrote:
  I am playing with methods. It's +/- function with first hidden arguments.
 
  example: sin(10)  ~ (10).sin() is equivalent.
  legal is substring('',1,3).upper() too etc
 
  I spent some time with bison (without success).

 I don't think you can actually resolve this in the parser.  For example

 a.b(x)

 could be, call function b(x) in schema a, or call function b(a, x).

 You need to resolve this later, with catalog access, it appears.


yes, I know, but I have to go across parser first

Pavel

 --
 Peter Eisentraut
 http://developer.postgresql.org/~petere/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Internal document bug?

2007-11-22 Thread Tatsuo Ishii
Hi,

53.3. Database Page Layout

in table 53-2, 

ItemPointerDataArray of (offset,length) pairs pointing to the actual 
items. 4 bytes per item.

the explanation should be for ItemIdData, not for ItemPointerData, I think.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] run_build.pl ~ By Andrew Dunstan

2007-11-22 Thread Andrew Dunstan



cinu wrote:
Hi All, 


I was exploring through the BuildFarm specific perl
script run_build.pl. 
I executed the perl script and it went on file by

downloading the latest PostgreSQL source code that is
8.3Beta2, after successful completion of the script it
creates the required set of logfiles in the
lastrun-logs folder.

Now Inside this script there are certain places where
the rmtreecommand is used for removing the
installation, if I comment these places it should not
remove the current installation, but infact it is
doing so, it is installing the latest version of
PostgreSQL in the location /HEAD/inst/bin and after
the execution of the script it is cleaning the folder
inst.

Please tell me :
1) Is rmtree the command used for removing the
installation or is there any other command for the
same, please specify?
2) Is there any site or portal where I can get
detailed description about the perl script
run_build.pl?
3) If I rename the perl script run_build.pl to some
other file like test.pl will it execute or will it
give errors?


For your information :

I am running this script through the postgres user.


  


First, -hackers is not the appropriate forum for questions regarding the 
buildfarm code.


Second, there should not be any need to change the script. There are 
switches to do most of the things you might want to do.


Third, unless you show the exact changes you have made it is impossible 
to comment  further. Why don't you say what it is you actually want to 
do and why?


cheers

andrew



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[HACKERS] Ordered Append Node

2007-11-22 Thread Gregory Stark

I've been hacking on the idea of an Append node which maintains the ordering
of its subtables merging their records in order. This is important for
partitioned tables since otherwise a lot of plans are not available such as
merge joins.

The logic I've followed is to do as follows:

1) Go through all subrels asking for any interesting pathkey lists. Gather up
   the union of all of these.

2) Go back through all the subrels and for each accumulated pathkey list ask
   for the best path for that subrel for that pathkey list. 

3) Generate an two append paths for each of these pathkey lists, one using the
   best startup_cost subpath and one with the best total_cost subpath
   available for the pathkey list for each child rel. If there is none
   available take the best unordered path and put a sort node above it.

4) Additionally advertise the traditional unordered append node which our
   parent could choose to put a sort node above same as ever.

5) Append plan nodes look a lot like Sort plan nodes glued onto an Append plan
   node, with sort function oids and so on.

6) Append exec state nodes look a lot like a (a few fields from)
   tuplesortstate glued onto an Append node. They have the ScanKey array and
   the fmgr sort functions. They also have an array of TupleTableSlot and a
   heap of indexes into that array.

8) ExecAppend does something similar to tuplesort's bounded sort (I fear I'm
   going to get typecasted) or more to the point, similar to the final merge
   of a merge sort. It directly returns the slot the child rel last returned
   to it.


Open questions:

1) I still haven't completely figured out what to do with equivalence classes.
   My hack of just stuffing all the append subrel vars into there seems to
   work fine. I need to understand what's going on to see if there's really a
   problem with it or not.

2) I'm not sure this code will work when the append rel is a target (ie UPDATE
   and DELETE stmts).

3) It does seem to work when the columns in the subrels don't line up but I
   didn't do anything special to handle this case.

4) I haven't handled mark/restore or random access. I think they could be
   handled and they'll probably be worth the complexity but I'm not sure.

5) Is it considering too many paths? Are there some which maybe aren't worth
   considering? For example, maybe it only makes sense to take best start_cost
   paths since if that plan doesn't dominate then the best total_cost plan is
   likely to be the sequential scans + unordered append + sort.

6) I haven't looked at setops yet but this is particularly attractive for the
   UNION (as opposed to UNION ALL) case.

7) I copied/adapted a bunch of bits from tuplesort to maintain the heap and do
   the scankey comparisons. I could refactor that code back into tuplesort but
   it would mean twisting around tuplesort quite a bit. Essentially it would
   mean introducing a new type of tuplesort which would start off in
   FINAL_MERGE state only it would have to behave differently since we don't
   want it prefetching lots of records like FINAL_MERGE does, I don't think.

Some example plans (though note that the first was with a lot of enable_*
parameters set to off).

 QUERY PLAN 
 
-
 Merge Join  (cost=76.03..107.63 rows=12 width=12) (actual time=0.435..0.706 
rows=11 loops=1)
   Merge Cond: (public.z.i = x.i)
   -  Append  (cost=22.36..53.66 rows=12 width=8) (actual time=0.365..0.440 
rows=12 loops=1)
 -  Index Scan using zi on z  (cost=0.00..8.27 rows=1 width=8) (actual 
time=0.089..0.091 rows=1 loops=1)
 -  Index Scan using z1i on z1 z  (cost=0.00..12.28 rows=2 width=8) 
(actual time=0.063..0.068 rows=2 loops=1)
 -  Index Scan using z2i on z2 z  (cost=0.00..12.30 rows=3 width=8) 
(actual time=0.060..0.066 rows=3 loops=1)
 -  Index Scan using z3i on z3 z  (cost=0.00..12.33 rows=5 width=8) 
(actual time=0.059..0.070 rows=5 loops=1)
 -  Index Scan using zzi on zz z  (cost=0.00..8.27 rows=1 width=8) 
(actual time=0.076..0.079 rows=1 loops=1)
   -  Materialize  (cost=53.67..53.79 rows=12 width=8) (actual 
time=0.051..0.170 rows=12 loops=1)
 -  Append  (cost=22.36..53.66 rows=12 width=8) (actual 
time=0.036..0.104 rows=12 loops=1)
   -  Index Scan using zi on z x  (cost=0.00..8.27 rows=1 width=8) 
(actual time=0.006..0.006 rows=1 loops=1)
   -  Index Scan using z1i on z1 x  (cost=0.00..12.28 rows=2 
width=8) (actual time=0.004..0.009 rows=2 loops=1)
   -  Index Scan using z2i on z2 x  (cost=0.00..12.30 rows=3 
width=8) (actual time=0.005..0.014 rows=3 loops=1)
   -  Index Scan using z3i on z3 x  (cost=0.00..12.33 rows=5 
width=8) (actual time=0.004..0.016 rows=5 loops=1)
   -  

Re: [HACKERS] 8.3devel slower than 8.2 under read-only load

2007-11-22 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 The weird thing is that after a couple of hours of poking at it with
 oprofile and other sharp objects, I have no idea *why* it's slower.
 oprofile shows just about the same relative percentages for all the
 hot-spot functions in the backend.

 Out of curiosity have you recompiled 8.2.5 recently? That is, are they
 compiled with the same version of gcc?

CVS tip of both branches, freshly compiled for this test.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Internal document bug?

2007-11-22 Thread Tom Lane
Tatsuo Ishii [EMAIL PROTECTED] writes:
 53.3. Database Page Layout

 in table 53-2, 

 ItemPointerData  Array of (offset,length) pairs pointing to the actual 
 items. 4 bytes per item.

 the explanation should be for ItemIdData, not for ItemPointerData, I think.

Yeah, you're right.  Please fix.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Postgres 8.3 archive_command

2007-11-22 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Thu, 2007-11-22 at 10:34 +0100, Zeugswetter Andreas ADI SD wrote:
 On further reflection I think that initdb time is probably sufficient.
 Do you think that would be a reasonable TODO ?

 I think you'd have to explain why this was needed. It was useful for
 performance once, but not anymore.

Yeah, some demonstration that choosing another size is actually useful
would be a good idea before we go to the trouble of making it runtime
configurable.  It's already build-time configurable, and I'm not
convinced that that's not sufficient.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] 8.3devel slower than 8.2 under read-only load

2007-11-22 Thread Tom Lane
Guillaume Smet [EMAIL PROTECTED] writes:
 On Nov 22, 2007 6:44 AM, Tom Lane [EMAIL PROTECTED] wrote:
 Are you examining only trivial queries?  I've been able to identify a
 couple of new planner hotspots that could explain some slowdown if the
 planning time is material compared to the execution time.  If you're
 seeing a slowdown on queries that run for awhile, that would be
 something else ...

 Yes, I kept only queries with no join and a couple of where
 conditions. As I explained previously, I can reproduce the behavior
 with a single index scan on only one table (plan posted previously).
 If anyone is interested I can post the content of this table (there's
 nothing confidential in it so I should have the customer permission)
 and a couple of instructions to reproduce the test case.

I don't think you need to --- the read-only transaction case built
into pgbench is probably an equivalent test.  What it looks like to
me is that the EquivalenceClass mechanism has added a little bit of
overhead, which isn't actually buying much of anything in these
trivial cases.  I'll look at whether it can be short-circuited.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] 8.3devel slower than 8.2 under read-only load

2007-11-22 Thread Guillaume Smet
On Nov 22, 2007 5:00 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Gregory Stark [EMAIL PROTECTED] writes:
  Out of curiosity have you recompiled 8.2.5 recently? That is, are they
  compiled with the same version of gcc?

 CVS tip of both branches, freshly compiled for this test.

And in my case, a vanilla 8.2.5 compiled on the same box with the same compiler.

--
Guillaume

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 8.3devel slower than 8.2 under read-only load

2007-11-22 Thread Guillaume Smet
Tom,

On Nov 22, 2007 10:29 AM, Simon Riggs [EMAIL PROTECTED] wrote:
 Sounds comprehensive, thanks for double checking.

 Would it be possible to do these tests?

 snip additional tests

Do you want me to perform additional tests or are you pretty sure of
what the problem is?

I thought I could also perform a test on CVS head every month from
December 2006 to now to see if it can give us a better idea of when
the overhead first appeared. Ping me if you're interested in it.

--
Guillaume

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] 8.3devel slower than 8.2 under read-only load

2007-11-22 Thread Jonah H. Harris
On Nov 22, 2007 12:45 PM, Guillaume Smet [EMAIL PROTECTED] wrote:
 I thought I could also perform a test on CVS head every month from
 December 2006 to now to see if it can give us a better idea of when
 the overhead first appeared. Ping me if you're interested in it.

If I recall correctly, I seem to recall this issue coming in around
between HOT and the select-only improvement fix.  Though, I'm not sure
whether I had tested it previously to that.

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
499 Thornall Street, 2nd Floor  | [EMAIL PROTECTED]
Edison, NJ 08837| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] 8.3devel slower than 8.2 under read-only load

2007-11-22 Thread Tom Lane
Guillaume Smet [EMAIL PROTECTED] writes:
 I thought I could also perform a test on CVS head every month from
 December 2006 to now to see if it can give us a better idea of when
 the overhead first appeared. Ping me if you're interested in it.

If you feel like doing that, it might be interesting just on general
principles ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Autovacuum and OldestXmin

2007-11-22 Thread Simon Riggs
I notice that slony records the oldestxmin that was running when it last
ran a VACUUM on its tables. This allows slony to avoid running a VACUUM
when it would be clearly pointless to do so.

AFAICS autovacuum does not do this, or did I miss that?

It seems easy to add (another, groan) column onto pg_stat_user_tables to
record the oldestxmin when it was last vacuumed. (last_autovacuum_xmin)

That will avoid pointless VACUUMs for all users (in 8.4).

Strangely HOT does this at the page level to avoid useless work, yet
stranger still VACUUM doesn't evaluate PageIsPrunable() at all and
always scans each page regardless.

Why isn't VACUUM optimised the same way HOT is?
Why doesn't VACUUM continue onto the next block when !PageIsPrunable().
Nothing is documented though it seems obvious that it should.

Perhaps an integration oversight?

[Also there is a comment saying this is a bug in autovacuum.c
Are we thinking to go production with that phrase in the code?]

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Autovacuum and OldestXmin

2007-11-22 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Simon Riggs wrote:
 [Also there is a comment saying this is a bug in autovacuum.c
 Are we thinking to go production with that phrase in the code?]

 Yeah, well, it's only a comment ;-)  The problem is that a worker can
 decide that a table needs to be vacuumed, if another worker has finished
 vacuuming it in the last 500 ms.  I proposed a mechanism to close the
 hole but it was too much of a hassle.

 Maybe we could remove the comment for the final release? :-)

What, you think we should try to hide our shortcomings?  There are
hundreds of XXX and FIXME comments in the sources.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Autovacuum and OldestXmin

2007-11-22 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Why isn't VACUUM optimised the same way HOT is?

It doesn't do the same things HOT does.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Autovacuum and OldestXmin

2007-11-22 Thread Alvaro Herrera
Simon Riggs wrote:
 I notice that slony records the oldestxmin that was running when it last
 ran a VACUUM on its tables. This allows slony to avoid running a VACUUM
 when it would be clearly pointless to do so.
 
 AFAICS autovacuum does not do this, or did I miss that?

Hmm, I think it's just because nobody suggested it and I didn't came up
with the idea.

Whether it's a useful thing to do is a different matter.  Why store it
per table and not more widely?  Perhaps per database would be just as
useful; and maybe it would allow us to skip running autovac workers
when there is no point in doing so.


 Why isn't VACUUM optimised the same way HOT is?
 Why doesn't VACUUM continue onto the next block when !PageIsPrunable().
 Nothing is documented though it seems obvious that it should.
 
 Perhaps an integration oversight?

Yeah.

 [Also there is a comment saying this is a bug in autovacuum.c
 Are we thinking to go production with that phrase in the code?]

Yeah, well, it's only a comment ;-)  The problem is that a worker can
decide that a table needs to be vacuumed, if another worker has finished
vacuuming it in the last 500 ms.  I proposed a mechanism to close the
hole but it was too much of a hassle.

Maybe we could remove the comment for the final release? :-)

-- 
Alvaro Herrera   Valdivia, Chile   ICBM: S 39º 49' 18.1, W 73º 13' 56.4
Management by consensus: I have decided; you concede.
(Leonard Liu)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Autovacuum and OldestXmin

2007-11-22 Thread Heikki Linnakangas

Simon Riggs wrote:

On Thu, 2007-11-22 at 13:21 -0500, Tom Lane wrote:

Simon Riggs [EMAIL PROTECTED] writes:

Why isn't VACUUM optimised the same way HOT is?

It doesn't do the same things HOT does.


Thanks for the enlightenment :-)

Clearly much of the code in heap_page_prune_opt() differs, yet the test
for if (!PageIsPrunable(...)) could be repeated inside the main block
scan loop in lazy_scan_heap().

My thought-experiment:

- a long running transaction is in progress
- HOT cleans a block and then the block is not touched for a while, the
total of all uncleanable updates cause a VACUUM to be triggered, which
then scans the table, sees the block and scans the block again
because...

a) it could have checked !PageIsPrunable(), but didn't

b) it is important that it attempt to clean the block again for
reason...?


There might be dead tuples left over by aborted INSERTs, for example, 
which don't set the Prunable-flag.


Even if we could use PageIsPrunable, it would be a bad thing from a 
robustness point of view. If we ever failed to set the Prunable-flag on 
a page for some reason, VACUUM would never remove the dead tuples.


Besides, I don't remember anyone complaining about VACUUM's CPU usage, 
so it doesn't really matter.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Autovacuum and OldestXmin

2007-11-22 Thread Simon Riggs
On Thu, 2007-11-22 at 19:02 +, Heikki Linnakangas wrote:

 Even if we could use PageIsPrunable, it would be a bad thing from a 
 robustness point of view. If we ever failed to set the Prunable-flag on 
 a page for some reason, VACUUM would never remove the dead tuples.

That's a killer reason, I suppose. I was really trying to uncover what
the thinking was, so we can document it. Having VACUUM ignore it
completely seems wrong.

 Besides, I don't remember anyone complaining about VACUUM's CPU usage, 
 so it doesn't really matter.

Recall anybody saying how much they love it? ;-)

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Autovacuum and OldestXmin

2007-11-22 Thread Simon Riggs
On Thu, 2007-11-22 at 15:20 -0300, Alvaro Herrera wrote:
 Simon Riggs wrote:
  I notice that slony records the oldestxmin that was running when it last
  ran a VACUUM on its tables. This allows slony to avoid running a VACUUM
  when it would be clearly pointless to do so.
  
  AFAICS autovacuum does not do this, or did I miss that?
 
 Hmm, I think it's just because nobody suggested it and I didn't came up
 with the idea.

OK, well, me neither :-(

...and I never thought to look at slony before now.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Autovacuum and OldestXmin

2007-11-22 Thread Simon Riggs
On Thu, 2007-11-22 at 13:21 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  Why isn't VACUUM optimised the same way HOT is?
 
 It doesn't do the same things HOT does.

Thanks for the enlightenment :-)

Clearly much of the code in heap_page_prune_opt() differs, yet the test
for if (!PageIsPrunable(...)) could be repeated inside the main block
scan loop in lazy_scan_heap().

My thought-experiment:

- a long running transaction is in progress
- HOT cleans a block and then the block is not touched for a while, the
total of all uncleanable updates cause a VACUUM to be triggered, which
then scans the table, sees the block and scans the block again
because...

a) it could have checked !PageIsPrunable(), but didn't

b) it is important that it attempt to clean the block again for
reason...?

Seems like the thought experiment could occur frequently.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Ordered Append Node

2007-11-22 Thread Markus Schiltknecht

Hello Gregory,

Gregory Stark wrote:

I've been hacking on the idea of an Append node which maintains the ordering
of its subtables merging their records in order. This is important for
partitioned tables since otherwise a lot of plans are not available such as
merge joins.


Cool!

Some time ago, I've been trying something very similar, but didn't get 
very far. I'd like to share my thoughts anyway.


First of, I envisioned that append node to be applicable also to 
unordered reading from multiple partitions, simply interleaving between 
the partitions.



The logic I've followed is to do as follows:

1) Go through all subrels asking for any interesting pathkey lists. Gather up
   the union of all of these.


I also tried to modify the Append node first, then figured that it might 
be better to base on the merge join node instead. While this seems 
farther away, I had the hope that a binary tree of such 'plain merge' 
nodes would require less comparisons in total.


Plus, I found it a lot simpler to cope with exactly two input relations 
instead of n, as with the append node. :-)



2) Go back through all the subrels and for each accumulated pathkey list ask
   for the best path for that subrel for that pathkey list. 


3) Generate an two append paths for each of these pathkey lists, one using the
   best startup_cost subpath and one with the best total_cost subpath
   available for the pathkey list for each child rel. If there is none
   available take the best unordered path and put a sort node above it.

4) Additionally advertise the traditional unordered append node which our
   parent could choose to put a sort node above same as ever.

5) Append plan nodes look a lot like Sort plan nodes glued onto an Append plan
   node, with sort function oids and so on.

6) Append exec state nodes look a lot like a (a few fields from)
   tuplesortstate glued onto an Append node. They have the ScanKey array and
   the fmgr sort functions. They also have an array of TupleTableSlot and a
   heap of indexes into that array.

8) ExecAppend does something similar to tuplesort's bounded sort (I fear I'm
   going to get typecasted) or more to the point, similar to the final merge
   of a merge sort. It directly returns the slot the child rel last returned
   to it.


Uh.. well, yeah. I guess you have a better understanding of the planner 
and executor that I do.



Open questions:

1) I still haven't completely figured out what to do with equivalence classes.
   My hack of just stuffing all the append subrel vars into there seems to
   work fine. I need to understand what's going on to see if there's really a
   problem with it or not.

2) I'm not sure this code will work when the append rel is a target (ie UPDATE
   and DELETE stmts).

3) It does seem to work when the columns in the subrels don't line up but I
   didn't do anything special to handle this case.


Uh.. is there any use case for that? WRT partitioning certainly not, is 
there?


I'll have a look at your patch.

Regards

Markus


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Ordered Append Node

2007-11-22 Thread Gregory Stark

Markus Schiltknecht [EMAIL PROTECTED] writes:

 1) Go through all subrels asking for any interesting pathkey lists. Gather up
the union of all of these.

 I also tried to modify the Append node first, then figured that it might be
 better to base on the merge join node instead. While this seems farther away, 
 I
 had the hope that a binary tree of such 'plain merge' nodes would require less
 comparisons in total.

It is kind of like a merge join but not quite. It's interleaving rows rather
than matching them up. It's more like the final merge of a sort which also
uses a heap to efficiently find the next value from the source tapes.

 3) It does seem to work when the columns in the subrels don't line up but I
didn't do anything special to handle this case.

 Uh.. is there any use case for that? WRT partitioning certainly not, is there?

Not necessarily but it is something Postgres supports and I don't think we
want to break it. Actually it's useful for partitioned tables if you build the
new partition in a separate table and then add it to the partitioned table. In
that case you may have gone through several steps of adding columns and
dropping them to get the structure to line up.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Test lab

2007-11-22 Thread Simon Riggs
On Thu, 2007-11-08 at 13:34 -0800, Joshua D. Drake wrote:
 On Sun, 04 Nov 2007 18:55:59 +
 Simon Riggs [EMAIL PROTECTED] wrote:
 ve up and have ready access to
   is a HP DL 585. It has 8 cores (Opteron), 32GB of ram and 28
   spindles over 4 channels.
   
   My question is -hackers, is who wants first bite and what do they
   want :) 
  
  I'll take a few slots, probably 3 x 1 days, at least a week apart.
  Won't be able to start before 19th Nov.
 
 Sorry I missed this. We are awaiting provisioning.

Is there any update on this please?

I'm thinking of some performance regression testing to see what else is
lurking around the corner for us.

Thanks,

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Test lab

2007-11-22 Thread Heikki Linnakangas

Simon Riggs wrote:

On Thu, 2007-11-08 at 13:34 -0800, Joshua D. Drake wrote:

On Sun, 04 Nov 2007 18:55:59 +
Simon Riggs [EMAIL PROTECTED] wrote:
ve up and have ready access to

is a HP DL 585. It has 8 cores (Opteron), 32GB of ram and 28
spindles over 4 channels.

My question is -hackers, is who wants first bite and what do they
want :) 

I'll take a few slots, probably 3 x 1 days, at least a week apart.
Won't be able to start before 19th Nov.

Sorry I missed this. We are awaiting provisioning.


Is there any update on this please?

I'm thinking of some performance regression testing to see what else is
lurking around the corner for us.


If you have something you can just throw over the fence, I can run stuff 
on Imola as well.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] [GENERAL] possible to create multivalued index from xpath() results in 8.3?

2007-11-22 Thread Peter Eisentraut
Tom Lane wrote:
 Given the actual behavior of xmltotext_with_xmloption, it certainly
 seems like a pretty useless error check.  Also, xml_out doesn't behave
 that way, so why should xmltotext?

 The volatility markings of xml_in and texttoxml seem wrong too.

This is the patch that came out of it.

 Should we force initdb to correct these pg_proc entries, or just quietly
 change pg_proc.h?

Considering the extent of the changes, I'd be in favor of forcing an initdb.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/
diff -ur ../cvs-pgsql/src/backend/utils/adt/xml.c ./src/backend/utils/adt/xml.c
--- ../cvs-pgsql/src/backend/utils/adt/xml.c	2007-11-22 21:52:30.0 +0100
+++ ./src/backend/utils/adt/xml.c	2007-11-22 23:21:25.0 +0100
@@ -535,7 +535,8 @@
 {
 	xmltype*data = PG_GETARG_XML_P(0);
 
-	PG_RETURN_TEXT_P(xmltotext_with_xmloption(data, xmloption));
+	/* It's actually binary compatible. */
+	return (text *) data;
 }
 
 
diff -ur ../cvs-pgsql/src/include/catalog/pg_cast.h ./src/include/catalog/pg_cast.h
--- ../cvs-pgsql/src/include/catalog/pg_cast.h	2007-11-16 21:25:20.0 +0100
+++ ./src/include/catalog/pg_cast.h	2007-11-22 23:34:23.0 +0100
@@ -292,7 +292,7 @@
 DATA(insert (  650	 25  730 a ));
 DATA(insert (  869	 25  730 a ));
 DATA(insert (	16	 25 2971 a ));
-DATA(insert (  142	 25 2922 a ));
+DATA(insert (  142	 250 a ));
 DATA(insert (	25	142 2896 e ));
 
 /*
@@ -303,7 +303,7 @@
 DATA(insert (  650 1043  730 a ));
 DATA(insert (  869 1043  730 a ));
 DATA(insert (	16 1043 2971 a ));
-DATA(insert (  142 1043 2922 a ));
+DATA(insert (  142 10430 a ));
 DATA(insert ( 1043	142 2896 e ));
 
 /*
@@ -314,7 +314,7 @@
 DATA(insert (  650 1042  730 a ));
 DATA(insert (  869 1042  730 a ));
 DATA(insert (	16 1042 2971 a ));
-DATA(insert (  142 1042 2922 a ));
+DATA(insert (  142 10420 a ));
 DATA(insert ( 1042	142 2896 e ));
 
 /*
diff -ur ../cvs-pgsql/src/include/catalog/pg_proc.h ./src/include/catalog/pg_proc.h
--- ../cvs-pgsql/src/include/catalog/pg_proc.h	2007-11-16 21:25:20.0 +0100
+++ ./src/include/catalog/pg_proc.h	2007-11-22 23:21:07.0 +0100
@@ -4028,13 +4028,13 @@
 DESCR(release all advisory locks);
 
 /* XML support */
-DATA(insert OID = 2893 (  xml_in		   PGNSP PGUID 12 1 0 f f t f i 1 142 2275 _null_ _null_ _null_ xml_in - _null_ _null_ ));
+DATA(insert OID = 2893 (  xml_in		   PGNSP PGUID 12 1 0 f f t f s 1 142 2275 _null_ _null_ _null_ xml_in - _null_ _null_ ));
 DESCR(I/O);
 DATA(insert OID = 2894 (  xml_out		   PGNSP PGUID 12 1 0 f f t f i 1 2275 142 _null_ _null_ _null_ xml_out - _null_ _null_ ));
 DESCR(I/O);
 DATA(insert OID = 2895 (  xmlcomment	   PGNSP PGUID 12 1 0 f f t f i 1 142 25 _null_ _null_ _null_ xmlcomment - _null_ _null_ ));
 DESCR(generate an XML comment);
-DATA(insert OID = 2896 (  xml			   PGNSP PGUID 12 1 0 f f t f i 1 142 25 _null_ _null_ _null_ texttoxml - _null_ _null_ ));
+DATA(insert OID = 2896 (  xml			   PGNSP PGUID 12 1 0 f f t f s 1 142 25 _null_ _null_ _null_ texttoxml - _null_ _null_ ));
 DESCR(perform a non-validating parse of a character string to produce an XML value);
 DATA(insert OID = 2897 (  xmlvalidate	   PGNSP PGUID 12 1 0 f f t f i 2 16 142 25 _null_ _null_ _null_ xmlvalidate - _null_ _null_ ));
 DESCR(validate an XML value);
@@ -4046,7 +4046,7 @@
 DESCR(aggregate transition function);
 DATA(insert OID = 2901 (  xmlagg		   PGNSP PGUID 12 1 0 t f f f i 1 142 142 _null_ _null_ _null_ aggregate_dummy - _null_ _null_ ));
 DESCR(concatenate XML values);
-DATA(insert OID = 2922 (  text			   PGNSP PGUID 12 1 0 f f t f s 1 25 142 _null_ _null_ _null_ xmltotext - _null_ _null_ ));
+DATA(insert OID = 2922 (  text			   PGNSP PGUID 12 1 0 f f t f i 1 25 142 _null_ _null_ _null_ xmltotext - _null_ _null_ ));
 DESCR(serialize an XML value to a character string);
 
 DATA(insert OID = 2923 (  table_to_xml  PGNSP PGUID 12 100 0 f f t f s 4 142 2205 16 16 25 _null_ _null_ {tbl,nulls,tableforest,targetns} table_to_xml - _null_ _null_ ));
diff -ur ../cvs-pgsql/src/test/regress/expected/opr_sanity.out ./src/test/regress/expected/opr_sanity.out
--- ../cvs-pgsql/src/test/regress/expected/opr_sanity.out	2007-09-12 20:57:22.0 +0200
+++ ./src/test/regress/expected/opr_sanity.out	2007-11-22 23:40:13.0 +0100
@@ -288,6 +288,9 @@
 -- those are binary-compatible while the reverse way goes through rtrim().
 -- As of 8.2, this finds the cast from cidr to inet, because that is a
 -- trivial binary coercion while the other way goes through inet_to_cidr().
+-- As of 8.3, this finds the casts from xml to text, varchar, and bpchar,
+-- because those are binary-compatible while the reverse goes through
+-- texttoxml(), which does an XML syntax check.
 SELECT *
 FROM pg_cast c
 WHERE c.castfunc = 0 AND
@@ -300,7 +303,10 @@
  25 |   1042 |0 | i
1043 |   1042 |0 | i
 650 |869 |0 | i
-(3 rows)
+142 | 25 |0 | a
+

Re: [HACKERS] wrong behavior using to_char() again

2007-11-22 Thread Euler Taveira de Oliveira
Bruce Momjian wrote:

 I am confused.  You stated in your earlier email:
 
 Looking again at bug report [1], I agree that's a glibc bug.  Numbers
 in pt_BR has its format 1.234.567,89; sometimes the format 1234567,89
 is acceptable too, ie, the thousand separator is optional. I guess
 
 so I assumed that you were OK with having . be the thousands
 separator.  I think we have to try to get a proper fix even if glibc is
 incorrect. The problem we had with psql print.c is that when we didn't
 provide a . default we had people complaining about that.  The idea I
 think is that if people are asking for a thousands separator in the
 to_char() format they certainly want to see a thousands separator.
 
Maybe I'm not so clear (too few caffeine) but what I tried to say
(suggest) is that we could accept the thousands_sep from glibc instead
of guessing it (.). I'm fine with the current behavior (at least in
pt_BR) but I'm afraid we have broken some locales (those that a
presented in the lcnumeric.diff).


-- 
  Euler Taveira de Oliveira
  http://www.timbira.com/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] wrong behavior using to_char() again

2007-11-22 Thread Bruce Momjian
Euler Taveira de Oliveira wrote:
 Bruce Momjian wrote:
 
  OK, I researched this and realized it should have been obvious to me
  when I added this code in 2006 that making the thousands separator
  always , for a locale of  was going to cause a problem.
  
 I tested your patch and IMHO it breaks the glibc behavior. I'm providing
 a SQL script [1] and a diff [2] showing the differences between before
 and after applying it. In [2], I see a lot of common used (pt_*, es_*,
 and fr_*) locales that we'll be changed. Is it the behavior we want to
 support? I think we shouldn't try to fix glibc bug inside PostgreSQL (in
 this case, use should accept  as a possible value for thousands_sep).

I am confused.  You stated in your earlier email:

 Looking again at bug report [1], I agree that's a glibc bug.  Numbers
 in pt_BR has its format 1.234.567,89; sometimes the format 1234567,89
 is acceptable too, ie, the thousand separator is optional. I guess

so I assumed that you were OK with having . be the thousands
separator.  I think we have to try to get a proper fix even if glibc is
incorrect. The problem we had with psql print.c is that when we didn't
provide a . default we had people complaining about that.  The idea I
think is that if people are asking for a thousands separator in the
to_char() format they certainly want to see a thousands separator.

The backend behavior now matches the psql numericlocale behavior which
was accepted a while back.

  I don't think there is any change needed for the C locale.  That part
  seems fine, as Alvaro already pointed out.
  
 I don't know about C locale, but it's broken too. In PostgreSQL, it's
 following the en_US behavior. Comments?
 
 [EMAIL PROTECTED]:/a/pgsql$ ./a.out C
 decimal_point: .
 thousands_sep: 
 [EMAIL PROTECTED]:/a/pgsql$ ./a.out en_US
 decimal_point: .
 thousands_sep: ,

Yes, I think that is correct.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] wrong behavior using to_char() again

2007-11-22 Thread Euler Taveira de Oliveira
Bruce Momjian wrote:

 OK, I researched this and realized it should have been obvious to me
 when I added this code in 2006 that making the thousands separator
 always , for a locale of  was going to cause a problem.
 
I tested your patch and IMHO it breaks the glibc behavior. I'm providing
a SQL script [1] and a diff [2] showing the differences between before
and after applying it. In [2], I see a lot of common used (pt_*, es_*,
and fr_*) locales that we'll be changed. Is it the behavior we want to
support? I think we shouldn't try to fix glibc bug inside PostgreSQL (in
this case, use should accept  as a possible value for thousands_sep).


 I don't think there is any change needed for the C locale.  That part
 seems fine, as Alvaro already pointed out.
 
I don't know about C locale, but it's broken too. In PostgreSQL, it's
following the en_US behavior. Comments?

[EMAIL PROTECTED]:/a/pgsql$ ./a.out C
decimal_point: .
thousands_sep: 
[EMAIL PROTECTED]:/a/pgsql$ ./a.out en_US
decimal_point: .
thousands_sep: ,

[1] http://timbira.com/tmp/lcn3.sql
[2] http://timbira.com/tmp/lcnumeric.diff


-- 
  Euler Taveira de Oliveira
  http://www.timbira.com/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] possible to create multivalued index from xpath() results in 8.3?

2007-11-22 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Should we force initdb to correct these pg_proc entries, or just quietly
 change pg_proc.h?

 Considering the extent of the changes, I'd be in favor of forcing an initdb.

Well, if you're going to change the contents of pg_cast then there is
little choice.  I was considering something less invasive ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Autovacuum and OldestXmin

2007-11-22 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 That's a killer reason, I suppose. I was really trying to uncover what
 the thinking was, so we can document it. Having VACUUM ignore it
 completely seems wrong.

What you seem to be forgetting is that VACUUM is charged with cleaning
out LP_DEAD tuples, which HOT cannot do.  And the page header fields are
set (quite properly so) with HOT's interests in mind not VACUUM's.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Autovacuum and OldestXmin

2007-11-22 Thread Christopher Browne
The world rejoiced as [EMAIL PROTECTED] (Alvaro Herrera) wrote:
 Simon Riggs wrote:
 I notice that slony records the oldestxmin that was running when it last
 ran a VACUUM on its tables. This allows slony to avoid running a VACUUM
 when it would be clearly pointless to do so.
 
 AFAICS autovacuum does not do this, or did I miss that?

 Hmm, I think it's just because nobody suggested it and I didn't came up
 with the idea.

 Whether it's a useful thing to do is a different matter.  Why store it
 per table and not more widely?  Perhaps per database would be just as
 useful; and maybe it would allow us to skip running autovac workers
 when there is no point in doing so.

I think I need to take blame for that feature in Slony-I ;-).

I imagine it might be useful to add it to autovac, too.  I thought it
was pretty neat that this could be successfully handled by comparison
with a single value (e.g. - eldest xmin), and I expect that using a
single quasi-global value should be good enough for autovac.

If there is some elderly, long-running transaction that isn't a
VACUUM, that will indeed inhibit VACUUM from doing any good, globally,
across the cluster, until such time as that transaction ends.

To, at that point, inhibit autovac from bothering to run VACUUM,
would seem like a good move.  There is still value to running ANALYZE
on tables, so it doesn't warrant stopping autovac altogether, but this
scenario suggests a case for suppressing futile vacuuming, at least...
-- 
If this was helpful, http://svcs.affero.net/rm.php?r=cbbrowne rate me
http://linuxfinances.info/info/slony.html
It's hard to tell if someone is inconspicuous. 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq