date:20071203

Re: [HACKERS] .NET or Mono functions in PG

2007-12-03 Thread James Mansion


Tom Lane wrote:

It is also pretty well established that if pltcl or plperl cause the
backend to become multithreaded, things break horribly.  I strongly
  
Isn't that something that needs to be fixed?  Its one thing not to allow 
for multiple threads to
be in your code, but not having a threaded library linked in is going to 
become more and

more of a problem.

James


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] buildenv.pl/buildenv.bat

2007-12-03 Thread Dave Page

Magnus Hagander wrote:
 Hi!
 
 When you redid the msvc build stuff you seem to have missed the
 documentatino.. Specifically, I notice that buildenv.pl isn't documented -
 docs still say buildenv.bat is the way to go.
 
 Also, do we now have both buildenv.bat and buildenv.pl? (I have them both
 in my environment, but that could be a result of being lazy)

builddoc.bat still seems to require buildenv.bat.

/D

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] buildenv.pl/buildenv.bat

2007-12-03 Thread Magnus Hagander

Hi!

When you redid the msvc build stuff you seem to have missed the
documentatino.. Specifically, I notice that buildenv.pl isn't documented -
docs still say buildenv.bat is the way to go.

Also, do we now have both buildenv.bat and buildenv.pl? (I have them both
in my environment, but that could be a result of being lazy)

//Magnus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] Regression testing

2007-12-03 Thread cinu

Hi All, 

I am trying to do a regression testing on the version
installed.

Now on the installed version of postgreSQL-8.2.4 with
the service started, and at the regress filder, I run
the command gmake installcheck, when I run this
command I get the following message:



GNUmakefile:15: ../../../src/Makefile.global: No such
file or directory
GNUmakefile:78: /src/Makefile.shlib: No such file or
directory
gmake: *** No rule to make target
`/src/Makefile.shlib'.  Stop.



Can anyone please tell me why am I getting this
message and the solution for the same.

Thanks in advance
Regards
Cinu Kuriakose


  Chat on a cool, new interface. No download required. Go to 
http://in.messenger.yahoo.com/webmessengerpromo.php

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Sorting Improvements for 8.4

2007-12-03 Thread Simon Riggs

On Fri, 2007-11-30 at 12:07 -0800, Jeff Davis wrote: 
 On Tue, 2007-11-27 at 18:03 +, Simon Riggs wrote:
  5. DYNAMIC RUN HANDLING (in Final Merge)
  
  Another way of addressing a) is to simply make better use of memory
  itself. Let's look at that in more detail:
  
  Number of runs that can be merged at once is currently fixed, based upon
  available memory. This has the underlying assumption that all runs will
  be concurrently active during final merging, which may not always be
  true.
  
  If we have random data then almost all runs will overlap with all other
  runs, i.e. the min and max values are sufficiently wide that the runs do
  all overlap. In many cases, data arrives in somewhat sorted order, e.g.
  financial data is fairly regular with some late payers but not many, and
  those trail off with a fairly tight decay. In the somewhat sorted case
  we find that the actual overlap is less than total, so there are many
  later runs that don't overlap the earlier ones. In the best case we
  would find run 1 and 2 overlap, runs 2 and 3 overlap, then 3 and 4
  overlap.
 
 I have spoken with Len Shapiro, a professor at Portland State
 University, regarding sorting before.
 
 He suggests that PostgreSQL should implement forecasting, which is
 similar to what you're describing. Forecasting does not require that
 entire runs are disjoint, it works by tracking the maximum values from
 the last block read from every run. This allows you to know which run
 you will need more blocks from the soonest.
 
 I'm still looking into the problem to understand it better, but the
 algorithm is in Knuth Vol 3.
 
 I can look at it in more detail, but have you already looked into this
 idea? Is there a reason we don't do this currently?

Interesting, I hadn't read that part.

Knuth's Algorithm F covers how to do a P-way merge using 2P + 2 buffers.
My ideas cover how to do a P-way merge when you don't have enough memory
for that many buffers. 

The current sort code makes two assumptions, amongst others

1. minimizing number of runs is always worth it
2. there is a single fixed maximum size of P, depending upon memory

I'm challenging both of those. Only runs that overlap need to be merged
simultaneously, so if the runs aren't overlapping then its OK to allow
more runs to be formed. If its OK to allow more runs, then reducing heap
size to allow better CPU efficiency is possible.

So Algorithm F is somewhat orthogonal to what I've proposed, though
maybe still interesting. What we do now is fairly close, but you should
look at the code in tuplesort.c and logtape.c to see how well it
matches. That might lead to an increase in the limit of the number of
concurrent runs mergeable at any one time.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Stored procedure issue

2007-12-03 Thread Dragan Zubac

Hello

What I have noticed is that when I don't use procedure
at all,there's only 2-5 locks in pg_locks,after I
start application which uses stored procedure the
number in pg_locks increase rapidly to steady 75 even
to 130 at certain moments.

Any clue why procedure usage might increase locks so
heavily ?

Sincerely

--- Dragan Zubac [EMAIL PROTECTED] wrote:

 Hello
 
 I have a stored procedure which does the billing
 stuff
 in our system,it works ok,but if I put in
 production,where there is some 5-10 billing events
 per
 second,the whole database slows down. It won't even
 drop some test table,reindex,vacuum,things which
 were
 done before in the blink of an eye. If I stop the
 application which calls the procedure,all is back to
 normal.
 
 We didn't implement any special locking mechanism in
 the procedure,all is default. The procedure is
 updating user's balance in table 'users'. On the
 other
 hand a couple of 'heavy load' table has foreign keys
 pointing to table 'users'.
 
 Is it the matter of concurency and some locking
 issue
 or maybe the existing of all those foreign keys
 pointing to table 'users',or maybe something else
 which we're not aware at the moment ?
 
 Sincerely
 
 Pera
 
 
  


 Be a better sports nut!  Let your teams follow you 
 with Yahoo Mobile. Try it now. 

http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ
 
 ---(end of
 broadcast)---
 TIP 7: You can help support the PostgreSQL project
 by donating at
 

 http://www.postgresql.org/about/donate
 



  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ 


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Regression testing

2007-12-03 Thread Alvaro Herrera

cinu wrote:

 GNUmakefile:15: ../../../src/Makefile.global: No such
 file or directory
 GNUmakefile:78: /src/Makefile.shlib: No such file or
 directory
 gmake: *** No rule to make target
 `/src/Makefile.shlib'.  Stop.

You need to run configure first (found in the top-level directory), so
that certain files are generated.

-- 
Alvaro Herrera http://www.flickr.com/photos/alvherre/
You're _really_ hosed if the person doing the hiring doesn't understand
relational systems: you end up with a whole raft of programmers, none of
whom has had a Date with the clue stick.  (Andrew Sullivan)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] [HACKERS] Stored procedure issue

2007-12-03 Thread Pavel Stehule

Hello

On 03/12/2007, Dragan Zubac [EMAIL PROTECTED] wrote:
 Hello

 What I have noticed is that when I don't use procedure
 at all,there's only 2-5 locks in pg_locks,after I
 start application which uses stored procedure the
 number in pg_locks increase rapidly to steady 75 even
 to 130 at certain moments.

 Any clue why procedure usage might increase locks so
 heavily ?

 Sincerely


upgrade to 8.2? There is shared lock and there are less problems with
locks. But I am not sure if this solves your problem. General
protection before locks is all things with maximal speed. Are your
queris well optimazed?

Regards
Pavel Stehule

 --- Dragan Zubac [EMAIL PROTECTED] wrote:

  Hello
 
  I have a stored procedure which does the billing
  stuff
  in our system,it works ok,but if I put in
  production,where there is some 5-10 billing events
  per
  second,the whole database slows down. It won't even
  drop some test table,reindex,vacuum,things which
  were
  done before in the blink of an eye. If I stop the
  application which calls the procedure,all is back to
  normal.
 
  We didn't implement any special locking mechanism in
  the procedure,all is default. The procedure is
  updating user's balance in table 'users'. On the
  other
  hand a couple of 'heavy load' table has foreign keys
  pointing to table 'users'.
 
  Is it the matter of concurency and some locking
  issue
  or maybe the existing of all those foreign keys
  pointing to table 'users',or maybe something else
  which we're not aware at the moment ?
 
  Sincerely
 
  Pera
 
 
 
 
 
  Be a better sports nut!  Let your teams follow you
  with Yahoo Mobile. Try it now.
 
 http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ
 
  ---(end of
  broadcast)---
  TIP 7: You can help support the PostgreSQL project
  by donating at
 
 
  http://www.postgresql.org/about/donate
 



   
 
 Be a better friend, newshound, and
 know-it-all with Yahoo! Mobile.  Try it now.  
 http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ


 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] buildenv.pl/buildenv.bat

2007-12-03 Thread Tom Lane

Magnus Hagander [EMAIL PROTECTED] writes:
 When you redid the msvc build stuff you seem to have missed the
 documentatino.. Specifically, I notice that buildenv.pl isn't documented -
 docs still say buildenv.bat is the way to go.

On a related note, I was wondering yesterday if there are any obsolete
statements here:
http://developer.postgresql.org/pgdocs/postgres/install-win32-libpq.html

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[HACKERS] Integer datatimes vs msvc

2007-12-03 Thread Magnus Hagander

Did we ever come to a conclusion on what to do with integer datetimes on
msvc for 8.3?

For the uninformed - 8.2 and earlier had the binary build with floating
point timestamps, 8.3 has integer timestamps. (We're just talking the
binary build here)

//Magnus

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] buildenv.pl/buildenv.bat

2007-12-03 Thread Magnus Hagander

On Mon, Dec 03, 2007 at 10:52:15AM -0500, Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
  When you redid the msvc build stuff you seem to have missed the
  documentatino.. Specifically, I notice that buildenv.pl isn't documented -
  docs still say buildenv.bat is the way to go.
 
 On a related note, I was wondering yesterday if there are any obsolete
 statements here:
 http://developer.postgresql.org/pgdocs/postgres/install-win32-libpq.html

No, I refreshed that page back-when. It's used for the frontend-only build
that can use earlier versions of visual studio. It also supports 64-bit
libpq which the main build doesn't  (yet).

//Magnus

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Integer datatimes vs msvc

2007-12-03 Thread Tom Lane

Magnus Hagander [EMAIL PROTECTED] writes:
 Did we ever come to a conclusion on what to do with integer datetimes on
 msvc for 8.3?

 For the uninformed - 8.2 and earlier had the binary build with floating
 point timestamps, 8.3 has integer timestamps. (We're just talking the
 binary build here)

I believe we have consensus that 8.4 is the time to do that --- see
thread here:
http://archives.postgresql.org/pgsql-patches/2007-05/msg00046.php

So my recommendation is that the MSVC build should follow the source
change, ie, not change until 8.4.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

[HACKERS] 8.3 beta 4 crash on windows 2003 64-bit

2007-12-03 Thread Filip Wuytack

Hi,

I loaded pg8.3 beta 4 on a win2k3 64bit box (8GB ram). I was able to
restore a database into it (about 30GB,5schemas and about 2000 tables)
but whenever I try to open the tables in pgadmin3, the database crashes
(i've tried with both the default postgresql.conf file and with some
more tuned ones (based on 8.2.5 running on the same box before holding
the same db):

2007-12-03 16:38:01 GMT LOG:  database system is ready to accept
connections
2007-12-03 16:38:01 GMT LOG:  autovacuum launcher started
 This application has requested the Runtime to terminate it in an
unusual way.
Please contact the application's support team for more information.
2007-12-03 16:38:22 GMT LOG:  server process (PID 3716) exited with exit
code 3
2007-12-03 16:38:22 GMT LOG:  terminating any other active server
processes
2007-12-03 16:38:22 GMT WARNING:  terminating connection because of
crash of another server process
2007-12-03 16:38:22 GMT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2007-12-03 16:38:22 GMT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2007-12-03 16:38:22 GMT LOG:  all server processes terminated;
reinitializing
2007-12-03 16:38:23 GMT FATAL:  pre-existing shared memory block is
still in use
2007-12-03 16:38:23 GMT HINT:  Check if there are any old server
processes still running, and terminate them.


~ filip

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Integer datatimes vs msvc

2007-12-03 Thread Michael Glaesemann



On Dec 3, 2007, at 11:27 , Tom Lane wrote:


I believe we have consensus that 8.4 is the time to do that --- see
thread here:
http://archives.postgresql.org/pgsql-patches/2007-05/msg00046.php



Is there anything in the release notes (or elsewhere) for 8.3 that  
mention this as the intention, similar to the formerly deprecated  
money type? Would we want to?


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Integer datatimes vs msvc

2007-12-03 Thread Magnus Hagander

Andrew Dunstan wrote:
 
 
 Magnus Hagander wrote:
 Did we ever come to a conclusion on what to do with integer datetimes on
 msvc for 8.3?

 For the uninformed - 8.2 and earlier had the binary build with floating
 point timestamps, 8.3 has integer timestamps. (We're just talking the
 binary build here)


   
 
 If you are preparing a binary package you can use whatever options you
 like. But I think in principle the MSVC defaults in our source code
 should mirror the Unix defaults.

Right. I know there are certainly linux distributions that change it to
integer timestamps (debian for example), so it's just a matter of what
we want to ship in the installer.

But if that's the consensus - Dave, unless you wrapped the new beta
already, please change that in your config file before you do!

//Magnus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Integer datatimes vs msvc

2007-12-03 Thread Andrew Dunstan




Magnus Hagander wrote:

Did we ever come to a conclusion on what to do with integer datetimes on
msvc for 8.3?

For the uninformed - 8.2 and earlier had the binary build with floating
point timestamps, 8.3 has integer timestamps. (We're just talking the
binary build here)


  


If you are preparing a binary package you can use whatever options you 
like. But I think in principle the MSVC defaults in our source code 
should mirror the Unix defaults.


cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] 8.3 beta 4 crash on windows 2003 64-bit

2007-12-03 Thread Andrew Dunstan




Filip Wuytack wrote:

Hi,

I loaded pg8.3 beta 4 on a win2k3 64bit box (8GB ram). I was able to
restore a database into it (about 30GB,5schemas and about 2000 tables)
but whenever I try to open the tables in pgadmin3, the database crashes
(i've tried with both the default postgresql.conf file and with some
more tuned ones (based on 8.2.5 running on the same box before holding
the same db):


  


Where did you get the binary from? How was it built?

cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] 8.3 beta 4 crash on windows 2003 64-bit

2007-12-03 Thread Filip Wuytack

From the postgresql ftp server. I initially wanted to download beta3,
but spotted beta4 so I gave that a go

ftp.postgresql.org

/pub/binary/v8.3beta4/win32/

~ Filip 

-Original Message-
From: Andrew Dunstan [mailto:[EMAIL PROTECTED] 
Sent: 03 December 2007 17:09
To: Filip Wuytack
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] 8.3 beta 4 crash on windows 2003 64-bit



Filip Wuytack wrote:
 Hi,

 I loaded pg8.3 beta 4 on a win2k3 64bit box (8GB ram). I was able to 
 restore a database into it (about 30GB,5schemas and about 2000 tables)

 but whenever I try to open the tables in pgadmin3, the database 
 crashes (i've tried with both the default postgresql.conf file and 
 with some more tuned ones (based on 8.2.5 running on the same box 
 before holding the same db):


   

Where did you get the binary from? How was it built?

cheers

andrew

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Integer datatimes vs msvc

2007-12-03 Thread Magnus Hagander

Magnus Hagander wrote:
 Andrew Dunstan wrote:

 Magnus Hagander wrote:
 Did we ever come to a conclusion on what to do with integer datetimes on
 msvc for 8.3?

 For the uninformed - 8.2 and earlier had the binary build with floating
 point timestamps, 8.3 has integer timestamps. (We're just talking the
 binary build here)


   
 If you are preparing a binary package you can use whatever options you
 like. But I think in principle the MSVC defaults in our source code
 should mirror the Unix defaults.
 
 Right. I know there are certainly linux distributions that change it to
 integer timestamps (debian for example), so it's just a matter of what
 we want to ship in the installer.
 
 But if that's the consensus - Dave, unless you wrapped the new beta
 already, please change that in your config file before you do!

Hmm. Too late to pull that one, but we'll make the change for next
beta/rc then.

//Magnus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] buildenv.pl/buildenv.bat

2007-12-03 Thread Andrew Dunstan




Magnus Hagander wrote:

Hi!

When you redid the msvc build stuff you seem to have missed the
documentatino.. Specifically, I notice that buildenv.pl isn't documented -
docs still say buildenv.bat is the way to go.

Also, do we now have both buildenv.bat and buildenv.pl? (I have them both
in my environment, but that could be a result of being lazy)


  


My recollection is that I changed the minimum amount necessary, because 
I was expecting us to go into beta at anmy moment (silly me). That might 
be why we still have both. There was an expectation that some cleanup 
might be required during 8.4 development. I know I left other .bat files 
as wrappers for the perl scripts, but that's obviously not appropriate 
here. I'll see if I can adjust builddoc.bat so we can get rid of 
buildenv.bat.


As for documentation, you're probably right, I could easily have missed 
it. I'll look into it.


cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] 8.3 beta 4 crash on windows 2003 64-bit

2007-12-03 Thread Filip Wuytack

Andrew,

As to how it was built: I was under the impression that the 8.3 release
is build using MSVSC? 

I was keen to give it a try to see if this version would work more
optimally on windows as the pg instance (8.2.5) I run normally suffers
from serious high pages/sec (by looking at perfmon on windows) slowing
the whole thing down (it's connected to a SAN on a 4Gb card and holds 
80 disks...so file I/O should not be the bottleneck). 

Now a quick restore back to 8.2.5...

~ Filip

-Original Message-
From: Filip Wuytack 
Sent: 03 December 2007 17:12
To: 'Andrew Dunstan'
Cc: 'pgsql-hackers@postgresql.org'
Subject: RE: [HACKERS] 8.3 beta 4 crash on windows 2003 64-bit

From the postgresql ftp server. I initially wanted to download beta3,
but spotted beta4 so I gave that a go

ftp.postgresql.org

/pub/binary/v8.3beta4/win32/

~ Filip 

-Original Message-
From: Andrew Dunstan [mailto:[EMAIL PROTECTED]
Sent: 03 December 2007 17:09
To: Filip Wuytack
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] 8.3 beta 4 crash on windows 2003 64-bit



Filip Wuytack wrote:
 Hi,

 I loaded pg8.3 beta 4 on a win2k3 64bit box (8GB ram). I was able to 
 restore a database into it (about 30GB,5schemas and about 2000 tables)

 but whenever I try to open the tables in pgadmin3, the database 
 crashes (i've tried with both the default postgresql.conf file and 
 with some more tuned ones (based on 8.2.5 running on the same box 
 before holding the same db):


   

Where did you get the binary from? How was it built?

cheers

andrew

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] Kludge in pg_standby.c

2007-12-03 Thread Gregory Stark


There's a suspicious ifdef in pg_standby for WIN32 which smells like a kludge
added to work around a Windows problem which makes it work but at great
expense:

#ifdef WIN32
/*
 * Windows reports that the file has the right number of bytes
 * even though the file is still being copied and cannot be
 * opened by pg_standby yet. So we wait for sleeptime secs
 * before attempting to restore. If that is not enough, we
 * will rely on the retry/holdoff mechanism.
 */
pg_usleep(sleeptime * 100L);
#endif

This happens before we return *any* WAL file to be processed. That means it
slows down the processing of any file by 1s. On a server which has fallen
behind this means it can't process files as quickly as it can copy them, it's
limited to at most 1/s.

I think it wouldn't be hard to do this properly. We can try to open the file,
handle the expected Windows error by sleeping for 1s and repeating until we
can successfully open it. Something like (untested):

bool success = false;
int fd, tries = 10;
while (--tries) {
   fd = open(WALFilePath, O_RDONLY);
   if (fd = 0) {
   close(fd);
   success = true;
   break;
   } else if (errno == EWINDOWSBLOWS) {
   usleep(100);
   } else {
   perror(pg_standby open:);
   exit(2);
   }
   }
   if (!success) {
   fprintf(stderr, pg_standby: couldn't open file \%s\ due 
to \%s\, 
   WALFilePath, strerror(EWINDOWSBLOWS));
   exit(2);
   }


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Sorting Improvements for 8.4

2007-12-03 Thread Jeff Davis

On Mon, 2007-12-03 at 11:51 +, Simon Riggs wrote:
 So Algorithm F is somewhat orthogonal to what I've proposed, though
 maybe still interesting. What we do now is fairly close, but you should
 look at the code in tuplesort.c and logtape.c to see how well it
 matches. That might lead to an increase in the limit of the number of
 concurrent runs mergeable at any one time.
 

tuplesort.c:

 * When merging runs, we use a heap containing just the frontmost tuple from
 * each source run; we repeatedly output the smallest tuple and insert the
 * next tuple from its source tape (if any).  When the heap empties, the merge
 * is complete.  The basic merge algorithm thus needs very little memory ---
 * only M tuples for an M-way merge, and M is constrained to a small number.
 * However, we can still make good use of our full workMem allocation by
 * pre-reading additional tuples from each source tape.  Without prereading,
 * our access pattern to the temporary file would be very erratic; on average
 * we'd read one block from each of M source tapes during the same time that
 * we're writing M blocks to the output tape, so there is no sequentiality of
 * access at all, defeating the read-ahead methods used by most Unix kernels.
 * Worse, the output tape gets written into a very random sequence of blocks
 * of the temp file, ensuring that things will be even worse when it comes
 * time to read that tape.  A straightforward merge pass thus ends up doing a
 * lot of waiting for disk seeks.  We can improve matters by prereading from
 * each source tape sequentially, loading about workMem/M bytes from each tape
 * in turn.  Then we run the merge algorithm, writing but not reading until
 * one of the preloaded tuple series runs out.  Then we switch back to preread
 * mode, fill memory again, and repeat.  This approach helps to localize both
 * read and write accesses.

The idea of prefetching, as I understand it, is that we don't blindly
preread workMem/M bytes from each of M tapes; instead we predict
which tapes we will need tuples from next through forecasting.

If I understand correctly, we just keep track of the maximum value of
the last block read from each run, and then always read from the run in
which the last block read has the lowest maximum.

It seems as if this would allow a variable number of runs to be merged
at once, but if the data really *is* random, we'd want it to degrade
gracefully something resembling the current implementation.

I'm being somewhat vague here because I haven't taken the time to
really understand it. If you think this idea has potential I will look
into it in more detail.

Regards,
Jeff Davis




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] buildenv.pl/buildenv.bat

2007-12-03 Thread Magnus Hagander


On Mon, 2007-12-03 at 12:15 -0500, Andrew Dunstan wrote:
 
 Magnus Hagander wrote:
  Hi!
 
  When you redid the msvc build stuff you seem to have missed the
  documentatino.. Specifically, I notice that buildenv.pl isn't documented -
  docs still say buildenv.bat is the way to go.
 
  Also, do we now have both buildenv.bat and buildenv.pl? (I have them both
  in my environment, but that could be a result of being lazy)
 
 

 
 My recollection is that I changed the minimum amount necessary, because 
 I was expecting us to go into beta at anmy moment (silly me). That might 
 be why we still have both. There was an expectation that some cleanup 
 might be required during 8.4 development. I know I left other .bat files 
 as wrappers for the perl scripts, but that's obviously not appropriate 
 here. I'll see if I can adjust builddoc.bat so we can get rid of 
 buildenv.bat.
 
 As for documentation, you're probably right, I could easily have missed 
 it. I'll look into it.

Great. Let me know if you need me to poke at anything.

//Magnus

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Sorting Improvements for 8.4

2007-12-03 Thread Simon Riggs

On Mon, 2007-12-03 at 10:32 -0800, Jeff Davis wrote:

 If I understand correctly, we just keep track of the maximum value of
 the last block read from each run, and then always read from the run in
 which the last block read has the lowest maximum.

Yep, sounds like Algorithm F

 It seems as if this would allow a variable number of runs to be merged
 at once, but if the data really *is* random, we'd want it to degrade
 gracefully something resembling the current implementation.

If we also keep track of the endpoints of runs that we haven't yet read
from, then yes that would link my ideas with Algorithm F, so we just
have a single implementation. (F++ ?)

Probably easiest to store the endpoint tuples directly, with some sane
limits for when we have very large tuples.

You'll still need to do run-level forecasting as I had proposed to tell
whether you need to do any intermediate merging prior to the final
merge. So the two sets of ideas can't be brought together completely.

 I'm being somewhat vague here because I haven't taken the time to
 really understand it. If you think this idea has potential I will look
 into it in more detail.

Yes, F++ sound like it will use memory more effectively than we do
currently. That's likely to improve performance when the number of runs
approaches the limit for the size of work_mem. So this will improve
external sorts with too small memory allocations, but it won't do
anything about sorts with too large a memory allocation. That's probably
the order of importance for tackling sort performance, so thats good.

Probably best to test with 
- 1M - 4M work_mem, so we see the full benefit of any improvements in
memory utilisation in a typical context
- number of runs is nearly at limit for memory
- total sort is very large, so we see real I/O issues starkly

You'll need to instrument things carefully so you can tell how many runs
are being merged at any one time and how that effects elapsed time/row.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] Is postgres.gif missing in cvs?

2007-12-03 Thread Devrim GÜNDÜZ

Hi,

I tried to run make postgres.pdf (PostgreSQL 8.3beta4), and got this
error:

cp ./../graphics/postgres.gif .
cp: cannot stat `./../graphics/postgres.gif': No such file or directory
make: *** [postgres.gif] Error 1

CVS/tarball does not include this file, and I think we should add.

Regards,
-- 
Devrim GÜNDÜZ , RHCE
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/


signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] Sorting Improvements for 8.4

2007-12-03 Thread Gregory Stark


Simon Riggs [EMAIL PROTECTED] writes:

 On Mon, 2007-12-03 at 10:32 -0800, Jeff Davis wrote:

 If I understand correctly, we just keep track of the maximum value of
 the last block read from each run, and then always read from the run in
 which the last block read has the lowest maximum.

So it sounds like the use case where this is the biggest win would be a
situation where you have presorted input which has been sliced up. So for
example sorting by zip code in a table which was clustered by city. The
alphabetic order of the cities isn't correlated to the results but all the zip
codes for a city are in a contiguous block somewhere in the output.

In such a case after doing a single pass we would have a bunch of tapes each
of which corresponded to a single city and was able to completely reorder the
zip codes in that city to be ordered. So the desired results would be, for
example, all the tuples from tape 17 (NYC) followed by all the tuples from
tape 3 (Buffalo) followed by all the tuples from tape 1 (Albuquerque), etc.

We currently preread an equal amount from each tape and then would empty all
the preread tuples from tape 17, refill them, preread them again, repeat until
tape 17 is empty then move on to tape 3. All the tuples except the currently
active tape are completely idle.

I think the way to do what you're proposing is to preread one tuple from each
tape, then when one preread bunch is emptied refill it with twice as many and
repeat. In this case you would end up with nearly all of workmem full of
tuples from NYC until you're done with NYC. That would increase the prereading
block size by a factor of 20 in this case. 

So the question is just how many seeks are we doing during sorting. If we're
doing 0.1% seeks and 99.9% sequential i/o then eliminating the 1% entirely
(which we can't do) isn't going to speed up seeking all that much. If we're
doing 20% seeks and can get that down to 10% it might be worthwhile.

I'm not sure where the idea of keeping the current bounds of the input tapes
comes into it. We only preread when we run out of tuples anyways and then we
don't really have a choice about which tape we want to preread from. And it's
a good thing too since maintaining such a list of bounds and finding the
lowest or highest would mean maintaining a second heap which would basically
double the cpu cost of sorting.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Is postgres.gif missing in cvs?

2007-12-03 Thread Kris Jurka




On Mon, 3 Dec 2007, Devrim G?ND?Z wrote:


I tried to run make postgres.pdf (PostgreSQL 8.3beta4), and got this
error:



You need to say make postgres-US.pdf or -A4.  The .gif file is coming 
from make's default rules.  Perhaps we should provide a real target 
matching postgres.pdf and error out with a helpful message as this isn't 
the first time people have been bitten by that change.


Kris Jurka

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Sorting Improvements for 8.4

2007-12-03 Thread Simon Riggs

On Mon, 2007-12-03 at 20:40 +, Gregory Stark wrote:

 I think the way to do what you're proposing is...

Don't understand that. Algorithm F covers that already doesn't it?

 So the question is just how many seeks are we doing during sorting. If we're
 doing 0.1% seeks and 99.9% sequential i/o then eliminating the 1% entirely
 (which we can't do) isn't going to speed up seeking all that much. If we're
 doing 20% seeks and can get that down to 10% it might be worthwhile.

The buffer size at max tapes is an optimum - a trade off between
avoiding intermediate merging and merging efficiently. Freeing more
memory is definitely going to help in the case of low work_mem and lots
of runs.

You're right that there is a limit to the benefit you can get. I wrote a
patch in 2005/6 to optimise the memory usage when there were few runs
and lots of memory. I still think there's value in that.

 I'm not sure where the idea of keeping the current bounds of the input tapes
 comes into it. We only preread when we run out of tuples anyways and then we
 don't really have a choice about which tape we want to preread from.

You have to decide whether to perform intermediate merges or whether you
can do everything at the final merge. Otherwise you can't merge more
runs than you have buffers for, since you'd at some point freeze up and
not be able to input.

 And it's
 a good thing too since maintaining such a list of bounds and finding the
 lowest or highest would mean maintaining a second heap which would basically
 double the cpu cost of sorting.

I think you're not understanding me.

You only need to record the lowest or highest when a run
completes/starts. When all runs have been written we then have a table
of the highest and lowest values for each run. We then scan that to see
whether we can perform merging in one pass, or if not what kind of
intermediate merging is required. We keep the merge plan in memory and
then follow it. So probably very small % of total sort cost, though
might save you doing intermediate merges with huge costs.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Is postgres.gif missing in cvs?

2007-12-03 Thread Alvaro Herrera

Devrim GÜNDÜZ wrote:
 Hi,
 
 I tried to run make postgres.pdf (PostgreSQL 8.3beta4), and got this
 error:
 
 cp ./../graphics/postgres.gif .
 cp: cannot stat `./../graphics/postgres.gif': No such file or directory
 make: *** [postgres.gif] Error 1

No, the reason for that error message is that it is trying to generate
postgres.pdf from postgres.gif.  Try make postgres-A4.pdf or
postgres-US.pdf instead.


-- 
Alvaro Herrera   http://www.PlanetPostgreSQL.org/
Hay quien adquiere la mala costumbre de ser infeliz (M. A. Evans)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Sorting Improvements for 8.4

2007-12-03 Thread Gregory Stark

Simon Riggs [EMAIL PROTECTED] writes:

 The buffer size at max tapes is an optimum - a trade off between
 avoiding intermediate merging and merging efficiently. Freeing more
 memory is definitely going to help in the case of low work_mem and lots
 of runs.

I can't follow these abstract arguments. That's why I tried to spell out a
concrete example.

 I think you're not understanding me.

 You only need to record the lowest or highest when a run
 completes/starts. When all runs have been written we then have a table
 of the highest and lowest values for each run. We then scan that to see
 whether we can perform merging in one pass, or if not what kind of
 intermediate merging is required. We keep the merge plan in memory and
 then follow it. So probably very small % of total sort cost, though
 might save you doing intermediate merges with huge costs.

Ok, that's a very different concept than what I was thinking.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Sorting Improvements for 8.4

2007-12-03 Thread Jeff Davis

On Mon, 2007-12-03 at 20:40 +, Gregory Stark wrote:
 So the question is just how many seeks are we doing during sorting. If we're
 doing 0.1% seeks and 99.9% sequential i/o then eliminating the 1% entirely
 (which we can't do) isn't going to speed up seeking all that much. If we're
 doing 20% seeks and can get that down to 10% it might be worthwhile.

It's not just about eliminating seeks, it's about being able to merge
more runs at one time.

If you are merging 10 runs at once, and only two of those runs overlap
and the rest are much greater values, you might be spending 99% of the
time in sequential I/O. 

But the point is, we're wasting the memory holding those other 8 runs in
memory (wasting 80% of the memory you're using), so we really could be
merging a lot more than 10 runs at once. This might eliminate stages
from the merge process.

My point is just that how many seeks are we doing is not the only
question. We could be doing 99% sequential I/O and still make huge wins.

In reality, of course, the runs aren't going to be disjoint completely,
but they may be partially disjoint. That's where forecasting comes in:
you preread from the tapes you will actually need tuples from soonest.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Is postgres.gif missing in cvs?

2007-12-03 Thread Tom Lane

Devrim =?ISO-8859-1?Q?G=DCND=DCZ?= [EMAIL PROTECTED] writes:
 I got some SGML errors:
 https://devrim.privatepaste.com/501oMnwCYw

Hmph.  What version of the SGML tools are you using?  It seems more
prone to get confused by non-entity-ized '' and '' than what the
rest of us are using.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Is postgres.gif missing in cvs?

2007-12-03 Thread Devrim GÜNDÜZ

Hi,

On Mon, 2007-12-03 at 18:33 -0300, Alvaro Herrera wrote:
 No, the reason for that error message is that it is trying to generate
 postgres.pdf from postgres.gif.  Try make postgres-A4.pdf or
 postgres-US.pdf instead.

Oh, thanks :)

I got some SGML errors:

https://devrim.privatepaste.com/501oMnwCYw

I could fix them; but both the errors and the line numbers seemed very
irrelevant to me. Can someone please take a look at it?

Regards,
-- 
Devrim GÜNDÜZ , RHCE
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/


signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] Sorting Improvements for 8.4

2007-12-03 Thread Jeff Davis

On Mon, 2007-12-03 at 20:40 +, Gregory Stark wrote:
 I'm not sure where the idea of keeping the current bounds of the input tapes
 comes into it. We only preread when we run out of tuples anyways and then we
 don't really have a choice about which tape we want to preread from. And it's
 a good thing too since maintaining such a list of bounds and finding the
 lowest or highest would mean maintaining a second heap which would basically
 double the cpu cost of sorting.
 

You're only keeping track of the maximum value for each run, which
should be cheap to track. The only time it changes is when you're
reading more data from that run, in which case it increases.

The tradeoff that's happening right now is: we want to merge many runs
at once because it reduces the number of merge phases, but the problem
is that it increases the seeking because we read one block from one run,
then one block from another run, etc., especially if the input is
random.

If we reduce the number of runs, then we can preread more efficiently.
See:

tuplesort.c:

* as sorted runs, we can eliminate any repeated I/O at all.  In the
current
* code we determine the number of tapes M on the basis of workMem: we
want
* workMem/M to be large enough that we read a fair amount of data each
time
* we preread from a tape, so as to maintain the locality of access
described
* above.  Nonetheless, with large workMem we can have many tapes.

So, for workMem/M to be large enough, M has to be small enough. But a
small M means we have to do more merge phases, which is expensive.

Forecasting improves this trade. Forecasting no longer _blindly_
prereads from each tape, it uses information that it already has (the
max value of the last block read from each run) to determine the runs
from which we need tuples the soonest.

Then, it prereads the _correct_ data.

Regards,
Jeff Davis



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Is postgres.gif missing in cvs?

2007-12-03 Thread Devrim GÜNDÜZ

Hi,

On Mon, 2007-12-03 at 17:49 -0500, Tom Lane wrote:
  I got some SGML errors:
  https://devrim.privatepaste.com/501oMnwCYw
 
 Hmph.  What version of the SGML tools are you using? 

The ones supplied with Fedora 8.

sgml-common: 0.6.3 
openjada: 1.3.2

Regards,
-- 
Devrim GÜNDÜZ , RHCE
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/


signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] Is postgres.gif missing in cvs?

2007-12-03 Thread Tom Lane

Andrew Dunstan [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Hmph.  What version of the SGML tools are you using?  It seems more
 prone to get confused by non-entity-ized '' and '' than what the
 rest of us are using.

 I'm not totally au fait with the rules of SGML. Does it allow literal 
 '' in text nodes? In most places I looked in our docs we seem to use 
 'lt;' as I would have expected.

It appears to me that the tools will silently take  (and also )
as literal characters, *if* what follows them happens to not look
too much like a tag or entity :-(.  Pretty ugly.  The particular
cases that were biting Devrim seemed to all be occurrences of 
which perhaps is an allowed tag in his release.

I found out that -wxml will cause openjade to warn about these cases.
It turns on a boatload of other warnings that we probably don't care
about, so I'm not going to recommend using it by default, but it
enabled me to find a lot of problem spots just now.

Oh, another interesting behavior that was turned up by this ---
apparently you can get away with leaving off the ; in lt;,
because we had done so in a few places.  -wxml catches that too.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Is postgres.gif missing in cvs?

2007-12-03 Thread Andrew Dunstan




Tom Lane wrote:

Devrim =?ISO-8859-1?Q?G=DCND=DCZ?= [EMAIL PROTECTED] writes:
  

I got some SGML errors:
https://devrim.privatepaste.com/501oMnwCYw



Hmph.  What version of the SGML tools are you using?  It seems more
prone to get confused by non-entity-ized '' and '' than what the
rest of us are using.


  


I'm not totally au fait with the rules of SGML. Does it allow literal 
'' in text nodes? In most places I looked in our docs we seem to use 
'lt;' as I would have expected.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Is postgres.gif missing in cvs?

2007-12-03 Thread Tom Lane

Devrim =?ISO-8859-1?Q?G=DCND=DCZ?= [EMAIL PROTECTED] writes:
 On Mon, 2007-12-03 at 17:49 -0500, Tom Lane wrote:
 Hmph.  What version of the SGML tools are you using? 

 The ones supplied with Fedora 8.

 sgml-common: 0.6.3 
 openjada: 1.3.2

Those are the same version numbers I see in Fedora 6, which doesn't
behave like that ...  Anyway, I've committed some cleanup in HEAD.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Is postgres.gif missing in cvs?

2007-12-03 Thread Devrim GÜNDÜZ

Hi,

On Mon, 2007-12-03 at 18:51 -0500, Tom Lane wrote:

 Those are the same version numbers I see in Fedora 6, which doesn't
 behave like that ...  Anyway, I've committed some cleanup in HEAD.

Thanks. Now, we have some more warnings/errors:

http://www.gunduz.org/postgresql/postgres-A4.log.gz

I can reproduce this on Fedora-8 and CentOS 5.1 (which is more or less
like Fedora 6).

Regards,
-- 
Devrim GÜNDÜZ , RHCE
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/


signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] Is postgres.gif missing in cvs?

2007-12-03 Thread Tom Lane

Devrim =?ISO-8859-1?Q?G=DCND=DCZ?= [EMAIL PROTECTED] writes:
 Thanks. Now, we have some more warnings/errors:
 http://www.gunduz.org/postgresql/postgres-A4.log.gz

You need bigger TeX settings.  Please read Building the Documentation.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

41 matches

Mail list logo