[HACKERS] questionable item in HISTORY

2005-09-24 Thread Tatsuo Ishii
Following item in HISTORY:

 * Add support for 3 and 4-byte UTF8 characters (John Hansen)
   Previously only one and two-byte UTF8 characters were supported.
   This is particularly important for support for some Chinese
   characters.

is wrong since 3-byte UTF-8 characters are supported since UTF-8
support has been added to PostgreSQL. Correct description would be:

 * Add support for 4-byte UTF8 characters (John Hansen)
   Previously only up to three-byte UTF8 characters were supported.
   This is particularly important for support for some Chinese
   characters.

In the mean time I wonder if we need to update UTF-8 -- locale
encoding maps. The author of the patches stated that This is
particularly important for support for some Chinese characters. I
have no idea what encoding he is reffering to, but I wonder if the
latest Chinense encoding standard GB18030 needs 4-byte UTF-8 mappings.
If yes, we surely need to update utf8_to_gb18030.map.

Anybody familiar with GB18030/UTF-8?
--
SRA OSS, Inc. Japan
Tatsuo Ishii

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] stack depth limit exceeded problem.

2005-09-24 Thread Thomas Hallgren

Oliver Jowett wrote:


Thomas Hallgren wrote:

 


PL/Java runs a JVM. Since a JVM is multi threaded, PL/Java goes to
fairly  extreme measures to ensure that only one thread at a time can
access the backend. So far, this have worked well but there is one small
problem. [...]
   



I assume this means you have a single lock serializing requests to the
backend?
 

Yes, of course. I also make sure that the main thread cannot return 
until another thread that is servicing a backend request has completed. 
There's absolutely no way two threads can execute backend code 
simultaniously.



If you can't solve the depth checking problem (Tom doesn't seem to like
the idea of multiple threads calling into the backend..), what about
turning the original thread (i.e. the main backend thread) into a
backend interface thread that does nothing but feed callbacks into the
backend on request? Then run all the user code in a separate thread that
passes backend requests to the interface thread rather than directly
executing them. If it starts extra threads which makes DB requests, the
mechanism stays the same..
 

I though about that. The drawback is that each and every call must spawn 
a new thread, no matter how trivial that call might be. If you do a 
select from a table with 10,000 records and execute a function for each 
record, you get 20,000 context switches. Avoiding that kind of overhead 
is one of the motivating factors for keeping the VM in-process.


I don't rule out such a solution but I'd like to have a discussion with 
Tom and iron out what the problems are when one thread at a time is 
allowed to execute. Perhaps I can solve them.


Regards,
Thomas Hallgren



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] stack depth limit exceeded problem.

2005-09-24 Thread Martijn van Oosterhout
On Sat, Sep 24, 2005 at 10:34:42AM +0200, Thomas Hallgren wrote:
 Oliver Jowett wrote:
 I assume this means you have a single lock serializing requests to the
 backend?
  
 Yes, of course. I also make sure that the main thread cannot return 
 until another thread that is servicing a backend request has completed. 
 There's absolutely no way two threads can execute backend code 
 simultaniously.

Ok, I have a question. PostgreSQL uses sigsetjmp/siglongjmp to handle
errors in the backend. If you're changing the stack, how do you avoid
the siglongjmp jumping back to a different stack? Or do you somehow
avoid this problem altogether?

 I though about that. The drawback is that each and every call must spawn 
 a new thread, no matter how trivial that call might be. If you do a 
 select from a table with 10,000 records and execute a function for each 
 record, you get 20,000 context switches. Avoiding that kind of overhead 
 is one of the motivating factors for keeping the VM in-process.

Well, on linux at least context switches are quite cheap. However, how
does Java handle the possibility that functions never return. Do you
wrap each call in a PG_TRY/PG_CATCH to propegate errors?

Tricky issues...
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpWDeiFxvZex.pgp
Description: PGP signature


Re: [HACKERS] stack depth limit exceeded problem.

2005-09-24 Thread Thomas Hallgren

Martijn van Oosterhout wrote:


On Sat, Sep 24, 2005 at 10:34:42AM +0200, Thomas Hallgren wrote:
 


Oliver Jowett wrote:
   


I assume this means you have a single lock serializing requests to the
backend?

 

Yes, of course. I also make sure that the main thread cannot return 
until another thread that is servicing a backend request has completed. 
There's absolutely no way two threads can execute backend code 
simultaniously.
   



Ok, I have a question. PostgreSQL uses sigsetjmp/siglongjmp to handle
errors in the backend. If you're changing the stack, how do you avoid
the siglongjmp jumping back to a different stack? Or do you somehow
avoid this problem altogether?
 

All calls use a PG_TRY/PG_CATCH. So yes, I think I avoid that problem 
altogether.


I though about that. The drawback is that each and every call must spawn 
a new thread, no matter how trivial that call might be. If you do a 
select from a table with 10,000 records and execute a function for each 
record, you get 20,000 context switches. Avoiding that kind of overhead 
is one of the motivating factors for keeping the VM in-process.
   



Well, on linux at least context switches are quite cheap.

I know. And as I said, I don't rule out such a solution. But however 
cheap, there's still a performance penalty and added complexity. I 
rather avoid both if I can. At least until I know what the real problem 
is with the solution that I propose.



However, how
does Java handle the possibility that functions never return. Do you
wrap each call in a PG_TRY/PG_CATCH to propegate errors?
 

Yes. All backend exceptions are cought in a PG_CATCH and then propagated 
to Java as a ServerException. If there's no catch in the Java code, they 
are rethrown by the java_call_handler. This time with jump buffer that 
was setup by the backend when it invoked the call_handler.


There's also a barrier that will prevent any further calls from the Java 
code once an exception has been thrown by the backend unless that call 
was wrapped in a savepoint construct. A savepoint rollback will unlock 
the barrier (this is not related to the thread issue of course).


Regards,
Thomas Hallgren




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] stack depth limit exceeded problem.

2005-09-24 Thread Martijn van Oosterhout
On Sat, Sep 24, 2005 at 12:26:58PM +0200, Thomas Hallgren wrote:
 Yes. All backend exceptions are cought in a PG_CATCH and then propagated 
 to Java as a ServerException. If there's no catch in the Java code, they 
 are rethrown by the java_call_handler. This time with jump buffer that 
 was setup by the backend when it invoked the call_handler.
 
 There's also a barrier that will prevent any further calls from the Java 
 code once an exception has been thrown by the backend unless that call 
 was wrapped in a savepoint construct. A savepoint rollback will unlock 
 the barrier (this is not related to the thread issue of course).

Well, you seem to have dealt with the obvious issues I can see. I
imagine you need also to worry about things like signal handling. Is
there no way to reserve a stack just for PostgreSQL and switch to that
stack, rather than switch threads (although, the stack is really the
only thing that differentiates threads anyway...).

Linux has sigaltstack so you can catch the stack overflow signal (and
other signals obviously, but that's its main use), but it's not terribly
portable. What you really need to do is set the stack_base_ptr every
time you execute postgres with a new stack; that preserves existing
semantics.

Signals are the only way the kernel can pass control unexpectedly so if
you handle those, postgres would never know it's threaded. I do wonder
if there are any other assumptions made...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpAoT6QNSsBT.pgp
Description: PGP signature


Re: [HACKERS] stack depth limit exceeded problem.

2005-09-24 Thread Thomas Hallgren

Martijn van Oosterhout wrote:
 Linux has sigaltstack so you can catch the stack overflow signal (and
 other signals obviously, but that's its main use), but it's not terribly
 portable.

I rely on the signal handler that the JVM uses for page-faults (which a 
stack overflow generally amounts to) and fpe exeptions so I know that 
they will generate java exceptions in a controlled way (which I in turn 
translate to elog(ERROR) on the main thread).


 What you really need to do is set the stack_base_ptr every
 time you execute postgres with a new stack; that preserves existing
 semantics.

Exactly!. What I'd really like to do in threads other than main is:

void* currentBase = switchStackBase(stackBaseOfMyThread);
PG_TRY
{
  /* service the call here */
  switchStackBase(currentBase);
}
PG_CATCH
{
  switchStackBase(currentBase);
  /* generate Java exception as usual */
}

 Signals are the only way the kernel can pass control unexpectedly so if
 you handle those, postgres would never know it's threaded. I do wonder
 if there are any other assumptions made...

 Have a nice day,

You too. And thanks for all your input.

Regards,
Thomas Hallgren



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] stack depth limit exceeded problem.

2005-09-24 Thread Martijn van Oosterhout
On Sat, Sep 24, 2005 at 02:38:35PM +0200, Thomas Hallgren wrote:
 Martijn van Oosterhout wrote:
  Linux has sigaltstack so you can catch the stack overflow signal (and
  other signals obviously, but that's its main use), but it's not terribly
  portable.
 
 I rely on the signal handler that the JVM uses for page-faults (which a 
 stack overflow generally amounts to) and fpe exeptions so I know that 
 they will generate java exceptions in a controlled way (which I in turn 
 translate to elog(ERROR) on the main thread).

Well, actually, what I was thinking is if someone sends a -INT or -TERM
to the backend, which thread will catch it? You have to block it in
every thread except the one you want to catch it in if you want to
control it. This means that for any signal handler that PostgreSQL
installs, you need to intercept it with a wrapper function to make sure
it runs in the right stack.

Actually, while running backend code, you're probably fine since the
elog stuff will handle it. But if a signal is received while the JVM is
running, the signal handler will get the stack of the JVM. Now,
PostgreSQLs signal handlers tend not to do much so you may be safe.
They tend not to throws errors, but who knows...

Still, this is all solvable I think...
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpLeqyX5gIeM.pgp
Description: PGP signature


Re: [HACKERS] 64-bit API for large objects

2005-09-24 Thread Tom Lane
Jeremy Drake [EMAIL PROTECTED] writes:
 On Fri, 23 Sep 2005, Tom Lane wrote:
 postgresql-fe.h defines a ton of stuff that has no business being
 visible to libpq's client applications.  It's designed to be used by
 our *own* client-side code (psql and the like), but we have not made
 any attempt to keep it from defining stuff that would likely break
 other peoples' code.

 So does this mean that there is a different, more advanced and more likely
 to break random other code, client library where this call would fit
 better?

I've been thinking more about this and come to these conclusions:

1. libpq_fe.h definitely cannot include postgres_fe.h; in fact, it has
no business even defining a type named int64.  That is way too likely
to collide with symbols coming from elsewhere in a client compilation.
I think what we need is to declare a type named pg_int64 and use that
in the externally visible declarations.  The most reasonable place to
put the typedef is postgres_ext.h.  This will mean making configure
generate postgres_ext.h from a template postgres_ext.h.in, but that's
no big deal.

2. We need a strategy for what to do when configure doesn't find a
working int64 type.  My inclination is to just not export the functions
in that case.  So normally, postgres_ext.h would contain something
like

#define HAVE_PG_INT64 1
typedef long long int pg_int64;

but neither of these would appear if configure couldn't find a working
type.  In libpq-fe.h, we'd have

#ifdef HAVE_PG_INT64
extern pg_int64 lo_lseek64(PGconn *conn, int fd, pg_int64 offset, int 
whence);
extern pg_int64 lo_tell64(PGconn *conn, int fd);
#endif

and similarly for all the code inside libpq.  The reason this seems like
a good idea is that client code could key off #ifdef HAVE_PG_INT64
to detect whether the lo64 functions are available; which is useful even
if you don't care about machines without int64, because you still need
to think about machines with pre-8.2 PG installations.

3. This is still not 100% bulletproof, as it doesn't address situations
like building PG with gcc and then trying to compile client apps with a
vendor cc that doesn't understand long long int.  The compile would
choke on the typedef even if you weren't trying to use large objects at
all.  I don't see any very nice way around that.  It might be worth
doing this in postgres_ext.h:

#ifndef NO_PG_INT64
#define HAVE_PG_INT64 1
typedef long long int pg_int64;
#endif

which would at least provide an escape hatch for such situations: define
NO_PG_INT64 before trying to build.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] 64-bit API for large objects

2005-09-24 Thread Tom Lane
Jeremy Drake [EMAIL PROTECTED] writes:
 In any case, are there any comments on the changes below libpq (the
 functions visible to queries on down)?

Within the backend, I don't see the point in maintaining a distinction
between 32- and 64-bit APIs for inv_api.c: you should just switch 'em
to use int64.  You did it that way for inv_getsize but then proceeded
to add separate inv_seek64/tell64 functions, which is inconsistent.
The submitted version of lo_tell64 isn't even calling the right one :-(

The 32-bit version of lo_tell will need to check and error out if the
value it'd need to return doesn't fit in int32.  (Come to think of it,
so should the 32-bit version of lo_lseek.)

All of the LO code needs to be eyeballed to make sure it still behaves
sanely if int64 is really only 32 bits.

It would probably be a good idea also to introduce some overflow checks
to detect cases where the current LO offset would overflow int64 after a
read, write, or seek.  This is missing from the existing code :-(
It is possible to code overflow checks that work regardless of the size
of int64; see int8.c for some inspiration.  I'd suggest also that the
offset be defined as signed not unsigned (int64 not uint64) as this will
simplify the overflow checks and eliminate the prospect of scenarios
where lo_tell64 cannot return a correct value.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] 64-bit API for large objects

2005-09-24 Thread Jeremy Drake
On Fri, 23 Sep 2005, Tom Lane wrote:

 Jeremy Drake [EMAIL PROTECTED] writes:
  2) The lo_*64, in order to be convenient from the client end, have
  functions added to libpq as the existing lo_* functions.  The client side
  of libpq did not previously know anything about int64 or how to
  send/receive them.  I added an include of postgres-fe.h (which according
  to the comment in that file looks like it should go there) so int64 would
  be defined,

 Unfortunately that's completely unacceptable from a namespace-pollution
 point of view.

I don't quite understand.  Allow me to cite the source, so we both are
referring to the same thing here...

[EMAIL PROTECTED] postgresql-8.0.3 $ head -n17 src/include/postgres_fe.h
/*-
 *
 * postgres_fe.h
 *Primary include file for PostgreSQL client-side .c files
 *
 * This should be the first file included by PostgreSQL client libraries and
 * application programs --- but not by backend modules, which should include
 * postgres.h.
 *
 *
 * Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group
 * Portions Copyright (c) 1995, Regents of the University of California
 *
 * $PostgreSQL: pgsql/src/include/postgres_fe.h,v 1.10 2004/12/31 22:03:19 
pgsql Exp $
 *
 *-
 */

Now I may not completely understand the term client, but I think libpq
is a client library and anything which may use it would be an
application program.  So it seems it was an oversight on the part of
libpq to not include it.  Does the term client not mean what I thought
it did (anything which connects to a postgresql server)?


 The real problem here is that int64 isn't a well-defined portable
 datatype, and so it's going to be very hard to export these functions in
 a way that won't break on different platforms, applications compiled
 with a different compiler than libpq was, etc.

Umm, what wouldn't break if you switched compilers in a way that redefined
sizeof(things)?  I happen to know, even using the same compiler but just
changing a compile flag (-m64) which changes sizes of integral types
(sizeof(long) from 32 to 64 bits) will make such actions stop working on
one of my tested platform.  It sucks, I happen to not be fond of this
because I tend not to have every library which is on my box built for
both, but it is the way life is.  I do not know of a platform where the
size of an integral type can change and still be able to link against
libraries and things.  And if the size of some type is not changing, then
things should already be correctly set for the platform.  But I admit I
have not met every platform in existance.  Do you happen to be able to
cite a platform where this is the case?


 For that matter, we can't even guarantee that they work at all: not all
 platforms even *have* int64 types.  We have so far avoided putting any
 fundamental dependencies on int64 arithmetic into the system, and I'm a
 bit worried that this patch will break LO support entirely on platforms
 that don't have working int64 arithmetic.

They should in fact break gracefully on such platforms, or at least as
gracefully as any other int64-using code might.  I did check a couple
places for #ifdef INT64_BROKEN (or whatever it was called) to make sure
that on those platforms something at least somewhat sane would happen.
(they use 32 bits instead).  Also, on those platforms, you could always
use the non-64 versions if you were concerned about that.  The patches
would allow seeking past the old limit using the 32 function in stages
(seek 2G, seek 2G, seek 2G would put you at 6G) if you do not mind wierd
return values and tell not working.  And if you use a platform which
does not support 64bit integral types, then you cannot reasonably expect
those functions to work correctly anyway.  But they should compile at
least.


   regards, tom lane

 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match


-- 
I don't wanna argue, and I don't wanna fight,
But there will definitely be a party tonight...

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] Releasing memory during External sorting?

2005-09-24 Thread Ron Peacetree
From: Dann Corbit [EMAIL PROTECTED]
Sent: Sep 23, 2005 5:38 PM
Subject: RE: [HACKERS] [PERFORM] Releasing memory during External sorting?

_C Unleashed_ also explains how to use a callback function to perform
arbitrary radix sorts (you simply need a method that returns the
[bucketsize] most significant bits for a given data type, for the length
of the key).

So you can sort fairly arbitrary data in linear time (of course if the
key is long then O(n*log(n)) will be better anyway.)

But in any case, if we are talking about external sorting, then disk
time will be so totally dominant that the choice of algorithm is
practically irrelevant.

Horsefeathers.  Jim Gray's sorting contest site:
http://research.microsoft.com/barc/SortBenchmark/

proves that the choice of algorithm can have a profound affect on
performance.  After all, the amount of IO done is the most
important of the things that you should be optimizing for in
choosing an external sorting algorithm.

Clearly, if we know or can assume the range of the data in question
the theoretical minimum amount of IO is one pass through all of the
data (otherwise, we are back in O(lg(n!)) land ).  Equally clearly, for
HD's that one pass should involve as few seeks as possible.

In fact, such a principle can be applied to _all_ forms of IO:  HD,
RAM, and CPU cache.  The absolute best that any sort can
possibly do is to make one pass through the data and deduce the
proper ordering of the data during that one pass.

It's usually also important that our algorithm be Stable, preferably
Wholly Stable.

Let's call such a sort Optimal External Sort (OES).  Just how much
faster would it be than current practice?

The short answer is the difference between how long it currently
takes to sort a file vs how long it would take to cat the contents
of the same file to a RAM buffer (_without_ displaying it). IOW, 
there's SIGNIFICANT room for improvement over current
standard practice in terms of sorting performance, particularly
external sorting performance.

Since sorting is a fundamental operation in many parts of a DBMS,
this is a Big Deal.
   
This discussion has gotten my creative juices flowing.  I'll post
some Straw Man algorithm sketches after I've done some more
thought.

Ron

 -Original Message-
 From: Dann Corbit [EMAIL PROTECTED]
 Sent: Friday, September 23, 2005 2:21 PM
 Subject: Re: [HACKERS] [PERFORM] Releasing memory during ...
 
For the subfiles, load the top element of each subfile into a priority
queue.  Extract the min element and write it to disk.  If the next
value is the same, then the queue does not need to be adjusted.
If the next value in the subfile changes, then adjust it.
 
Then, when the lowest element in the priority queue changes, adjust
the queue.
 
Keep doing that until the queue is empty.
 
You can create all the subfiles in one pass over the data.
 
You can read all the subfiles, merge them, and write them out in a
second pass (no matter how many of them there are).
 
The Gotcha with Priority Queues is that their performance depends
entirely on implementation.  In naive implementations either Enqueue()
or Dequeue() takes O(n) time, which reduces sorting time to O(n^2).

The best implementations I know of need O(lglgn) time for those
operations, allowing sorting to be done in O(nlglgn) time.
Unfortunately, there's a lot of data manipulation going on in the 
process and two IO passes are required to sort any given file.
Priority Queues do not appear to be very IO friendly.

I know of no sorting performance benchmark contest winner based on
Priority Queues.


Replacement selection is not a good idea any more, since obvious
better ideas should take over.  Longer runs are of no value if you do not
have to do multiple merge passes.
 
Judging from the literature and the contest winners, Replacement
Selection is still a viable and important technique.  Besides Priority
Queues, what obvious better ideas have you heard of?


I have explained this general technique in the book C Unleashed,
chapter 13.
 
Sample code is available on the book's home page.

URL please?  

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] 64-bit API for large objects

2005-09-24 Thread Tom Lane
David Fetter [EMAIL PROTECTED] writes:
 On Fri, Sep 23, 2005 at 05:40:09PM -0400, Tom Lane wrote:
 For that matter, we can't even guarantee that they work at all: not
 all platforms even *have* int64 types.

 What platforms that PG supports don't have int64 arithmetic?

We claim to build with any ANSI C compiler, and there is no requirement
for a 64-bit type in ANSI C.

The historical project policy is that we should still build without
such a type, and everything should still work except that the effective
bounds of bigint data correspond to int32 instead of int64 limits.
I see no reason to back off that policy.  It's not very much harder
to do it right.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] 64-bit API for large objects

2005-09-24 Thread Alvaro Herrera
Hey,

While you guys are hacking at the LO code, it would be nice to consider
the suggestions outlined here:

http://archives.postgresql.org/pgsql-bugs/2004-07/msg00143.php

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
Y una voz del caos me habló y me dijo
Sonríe y sé feliz, podría ser peor.
Y sonreí. Y fui feliz.
Y fue peor.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Proposed patch to clean up signed-ness warnings

2005-09-24 Thread Tom Lane
Tatsuo Ishii [EMAIL PROTECTED] writes:
 Are you proposing that we change all the char * to unsigned char *?

 No, I suggest we change all char * to unsigned char * only where
 it points a string which could hold non ASCII character strings.

Which is pretty nearly all of them...

 To support multiple charsets/collataions, I think we need to change
 the way to represent character strings from the unstructured char *
 to more intelligent structure (I know it's hard to implement that
 without significant performance loss, but I know we should do it in
 the future).

Yeah, it's still not clear where we are going to end up, but in the
meantime we've got a lot of warnings cluttering the code and making
it hard to spot real problems.

 So unsigned char* is not enough for the goal anyway, I'm not against
 your patches.

OK.  No one else objected, so I'll go ahead and apply before the code
drifts to the point of breaking the patch.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] 2 forks for md5?

2005-09-24 Thread Andrew Dunstan



Bruce Momjian wrote:


Tom Lane wrote:
 



I'm coming to agree with Andrew that a documentation patch might be the
best answer.  But where to put it ... under the description of the
log_connections GUC var?
   



I am thinking we should wait for someone else to notice the double log
entries before mentioning it in the docs.
 



If I had a more Machiavellian bent I would make sure that happened ;-)

How about this note under log_connections?:

Some clients (notably psql) sometimes try to connect without a password 
before trying with a password. This behaviour will generate two log 
lines if log_connections is turned on, even though to the user it 
appears that only one connection has occurred.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] 64-bit API for large objects

2005-09-24 Thread Jeremy Drake
On Sat, 24 Sep 2005, Tom Lane wrote:

 Jeremy Drake [EMAIL PROTECTED] writes:
  In any case, are there any comments on the changes below libpq (the
  functions visible to queries on down)?

 Within the backend, I don't see the point in maintaining a distinction
 between 32- and 64-bit APIs for inv_api.c: you should just switch 'em
 to use int64.  You did it that way for inv_getsize but then proceeded
 to add separate inv_seek64/tell64 functions, which is inconsistent.

Right.  I did it the way you describe my first cut (I did this several
times and changed my mind and started over).  I was concerned (perhaps
needlessly) about breaking the signatures of the inv_* functions which are
visible outside of the .c file.  This is why I did the getsize differently
- it is static.  But it sounds like there is no concern about changing the
signatures of these functions, so I will change my patch to not maintain
the seperate code paths in inv_api.c.

 The submitted version of lo_tell64 isn't even calling the right one :-(

Oops.  That's what I get for lots of copy/paste and doing it multiple
times...  Bonehead mistake, thanks for catching it.

 The 32-bit version of lo_tell will need to check and error out if the
 value it'd need to return doesn't fit in int32.  (Come to think of it,
 so should the 32-bit version of lo_lseek.)

That makes sense.  Or it could return some value (INT_MAX?) which could
mean that it is outside the range, so someone could still get at the data
even if they are using a backwards client box?  I don't know if that makes
sense at all, it sounds pretty wacky since these clients would have no
way of knowing where they are in the file.  Erroring would probably be
best.

 All of the LO code needs to be eyeballed to make sure it still behaves
 sanely if int64 is really only 32 bits.

Of course.

 It would probably be a good idea also to introduce some overflow checks
 to detect cases where the current LO offset would overflow int64 after a
 read, write, or seek.  This is missing from the existing code :-(
 It is possible to code overflow checks that work regardless of the size
 of int64; see int8.c for some inspiration.

Yes.  That would be good.  These would be in the inv_* functions, correct?

 I'd suggest also that the
 offset be defined as signed not unsigned (int64 not uint64) as this will
 simplify the overflow checks and eliminate the prospect of scenarios
 where lo_tell64 cannot return a correct value.

I intended to do that.  I think the only place I made uint64 vs int64 was
the getsize function, and that could be int64 also.  I will look at that
code and make sure I am not mixing them in ways that are not necessary and
useful.


I will take these suggestions and make another revision of the patch
shortly.  Also, I was considering exposing up an lo_getsize or lo_stat
function which would tell you how big a large object was without having to
seek to the end and look at the return value, requiring you to have the
large object open at the time, to loose your old seek position (unless you
do a tell first), and requires several more server round-trips (if not
open, would involve open/seek/close, if open, could require
tell/seek/seek).


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] 64-bit API for large objects

2005-09-24 Thread Jeremy Drake
On Sat, 24 Sep 2005, Alvaro Herrera wrote:

 Hey,

 While you guys are hacking at the LO code, it would be nice to consider
 the suggestions outlined here:

 http://archives.postgresql.org/pgsql-bugs/2004-07/msg00143.php

Included from that message for easier reference:

 0) In Oid lo_creat(PGconn *conn, int mode), why is there a mode on
 lo_create? The mode is determined when the object is lo_open()ed, right?

I think the docs basically said it is a vestigial feature, it used to be
useful but the code evolved in such a way that it ceased being useful.  It
is probably still there to allow old code to continue to compile against
newer servers without being recompiled.

 1) There is no lo_truncate(PGconn *conn, int fd, off_t len).

Did not notice that one.  That is a good one to add if adding functions is
in the cards.  I bet when the person/people who are attempting to write to
this api here get far enough, they would have noticed that too ;)

 2) There is no lo_length(PGconn *conn, int fd).

We did notice this one however.  There is also no lo_stat(PGconn *conn,
Oid lobjId).  I have been thinking about implementing these two.

I think I will make a revision of the patch at some point with these.  The
size ones will be extremely easy, the functionality is already there, just
a matter of exposing it.  The truncate is not too difficult, but actually
requires me to think a little more ;)

-- 
When does summertime come to Minnesota, you ask?  Well, last year, I
think it was a Tuesday.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] gcc4's uninitialized-variable warnings

2005-09-24 Thread Tom Lane
I asked some gcc experts at Red Hat about the new variable-may-be-used-
uninitialized warnings that gcc 4.x reports.  These occur in cases
like

int i, j;
...
foo(i, j);
// use i and j

I had thought that gcc was being stricter about the possibility that the
called function might not set its output parameters, but the true story
is entirely different.  There's been no change in the strictness of the
check for external function calls.  What is happening is that if foo()
is static and gcc chooses to inline it into the calling function, you
will now see a warning if the transformed code fails the check.  In
essence this means that there is a code path through foo() that doesn't
set the output parameter.

Armed with that knowledge, we can fix these warnings by ensuring the
callee sets the output parameters in all code paths; which is often
cleaner than having the caller initialize the variables before call,
as I was afraid we'd have to do.

I'll work on cleaning these up.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] 64-bit API for large objects

2005-09-24 Thread Tom Lane
Jeremy Drake [EMAIL PROTECTED] writes:
 0) In Oid lo_creat(PGconn *conn, int mode), why is there a mode on
 lo_create? The mode is determined when the object is lo_open()ed, right?

 I think the docs basically said it is a vestigial feature, it used to be
 useful but the code evolved in such a way that it ceased being useful.  It
 is probably still there to allow old code to continue to compile against
 newer servers without being recompiled.

Yeah.  There were once multiple types of large objects, and I suppose
the mode argument told lo_creat which kind to create.  I have no idea
how the read/write bits got included into that --- it doesn't make any
sense.  As of PG 8.1, lo_creat just ignores the mode argument.  We can't
delete the argument though without causing a lot of compatibility
headaches.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] \x output blowing up

2005-09-24 Thread Martijn van Oosterhout
Hi,

On CVS tip, whenever I do \x output, it dies with an internal error in
glibc on free. If you run it under valgrind, it complains about these
lines of code:

700{  
701char *my_cell = pg_local_malloc(cell_w[i] + 1);
702
703 [Inv write 1 byte] strcpy(my_cell, *ptr);
704if (opt_align[i % col_count] == 'r'  
opt_numeric_locale)
705format_numeric_locale(my_cell);
706if (opt_border  2)
707fprintf(fout, %s\n, my_cell);
708else
709 [Inv read 1 byte]  fprintf(fout, %-s%*s |\n, my_cell, dwidth 
- cell_w[i], );
710free(my_cell);
711}

Now, apart from the fact that the cell width != strlen in multibyte
encodings, there must be something else because this is just select *
from pg_proc and there are no multiple characters there AFAIK. I can't
see it though.

Hope this helps,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpid8LNrp3Xr.pgp
Description: PGP signature


Re: [HACKERS] Start translating

2005-09-24 Thread Peter Eisentraut
Tom Lane wrote:
 One thing I had wanted to do before issuing the call for translations
 was to make a pass over the recent GiST changes, looking at
 elog-vs-ereport decisions and message wording.

OK, can you tell me where those are and I can look at it.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] \x output blowing up

2005-09-24 Thread Martijn van Oosterhout
On Sat, Sep 24, 2005 at 11:45:08PM +0200, Martijn van Oosterhout wrote:
 Hi,
 
 On CVS tip, whenever I do \x output, it dies with an internal error in
 glibc on free. If you run it under valgrind, it complains about these
 lines of code:

snip

Ok, I worked out the direct cause, pg_wcswidth only returns the length
upto the first newline and the line it breaks on is the multiline
definition in _pg_expandarray.

The quick fix should be to only allocate memory if it's going to call
format_numeric_locale(), since then you know it's a number. It makes
the code slightly more convoluated but it should be slightly more
efficient.

I actually have a working psql that handles and displays newlines
properly, but it's too late for 8.1. It fixes all these issues
properly.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpSxRFAxuskf.pgp
Description: PGP signature


Re: [HACKERS] Start translating

2005-09-24 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 One thing I had wanted to do before issuing the call for translations
 was to make a pass over the recent GiST changes, looking at
 elog-vs-ereport decisions and message wording.

 OK, can you tell me where those are and I can look at it.

Done already, but thanks for offering.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] questionable item in HISTORY

2005-09-24 Thread Bruce Momjian
Tatsuo Ishii wrote:
 Following item in HISTORY:
 
  * Add support for 3 and 4-byte UTF8 characters (John Hansen)
Previously only one and two-byte UTF8 characters were supported.
This is particularly important for support for some Chinese
characters.
 
 is wrong since 3-byte UTF-8 characters are supported since UTF-8
 support has been added to PostgreSQL. Correct description would be:
 
  * Add support for 4-byte UTF8 characters (John Hansen)
Previously only up to three-byte UTF8 characters were supported.
This is particularly important for support for some Chinese
characters.

Release notes updated.

 
 In the mean time I wonder if we need to update UTF-8 -- locale
 encoding maps. The author of the patches stated that This is
 particularly important for support for some Chinese characters. I
 have no idea what encoding he is reffering to, but I wonder if the
 latest Chinense encoding standard GB18030 needs 4-byte UTF-8 mappings.
 If yes, we surely need to update utf8_to_gb18030.map.
 
 Anybody familiar with GB18030/UTF-8?

Good question.  The report we got in the past was that some UTF
characters were being rejected even though they were valid UTF
characters, mostly Chinese.  I have no idea how they map to GB*
character sets.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] Discarding relations from FSM

2005-09-24 Thread Jim C. Nasby
Per http://lnk.nu/developer.postgresql.org/43b.c:

 * The number of distinct relations tracked is limited by a configuration
 * variable (MaxFSMRelations).  When this would be exceeded, we discard the
 * least recently used relation.  A doubly-linked list with move-to-front
 * behavior keeps track of which relation is least recently used.

Rather than keeping track of which relation was last used (presumably
everytime something hits the FSM), wouldn't it make more sense to just
drop the relation with the smallest amount of free space in the map?

Of course maybe a better question is why we even limit based on the
number of relations...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] 2 forks for md5?

2005-09-24 Thread Bruce Momjian
Andrew Dunstan wrote:
 
 
 Bruce Momjian wrote:
 
 Tom Lane wrote:
   
 
 
 I'm coming to agree with Andrew that a documentation patch might be the
 best answer.  But where to put it ... under the description of the
 log_connections GUC var?
 
 
 
 I am thinking we should wait for someone else to notice the double log
 entries before mentioning it in the docs.
   
 
 
 If I had a more Machiavellian bent I would make sure that happened ;-)
 
 How about this note under log_connections?:
 
 Some clients (notably psql) sometimes try to connect without a password 
 before trying with a password. This behaviour will generate two log 
 lines if log_connections is turned on, even though to the user it 
 appears that only one connection has occurred.

OK, you wore me down.  :-)

New text:

This outputs a line to the server log detailing each successful
connection. This is off by default, although it is probably very
useful.  Some client programs, like applicationpsql/,
attempt to connect twice while determining if a password is required,
so duplicate literalconnection received/ messasges are not a sign
of  problem.  This option can only be set at server start or in the
filenamepostgresql.conf/filename configuration file.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] gcc4's uninitialized-variable warnings

2005-09-24 Thread Bruce Momjian
Tom Lane wrote:
 I asked some gcc experts at Red Hat about the new variable-may-be-used-
 uninitialized warnings that gcc 4.x reports.  These occur in cases
 like
 
   int i, j;
   ...
   foo(i, j);
   // use i and j
 
 I had thought that gcc was being stricter about the possibility that the
 called function might not set its output parameters, but the true story
 is entirely different.  There's been no change in the strictness of the
 check for external function calls.  What is happening is that if foo()
 is static and gcc chooses to inline it into the calling function, you
 will now see a warning if the transformed code fails the check.  In
 essence this means that there is a code path through foo() that doesn't
 set the output parameter.
 
 Armed with that knowledge, we can fix these warnings by ensuring the
 callee sets the output parameters in all code paths; which is often
 cleaner than having the caller initialize the variables before call,
 as I was afraid we'd have to do.
 
 I'll work on cleaning these up.

Wow, that is a nifty complier check.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Discarding relations from FSM

2005-09-24 Thread Bruce Momjian
Jim C. Nasby wrote:
 Per http://lnk.nu/developer.postgresql.org/43b.c:
 
  * The number of distinct relations tracked is limited by a configuration
  * variable (MaxFSMRelations).  When this would be exceeded, we discard the
  * least recently used relation.  A doubly-linked list with move-to-front
  * behavior keeps track of which relation is least recently used.
 
 Rather than keeping track of which relation was last used (presumably
 everytime something hits the FSM), wouldn't it make more sense to just
 drop the relation with the smallest amount of free space in the map?

The one with the smallest amount of free space might be the one we want
to vacuum next, so perhaps it will be added to very soon.  The _usage_
of the free space is a much better indicator.

 Of course maybe a better question is why we even limit based on the
 number of relations...

Limited shared memory.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] \x output blowing up

2005-09-24 Thread Bruce Momjian

Well, it seems we are going to have to fix it somehow for 8.1.  It is
not crashing here so I can't work up a patch.  Can you submit a minimal
fix for 8.1?  Thanks.

---

Martijn van Oosterhout wrote:
-- Start of PGP signed section.
 On Sat, Sep 24, 2005 at 11:45:08PM +0200, Martijn van Oosterhout wrote:
  Hi,
  
  On CVS tip, whenever I do \x output, it dies with an internal error in
  glibc on free. If you run it under valgrind, it complains about these
  lines of code:
 
 snip
 
 Ok, I worked out the direct cause, pg_wcswidth only returns the length
 upto the first newline and the line it breaks on is the multiline
 definition in _pg_expandarray.
 
 The quick fix should be to only allocate memory if it's going to call
 format_numeric_locale(), since then you know it's a number. It makes
 the code slightly more convoluated but it should be slightly more
 efficient.
 
 I actually have a working psql that handles and displays newlines
 properly, but it's too late for 8.1. It fixes all these issues
 properly.
 
 Have a nice day,
 -- 
 Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
  Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
  tool for doing 5% of the work and then sitting around waiting for someone
  else to do the other 95% so you can sue them.
-- End of PGP section, PGP failed!

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Discarding relations from FSM

2005-09-24 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 Rather than keeping track of which relation was last used (presumably
 everytime something hits the FSM), wouldn't it make more sense to just
 drop the relation with the smallest amount of free space in the map?

Why?  That certainly wouldn't be cheaper to track.

 Of course maybe a better question is why we even limit based on the
 number of relations...

Shared memory is fixed-size.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] \d on database with a lot of tables is slow

2005-09-24 Thread Jim C. Nasby
I have a client with a database that contains 4000 relations according
to vacuum verbose, and \d in psql is painfully slow. In particular...

   -  Seq Scan on pg_class c  (cost=0.00..2343.09 rows=6124 width=73) (actual 
time=0.325..22100.840 rows=16856 loops=1)
 Filter: (((relkind = 'r'::char) OR (relkind = 'v'::char) OR 
(relkind = 'S'::char) OR (relkind = ''::char)) AND pg_table_is_visible(oid))

That's off my laptop, but they're seeing similar issues on an 8-way
Opteron as well...

I've messed around with adding indexes to a copy of pg_class to no
avail. Any ideas on how to improve the performance?

Also, not sure if this matters, but they're occasionally getting errors
like 'Did not find relation named table that exists' (where 'table
that exists' is the name of some table that is in the catalog) from \d.
Does anyone know what causes that?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Vacuum questions...

2005-09-24 Thread Jim C. Nasby
Would it be difficult to vacuum as part of a dump? The reasoning behind
this is that you have to read the table to do the dump anyway, so it
would be a good time to be able to piggy-back other operations that need
to read the entire table on top. I know vacuuming of indexes complicates
this, so it's probably not as simple as just firing off a vacuum and
copy at the same time (although that idea is probably worth testing,
since it might still be a win).

When dropping a table or index, is it's space immediately released in
the FSM?

Also, would it be possible to add some means to check the status of a
running vacuum? Even with vacuum verbose, once it starts in on a large
table you have no way to know how far along it is.

Finally, if vacuum_delay is enabled, does vacuum_cost_page_miss consider
a miss as not in the database buffer, or not in the kernel buffer? I
remember discussions about trying to track IO request times to try and
determine if something came out of kernel buffers or not, but AFAIK
that's all vaporware right now...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] \d on database with a lot of tables is slow

2005-09-24 Thread Rod Taylor
On Sat, 2005-09-24 at 18:59 -0500, Jim C. Nasby wrote:
 I have a client with a database that contains 4000 relations according
 to vacuum verbose, and \d in psql is painfully slow. In particular...
 
-  Seq Scan on pg_class c  (cost=0.00..2343.09 rows=6124 width=73) 
 (actual time=0.325..22100.840 rows=16856 loops=1)
  Filter: (((relkind = 'r'::char) OR (relkind = 'v'::char) OR 
 (relkind = 'S'::char) OR (relkind = ''::char)) AND 
 pg_table_is_visible(oid))
 
 That's off my laptop, but they're seeing similar issues on an 8-way
 Opteron as well...
 
 I've messed around with adding indexes to a copy of pg_class to no
 avail. Any ideas on how to improve the performance?

It is probably the visibility checks. Is a \d fast if you include the
full name (schema.table)?

I brought this up a while ago and Tom has since rearranged some of the
psql queries to move the visibility check to come after the other where
clause segments.


It would be nice if the cost of the function could be added somehow --
even if it was just a low, medium or high setting. This would allow the
planner to shuffle the where clause executing ordering around in a
reasonable manner.
-- 


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Discarding relations from FSM

2005-09-24 Thread Jim C. Nasby
On Sat, Sep 24, 2005 at 07:21:19PM -0400, Tom Lane wrote:
  Of course maybe a better question is why we even limit based on the
  number of relations...
 
 Shared memory is fixed-size.

True, but can't the fixed memory required per-relation just be shared
with the fixed memory used to store free pages?

Though, the idea mentioned recently of just using one shared memory
segment for everything and allocating dynamically within that probably
makes more sense...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] \d on database with a lot of tables is slow

2005-09-24 Thread Jim C. Nasby
On Sat, Sep 24, 2005 at 08:20:47PM -0400, Rod Taylor wrote:
 On Sat, 2005-09-24 at 18:59 -0500, Jim C. Nasby wrote:
  I have a client with a database that contains 4000 relations according
  to vacuum verbose, and \d in psql is painfully slow. In particular...
  
 -  Seq Scan on pg_class c  (cost=0.00..2343.09 rows=6124 width=73) 
  (actual time=0.325..22100.840 rows=16856 loops=1)
   Filter: (((relkind = 'r'::char) OR (relkind = 'v'::char) OR 
  (relkind = 'S'::char) OR (relkind = ''::char)) AND 
  pg_table_is_visible(oid))
  
  That's off my laptop, but they're seeing similar issues on an 8-way
  Opteron as well...
  
  I've messed around with adding indexes to a copy of pg_class to no
  avail. Any ideas on how to improve the performance?
 
 It is probably the visibility checks. Is a \d fast if you include the
 full name (schema.table)?
 
 I brought this up a while ago and Tom has since rearranged some of the
 psql queries to move the visibility check to come after the other where
 clause segments.
 
 
 It would be nice if the cost of the function could be added somehow --
 even if it was just a low, medium or high setting. This would allow the
 planner to shuffle the where clause executing ordering around in a
 reasonable manner.

\d tablename is fast, yes.

Maybe instead of re-arranging the query it would make more sense to roll
the visibility check into the query itself (probably using a new system
view).
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Discarding relations from FSM

2005-09-24 Thread Bruce Momjian
Jim C. Nasby wrote:
 On Sat, Sep 24, 2005 at 07:21:19PM -0400, Tom Lane wrote:
   Of course maybe a better question is why we even limit based on the
   number of relations...
  
  Shared memory is fixed-size.
 
 True, but can't the fixed memory required per-relation just be shared
 with the fixed memory used to store free pages?

The assumption is that the admin wants to control the allotment of
memory, and doesn't want it to shift based on (perhaps temporary) load.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Vacuum questions...

2005-09-24 Thread Joshua D. Drake

Jim C. Nasby wrote:


Would it be difficult to vacuum as part of a dump? The reasoning behind
this is that you have to read the table to do the dump anyway, so it
would be a good time to be able to piggy-back other operations that need
to read the entire table on top. I know vacuuming of indexes complicates
this, so it's probably not as simple as just firing off a vacuum and
copy at the same time (although that idea is probably worth testing,
since it might still be a win).
 


This would be a nightmare on a large database. Think of how
long it takes to dump 20 gig, now add how long it is going to
take to vacuum that size of DB, now think about a 500 gig
database.

Actually this also probably would not gain you much in 8.1
as vacuum in theory is already dealing with itself.


When dropping a table or index, is it's space immediately released in
the FSM?
 


I would have to double check but I believe you would have to
vacuum to reclaim the space to the FSM because the relationship
is still there just like when you delete (but not truncate).


Also, would it be possible to add some means to check the status of a
running vacuum? Even with vacuum verbose, once it starts in on a large
table you have no way to know how far along it is.
 


That is an interesting thought... Perhaps a quick scan of
the table to see how many dead rows there are? Then check
back every n/10 ? Hmmm... I am not a C guy so I don't know if
that is technically feasible (although probably possible) but it
is interesting from a DBA perspective.

Although that could be an issue on a large table as well I think.


Finally, if vacuum_delay is enabled, does vacuum_cost_page_miss consider
a miss as not in the database buffer, or not in the kernel buffer? I
remember discussions about trying to track IO request times to try and
determine if something came out of kernel buffers or not, but AFAIK
that's all vaporware right now...
 


Good question, anyone else?

Sincerely,

Joshua D. Drake





--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Vacuum questions...

2005-09-24 Thread Alvaro Herrera
On Sat, Sep 24, 2005 at 07:17:38PM -0500, Jim C. Nasby wrote:

 Finally, if vacuum_delay is enabled, does vacuum_cost_page_miss consider
 a miss as not in the database buffer, or not in the kernel buffer?

The database buffer.

 I
 remember discussions about trying to track IO request times to try and
 determine if something came out of kernel buffers or not, but AFAIK
 that's all vaporware right now...

I don't remember the discussion, but it certainly hasn't been
implemented.

-- 
Alvaro Herrerahttp://www.PlanetPostgreSQL.org
Escucha y olvidarás; ve y recordarás; haz y entenderás (Confucio)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Releasing memory during External sorting?

2005-09-24 Thread Dann Corbit








Generally, when you read from a set of
subfiles, the OS will cache the reads to some degree, so the disk-seek jitter
is not always that bad. On a highly fragmented disk drive, you might also jump
all over the place reading serially from a single subfile. Of course,
every situation is different. At any rate, I would recommend to benchmark
many different approaches. It is also rather important how much memory is
available to perform sorts and reads. But with memory becoming larger and
cheaper, it becomes less and less of a worry.













From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Meir Maor
Sent: Friday, September 23, 2005
10:24 PM
To: Simon Riggs
Cc: pgsql-hackers@postgresql.org;
pgsql-performance@postgresql.org
Subject: Re: [HACKERS] Releasing
memory during External sorting?





Calculating Optimal
memory for disk based sort is based only on minimizing IO.
A previous post stated we can merge as many subfiles as we want in a single
pass,
this is not accurate, as we want to eliminate disk seeks also in the merge
phase,
also the merging should be done by reading blocks of data from each subfile,
if we have data of size N and M memory, then we will have K=N/M subfiles to
merge
after sorting each. 
in the merge operation if we want to merge all blocks in one pass we will read 
M/K data from each subfile into memory and begin merging, we will read another
M/K block
when the buffer from a subfile is empty, 
we would like disk seek time to be irrelavant when comparing to sequential IO
time.
We notice that we are performing IO in blocks of N/K^2 which is M/(N/M)^2 
let us assume that sequeential IO is done at 100MB/s and that
a random seek requires ~15ms. and we want seek time to be irrelavnt in one
order of
magnitute we get, that in the time of one random seek we can read 1.5MB of data
and would get optimal performance if we perform IO in blocks of 15MB.
and since in the merge algorithm showed above we perform IO in blocks of M/K
we would like MK*15MB which results in a very large memory requirement.
M^2N*15MB
Msqrt(N*15MB)
for example for sorting 10GB of data, we would like M380MB
for optimal performance.

alternativly if we can choose a diffrent algorithm in which we merge only a
constant
number of sunfiles to gether at a time but then we will require multiple passes
to merge
the entire file. we will require log(K) passes over the entire data and this
approach obviously
improves with increase of memory.

The first aproach requires 2 passes of the entire data and K^2+K random seeks,
the second aproach(when merging l blocks at a time) requires: log(l,K) passes
over the data
and K*l+K random seeks.





On 9/23/05, Simon
Riggs [EMAIL PROTECTED]
wrote:

I have concerns about whether we are overallocating memory for use in
external sorts. (All code relating to this is in tuplesort.c)

When we begin a sort we allocate (work_mem | maintenance_work_mem) and
attempt to do the sort in memory. If the sort set is too big to fit in 
memory we then write to disk and begin an external sort. The same memory
allocation is used for both types of sort, AFAICS.

The external sort algorithm benefits from some memory but not much.
Knuth says that the amount of memory required is very low, with a value 
typically less than 1 kB. I/O overheads mean that there is benefit from
having longer sequential writes, so the optimum is much larger than
that. I've not seen any data that indicates that a setting higher than
16 MB adds any value at all to a large external sort. I have some
indications from private tests that very high memory settings may
actually hinder performance of the sorts, though I cannot explain that
and wonder whether it is the performance tests themselves that have 
issues.

Does anyone have any clear data that shows the value of large settings
of work_mem when the data to be sorted is much larger than memory? (I am
well aware of the value of setting work_mem higher for smaller sorts, so 
any performance data needs to reflect only very large sorts).

If not, I would propose that when we move from qsort to tapesort mode we
free the larger work_mem setting (if one exists) and allocate only a
lower, though still optimal setting for the tapesort. That way the
memory can be freed for use by other users or the OS while the tapesort
proceeds (which is usually quite a while...).

Feedback, please.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings