Re: [HACKERS] windows doesn't notice backend death

2009-05-03 Thread Magnus Hagander
Andrew Dunstan wrote:
 
 
 Tom Lane wrote:
 Andrew Dunstan and...@dunslane.net writes:
  
 Tom Lane wrote:

 Ick.  Is it possible that the postmaster did get a report, but thought
 it was normal session termination?  If so, how could we distinguish?
   

  
 If that were the case then it would not have the dead process still
 listed as a live backend, ISTM, which it does.
 

 The postmaster does not control the content of the pg_stat_activity
 view.


   
 
 Well, I'm not I know how to find out the answer to your question. I
 could try attaching a debugger to the postmaster - if I knew where to
 put a breakpoint.

I'd start at pgwin32_deadchild_callback(). That's where the waiting
thread should activate once a child goes away.

That one should post a notice to the win32ChildQueue, which is polled in
win32_waitpid() - that's a second good point for a breakpoint.

This in turn should make things happen up in reaper() - as Alvaro
suggests, a third good place for a breakpoint.


FWIW, this certainly used to work. So we've either broken this recently,
or it's always been broken on Vista (I've never tried it myself on
Vista, only 2000, XP and 2003).

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] windows shared memory error

2009-05-03 Thread Magnus Hagander
Tom Lane wrote:
 Andrew Dunstan and...@dunslane.net writes:
 I am seeing Postgres 8.3.7 running as a service on Windows Server 2003 
 repeatedly fail to restart after a backend crash because of the 
 following code in port/win32_shmem.c:
 
 On further review, I see an entirely different explanation for possible
 failures of that code.
 
 It says here:
 http://msdn.microsoft.com/en-us/library/ms885627.aspx

FWIW, this is the Windows CE documentation. The one for win32 is at:
http://msdn.microsoft.com/en-us/library/ms679360(VS.85).aspx



 that GetLastError() continues to return the same error code until
 someone calls SetLastError() to change it.  It further says that
 only a few operating system functions call SetLastError(0) on success,
 and that it is explicitly documented whenever a function does so.
 I see no such statement for CreateFileMapping:
 http://msdn.microsoft.com/en-us/library/aa366537(VS.85).aspx
 
 This leads me to conclude that after a successful creation,
 GetLastError will return whatever the errno previously was,
 meaning that you cannot reliably distinguish creation from non
 creation unless you do SetLastError(0) beforehand.  Which we don't.
 
 Now this would only explain problems if there were some code path
 through the postmaster that could leave the errno set to
 ERROR_ALREADY_EXISTS (a/k/a EEXIST) when this code is reached.  I'm not
 sure there is one, and I have even less of a theory as to why system
 load might make it more probable to happen.  Still, this looks like a
 bug from here, and repeating the create call won't fix it.

The ref page for CreateFileMapping you linked has:

If the object exists before the function call, the function returns a
handle to the existing object (with its current size, not the specified
size), and GetLastError  returns ERROR_ALREADY_EXISTS. 


I think that qualifies as it documenting that it's setting the return
value, no? That would never work if it isn't set to something other than
ERROR_ALREADY_EXISTS (probably zero) when it *didn't* already exist.

The quick try would be to stick a SetLastError(0) in there, just to be
sure... Could be worth a try?


Andrew, just to confirm: you've found a case where this happens
*repeatably*? That's what we've failed to do before - it's happened now
and then, but never during testing...

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] windows shared memory error

2009-05-03 Thread Magnus Hagander
Andrew Dunstan wrote:
 
 
 Tom Lane wrote:

 Now this would only explain problems if there were some code path
 through the postmaster that could leave the errno set to
 ERROR_ALREADY_EXISTS (a/k/a EEXIST) when this code is reached.  I'm not
 sure there is one, and I have even less of a theory as to why system
 load might make it more probable to happen.  Still, this looks like a
 bug from here, and repeating the create call won't fix it.


   
 
 Oh, I think that this code has such a path. We already know that the
 code I showed is entered when that error is set. So the solution would
 be to put SetError(0) before the call to CreateFileMapping(), possibly
 before both such calls.
 
 Maybe we need to look at all the places we call GetLastError(). There
 are quite a few of them.

A quick look shows that all of these except the one in
pgwin32_get_dynamic_tokeninfo() (which uses a documented way to check
the return code in the case of success) are only called after an API
function fails, so we should be safe there.

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gist consistent and compression

2009-05-03 Thread Yeb Havinga

Hi Dimitri, list
It seems to me that what you're asking for is addressed indirectly in 
the possibility to make your internal data type a full SQL visible 
datatype. Then you store this new datatype directly in the table and 
index that. Instead of converting from external to internal type at 
consistent() time in a query, provide an implicit CAST for external to 
internal for queries to just work without editing. The CAST will get 
called once per literal in the query.
Thanks for your reply. If there would be a way that the user never sees 
the 'internal' type in query results (with some rules perhaps), it might 
work. But somehow it feels 'not right'; the user datatype is the 
external one, and how the gist index works internally should not be put 
as burden on the user. The compression is also lossy, so recheck is 
necessary and I think that will not work if only the internal type is 
stored in the relation.

See prefix and the prefix_range datatype as an example of this:
  http://blog.tapoueh.org/prefix.html
I wish I'd seen the slides on that link half a year ago, since I think 
they are helpful to anyone that wants to start writing a gist index. I 
remember using google translate on a russion page about gist in the 
search for information. Besides that also reading Guttmans r-tree paper 
was good, to have any idea what picksplit, penalty and union are 
supposed to do.


regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] windows shared memory error

2009-05-03 Thread Andrew Dunstan



Magnus Hagander wrote:


Andrew, just to confirm: you've found a case where this happens
*repeatably*? That's what we've failed to do before - it's happened now
and then, but never during testing...


  


Well, it happened several times to my client within a matter of hours. I 
didn't see any successful restarts on the log.


Unfortunately, I can't use this system for experimentation - it's doing 
extremely urgent production work.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] windows shared memory error

2009-05-03 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 Tom Lane wrote:
 It says here:
 http://msdn.microsoft.com/en-us/library/ms885627.aspx

 FWIW, this is the Windows CE documentation. The one for win32 is at:
 http://msdn.microsoft.com/en-us/library/ms679360(VS.85).aspx

Sorry, that was the one that came up first in a Google search ...

 The ref page for CreateFileMapping you linked has:

 If the object exists before the function call, the function returns a
 handle to the existing object (with its current size, not the specified
 size), and GetLastError  returns ERROR_ALREADY_EXISTS. 

 I think that qualifies as it documenting that it's setting the return
 value, no?

The question is what it does when creating a new object.  To be sure
that our existing code isn't misled, it'd be necessary for
CreateFileMapping to do SetLastError(0) in the successful-creation
code path.  What I read the GetLastError page to be saying is that
most functions do *not* do SetLastError(0) on success, and that it
is always documented if they do.

 The quick try would be to stick a SetLastError(0) in there, just to be
 sure... Could be worth a try?

I kinda think we should do that whether or not it can be proven to
have anything to do with Andrew's report.  It's just like errno = 0
for Unix --- sometimes you have to do it to be sure of whether a
particular function has thrown an error.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] cleaning up stray references

2009-05-03 Thread Robert Haas
Here's a small patch to replace a couple of references to files that
no longer exist in the source tree with references to the appropriate
URLs.

...Robert
*** a/src/DEVELOPERS
--- b/src/DEVELOPERS
***
*** 1,3 
! Read the Developer's FAQ in pgsql/doc/FAQ_DEV.  All the developer tools
! are located in the pgsql/src/tools directory. 
  
--- 1,3 
! Read the Developer's FAQ at http://wiki.postgresql.org/wiki/Developer_FAQ
  
+ All the developer tools are located in the pgsql/src/tools directory. 
*** a/src/tools/RELEASE_CHANGES
--- b/src/tools/RELEASE_CHANGES
***
*** 31,38  For Major Releases
  	o src/interfaces/*/*/Makefile
  
  * Release notes
! 	o check that dashed items from the TODO list are complete
! 	o remove dashed TODO items
  	o group items into categories
  	o select major features
  	o select incompatibilities
--- 31,39 
  	o src/interfaces/*/*/Makefile
  
  * Release notes
! 	o check completion of items which have been marked as completed at
!   http://wiki.postgresql.org/wiki/Todo
! 	o remove completed TODO items
  	o group items into categories
  	o select major features
  	o select incompatibilities

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] windows doesn't notice backend death

2009-05-03 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 FWIW, this certainly used to work. So we've either broken this recently,
 or it's always been broken on Vista (I've never tried it myself on
 Vista, only 2000, XP and 2003).

Maybe a quick check if it still works on non-Vista versions would be
in order, to eliminate one or the other of those theories.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] windows doesn't notice backend death

2009-05-03 Thread Andrew Dunstan



Tom Lane wrote:

Magnus Hagander mag...@hagander.net writes:
  

FWIW, this certainly used to work. So we've either broken this recently,
or it's always been broken on Vista (I've never tried it myself on
Vista, only 2000, XP and 2003).



Maybe a quick check if it still works on non-Vista versions would be
in order, to eliminate one or the other of those theories.


  


Well, I can tell you that it is getting an exit code of 1, which is why 
the postmaster isn't restarting.


That raises two questions in my mind. First, is that the behaviour we 
expect when we kill the backend this way? And second, why is it still 
showing up in the output of pg_stat_activity?


Meanwhile, I guess I can try to create a loadable function that will 
crash a backend badly so I can try to reproduce my client's problem.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] windows doesn't notice backend death

2009-05-03 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Well, I can tell you that it is getting an exit code of 1, which is why 
 the postmaster isn't restarting.

Blech.  Count on Windows to find a way to break things.

 That raises two questions in my mind. First, is that the behaviour we 
 expect when we kill the backend this way? And second, why is it still 
 showing up in the output of pg_stat_activity?

Well, if the process is being hard-killed without an opportunity to run
through proc_exit(), then yes it is going to still show up in
pg_stat_activity.  It's pgstat_beshutdown_hook that removes that entry.

The problem here is that we need to be able to distinguish a task
manager kill from a voluntary exit(1).  Have M$ really been stupid
enough to make an external kill look just like an exit() call?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] windows doesn't notice backend death

2009-05-03 Thread Tom Lane
I wrote:
 Andrew Dunstan and...@dunslane.net writes:
 Well, I can tell you that it is getting an exit code of 1, which is why 
 the postmaster isn't restarting.

 Blech.  Count on Windows to find a way to break things.

I reflected on this a bit more.  Even if we find a way around this
particular task-manager behavior, it seems to me there is a generic
problem here.  If some bit of clueless code does exit(0) or exit(1)
inside a backend session, the postmaster will think everything is fine,
but actually we have an un-cleaned-up session that's probably still
holding locks etc.  It's fairly easy to demonstrate the issue:

pl_regression=# create language plperlu;
CREATE LANGUAGE
pl_regression=# create or replace function trouble() returns void as
pl_regression-# $$ exit 0; $$ language plperlu;
CREATE FUNCTION
pl_regression=# select trouble();
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
pl_regression=# select * from pg_stat_activity;
 datid |datname| procpid | usesysid | usename |  current_query  
| waiting |  xact_start   |  query_start
  | backend_start | client_addr | client_port 
---+---+-+--+-+-+-+---+---+---+-+-
 40179 | pl_regression |   20847 |   10 | tgl | select trouble();   
| f   | 2009-05-03 14:46:10.170604-04 | 2009-05-03 
14:46:10.170604-04 | 2009-05-03 14:45:10.911359-04 | |  -1
 40179 | pl_regression |   20855 |   10 | tgl | select * from 
pg_stat_activity; | f   | 2009-05-03 14:46:23.986909-04 | 2009-05-03 
14:46:23.986909-04 | 2009-05-03 14:46:17.920486-04 | |  -1
(2 rows)


Up to now we've always just dismissed the above possibility as
superusers should know better, but I think there's a reasonable case
to be made that this is an obvious failure mode and we should put a bit
more effort into being robust against it.  With more and more external
code being routinely run in the backend, who wants to swear that there
is no exit(1) in the guts of libperl or libxml or whatever?

The first idea that comes to mind is to have some sort of dead man
switch that flags an active backend and is reset by proc_exit() after
it's finished cleaning up everything else.  If the postmaster sees
this flag still set after backend exit, then it treats the backend as
having crashed regardless of what the reported exit code is.
We could implement this via an array of sig_atomic_t in shared memory,
so as to minimize the postmaster's entanglement with shared memory
(it'd be no worse than the old WIN32-specific child pid arrays).

Or maybe there's a better way.  Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] windows doesn't notice backend death

2009-05-03 Thread justin




Tom Lane wrote:

Have M$ really been stupid
enough to make an external kill look just like an exit() call?

			regards, tom lane

  

kind of :-(

Everything i have read seems to point the Task Manager calls
TerminateProcess() in the kernel32.dll and passes a 1 setting the
exitcode to 1. I have not found anything clearly stating that, yet
logic points that is what its doing. 

http://msdn.microsoft.com/en-us/library/ms686714(VS.85).aspx

Would it not be easy to set the normal exitcode to something other than
1 to see the difference 
ExitProcess()
http://msdn.microsoft.com/en-us/library/ms682658(VS.85).aspx


this article maybe helpful. 
http://msdn.microsoft.com/en-us/library/ms686722(VS.85).aspx

BTW tested on Windows 2003 SP1 this problem shows up in Postgresql
8.3.3





Re: [HACKERS] GEQO: ERX

2009-05-03 Thread Robert Haas
On Sat, May 2, 2009 at 11:37 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Tobias Zahn tobias-z...@arcor.de writes:
 I didn't not get any response to my initial message below. Now I am
 wondering if nobody is into the optimizer or if my question was just too
 stupid. Could you please give me some clues? Your help would really be
 appreciated.

 Well, nobody's into GEQO very much.  I took a quick look and didn't
 think that deleting the ERX support would save anything noticeable,
 but you're welcome to try it if you think different.

 The real problem with working on GEQO, in my humble opinion, is that
 it's throwing good effort after bad.  That module doesn't need marginal
 fixing, it needs throwing away and rewriting from scratch.  Bad enough
 that it's convoluted and full of dead (experimental?) code; but I don't
 even believe that it's based on a good analogy.   The planning problem
 is not all that much like traveling salesman problems, so heuristics
 designed for TSP are of pretty questionable usefulness to start with.
 That complaint could have been refuted if the module performed well,
 but in fact if you check the archives you'll find many many complaints
 about it --- its ability to find good plans seems to be mostly dependent
 on luck.

 My knowledge of AI search algorithms is about 20 years obsolete, but
 last I heard simulated annealing had overtaken genetic algorithms for
 many purposes.  It might be interesting to try a rewrite based on SA;
 or maybe there's something better out there now.

There's a 1997 article on this topic that's pretty interesting.

Heuristic and randomized optimization for the join ordering problem
http://reference.kfupm.edu.sa/content/h/e/heuristic_and_randomized_optimization_fo_87585.pdf

Here's the basic conclusion:

If good solutions are of highest importance, Two-Phase Optimization,
the algorithm that performed best in our experiments, is a very good
choice; other Simulated Annealing variants, for instance Toured
Simulated Annealing (TSA, LVZ93]), that we did not implement, are
likely to achieve quite similar results. The 'pure' Simulated
Annealing algorithm has a much higher running time without yielding
significantly better solutions. If short running time is more
important, Iterative Improvement (IIIO), the genetic algo- rithm
(BushyGenetic), and, to a lesser extent, Two-Phase Optimization (2PO)
are feasible alternatives.

I'm not sure if there's anything more recent out there.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] windows doesn't notice backend death

2009-05-03 Thread Andrew Dunstan



justin wrote:

Tom Lane wrote:

  Have M$ really been stupid
enough to make an external kill look just like an exit() call?

regards, tom lane

  

kind of :-(



Would it not be easy to set the normal exitcode to something other 
than 1 to see the difference

ExitProcess()



Not really, as Tom showed later this is an example of a more general 
problem. I think his solution of detecting when backends have cleaned up 
nicely and when they have not is the right way to go.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] windows doesn't notice backend death

2009-05-03 Thread justin

Andrew Dunstan wrote:



justin wrote:


Would it not be easy to set the normal exitcode to something other 
than 1 to see the difference

ExitProcess()



Not really, as Tom showed later this is an example of a more general 
problem. I think his solution of detecting when backends have cleaned 
up nicely and when they have not is the right way to go.


cheers

andrew

Stupid thought  why can the some clueless code set the exit status to 
crashed status???  Would it not be more prudent to remove that ability??? 




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why isn't stats_temp_directory automatically created?

2009-05-03 Thread Tom Lane
Fujii Masao masao.fu...@gmail.com writes:
 Here is the revised patch; If stats_temp_directory indicates the symlink,
 we pursue the chain of symlinks and create the referenced directory.

I looked at this patch a bit.  I'm still entirely unconvinced that we
should be doing this at all --- if the directory is not there, there's
a significant probability that there's something wrong that is beyond
the backend's ability to understand or correct.  However, even ignoring
that objection, the patch is not ready to commit for a number of
reasons:

* Repeating the operation every time the stats file is written doesn't
seem like a particularly good idea; it eats cycles, and if the directory
disappears during live operation then there is *definitely* something
fishy going on.  Can't we fix it so that the work is only done when the
path setting changes?  (In principle you could do it in
assign_pgstat_temp_directory(), but I think something would be needed to
ensure that only the stats collector process actually tries to create
the directory.  Or maybe it would be simplest to try to run the code only
when we get a failure from trying to create the stats temp file.)

* I don't think the mkdir_p code belongs in fd.c.  It looks like
you copied-and-pasted it from initdb.c, which isn't any good either;
we don't want to maintain multiple copies of this.  Maybe a new
src/port/ file is indicated.

* elog(LOG) is not exactly an adequate response if the final chdir fails
--- you have just broken the process beyond recovery.  That alone may be
sufficient reason to reject the attempt to deal with symlinks.  As far
as pgstat_temp_directory is concerned, I'm not sure of the point of
making the GUC point to a symlink anyway --- if you have a GUC why not
just point it where you want the directory to be?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq is not thread safe

2009-05-03 Thread Tom Lane
Zdenek Kotala zdenek.kot...@sun.com writes:
 When postgreSQL is compiled with --thread-safe that libpq should be
 thread safe. But it is not true when somebody call fork(). The problem
 is that fork() forks only active threads and some mutex can stay locked
 by another thread. We use ssl_config mutex which is global.

fork() without exec() when there are open libpq connections is
unbelievably dangerous anyway --- you will have multiple processes
that all think they own the same database connection.  I think writing
code to deal with this for the ssl_config mutex is entirely a waste
of time.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] windows doesn't notice backend death

2009-05-03 Thread Andrew Dunstan



justin wrote:
Stupid thought  why can the some clueless code set the exit status to 
crashed status???  Would it not be more prudent to remove that ability???




You're missing the point. The danger comes from code that we don't control.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] windows doesn't notice backend death

2009-05-03 Thread Tom Lane
I wrote:
 The first idea that comes to mind is to have some sort of dead man
 switch that flags an active backend and is reset by proc_exit() after
 it's finished cleaning up everything else.  If the postmaster sees
 this flag still set after backend exit, then it treats the backend as
 having crashed regardless of what the reported exit code is.

Another thought that came to mind: we could set up an atexit hook that
does all the work that proc_exit() currently does, and reduce
proc_exit() itself to just an exit() call.  psql already relies on
having atexit (or on_exit) so this doesn't appear to add any new
portability issues.

This will probably not fix the Vista taskmanager issue, since I'll
bet it's not running atexit hooks anyway.  What it would do is improve
the situation so that a clueless exit() call would be no worse than
elog(FATAL), rather than triggering a DB-wide restart as the dead man
switch would do.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_resetxlog bug?

2009-05-03 Thread Tom Lane
Fujii Masao masao.fu...@gmail.com writes:
 Current pg_resetxlog doesn't remove any archive status files. This
 may cause continuous failure of archive command since .ready file
 remains even if a corresponding XLOG segment is removed. And,
 .done file without XLOG segment cannot be removed by checkpoint,
 and would remain forever. These are undesirable behaviors.

 I think that pg_resetxlog should remove existing archive status files
 of XLOG segments. Here is the patch to do so.

Applied with a trivial fix (the ending value of path isn't necessarily
right for a complaint about directory read failure, so use a constant
instead).

I back-patched as far as 8.1.  The issue exists in 8.0 too, but the
patch didn't apply immediately to 8.0 because of the above issue.
Given the lack of field complaints and 8.0's rather legacy status,
it didn't seem worth expending extra effort on.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] windows doesn't notice backend death

2009-05-03 Thread Alvaro Herrera
Tom Lane wrote:

 Up to now we've always just dismissed the above possibility as
 superusers should know better, but I think there's a reasonable case
 to be made that this is an obvious failure mode and we should put a bit
 more effort into being robust against it.  With more and more external
 code being routinely run in the backend, who wants to swear that there
 is no exit(1) in the guts of libperl or libxml or whatever?

FWIW there is (or there was, last time I looked) an exit(1) call in the
guts of the PHP library that PL/php uses, which is triggered when the
memory used goes over the configured memory limit.  It was very easily
triggered with some of the test functions we had on our regression
tests, and the only solution was to kludge up the limit.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] windows doesn't notice backend death

2009-05-03 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 FWIW there is (or there was, last time I looked) an exit(1) call in the
 guts of the PHP library that PL/php uses, which is triggered when the
 memory used goes over the configured memory limit.  It was very easily
 triggered with some of the test functions we had on our regression
 tests, and the only solution was to kludge up the limit.

I don't think we'll be able to prevent PHP from doing that :-(.  But
it now seems clear that we should try to make the database as a whole
recover with some degree of grace.  I'll go work up a patch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unchecked out of memory in postmaster.c

2009-05-03 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Tom Lane wrote:
 If you're really intent on doing something about this, my inclination
 would be to get rid of the dependence on DLNewElem altogether.  Add
 a Dlelem field to the Backend struct and use DLInitElem (compare
 the way catcache uses that module).

 Hmm, yeah, I had seen that code.  So it looks like this instead.

Huh, didn't you commit this yet?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] windows shared memory error

2009-05-03 Thread Andrew Dunstan



Tom Lane wrote:

The quick try would be to stick a SetLastError(0) in there, just to be
sure... Could be worth a try?



I kinda think we should do that whether or not it can be proven to
have anything to do with Andrew's report.  It's just like errno = 0
for Unix --- sometimes you have to do it to be sure of whether a
particular function has thrown an error.


  


I suspect it has little or nothing to do with it in fact. On my (very 
lightly loaded) Vista box a crash with exit code 9 seems to result in a 
consistently problem free restart. I did 200 iterations of the test.


Now presumably we sleep for 1 sec between the CloseHandle() call and the 
CreateFileMapping() call in that code for a reason. We have seen in 
other cases that Windows can take some time after a call has returned 
for some operations to actually complete, and I assume we have a similar 
case here. So, my question is: how do we know that 1 second is enough? 
Was that a wild guess?


I confess I don't have much confidence that just repeating it a few 
times without increasing the sleep interval will necessarily solve the 
problem.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] windows shared memory error

2009-05-03 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Now presumably we sleep for 1 sec between the CloseHandle() call and the 
 CreateFileMapping() call in that code for a reason.

I'm not sure.  Magnus never did answer my question about why the sleep
and retry was put in at all; it seems not unlikely from here that it
was mere speculation.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unchecked out of memory in postmaster.c

2009-05-03 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
  Tom Lane wrote:
  If you're really intent on doing something about this, my inclination
  would be to get rid of the dependence on DLNewElem altogether.  Add
  a Dlelem field to the Backend struct and use DLInitElem (compare
  the way catcache uses that module).
 
  Hmm, yeah, I had seen that code.  So it looks like this instead.
 
 Huh, didn't you commit this yet?

Sorry, I just did.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT

2009-05-03 Thread Jaime Casanova
On Sun, May 3, 2009 at 3:13 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Apr 6, 2009 at 11:15 AM, Robert Haas robertmh...@gmail.com wrote:
 So based on this comment and Stephen's remarks, I'm going to assume
 that I'm succumbing to a fit of unjustified paranoia and re-implement
 as you suggest.

 OK, new version of patch, this time with the weird scaling removed and
 the datatype changed to float4.


In this paragraph i think you mean: ALTER TABLE ... ALTER COLUMN ...
SET DISTINCT?

para
+The analyzer also estimates the number of distinct values that appear
+in each column.  Because only a subset of pages are scanned, this method
+can sometimes be quite inaccurate, especially for large tables.  If this
+inaccuracy leads to bad query plans, the analyzer can be forced to use
+a more correct value with commandALTER TABLE ... ALTER COLUMN ... SET
+STATISTICS/command (see xref linkend=sql-altertable
+endterm=sql-altertable-title).
+   /para

-- 
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers