from:"mark"

Re: [BUGS] BUG #8494: Cannot Backup / Restore

2013-10-01 Thread Mark Kirkwood


On 01/10/13 15:17, h...@canwrx.com wrote:


a)  My version is Postgres Enterprise Manager version 3.0.0, copyright
2002-2012, the pgAdmin Development Team; and Postgres Plus Advanced Server
9.2



Hi - your product is supported by Enterprisedb 
(http://www.enterprisedb.com/). I think you would be better raising this 
issue with them!


Cheers

Mark


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] pgadmin copy/paste bug

2013-09-03 Thread Mark Murawski

Randomly, you are unable to use ctrl-c/ctrl-v/ctrl-x shortcut keys in 
the pgadmin edit window, but you can use the right click menu.


You can make the shortcut keys work again by clicking in another widget 
in the same editor window and then clicking back to the main edit box.


Video link to show a really good example of this happening:

http://www.youtube.com/watch?v=e6WKzaB3J1c


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: R: Re: [BUGS] postgresql 8.4 on android table with debian installed (Debian Kit APK)

2013-08-01 Thread Mark Kirkwood


On 02/08/13 09:47, Mark Kirkwood wrote:


For the archives, looks like that was the issue, users need to be a
member of a certain group (gid 303) to use sockets (which is exactly
what Alvaro suspected).




Make that gid 3003, sorry.



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: R: Re: [BUGS] postgresql 8.4 on android table with debian installed (Debian Kit APK)

2013-08-01 Thread Mark Kirkwood


On 01/08/13 22:13, f...@libero.it wrote:

Da: mark.kirkw...@catalyst.net.nz

See  http://android-dls.com/wiki/index.php?title=Debian_on_G1 near the
bottom they discuss this issue.

Cheers

Mark


Thanks very very much: problem solved, postgresql, apache2, php5 ported on a
50$
small android table with little debian ... transformed in a very fast, cheap
and portable (200 g) server!!!

So it's possible (I think) porting postgresql database on  all android tablets
and rooted android cellulars!!!





For the archives, looks like that was the issue, users need to be a 
member of a certain group (gid 303) to use sockets (which is exactly 
what Alvaro suspected).


Cheers

Mark



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] pgBench cannot use prepared statements and reconnections

2013-07-31 Thread Mark Kirkwood


On 01/08/13 11:08, Mark Kirkwood wrote:

On 01/08/13 10:08, Josh Berkus wrote:


To wit:

[jberkus@pgx-test prepare]$ pgbench -c 4 -C -T 180 -l -r -M prepared
bench
starting vacuum...end.
Client 1 aborted in state 7: ERROR:  prepared statement "P0_7" does not
exist
Client 0 aborted in state 7: ERROR:  prepared statement "P0_7" does not
exist
Client 3 aborted in state 7: ERROR:  prepared statement "P0_7" does not
exist
Client 2 aborted in state 7: ERROR:  prepared statement "P0_7" does not
exist
transaction type: TPC-B (sort of)



Strange - work for me (9.4devel pulled just now).

Cheers

Mark


Hmmm, apologies - actually it didn't - I read right past the text

ERROR:  prepared statement "P0_7" does not exist




--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] pgBench cannot use prepared statements and reconnections

2013-07-31 Thread Mark Kirkwood


On 01/08/13 10:08, Josh Berkus wrote:


To wit:

[jberkus@pgx-test prepare]$ pgbench -c 4 -C -T 180 -l -r -M prepared bench
starting vacuum...end.
Client 1 aborted in state 7: ERROR:  prepared statement "P0_7" does not
exist
Client 0 aborted in state 7: ERROR:  prepared statement "P0_7" does not
exist
Client 3 aborted in state 7: ERROR:  prepared statement "P0_7" does not
exist
Client 2 aborted in state 7: ERROR:  prepared statement "P0_7" does not
exist
transaction type: TPC-B (sort of)



Strange - work for me (9.4devel pulled just now).

Cheers

Mark


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] postgresql 8.4 on android table with debian installed (Debian Kit APK)

2013-07-31 Thread Mark Kirkwood


On 01/08/13 09:13, f...@libero.it wrote:

problem:





LOG:  could not create IPv6 socket: Permission denied
LOG:  could not create IPv4 socket: Permission denied
WARNING:  could not create listen socket for "localhost"
FATAL:  could not create any TCP/IP sockets





See  http://android-dls.com/wiki/index.php?title=Debian_on_G1 near the 
bottom they discuss this issue.


Cheers

Mark




--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #8034: pg_buffercache gets invalid memory alloc request size with very large shared memory buffers

2013-04-04 Thread Mark Kirkwood


On 03/04/13 08:44, dben...@whitepages.com wrote:

The following bug has been logged on the website:

Bug reference:  8034
Logged by:  Devin Ben-Hur
Email address:  dben...@whitepages.com
PostgreSQL version: 9.2.3
Operating system:   Ubuntu Precise
Description:

When a very large shared buffer pool (~480GB) is used with postgresql,
pg_buffercache contrib module gets an allocation error trying to Allocate
NBuffers worth of BufferCachePagesRec records:

https://github.com/postgres/postgres/blob/REL9_2_3/contrib/pg_buffercache/pg_buffercache_pages.c#L101-L102

The requested allocation exceeds the 1GB limitation imposed by
AllocSizeIsValid macro:
https://github.com/postgres/postgres/blob/REL9_2_3/src/include/utils/memutils.h#L40-L43

Reproduce:
1) acquire server with half terabyte of memory
2) tweak OS settings to allow large shared memory
3) set postgresql.conf: shared_buffers = 400GB
4) CREATE EXTENSION pg_buffercache;
5) SELECT * FROM pg_buffercache LIMIT 1;





Yes indeed - however I'm not sure this is likely to be encountered in 
any serious configuration. The general rule for sizing shared buffers is:


shared_buffers = min(0.25 * RAM, 8G)

Now there has been some discussion about how settings bigger than 8G 
make sense in some cases...but I'm not aware of any suggestions that 
sizes in the hundreds of G make sense.


However it would be nice if pg_buffercache *could* work with bigger 
sizes if they make sense at any time.  Someone who understands the 
memory allocation system better than I do will need to comment about how 
that might work :-)


Cheers

Mark


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] New kind of crash?

2013-03-08 Thread Mark Kirkwood

Do you have any non default procedural languages installed? I provoked 
exactly that error with a similar script which used a PL/R procedure 
(see BUGS thread "PL/R Median Busts Commit"...the cause is signal 
hi-jacking in that case).


Regards

Mark

On 09/03/13 13:27, Josh Berkus wrote:

Folks,

This is one I've never seen before:

=> select generate_master_tables();
WARNING:  AbortTransaction while in COMMIT state
PANIC:  cannot abort transaction 9387287, it was already committed
PANIC:  cannot abort transaction 9387287, it was already committed
The connection to the server was lost. Attempting reset: Failed.
Time: 42259.753 ms
!>

Interestingly, Postges came back up on its own after about a minute and
seems to be OK.  The stored procedure in question is one which creates
about 600 tables.  This actually exceeds max_locks_per_transaction, but
what happened above is not the normal failure for that.

Note that the tables were, in fact, created, and as far as I can tell
there's no corruption of the database.





--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] PL/R Median Busts Commit (Postgres 9.1.6 + plr 8.3.0.13 on Ubuntu 12.10 64 bit)

2013-01-28 Thread Mark Kirkwood


On 29/01/13 10:29, Mark Kirkwood wrote:

On 25/01/13 13:56, Mark Kirkwood wrote:

On 25/01/13 13:49, Tom Lane wrote:

Mark Kirkwood  writes:

On 25/01/13 13:06, Tom Lane wrote:

Unless libR can be coerced into not screwing up our signal handlers,
I'd say that PL/R is broken beyond repair.  That would be 
unfortunate.



It looks like Joe has run into something similar with libR stealing
SIGINT, he reinstalls it. A simple patch along the same lines for
SIGUSR1 (attached) seems to fix the issue.


This certainly is not good enough, for either signal.  What happens if
the signal arrives while libR still has control?  These things being
asynchronous with respect to the receiving backend, we certainly can't
assume that that won't happen.

Why does libR think it should be messing with these handlers in the
first place?




Agreed - I will see if I can work out why.




Looking at the R source (which reminds me of Postgres, nice, clear 
code...), I saw quite a bit of code involving signal handlers - and 
nothing along the lines of "don't set them up if you are in embedded 
mode". So clearly more investigation is needed, and we really need to 
take this up on the R mailing list I think. Joe - is this something 
you would like to do? I am happy do it, but my knowledge of R is 
measured in hours...





Like a dog with a bone - sometimes it's hard to let go... given the 
quite readable nature of the R source I managed to find what I *think* 
is the "don't do R signals switch" (which I'd obviously missed before). 
Making use of this seems to fix the original bug - and possibly the 
SIGINT stealing too.


Patch attached to set the variable (R_SignalHandlers = 0), and remove 
the SIGINT workaround.


Cheers

Mark


*** plr.c.orig	2011-08-30 10:42:56.0 +1200
--- plr.c	2013-01-29 12:56:53.995936063 +1300
***
*** 31,37 
   * plr.c - Language handler and support functions
   */
  #include "plr.h"
- #include "libpq/pqsignal.h"
  
  PG_MODULE_MAGIC;
  
--- 31,36 
***
*** 369,374 
--- 368,378 
  	atexit(plr_atexit);
  
  	/*
+ 	 * Try to stop R using its own signal handlers
+ 	 */
+ 	R_SignalHandlers = 0;
+ 
+ 	/*
  	 * When initialization fails, R currently exits. Check the return
  	 * value anyway in case this ever gets fixed
  	 */
***
*** 392,401 
  	R_Interactive = false;
  #endif
  
- 	/*
- 	 * R seems to try to steal SIGINT in recent releases, so steal it back
- 	 */
- 	pqsignal(SIGINT, StatementCancelHandler);		/* cancel current query */
  
  	plr_pm_init_done = true;
  }
--- 396,401 

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] PL/R Median Busts Commit (Postgres 9.1.6 + plr 8.3.0.13 on Ubuntu 12.10 64 bit)

2013-01-28 Thread Mark Kirkwood


On 25/01/13 13:56, Mark Kirkwood wrote:

On 25/01/13 13:49, Tom Lane wrote:

Mark Kirkwood  writes:

On 25/01/13 13:06, Tom Lane wrote:

Unless libR can be coerced into not screwing up our signal handlers,
I'd say that PL/R is broken beyond repair.  That would be unfortunate.



It looks like Joe has run into something similar with libR stealing
SIGINT, he reinstalls it. A simple patch along the same lines for
SIGUSR1 (attached) seems to fix the issue.


This certainly is not good enough, for either signal.  What happens if
the signal arrives while libR still has control?  These things being
asynchronous with respect to the receiving backend, we certainly can't
assume that that won't happen.

Why does libR think it should be messing with these handlers in the
first place?




Agreed - I will see if I can work out why.




Looking at the R source (which reminds me of Postgres, nice, clear 
code...), I saw quite a bit of code involving signal handlers - and 
nothing along the lines of "don't set them up if you are in embedded 
mode". So clearly more investigation is needed, and we really need to 
take this up on the R mailing list I think. Joe - is this something you 
would like to do? I am happy do it, but my knowledge of R is measured in 
hours...


Cheers

Mark



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] PL/R Median Busts Commit (Postgres 9.1.6 + plr 8.3.0.13 on Ubuntu 12.10 64 bit)

2013-01-24 Thread Mark Kirkwood


On 25/01/13 13:49, Tom Lane wrote:

Mark Kirkwood  writes:

On 25/01/13 13:06, Tom Lane wrote:

Unless libR can be coerced into not screwing up our signal handlers,
I'd say that PL/R is broken beyond repair.  That would be unfortunate.



It looks like Joe has run into something similar with libR stealing
SIGINT, he reinstalls it. A simple patch along the same lines for
SIGUSR1 (attached) seems to fix the issue.


This certainly is not good enough, for either signal.  What happens if
the signal arrives while libR still has control?  These things being
asynchronous with respect to the receiving backend, we certainly can't
assume that that won't happen.

Why does libR think it should be messing with these handlers in the
first place?




Agreed - I will see if I can work out why.

Cheers

Mark




--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] PL/R Median Busts Commit (Postgres 9.1.6 + plr 8.3.0.13 on Ubuntu 12.10 64 bit)

2013-01-24 Thread Mark Kirkwood


On 25/01/13 13:06, Tom Lane wrote:

Mark Kirkwood  writes:

If I have done this right, then this is the trace for the 1st message...
from my wandering through the calls here it looks like a normal commit,
and something goes a bit weird as SI messages are being processed...


Seems like the critical bit is here:


#11 0x7f4e2a53d985 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x7f4e272b951a in ?? () from /usr/lib/libR.so
#13 
#14 0x7f4e2a538707 in kill () from /lib/x86_64-linux-gnu/libc.so.6
#15 0x006152e5 in SICleanupQueue (
  callerHasWriteLock=callerHasWriteLock@entry=1 '\001',
  minFree=minFree@entry=4) at sinvaladt.c:672


Frame 15 is definitely SICleanupQueue trying to send a catchup SIGUSR1
interrupt to the furthest-behind backend.  The fact that we go directly
into a signal handler from the kill() suggests that the furthest-behind
backend is actually *this* backend, which perhaps is a bit surprising,
but it's supposed to work.  What it looks like, though, is that libR has
commandeered the SIGUSR1 signal handler, and just to be extra special
unfriendly to the surrounding program, it does an exit() when it traps a
SIGUSR1.

Unless libR can be coerced into not screwing up our signal handlers,
I'd say that PL/R is broken beyond repair.  That would be unfortunate.

regards, tom lane


It looks like Joe has run into something similar with libR stealing 
SIGINT, he reinstalls it. A simple patch along the same lines for 
SIGUSR1 (attached) seems to fix the issue.


I wonder if we need to install *all* the remaining signal handlers too?

Cheers

Mark

*** plr.c.orig	2013-01-25 13:35:15.674086215 +1300
--- plr.c	2013-01-25 13:35:20.770086052 +1300
***
*** 397,402 
--- 397,405 
  	 */
  	pqsignal(SIGINT, StatementCancelHandler);		/* cancel current query */
  
+ 	/* Ditto for SIGUSR1*/
+ 	pqsignal(SIGUSR1, procsignal_sigusr1_handler);	/*  handle usr1 */
+ 
  	plr_pm_init_done = true;
  }
  

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] PL/R Median Busts Commit (Postgres 9.1.6 + plr 8.3.0.13 on Ubuntu 12.10 64 bit)

2013-01-24 Thread Mark Kirkwood


On 25/01/13 10:36, Tom Lane wrote:

Mark Kirkwood  writes:

Doh! Yes of course, sorry for the noise. I was busy thinking that the
issue could be tied up with sinval and plan caching (if there is any) in
plr and got excited about seeing something in gdb...and didn't think
carefully about why what I was seeing was not a bug at all :-( Thanks
for clarifying!


FWIW, presumably the failure comes from something throwing elog(ERROR)
after RecordTransactionCommit().  So what I'd try is setting a
breakpoint at errfinish and investigating what's causing the first error
call.


If I have done this right, then this is the trace for the 1st message... 
from my wandering through the calls here it looks like a normal commit, 
and something goes a bit weird as SI messages are being processed...


0x7f4e2a538425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x7f4e2a538425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x7f4e2a53bb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x006e0a70 in errfinish (dummy=dummy@entry=0) at elog.c:523
#3  0x006e3031 in elog_finish (elevel=elevel@entry=22,
fmt=fmt@entry=0x714d58 "cannot abort transaction %u, it was already 
committed") at elog.c:1202

#4  0x00481d87 in RecordTransactionAbort (
isSubXact=isSubXact@entry=0 '\000') at xact.c:1366
#5  0x00481e89 in AbortTransaction () at xact.c:2272
#6  0x004851c5 in AbortOutOfAnyTransaction () at xact.c:3807
#7  0x006eddc9 in ShutdownPostgres (code=,
arg=) at postinit.c:976
#8  0x00610b7d in shmem_exit (code=code@entry=-1) at ipc.c:221
#9  0x00610bf5 in proc_exit_prepare (code=-1) at ipc.c:181
#10 0x7f4e2a53d901 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#11 0x7f4e2a53d985 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x7f4e272b951a in ?? () from /usr/lib/libR.so
#13 
#14 0x7f4e2a538707 in kill () from /lib/x86_64-linux-gnu/libc.so.6
#15 0x006152e5 in SICleanupQueue (
callerHasWriteLock=callerHasWriteLock@entry=1 '\001',
minFree=minFree@entry=4) at sinvaladt.c:672
#16 0x006153bd in SIInsertDataEntries (data=0x3943b70, n=0)
at sinvaladt.c:437
#17 0x006d1371 in ProcessInvalidationMessagesMulti (
hdr=hdr@entry=0x204a280, func=0x6149f0 )
at inval.c:402
#18 0x006d2020 in AtEOXact_Inval (isCommit=isCommit@entry=1 '\001')
at inval.c:940
#19 0x00482bdd in CommitTransaction () at xact.c:1893
#20 0x00484975 in CommitTransactionCommand () at xact.c:2562
#21 0x006276d3 in finish_xact_command () at postgres.c:2452
#22 finish_xact_command () at postgres.c:2441
#23 0x0062a6f5 in exec_simple_query (query_string=0x20e8d70 
"COMMIT;")

at postgres.c:1037
#24 PostgresMain (argc=, argv=argv@entry=0x20476b8,
username=) at postgres.c:3968
#25 0x005ee92f in BackendRun (port=0x2068f40) at postmaster.c:3617
#26 BackendStartup (port=0x2068f40) at postmaster.c:3302
#27 ServerLoop () at postmaster.c:1466
#28 0x005ef4dc in PostmasterMain (argc=argc@entry=1,
argv=argv@entry=0x2044640) at postmaster.c:1127
#29 0x00453edb in main (argc=1, argv=0x2044640) at main.c:199



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] PL/R Median Busts Commit (Postgres 9.1.6 + plr 8.3.0.13 on Ubuntu 12.10 64 bit)

2013-01-24 Thread Mark Kirkwood


On 25/01/13 10:18, Tom Lane wrote:

Mark Kirkwood  writes:

Sorry - the others are getting a SIGUSR1 too (just was not so obvious).


SIGUSR1 is not a bug, it's expected cross-session signaling behavior.

regards, tom lane




Doh! Yes of course, sorry for the noise. I was busy thinking that the 
issue could be tied up with sinval and plan caching (if there is any) in 
plr and got excited about seeing something in gdb...and didn't think 
carefully about why what I was seeing was not a bug at all :-( Thanks 
for clarifying!


Cheers

Mark


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] PL/R Median Busts Commit (Postgres 9.1.6 + plr 8.3.0.13 on Ubuntu 12.10 64 bit)

2013-01-24 Thread Mark Kirkwood


On 25/01/13 10:12, Mark Kirkwood wrote:

On 25/01/13 04:14, Joe Conway wrote:

On 01/24/2013 05:21 AM, Mark Kirkwood wrote:

I admit - it sounds unlikely. However a simple scenario (attached) gives
rise to:


This is the wrong place for the bug report on PL/R I think, but I'll
take a look.

Joe




FYI - 8.4 shows the same behaviour as 9.1, but 8.3 gets a SUGUSR1:


Program received signal SIGUSR1, User defined signal 1.
0x005a401f in SIInsertDataEntry (segP=0x7f3b8ea5e2a0,
 data=data@entry=0x4082d10) at sinvaladt.c:255
255segP->buffer[segP->maxMsgNum % MAXNUMMESSAGES] = *data;
(gdb)




Sorry - the others are getting a SIGUSR1 too (just was not so obvious). 
Here's bt from 9.1:


Program received signal SIGUSR1, User defined signal 1.
0x7f2bfe4c1707 in kill () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x7f2bfe4c1707 in kill () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00611683 in SICleanupQueue (callerHasWriteLock=1 '\001',
minFree=) at sinvaladt.c:672
#2  0x00611768 in SIInsertDataEntries (data=0x7f2bf68b4cb0, n=0)
at sinvaladt.c:437
#3  0x006cdad1 in ProcessInvalidationMessagesMulti (hdr=0x295b240,
func=0x610de0 ) at inval.c:402
#4  0x006ce780 in AtEOXact_Inval (isCommit=)
at inval.c:940
#5  0x004826cd in CommitTransaction () at xact.c:1893
#6  0x0048467d in CommitTransactionCommand () at xact.c:2562
#7  0x00623a17 in finish_xact_command () at postgres.c:2452
#8  finish_xact_command () at postgres.c:2441
#9  0x00626b5d in exec_simple_query (query_string=0x29f7d60 
"COMMIT;")

at postgres.c:1051
#10 PostgresMain (argc=, argv=,
username=) at postgres.c:3968
#11 0x005eb429 in BackendRun (port=0x2979f00) at postmaster.c:3611
#12 BackendStartup (port=0x2979f00) at postmaster.c:3296
#13 ServerLoop () at postmaster.c:1460
#14 0x005ebd3c in PostmasterMain (argc=, 
argv=0x2955600)

at postmaster.c:1121
#15 0x00453d20 in main (argc=1, argv=0x2955600) at main.c:199
(gdb)



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] PL/R Median Busts Commit (Postgres 9.1.6 + plr 8.3.0.13 on Ubuntu 12.10 64 bit)

2013-01-24 Thread Mark Kirkwood


On 25/01/13 04:14, Joe Conway wrote:

On 01/24/2013 05:21 AM, Mark Kirkwood wrote:

I admit - it sounds unlikely. However a simple scenario (attached) gives
rise to:


This is the wrong place for the bug report on PL/R I think, but I'll
take a look.

Joe




FYI - 8.4 shows the same behaviour as 9.1, but 8.3 gets a SUGUSR1:


Program received signal SIGUSR1, User defined signal 1.
0x005a401f in SIInsertDataEntry (segP=0x7f3b8ea5e2a0,
data=data@entry=0x4082d10) at sinvaladt.c:255
255 segP->buffer[segP->maxMsgNum % MAXNUMMESSAGES] = *data;
(gdb)




--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] PL/R Median Busts Commit (Postgres 9.1.6 + plr 8.3.0.13 on Ubuntu 12.10 64 bit)

2013-01-24 Thread Mark Kirkwood

Ah right - sorry, I did a quick look for a mail list on the plr web site 
and didn't spot anything.


Thanks

Mark

On 25/01/13 04:14, Joe Conway wrote:

On 01/24/2013 05:21 AM, Mark Kirkwood wrote:

I admit - it sounds unlikely. However a simple scenario (attached) gives
rise to:


This is the wrong place for the bug report on PL/R I think, but I'll
take a look.






--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] PL/R Median Busts Commit (Postgres 9.1.6 + plr 8.3.0.13 on Ubuntu 12.10 64 bit)

2013-01-23 Thread Mark Kirkwood

I admit - it sounds unlikely. However a simple scenario (attached) gives 
rise to:


WARNING:  AbortTransaction while in COMMIT state
PANIC:  cannot abort transaction 880983, it was already committed

Essentially we are doing:

BEGIN;
DROP TABLE IF EXISTS tab0;
CREATE TEMP TABLE tab0 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab0  SELECT generate_series(1,1000),'xx';
SELECT median(id) FROM tab0;
DROP TABLE IF EXISTS tab1;
CREATE TEMP TABLE tab1 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab1  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab2;
...
DROP TABLE IF EXISTS tab99;
CREATE TEMP TABLE tab99 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab99  SELECT generate_series(1,1000),'xx';
SELECT median(id) FROM tab99;
COMMIT;


We crash at commit.

Changing median to something else (e.g avg or even a non PLR median 
function) makes the error vanish, so its either PLR or the specific PLR 
median function causing the grief.


Regards

Mark


create or replace function r_median(_float8) returns float as '
  median(arg1)
' language 'plr';

CREATE AGGREGATE median (
  sfunc = plr_array_accum,
  basetype = float8,
  stype = _float8,
  finalfunc = r_median
);
BEGIN;
DROP TABLE IF EXISTS tab0;
CREATE TEMP TABLE tab0 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab0  SELECT generate_series(1,1000),'xx';
SELECT median(id) FROM tab0;
DROP TABLE IF EXISTS tab1;
CREATE TEMP TABLE tab1 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab1  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab2;
CREATE TEMP TABLE tab2 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab2  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab3;
CREATE TEMP TABLE tab3 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab3  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab4;
CREATE TEMP TABLE tab4 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab4  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab5;
CREATE TEMP TABLE tab5 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab5  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab6;
CREATE TEMP TABLE tab6 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab6  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab7;
CREATE TEMP TABLE tab7 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab7  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab8;
CREATE TEMP TABLE tab8 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab8  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab9;
CREATE TEMP TABLE tab9 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab9  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab10;
CREATE TEMP TABLE tab10 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab10  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab11;
CREATE TEMP TABLE tab11 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab11  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab12;
CREATE TEMP TABLE tab12 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab12  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab13;
CREATE TEMP TABLE tab13 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab13  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab14;
CREATE TEMP TABLE tab14 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab14  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab15;
CREATE TEMP TABLE tab15 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab15  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab16;
CREATE TEMP TABLE tab16 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab16  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab17;
CREATE TEMP TABLE tab17 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab17  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab18;
CREATE TEMP TABLE tab18 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab18  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab19;
CREATE TEMP TABLE tab19 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab19  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab20;
CREATE TEMP TABLE tab20 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab20  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab21;
CREATE TEMP TABLE tab21 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab21  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab22;
CREATE TEMP TABLE tab22 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab22  SELECT generate_series(

Re: [BUGS] Pg_stat_replication shows sync standby with flush location behind primary in 9.1.5

2012-10-04 Thread Mark Kirkwood


On 04/10/12 19:06, Simon Riggs wrote:

On 4 October 2012 05:32, Mark Kirkwood  wrote:

I am seeing the situation where the reported flush location for the sync
standby (standby1 below) is *behind* the reported current xlog location of
the primary. This is Postgres 9.1.5 , and I was under the impression that
transactions initiated on the master do not commit until the corresponding
wal is flushed on the sync standby.

Now the standby is definitely working in sync mode, because stopping it
halts all write transactions on the primary (sync_standby_names contains
only standby1). So is the reported lag in flush location merely an artifact
of timing in the query, or is there something else going on? [1]


The writing of new WAL is independent of the wait that occurs on
commit, so it is entirely possible, even desirable, that the observed
effect occurs.



Ah right - it did occur to me (after posting of course), that *other* 
non commit wal could be causing the effect... thank you for clarifying!


This could be worth mentioning in docs for the view - as the context 
I've encountered this effect is folks writing scripts for replication 
lag etc.


Cheers

Mark


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] Pg_stat_replication shows sync standby with flush location behind primary in 9.1.5

2012-10-03 Thread Mark Kirkwood

I am seeing the situation where the reported flush location for the sync 
standby (standby1 below) is *behind* the reported current xlog location 
of the primary. This is Postgres 9.1.5 , and I was under the impression 
that transactions initiated on the master do not commit until the 
corresponding wal is flushed on the sync standby.


Now the standby is definitely working in sync mode, because stopping it 
halts all write transactions on the primary (sync_standby_names contains 
only standby1). So is the reported lag in flush location merely an 
artifact of timing in the query, or is there something else going on? [1]


db=# SELECT 
application_name,pg_current_xlog_location(),sent_location,write_location,flush_location,replay_location,sync_priority,state

 FROM pg_stat_replication where replay_location is not null;
 application_name | pg_current_xlog_location | sent_location | 
write_location | flush_location | replay_location | sync_priority |   state

--+--+---+++-+---+---
 standby1 | E/254909E0   | E/2549| 
E/2548C3B8 | E/2548C3B8 | E/25476DE0 | 1 | 
streaming   <===
 standby2 | E/254909E0   | E/2548C3B8| 
E/25476DE0 | E/25476DE0 | E/254724C0 | 0 | streaming
 standby3 | E/254909E0   | E/254909E0| 
E/25476DE0 | E/25476DE0 | E/254724C0 | 0 | streaming
 standby4 | E/254909E0   | E/2549| 
E/2548C3B8 | E/25476DE0 | E/25476DE0 | 0 | streaming
 standby5 | E/254909E0   | E/2549| 
E/25476DE0 | E/25476DE0 | E/254724C0 | 0 | streaming

(5 rows)


Cheers

Mark

[1] Looking at the code for pg_stat_replication, it appears to take the 
sync rep lock while reporting, so in theory should be exactly right...I 
should perhaps check what pg_current_xlog_location does...



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #7524: Partitioning

2012-09-09 Thread Mark Kirkwood


On 09/09/12 14:01, Kevin Grittner wrote:

wrote:


The TWO most important factors in hindering us to convert to
Postgres are the following:

Parallel execution of queries.

No Table Partitioning


Not a bug, so off-topic for this list.  If you need help figuring
out how best to use PostgreSQL, or whether it is a good fit for
your use-case, a post to pgsql-nov...@postgresql.org would be more
appropriate.




Or even -hackers - but it would make sense to ask 2 questions along the 
lines of:


1/ What is the current state of table partitioning...is anyone working 
on something a bit more native than abusing inheritance?


2/ Is anyone working on parallel query execution?

Regards


Mark



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #6689: socket file name convention doesn't allow different IPs and the same port for different pgclusters

2012-06-15 Thread Mark Schneider


Hello Magnus,

Thanks a lot for your time checking my email.

Am 15.06.2012 07:56, schrieb Magnus Hagander:

On Wed, Jun 13, 2012 at 2:45 AM,  wrote:

The following bug has been logged on the website:

Bug reference:  6689
Logged by:  Mark
Email address:  m...@it-infrastrukturen.org
PostgreSQL version: 9.1.3
Operating system:   unbuntu-server 12.04 LTS
Description:

When using /var/run/postgresql/ as unix_socket_directory for "main" and
"second" pgcluster and different IP-interfaces (addresse) for every
pgcluster (but the same default port number), there are different pid file
names (like 9.1-main.pid and 9.1-secondpg.pid) *but* names of unix_sockek
files doesn't follow such name conventions.

It results in error when trying to run the second pgcluster with another IP
but the same (default port 5432):
Error: Port conflict: another instance is already running on on
/var/run/postgresql with port 5432

First of all, this is not a bug - this is intended behaviour. Perhaps
the documentation needs to be clearified on the fact that the port
number needs to be unique across all instances though.
Even it is not an implementation bug it is a kind of *conception* bug 
from my point of view.
There is *no* real reason why port numbers should be different for 
different IP addresses.


There is anything about running further psql instances on different 
ethernet interfaces (on one box) in the current doc yet.



One reason is that the port number is used to control several other
things. Just the Unix socket is only one of the things - it also
controls the name of shared memory segments, so it still won't work if
you work around that.
When I want to run two separate psql instances they should use also 
separate shared memory segments.



All instances need to have their own ports. If you want to listen on
the same port on different IPs, you will need to use something like
kernel level ip forwarding to rewrite the access, and actually run
PostgreSQL on different ports still.
Using different port number for the second ethernet interface works as 
expected.


How complex is it to implement different unix_socket names in the sourse 
code?

(when further instances listen on different ethernet interfaces)

regards, Mark

--
m...@it-infrastrukturen.org

http://rsync.it-infrastrukturen.org


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #6685: Concurrency fault in create table

2012-06-10 Thread Mark Thornton


On 10/06/12 22:08, Tom Lane wrote:

mthorn...@optrak.com writes:

The following bug has been logged on the website:
Bug reference:  6685
Logged by:  Mark Thornton
Email address:  mthorn...@optrak.com
PostgreSQL version: 9.1.4
Operating system:   Ubuntu 12.04
Description:
Executing
create table child_a () inherits (parent)
create table child_b () inherits (parent)
concurrently (two separate connections) results in an error "concurrent
tuple update".

You sure that's a 9.1 server?  I can reproduce such an error in older
releases, but not 9.1.
Ah, now I come to think of it, I do have more than one version here. 
I'll recheck which one shows the problem.


Mark


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] log_collector doesn't respond to reloads

2012-04-26 Thread Mark Kirkwood


On 27/04/12 13:11, Josh Berkus wrote:

On 4/26/12 5:50 PM, Tom Lane wrote:

Josh Berkus  writes:

Summary: despite pg_reload(), log directory, filename and destination
don't change

Looking at the code, it's really hard to see how this could possibly
happen, unless maybe the process is blocking receipt of SIGHUP.  Which
it shouldn't be.  Not sure about RHEL5, but on recent Linuxen you can
check the process's signal masks like this:

grep ^Sig /proc//status

where  is the logging collector's PID.  Could we see that?

SigQ:   0/399360
SigPnd: 
SigBlk: 
SigIgn: 01007806
SigCgt: 0201



Might be a red herring, but I was able to reproduce this if (and only 
if) I forgot to create the new dest directory before doing the reload. 
Subsequently creating the directory and reloading did not result in a 
file in the new location.


Regards

Mark

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] Subquery with toplevel reference used to work in pg 8.4

2012-03-23 Thread Mark Murawski


I agree the query is a little odd, but I like backwards compatibility!


Postgres 8.4.1
--
CREATE VIEW v_members AS
 SELECT
   1 as member_id,
   100 as tenant_id,
   3732 as conference_id,
   200 as uid
 FROM
   (select 1 as uid_user, 2 as uid_contact) as m;

SELECT
   u.tenant_id,
   u.uid
 FROM
   (select 100 as tenant_id, 200 as uid) u
   LEFT JOIN v_members m ON (m.uid = u.uid AND m.conference_id = 3732)
 WHERE
  (
SELECT 1
FROM   (select 3732 as conference_id) c
WHERE  (c.conference_id = 3732) AND (m.uid IS NOT NULL)
  ) = 1;

 tenant_id | uid
---+-
   100 | 200
(1 row)




postgres 9.1.3
--

CREATE VIEW v_members AS
 SELECT
   1 as member_id,
   100 as tenant_id,
   3732 as conference_id,
   200 as uid
 FROM
   (select 1 as uid_user, 2 as uid_contact) as m;

SELECT
   u.tenant_id,
   u.uid
 FROM
   (select 100 as tenant_id, 200 as uid) u
   LEFT JOIN v_members m ON (m.uid = u.uid AND m.conference_id = 3732)
 WHERE
  (
SELECT 1
FROM   (select 3732 as conference_id) c
WHERE  (c.conference_id = 3732) AND (m.uid IS NOT NULL)
  ) = 1;

ERROR:  Upper-level PlaceHolderVar found where not expected



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #6511: calling spi_exec_query from non-main package, results in: couldn't fetch $_TD

2012-03-17 Thread Mark M.


On 03/16/12 13:48, Alex Hunsaker wrote:

On Thu, Mar 15, 2012 at 16:13, Bruce Momjian  wrote:

On Tue, Mar 06, 2012 at 09:08:25PM -0700, Alex Hunsaker wrote:

[ Calling a plperl trigger function from a plperl function ]

Yeah, there were some optimization done for 9.1 to try and make calls
a bit faster. The problem is we are fetching "_TD" not "main::_TD",
which means we try to find and use $_TD from whatever the current
package is. This should only happen from a nested plperl to plperl
trigger where the outer call was in a different package, otherwise the
package is always main.

The attached fixes it for me, It would be great if you could confirm that.

Thanks for the report!

So, should this be applied?

Yes, but I would like to get conformation that it fixes the issue.
Either way Ill probably just end up submitting it to the next commit
fest. (I know most commiters are still busy with the current one).



Fixes the problem.Thanks.  I always like one line fixes.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #6462: rpmbuild fails during regression tests on SMP ARM machines

2012-02-20 Thread Mark Langsdorf

On 02/18/2012 10:21 AM, Tom Lane wrote:
> mark.langsd...@calxeda.com writes:
>> I'm trying to build 9.0.6 for Fedora 15 on an ARM (Cortex A9) SMP processor
>> . The code compiles but the regression tests fail.
>> What can I do to help resolve this problem?
> 
> Does this patch make it better?
> http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=bb65cb8cdf864e61bc939d3c4b28bbd43d926700

That solves the problem. Thanks!

--Mark Langsdorf
Calxeda, Inc.


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #6462: rpmbuild fails during regression tests on SMP ARM machines

2012-02-17 Thread mark . langsdorf

The following bug has been logged on the website:

Bug reference:  6462
Logged by:  Mark Langsdorf
Email address:  mark.langsd...@calxeda.com
PostgreSQL version: 9.0.6
Operating system:   Fedora fc15
Description:

I'm trying to build 9.0.6 for Fedora 15 on an ARM (Cortex A9) SMP processor
. The code compiles but the regression tests fail.

If I reboot the ARM system with "maxcpus=1", the rpmbuild completes. So each
individual test in the regression suite passes, but they fail when run in
parallel.

I have also reproduced the problem and solution with 9.1.2 on ARM systems.

What can I do to help resolve this problem?


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #6404: postgres account not created during unattended install

2012-02-04 Thread Mark Phillips

Thank you for your reply.

On Feb 3, 2012, at 9:31 AM, Dharmendra Goyal wrote:

> On Fri, Feb 3, 2012 at 10:37 PM, Mark Phillips  
> wrote:
> After considering your remarks and modifying the app install script, the app 
> installation completes normally with a functioning PostgreSQL installation. 
> However, the errors reported previous still appear in the logs. 
> If erros which you sent last tim appear  even when you run the installation 
> first time, it means that there was an existing installation of postgres. You 
> can confirm same by checking /etc/postgres-reg.ini file.

Immediately after restoration of the vm image, there is no file 
/etc/postgres-reg.ini. Perhaps it is placed elsewhere by the OS X designers. I 
cannot find it with locate enabled and run as root. 

The file is present after I run the postgres installer. 


> To recap the questions:
> 1. are the errors reported in the postgres install log of a type to cause a 
> malfunction for end users?
> Even if the error (postgres user not found) is there in the log, it will not 
> cause any problem to the end user. Error is coming because installer finds a 
> previous installation of PG hence tries to stop the existing pg server if it 
> is running. So even if it is not able to stop any, it is not going to cause 
> any issue as subsequent installation of pg goes fine. 

This is good news. :-)

> 2. how can I suppress the appearance of the user account "PostgreSQL" in the 
> os x login window?
> Please check /etc/postgres-reg.ini file for bindled pg installation.  

Forgive me for being redundant, but I cannot locate a file named 
"postgres-reg.ini" prior to running the installer.

Here is the content from the file /etc/postgres-reg.ini written by the postgres 
installer:

$ cat etc/postgres-reg.ini 
[PostgreSQL/9.1]
Branding=PostgreSQL 9.1
DataDirectory=/Library/PostgreSQL/9.1/data
Description=PostgreSQL 9.1
DisableStackBuilder=0
InstallationDirectory=/Library/PostgreSQL/9.1
Locale=C
Port=5432
Serviceaccount=postgres
ServiceID=postgresql-9.1
Shortcuts=1
Superuser=postgres
Version=9.1.2.1

You thoughts are most welcome.

 - Mark


> 
> Many thanks,
> 
>  - Mark
> 
> On Jan 30, 2012, at 11:34 PM, Dharmendra Goyal wrote:
> 
>> Hi Mark,
>> 
>> Install log shows that your db installation is successful. The error which 
>> you had sent is coming because your earlier installation failed to create 
>> 'postgres' user and when you ran the installer again, installer read 
>> /etc/postgres-reg.ini file to check any previous installation and found that 
>> previous installation was done. Hence installer tries to stop the db server 
>> in case it is running using postgres user. This command fails as postgres 
>> user was not created in your first installation because of some reason. But 
>> in latest installation (for which you have sent the log), postgres user has 
>> been created successfully and installation went successfully. You need to 
>> check why postgres user was not created in first go by checking logs of that 
>> installation.
>> 
>> You are checking return status of the installbuilder.sh command run which is 
>> not 0 hence you are getting error ("install of DBMS failed".) in your 
>> script. Can yu please check what is the return value in your script of 
>> installbuilder.sh.
>> 
>> Regards,
>> 
>> On Tue, Jan 31, 2012 at 5:37 AM, Mark Phillips  
>> wrote:
>> 
>> On Jan 29, 2012, at 11:07 PM, Dharmendra Goyal wrote:
>> 
>>> On Sat, Jan 21, 2012 at 3:01 AM,  wrote:
>>> The following bug has been logged on the website:
>>> 
>>> Bug reference:  6404
>>> Logged by:  Mark Phillips
>>> Email address:  mark.phill...@mophilly.com
>>> PostgreSQL version: 9.1.2
>>> Operating system:   Mac OS X 10.7
>>> Description:
>>> 
>>> for a stand alone app that uses postgres, the app installer invokes the
>>> shell script installbuilder.sh for the "unattended" install of postgres.
>>> 
>>> The pg installer creates a user "PostgreSQL" that is not hidden, but does
>>> not create a hidden account "postgres". As a result, the invocation of
>>> pg_ctl fails. Error is "unknown login: postgres".
>>> pg installer creates "postgres" user which somehow seems to be failing at 
>>> your machine. Can you please send installation log which can be found under 
>>> /tmp/install-postgresql.log. Also if you can run 
>>> installation_path/installer/server/createuser.sh script manually and check 
>>> the output, it can be helpful.
>>> 
>>> Thank

Re: [BUGS] BUG #6404: postgres account not created during unattended install

2012-02-03 Thread Mark Phillips

Thank you for the reply. I have spent the better part of this week on this and 
related issues. 

My test procedure is as follows: create VM using Fusion 4.x with a fresh 
install of OS X 10.7. The VM is hosted on a Mac also running 10.7. I create a 
snapshot of this clean image. The first step of each test run is to restore the 
snapshot image. 

With this in mind, the comment about the previous installation does not seem to 
apply to my report. 

After considering your remarks and modifying the app install script, the app 
installation completes normally with a functioning PostgreSQL installation. 
However, the errors reported previous still appear in the logs. 

The unattended mode results in the appearance of a user "PostgreSQL" in the 
User Accounts preferences panel as well as the login window. It is interesting 
to note that a user named "postgres" in any variant does not appear in /Users/ 
directory, nor in the file /etc/passwd/, with the exception of the os x 
standard postgres account.
$ cat /etc/passwd | grep "Postgre"
_postgres:*:216:216:PostgreSQL Server:/var/empty:/usr/bin/false

The "one click" installer does not result in the appearance of the "PostgreSQL" 
user account in the os x GUI.

To recap the questions:
1. are the errors reported in the postgres install log of a type to cause a 
malfunction for end users?
2. how can I suppress the appearance of the user account "PostgreSQL" in the os 
x login window?

Many thanks,

 - Mark

On Jan 30, 2012, at 11:34 PM, Dharmendra Goyal wrote:

> Hi Mark,
> 
> Install log shows that your db installation is successful. The error which 
> you had sent is coming because your earlier installation failed to create 
> 'postgres' user and when you ran the installer again, installer read 
> /etc/postgres-reg.ini file to check any previous installation and found that 
> previous installation was done. Hence installer tries to stop the db server 
> in case it is running using postgres user. This command fails as postgres 
> user was not created in your first installation because of some reason. But 
> in latest installation (for which you have sent the log), postgres user has 
> been created successfully and installation went successfully. You need to 
> check why postgres user was not created in first go by checking logs of that 
> installation.
> 
> You are checking return status of the installbuilder.sh command run which is 
> not 0 hence you are getting error ("install of DBMS failed".) in your script. 
> Can yu please check what is the return value in your script of 
> installbuilder.sh.
> 
> Regards,
> 
> On Tue, Jan 31, 2012 at 5:37 AM, Mark Phillips  
> wrote:
> 
> On Jan 29, 2012, at 11:07 PM, Dharmendra Goyal wrote:
> 
>> On Sat, Jan 21, 2012 at 3:01 AM,  wrote:
>> The following bug has been logged on the website:
>> 
>> Bug reference:  6404
>> Logged by:  Mark Phillips
>> Email address:  mark.phill...@mophilly.com
>> PostgreSQL version: 9.1.2
>> Operating system:   Mac OS X 10.7
>> Description:
>> 
>> for a stand alone app that uses postgres, the app installer invokes the
>> shell script installbuilder.sh for the "unattended" install of postgres.
>> 
>> The pg installer creates a user "PostgreSQL" that is not hidden, but does
>> not create a hidden account "postgres". As a result, the invocation of
>> pg_ctl fails. Error is "unknown login: postgres".
>> pg installer creates "postgres" user which somehow seems to be failing at 
>> your machine. Can you please send installation log which can be found under 
>> /tmp/install-postgresql.log. Also if you can run 
>> installation_path/installer/server/createuser.sh script manually and check 
>> the output, it can be helpful.
>> 
>> Thanks,
>> -- 
>> Dharmendra Goyal
> 
> Thank you for the reply. I apologize for the delay in responding.
> 
> File install-postgresql.log attached.
> 
> Please review the link below to a post in the postgresql forum for a bit more 
> detail about this issue:
> 
>   http://forums.enterprisedb.com/posts/list/3042.page
> 
> I looked into running the createuser.sh script. However, I encountered a bit 
> of new information before I could get that far so I include it here in case 
> it is relevant. 
> 
> Beginning anew with a clean os install, I ran the app installer that invokes 
> the postgresql installer.
> 
> Here is a snippet of the installer script that relates to installing postgres:
> 
>   # adjust current settings to allow the install to execute properly
>   log "adjust current shared memory settings"
>   log &qu

[BUGS] BUG #6405: postgres account not created during unattended install

2012-01-20 Thread mark . phillips

The following bug has been logged on the website:

Bug reference:  6405
Logged by:  Mark Phillips
Email address:  mark.phill...@mophilly.com
PostgreSQL version: 9.1.2
Operating system:   Mac OS X 10.7
Description:

for a stand alone app that uses postgres, the app installer invokes the
shell script installbuilder.sh for the "unattended" install of postgres.

The pg installer creates a user "PostgreSQL" that is not hidden, but does
not create a hidden account "postgres". As a result, the invocation of
pg_ctl fails. Error is "unknown login: postgres". 




-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #6404: postgres account not created during unattended install

2012-01-20 Thread mark . phillips

The following bug has been logged on the website:

Bug reference:  6404
Logged by:  Mark Phillips
Email address:  mark.phill...@mophilly.com
PostgreSQL version: 9.1.2
Operating system:   Mac OS X 10.7
Description:

for a stand alone app that uses postgres, the app installer invokes the
shell script installbuilder.sh for the "unattended" install of postgres.

The pg installer creates a user "PostgreSQL" that is not hidden, but does
not create a hidden account "postgres". As a result, the invocation of
pg_ctl fails. Error is "unknown login: postgres". 




-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Add statistics_collector_listen_addresses to fix hard-coding of "localhost"

2011-11-11 Thread Mark Kirkwood


On 11/11/11 20:51, Torsten Zuehlsdorff wrote:



So either your statement is wrong or the manual. ;) If there is the 
possibility to lose data just because of this kind of 
missconfiguration, we should accept Roberts patch.


Greetings,
Torsten


Robert's patch is based on a complete mis-assessment risk and 
misunderstanding of sane network configuration. No-one can sensibly 
support a position to redefine 'localhost' to mean what it should never 
mean.


regards

Mark

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #6275: Horrible performance regression

2011-10-31 Thread Mark Kirkwood


On 28/10/11 15:42, Finlay Thompson wrote:

After upgrading the postgresql*-8.4 packages on ubuntu, to version 8.4.9,
the script suddenly stopped working, and consuming all the ram (16GB) on the
computer (i7).




If there is one query exhausting all ram, then it could be tricky to 
catch it in the logs (log duration and friends log the query time on 
completion... which might be never). If you have vm.overcommit_memory 
set to 2 then you may get the offending statement in the log when the 
postgres backend errors out with ENOMEM.


If you know the approx time at which memory starts filling up then 
getting (most of) the query text from pg_stat_activity would enable you 
to track down the offending sql in your load script.


regards

Mark

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] "no relation entry for relid 1"

2011-09-29 Thread Mark Kirkwood


On 30/09/11 10:08, Dickson S. Guedes wrote:

2011/9/29 Peter Geoghegan:

On 29 September 2011 21:59, Merlin Moncure  wrote:

hm -- works for me (9.1.0)

It works for me on REL9_1_STABLE too, unsurprisingly, as I would think
it highly unlikely that such a glaring bug would slip into a stable
release.

Have you did a make distclean before build, then init a new $PGDATA?



I can confirm this - checkout from 2 days ago, maintainer-clean + build 
and initdb.


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #6191: One click installer fails

2011-09-05 Thread Mark Lamberton

Thanks - worked a treat.  Might be useful to have the installer check 
for the existence of the VC++ runtime - just a suggestion...


Mark Lamberton

On Thu, Sep 1, 2011 at 2:47 AM, Mark Lamberton
  wrote:

The following bug has been logged online:

Bug reference:  6191
Logged by:  Mark Lamberton
Email address:  m...@penguinsystems.com.au
PostgreSQL version: 9.0.4-1
Operating system:   Windows 7
Description:One click installer fails
Details:

Problem signature:
  Problem Event Name:   BEX
  Application Name: install.exe_Microsoft® Visual Studio® 2008
  Application Version:  9.0.30729.4148

Unfortunately that's the Microsoft VC++ 2008 runtime installer thats
failing, not ours (we just embed theirs). If you already have them
installed (which is quite often the case), or install them manually
using the installer Microsoft offer on their download site - which is
a different one from the one we're allowed to embed - you could then
try running our installer from the command line using the option:
"--install_runtimes 0"




<>
-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #6191: One click installer fails

2011-08-31 Thread Mark Lamberton


The following bug has been logged online:

Bug reference:  6191
Logged by:  Mark Lamberton
Email address:  m...@penguinsystems.com.au
PostgreSQL version: 9.0.4-1
Operating system:   Windows 7
Description:One click installer fails
Details: 

Problem signature:
  Problem Event Name:   BEX
  Application Name: install.exe_Microsoft® Visual Studio® 2008
  Application Version:  9.0.30729.4148
  Application Timestamp:4a597677
  Fault Module Name:install.exe
  Fault Module Version: 9.0.30729.4148
  Fault Module Timestamp:   4a597677
  Exception Offset: 0003f04b
  Exception Code:   c417
  Exception Data:   
  OS Version:   6.1.7601.2.1.0.256.48
  Locale ID:3081
  Additional Information 1: 7e02
  Additional Information 2: 7e02000cc9a98b1fce2b3132e623d86d
  Additional Information 3: 7a95
  Additional Information 4: 7a95b5323f1cf363784ef70b5804c9c1

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] pgbench -T isn't a hard cutoff.

2011-08-26 Thread mark

Using the -T flag in pgbench I noticed that -T is a just a effort rather
than a hard cut off. 

With a (misbehaving) pooler and a "large" number of clients+jobs it's
possible to have the pgbench run extend by several seconds or even minutes
past the allotted time by -T. (or hang indefinitely if the transactions
never complete due to pooler issues.) 

Expected behavior would be -T would mean a hard cut off. 


Thoughts ?

-Mark


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #6029: packaged installer fails to configure ldap

2011-05-17 Thread mark


The following bug has been logged online:

Bug reference:  6029
Logged by:  mark
Email address:  m...@remidata.com
PostgreSQL version: 9.0.2
Operating system:   RHEL 6.0 x86_64
Description:packaged installer fails to configure ldap
Details: 

The source code installation can be configured to work with LDAP, but the
one-click installer cannot by default.

The one-click installer fails to detect openldap and does not attempt to
configure or link with openldap.  We first noticed pg_hba.conf disallowed
specifying ldap authentication.  We verified openldap and openldap-devel are
installed and up-to-date.  We then verified using "pg_configure --configure"
and "ldd postgres" that the PGHOME was not built with ldap support.  For
some reason the one-click installer did not detect openldap was installed
and so did not include it in the configure options.

I have tested this on RHEL 5.5 and RHEL 6.0, both x86_64.  On all servers we
installed openldap and openldap-devel for x86_64 prior to installing
postgres.  On the RHEL 6.0 server we additionally have ldap installed via
Centify.  We can use ldapsearch to connect to the ldap server and query it,
so we know our RHEL servers are functional in that respect.

To make the one-click installer make use of the openldap software we had to
install the i686 edition of it (yum install openldap.i686).  This "trick" is
not required when installing via source code.

We wiped and rebuilt the servers multiple times to verify this behavior.

To install postgresql 9.0 from source we first installed openldap and
openldap-devel (both were x86_64 only - we made sure the i686 was not
installed), and then we ran "./configure --with-ldap" followed by make &&
make install.  The postgres executable appears to support ldap (verified
using "pg_configure --configure" which shows the --with-ldap option, and
"ldd postgres" which shows /usr/lib64/libldap-2.3.so.0).


Server Specs:
--
1. The RHEL 5.5 server is a vm with 4 cores and 8 GB RAM running kernel
2.6.18-194.el5 #1 SMP x86_64 GNU/Linux.

2. The RHEL 6.0 server is a HP DL580G7 with 32 cores and 256 GB RAM running
kernel 2.6.32-71.el6.x86_64 #1 SMP GNU/Linux.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #5998: CLUSTER and "ERROR: missing chunk number 0 for toast value"

2011-04-29 Thread Mark Reid


The following bug has been logged online:

Bug reference:  5998
Logged by:  Mark Reid
Email address:  m...@markreid.org
PostgreSQL version: 8.3.5
Operating system:   Debian Etch
Description:CLUSTER and "ERROR:  missing chunk number 0 for toast
value"
Details: 

Hello,

This morning I noticed that a normally small table (18 wide rows) was
bloated to 6GB in size.  This has happened before using older postgres
versions in the past, where the main table got vacuumed, but the pg_toastXX
table did not.  This is the first time I've seen a problem for this version
of postgres (8.3.5).  So I decided to manually vacuum the table to find out
what was going on.  I saw a lot of "dead row versions cannot be removed
yet." (on the order of 60k in the main table, and 3.8M in the toast table). 
There was a day-old "idle in transaction" query for that same user, so I
killed that and tried again.  Same deal.  So I tried a "CLUSTER
properties_pkey ON properties;" which gave the following error message:
ERROR:  missing chunk number 0 for toast value 396950697 in
pg_toast_373928870

I jumped on IRC and got some excellent help from andres, who suggested
running the following query with index scans both enabled and disabled:
SELECT chunk_id, chunk_seq, length(chunk_data) FROM
pg_toast.pg_toast_373928870 WHERE chunk_id = 396950697;

Both queries returned zero rows.

I checked for other long-running transactions, and found one that had been
running since April 15th (it's now April 29th), but for a different database
user, which would definitely *not* have been accessing this table in any
way.  andres said that might cause such behaviour, so I killed the offending
process and re-tried the vacuum.  This time it was able to remove the dead
rows as expected.  After the vacuum, I also ran the CLUSTER command again,
which completed successfully and reduced the table size to 576kB.

Andres concluded "The bug is that CLUSTER seems to use the wrong xid horizon
when determining visibility, Or autovacuum. Not sure"

I have the output of the following queries, but would prefer to send it
off-list for confidentiality:
SELECT datname, datfrozenxid, age(datfrozenxid) FROM pg_database;
SELECT relname, relfrozenxid, age(relfrozenxid) FROM pg_class;
SELECT * FROM pg_stat_activity;

These queries were run shortly after the long-running transaction was
killed.

Thanks!

Mark Reid

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Massive memory use for star query

2011-04-16 Thread Mark Kirkwood


On 17/04/11 02:58, Tom Lane wrote:

Greg Stark  writes:

The planner uses various heuristics to avoid combinatoric growth
wherever it can but there's no way to completely avoid it.

Yeah.  The collapse_limit variables can be seen as another heuristic to
deal with this type of problem: they artificially limit the number of
combinations considered by forcing the join search to be broken down
into subproblems.  The trouble of course is that this breakdown is
pretty stupid and can easily prevent the best join order from ever being
considered.

If you've got a small number of such query types that you can afford to
spend some manual effort on, here's what I'd do:

1. With those three planner variables cranked up to more than the number
of relations in the query (if possible), run an EXPLAIN, or better
EXPLAIN ANALYZE so you can confirm you get a good plan.

2. Observe the join order selected in the good plan.

3. Rearrange your query so that the tables are explicitly JOINed in that
order.  Don't use the FROM-comma-list style.

4. Now, in your production app, *reduce* join_collapse_limit to a small
value, maybe even 1, to force the syntactic JOIN order to be followed.
(Obviously, don't keep it there when running queries you haven't
hand-optimized this way.)

This will force the planner to consider only small subproblems, which
will make it both much faster and much less memory-hungry than when it's
trying to solve a large join problem from scratch.



We've sort of done an equivalent thing as a temporary fix - restricted 
the page generating these queries to one or two keywords to tame the 
number of tables joined in.


We are only seeing this type of query being generated in a very specific 
part of the application (keyword search), and I've been encouraging a 
redesign in that area anyway as I don't believe it is necessary to 
require so many joins to achieve what they wish to do - so this is 
really the clincher for a redesign.


I will get 'em to reduce the *collapse limits too.

Thanks to all of you for your help, regards

Mark

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Massive memory use for star query

2011-04-16 Thread Mark Kirkwood


On 15/04/11 16:35, Mark Kirkwood wrote:
 Here's a simplified example using synthetic data (see attached to 
generate if desired):




For anyone else who might be want to play with this:

Patch with correction to make the directory reassignment work correctly, 
plus an additional comment in the README mentioning the need to set this 
in the generator and loading scripts.


Thanks

Mark
diff -c starjoin.orig/gendata.pl starjoin/gendata.pl
*** starjoin.orig/gendata.pl	2011-04-17 10:55:28.252595067 +1200
--- starjoin/gendata.pl	2011-04-17 10:55:48.684595066 +1200
***
*** 14,20 
  my %prevcatids;
  print "generate cat\n";
  
! open(FH, "> /data0/dump/cat.dat") || die ("cannot open cat.dat: $!");
  for ($i = 0; $i < $catrows; $i++) {
  	$catid = $i;
  
--- 14,20 
  my %prevcatids;
  print "generate cat\n";
  
! open(FH, "> $dir/cat.dat") || die ("cannot open cat.dat: $!");
  for ($i = 0; $i < $catrows; $i++) {
  	$catid = $i;
  
diff -c starjoin.orig/README starjoin/README
*** starjoin.orig/README	2011-04-17 10:55:28.252595067 +1200
--- starjoin/README	2011-04-17 10:56:29.292595065 +1200
***
*** 1,5 
--- 1,6 
  Setup for (hopefully) excessive join memory use.
  
+ $ vi gendata.pl load.sql# set dump/load directory
  $ ./gendata.pl
  $ psql test < schema.sql
  $ psql test < load.sql

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Massive memory use for star query

2011-04-16 Thread Mark Kirkwood


On 16/04/11 04:43, Tom Lane wrote:

Mark Kirkwood  writes:

I've recently seen examples of star-like queries using vast amounts of
memory in one of our production systems. Here's a simplified example
using synthetic data (see attached to generate if desired):
SET geqo_threshold = 14;
SET from_collapse_limit = 14;
SET join_collapse_limit = 14;

Well, if you're going to do the above, you should be expecting the
planner to eat a lot of memory.  There is a reason why the default
values of those parameters are significantly lower than that ...



Ok - so with the settings at their defaults geqo chooses a semi-random 
plan, and at least one of those (for the production variant of this 
query anyway) eat massive (>100G) amounts of temp space - not really a 
suitable outcome either.


I guess you have answered my first question - i.e yes this should eat 
massive amount of ram as written - however are you sure there is no 
memory leaking going on here?


regards

Mark

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Massive memory use for star query

2011-04-16 Thread Mark Kirkwood


On 16/04/11 01:59, Kevin Grittner wrote:

Mark Kirkwood  wrote:


Here's a simplified example using synthetic data (see attached to
generate if desired):


Doesn't work for me:

kgrittn@kgrittn-desktop:~/work/starjoin$ ./gendata.pl
generate cat
cannot open cat.dat: No such file or directory at ./gendata.pl line
17.


Apologies Kevin, I stuffed up the edit to supposedly make it easier for 
you all to choose your own place to write the files (left one hard coded 
for the table 'cat').


Cheers

Mark

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] Massive memory use for star query

2011-04-14 Thread Mark Kirkwood

I've recently seen examples of star-like queries using vast amounts of 
memory in one of our production systems. Here's a simplified example 
using synthetic data (see attached to generate if desired):


SET geqo_threshold = 14;
SET from_collapse_limit = 14;
SET join_collapse_limit = 14;

EXPLAIN
SELECT
1
FROM node n
JOIN nodekeyword kwn0 ON (n.nodeid = kwn0.nodeid)
JOIN keyword kw0 ON (kwn0.keywordid = kw0.keywordid)
JOIN nodekeyword kwn1 ON (n.nodeid = kwn1.nodeid)
JOIN keyword kw1 ON (kwn1.keywordid = kw1.keywordid)
JOIN nodekeyword kwn2 ON (n.nodeid = kwn2.nodeid)
JOIN keyword kw2 ON (kwn2.keywordid = kw2.keywordid)
JOIN nodekeyword kwn3 ON (n.nodeid = kwn3.nodeid)
JOIN keyword kw3 ON (kwn3.keywordid = kw3.keywordid)
JOIN nodekeyword kwn4 ON (n.nodeid = kwn4.nodeid)
JOIN keyword kw4 ON (kwn4.keywordid = kw4.keywordid)
JOIN nodekeyword kwn5 ON (n.nodeid = kwn5.nodeid)
JOIN keyword kw5 ON (kwn5.keywordid = kw5.keywordid)
WHERE   kw0.keyword = 'sscghryv'
AND   kw1.keyword = 'sscghryv'
AND   kw2.keyword = 'sscghryv'
AND   kw3.keyword = 'sscghryv'
AND   kw4.keyword = 'sscghryv'
AND   kw5.keyword = 'sscghryv'
;

Here's what a ps listing looks like:

VSZ RSS SZCMD
1849524 1793680 1791144 postgres: postgres test [local] EXPLAIN

So we are using 1.7G doing an *EXPLAIN* - so presumably this is gonna be 
the join search planning getting expensive for 13 tables. Is it expected 
that this much memory could/would be used? Could this be evidence of a 
leak?


Note this is a default 9.1 (2011-04-07) build w/o asserts, with a 
default postgresql.conf.


Clearly this particular query is a bit dumb, making the keyword 
predicates have different values results in much better behaved planning 
memory usage... and also allowing geqo to do the join search for us 
prevents the high memory use (however geqo has its own problems in 
the production variant of this query *one* of the plans it would pick 
liked to use >100G of temp space to execute...and there are only 100G 
available...sigh). However for these semi ad-hoc systems it is hard to 
prevent dumb queries altogether!


regards

Mark




starjoin.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Hung Vacuum in 8.3

2011-03-07 Thread Mark Kirkwood


On 08/03/11 13:03, Mark Kirkwood wrote:

On 08/03/11 12:55, Mark Kirkwood wrote:

On 23/02/11 10:18, Mark Kirkwood wrote:

On 23/02/11 00:26, Greg Stark wrote:


It's also possible there's a bug of course. If someone was using that
buffer and somehow failed to notify the vacuum that they were done it
would wait for a very long time (forever?). However if vacuum
eventually continued when the query was canceled then it seems likely
it was working as intended.



Greg, thanks for clarifying this.

Unfortunately this time around I canceled the vacuum and then the 
query. However *next* time I'll get rid of the query 1st and see 
what happens.




I have another one here:

postgres=# select datname,procpid, query_start, current_query
  from pg_stat_activity
  where now() - query_start > '2 hours'::interval 
order by query_start;

 datname | procpid |  query_start  |  current_query
-+-+---+-
 stuff   |   23957 | 2011-03-08 06:02:02.659159+13 | VACUUM ANALYZE;

postgres=# select datname,procpid, query_start, current_query
  from pg_stat_activity
  where now() - backend_start > '2 hours'::interval 
order by query_start;

 datname | procpid |  query_start  |  current_query
-+-+---+-
 stuff   |   23957 | 2011-03-08 06:02:02.659159+13 | VACUUM ANALYZE;
   :
 stuff   |   30018 | 2011-03-08 11:57:03.214177+13 | 
 stuff   |   11390 | 2011-03-08 12:27:20.390122+13 | 
 stuff   |   10178 | 2011-03-08 12:28:59.328265+13 | 
 stuff   |5100 | 2011-03-08 12:30:10.318204+13 | 
 stuff   |   10255 | 2011-03-08 12:32:21.520814+13 | 
 stuff   |   29993 | 2011-03-08 12:32:26.562835+13 | 
 stuff   |   10204 | 2011-03-08 12:40:57.059958+13 | 
 stuff   |   10206 | 2011-03-08 12:41:11.708924+13 | 
(9 rows)


It looks to me like whatever query buffer pins the vacuum *was* 
waiting on should be well and truly over by now. The strace for the 
vacuum looks the same as before:


$ strace -p 23957
Process 23957 attached - interrupt to quit
semop(39747613, 0x7fff53d3dec0, 1


It certainly looks like the vacuum is not able to resume after 
setting itself to sleep for some reason.






Immediately after I sent this, I saw the vacuum unlock itself and 
continue... so there *are* some queries still blocking it. I'm seeing 
some highish (1 -2 minute) lock waits for a certain table:


postgres=# select datname,procpid, query_start, current_query
   from pg_stat_activity where now() - query_start > '1 minute'
   and procpid in (select distinct pid from pg_locks where not 
granted) order by query_start;

 datname | procpid |  query_start  |current_query
-+-+---+-- 

 stuff   | 390 | 2011-03-08 12:57:01.752234+13 | UPDATE 
content.twitter_queue

   : SET error = $1
   : WHERE 
twitter_queue_id = $2

(30 or so)

which are probably not helping.




Make that definitely not helping:

stuff=# select relname, last_vacuum
from pg_stat_all_tables
where last_vacuum is not null
and last_vacuum>  '2011-03-08 06:03:06'::timestamp
order by last_vacuum limit 3;
relname |  last_vacuum
+---
 headline_alert | 2011-03-08 06:03:06.182866+13
 stock_static   | 2011-03-08 06:03:06.610739+13
 twitter_queue  | 2011-03-08 12:58:03.02451+13
(3 rows)



So we were trying to vacuum 'twitter_queue' (a small 20MB table) for 
approx 6 hrs.


Note that the vacuum itself was not waiting in any locks according to 
pg_locks, so looks to be the pinning issue. Clearly we need to amend the 
application design that results in this quite long (1-2 minute)! wait 
for the lock on this table.


regards

Mark

P.s: for the interested, the process(es) that the above UPDATEs are 
waiting for are sitting ' in transaction' - i.e application end is 
doing some *non database* blocking call whilst holding a lock on the 
'twitter_queue' table :-/


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Hung Vacuum in 8.3

2011-03-07 Thread Mark Kirkwood


On 08/03/11 12:55, Mark Kirkwood wrote:

On 23/02/11 10:18, Mark Kirkwood wrote:

On 23/02/11 00:26, Greg Stark wrote:


It's also possible there's a bug of course. If someone was using that
buffer and somehow failed to notify the vacuum that they were done it
would wait for a very long time (forever?). However if vacuum
eventually continued when the query was canceled then it seems likely
it was working as intended.



Greg, thanks for clarifying this.

Unfortunately this time around I canceled the vacuum and then the 
query. However *next* time I'll get rid of the query 1st and see what 
happens.




I have another one here:

postgres=# select datname,procpid, query_start, current_query
  from pg_stat_activity
  where now() - query_start > '2 hours'::interval 
order by query_start;

 datname | procpid |  query_start  |  current_query
-+-+---+-
 stuff   |   23957 | 2011-03-08 06:02:02.659159+13 | VACUUM ANALYZE;

postgres=# select datname,procpid, query_start, current_query
  from pg_stat_activity
  where now() - backend_start > '2 hours'::interval 
order by query_start;

 datname | procpid |  query_start  |  current_query
-+-+---+-
 stuff   |   23957 | 2011-03-08 06:02:02.659159+13 | VACUUM ANALYZE;
   :
 stuff   |   30018 | 2011-03-08 11:57:03.214177+13 | 
 stuff   |   11390 | 2011-03-08 12:27:20.390122+13 | 
 stuff   |   10178 | 2011-03-08 12:28:59.328265+13 | 
 stuff   |5100 | 2011-03-08 12:30:10.318204+13 | 
 stuff   |   10255 | 2011-03-08 12:32:21.520814+13 | 
 stuff   |   29993 | 2011-03-08 12:32:26.562835+13 | 
 stuff   |   10204 | 2011-03-08 12:40:57.059958+13 | 
 stuff   |   10206 | 2011-03-08 12:41:11.708924+13 | 
(9 rows)


It looks to me like whatever query buffer pins the vacuum *was* 
waiting on should be well and truly over by now. The strace for the 
vacuum looks the same as before:


$ strace -p 23957
Process 23957 attached - interrupt to quit
semop(39747613, 0x7fff53d3dec0, 1


It certainly looks like the vacuum is not able to resume after setting 
itself to sleep for some reason.






Immediately after I sent this, I saw the vacuum unlock itself and 
continue... so there *are* some queries still blocking it. I'm seeing 
some highish (1 -2 minute) lock waits for a certain table:


postgres=# select datname,procpid, query_start, current_query
   from pg_stat_activity where now() - query_start > '1 minute'
   and procpid in (select distinct pid from pg_locks where not 
granted) order by query_start;

 datname | procpid |  query_start  |current_query
-+-+---+--
 stuff   | 390 | 2011-03-08 12:57:01.752234+13 | UPDATE 
content.twitter_queue

   : SET error = $1
   : WHERE 
twitter_queue_id = $2

(30 or so)

which are probably not helping.

So false alarm, no bug here it would seem, sorry.

Mark







--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Hung Vacuum in 8.3

2011-03-07 Thread Mark Kirkwood


On 23/02/11 10:18, Mark Kirkwood wrote:

On 23/02/11 00:26, Greg Stark wrote:


It's also possible there's a bug of course. If someone was using that
buffer and somehow failed to notify the vacuum that they were done it
would wait for a very long time (forever?). However if vacuum
eventually continued when the query was canceled then it seems likely
it was working as intended.



Greg, thanks for clarifying this.

Unfortunately this time around I canceled the vacuum and then the 
query. However *next* time I'll get rid of the query 1st and see what 
happens.




I have another one here:

postgres=# select datname,procpid, query_start, current_query
  from pg_stat_activity
  where now() - query_start > '2 hours'::interval order 
by query_start;

 datname | procpid |  query_start  |  current_query
-+-+---+-
 stuff   |   23957 | 2011-03-08 06:02:02.659159+13 | VACUUM ANALYZE;

postgres=# select datname,procpid, query_start, current_query
  from pg_stat_activity
  where now() - backend_start > '2 hours'::interval 
order by query_start;

 datname | procpid |  query_start  |  current_query
-+-+---+-
 stuff   |   23957 | 2011-03-08 06:02:02.659159+13 | VACUUM ANALYZE;
   :
 stuff   |   30018 | 2011-03-08 11:57:03.214177+13 | 
 stuff   |   11390 | 2011-03-08 12:27:20.390122+13 | 
 stuff   |   10178 | 2011-03-08 12:28:59.328265+13 | 
 stuff   |5100 | 2011-03-08 12:30:10.318204+13 | 
 stuff   |   10255 | 2011-03-08 12:32:21.520814+13 | 
 stuff   |   29993 | 2011-03-08 12:32:26.562835+13 | 
 stuff   |   10204 | 2011-03-08 12:40:57.059958+13 | 
 stuff   |   10206 | 2011-03-08 12:41:11.708924+13 | 
(9 rows)


It looks to me like whatever query buffer pins the vacuum *was* waiting 
on should be well and truly over by now. The strace for the vacuum looks 
the same as before:


$ strace -p 23957
Process 23957 attached - interrupt to quit
semop(39747613, 0x7fff53d3dec0, 1


It certainly looks like the vacuum is not able to resume after setting 
itself to sleep for some reason.




--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5851: ROHS (read only hot standby) needs to be restarted manually in somecases.

2011-03-03 Thread mark



> -Original Message-
> From: Robert Haas [mailto:robertmh...@gmail.com]
> Sent: Thursday, March 03, 2011 9:04 AM
> To: mark
> Cc: Fujii Masao; pgsql-bugs@postgresql.org
> Subject: Re: [BUGS] BUG #5851: ROHS (read only hot standby) needs to be
> restarted manually in somecases.
> 
> On Tue, Feb 8, 2011 at 7:23 PM, mark  wrote:
> > (~two weeks and it dies)
> > keepalives_idle=30
> > keepalives_interval=30
> > keepalives_count=30
> 
> Maybe something like this:
> 
> keepalives_idle=60
> keepalives_interval=5
> keepalives_count=10
> 
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


Thank you !


Things have been more stable with regards to our WAN links. I am sure that
won't last for very long and expect to find out sooner than later if these
changes mitigate the issue I had been seeing. 


Again, thank you

~Mark


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Hung Vacuum in 8.3

2011-02-22 Thread Mark Kirkwood


On 23/02/11 03:27, Robert Haas wrote:

On Tue, Feb 22, 2011 at 6:26 AM, Greg Stark  wrote:

Actually it's not waiting for the LockBuffer LWLock. it's waiting
until your query unpins the buffer it wants. Vacuum tries to get an
exclusive lock on the buffer, if it gets it then it checks if anyone
is using that buffer. If someone is then it unlocks the buffer and
waits until nobody has it pinned.

How bad it would be if we made LockBufferForCleanup() not wait?  If we
can't obtain the buffer cleanup lock immediately, we just skip that
page and continue on.  That would prevent us from updating
relfrozenxid, I guess, but we already can't do that if there are any
bits set in the visibility map.   It could also leave some bloat in
the table, but probably not much (he says hopefully).



Seems like a good suggestion, and may leave less bloat than having the 
vacuum hung for potentially quite some time.


Mark

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Hung Vacuum in 8.3

2011-02-22 Thread Mark Kirkwood


On 23/02/11 00:26, Greg Stark wrote:


It's also possible there's a bug of course. If someone was using that
buffer and somehow failed to notify the vacuum that they were done it
would wait for a very long time (forever?). However if vacuum
eventually continued when the query was canceled then it seems likely
it was working as intended.



Greg, thanks for clarifying this.

Unfortunately this time around I canceled the vacuum and then the query. 
However *next* time I'll get rid of the query 1st and see what happens.


Cheers

Mark


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Hung Vacuum in 8.3

2011-02-21 Thread Mark Kirkwood


On 22/02/11 19:47, Heikki Linnakangas wrote:


A long query on the same table can block vacuum. Vacuum needs to take 
a so-called "cleanup lock" on each page, which means that it has to 
wait until no other backend holds a pin on the page. A long-running 
query can keep a page pinned for a long time.




Ah, - this is LockBufferForCleanup? So we are waiting for an exclusive 
LockBuffer operation i.e a LWLock not a (heavyweight) lock, urg... no 
wonder.


To mitigate this I'm encouraging the use of a cleanup script for long 
running queries - since the app concerned is a web site, there is no 
call for queries that run this long (i.e way longer than the timeout for 
the respective page rendering).


Thanks for the clarification (assuming I've understood correctly of 
course...).


Cheers

Mark

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] Hung Vacuum in 8.3

2011-02-21 Thread Mark Kirkwood


This is 8.3.14 on Debian Lenny x86-64.

I'm seeing a hung vacuum:

postgres=# select procpid, query_start,waiting, current_query from 
pg_stat_activity where current_query like '%VACUUM%';
procpid |  query_start  | waiting 
|
current_query
+---+-+-

7347| 2011-02-22 06:02:02.400247+13 | f   | VACUUM ANALYZE;

at approx 1300 (so it has been sitting there for approx 7 hours, normal 
database vacuum time is 10 minutes). Now according to pg_stat_activity 
and pg_locks it is *not* waiting for a lock, but no vacuuming appears to 
be going on. strace says:


$ strace -p 7347

Process 7347 attached - interrupt to quit
semop(33456157, 0x7512bad0, 1

Ok, so we are waiting on a semaphore - hmm, why is it not showing up as 
waiting on a lock of some kind?


Of interest is this:

postgres=# select procpid, query_start, current_query from 
pg_stat_activity order by query_start limit 1;
 procpid |  query_start  
|   current_query

-+---+-
   25953 | 2011-02-22 04:24:07.417138+13 | SELECT n.node_id, n.node_ -- 
long query, joining several large tables - text snipped


So this guy had been running from before the vacuum started, so probably 
vacuum is wanting to do lazy_truncate_heap() on one of the tables in the 
join (no gdb on this box unfortunately). I am however still puzzled 
about why no locks are being waited on.


I have canceled the vacuum, but any suggestions for getting more diag 
info for next time?


regards

Mark


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5862: Postgres dumps core upon a connection attempt

2011-02-08 Thread Mark Kirkwood


On 04/02/11 15:11, Craig Ringer wrote:

On 02/03/2011 11:15 PM, Matt Zinicola wrote:


I re-compiled with '--enable-debug' and got the symbols.  The 
pastebin is at

http://pastebin.com/xMhEHFdT


That's really interesting. It's getting a NULL path pointer when - I 
think - it tries to determine the location of the executables.


Presumably this is something bizarre in your environment - but I have 
no idea what it might be. Maybe someone else reading will have an idea.




(Coming in too late, but...)

I'd be interested to see what happens if you do:

$ export PATH=/usr/local/pgsql/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/pgsql/lib
$ initdb -D /data/postgres
$ pg_ctl -D /data/postgres start;
$ psql

I'm guessing that there are older libraries or binaries earlier in your 
various env paths, and these are tripping up postgres.


Cheers

Mark



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5851: ROHS (read only hot standby) needs to be restarted manually in somecases.

2011-02-08 Thread mark

> -Original Message-
> From: Fujii Masao [mailto:masao.fu...@gmail.com]
> Sent: Tuesday, February 08, 2011 4:00 PM
> To: mark
> Cc: Robert Haas; pgsql-bugs@postgresql.org
> Subject: Re: [BUGS] BUG #5851: ROHS (read only hot standby) needs to be
> restarted manually in somecases.
> 
> On Wed, Feb 9, 2011 at 6:36 AM, mark  wrote:
> > this is the recovery.conf file, see any problems with it? maybe I
> > didn't do some syntax right right ?
> >
> > [postgres@ data9.0]$ cat recovery.conf
> > standby_mode = 'on'
> > primary_conninfo = 'host= port=5432 user=postgres
> > keepalives_idle=30 keepalives_interval=30 keepalives_count=30'
> 
> This setting would lead TCP keepalive to take about 930 seconds
> (= 30 + 30 * 30) to detect the network outage. If you want to stop
> replication as soon as the outage happens, you need to decrease
> the keepalive setting values.

What numbers would you suggest? I have been guessing and probably doing a
very poor job of it. 

I am turning knobs and not getting any meaningful changes with respect to in
my problem. So either I am not turning them correctly, or they are not the
right knobs for my problem.

Trying to fix my own ignorance here. (should I move this off the bugs list,
since maybe it's not a bug?)

The settings have been unspecified in the recovery file,  it's been
specified in the recovery file, and I have tried the following in the
recovery file:

(~two weeks and it died)
keepalives_idle=0
keepalives_interval=0
keepalives_count=0

(~two weeks and it dies)
keepalives_idle=30
keepalives_interval=30
keepalives_count=30

(this didn't work either, don't recall how long this lasted, maybe a month)

keepalives_idle=2100
keepalives_interval=0
keepalives_count=0

Background is basically this: trying to do streaming replication over a WAN,
probably ship about 5GB of changes per day, hardware on both ends can easily
keep up with that. Running over a shared metro line and have about 3-5MBytes
per second depending on the time of day that I can count on. I have wal_keep
segments at 250 (I don't care about the disk overhead for this, since I
wanted to not have to use wal archiving). The link is being severed more
often than usually lately while some network changes are being made so while
I would expect that improve in the future this isn't exactly the most
reliable connection. so getting whatever as right as I can is of value to
me. 

Typically I see the streaming replication break down for good completely a
few hours after something that causes a interruption in networking. Nagios
notifications lag some but not hours and has to go through a few people
before I find out about it. When checking the nagios pages on their logs I
don't see pages about the distance between the master and the standby
getting bigger during this time, and then once I see the first unexpected
EOF then the distance between the master and standby gets further and
further until it gets fixed or we have to re-sync the whole base over. 

Again I can't seem to duplicate this problem on demand with virtual
machines, I startup a master and standby, setup streaming rep, kickoff a
multi hour or day pg bench run and start messing with networking. Every time
I try and duplicate this synthetically the standby picks right back where it
left off and catches back up.

I am at a loss so I do appreciate everyone's help. 

Thanks in advance

-Mark

> 
> Regards,
> 
> --
> Fujii Masao
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5851: ROHS (read only hot standby) needs to be restarted manually in somecases.

2011-02-08 Thread mark

On Sun, Jan 30, 2011 at 12:45 PM, mark  wrote:
>
>
>> -Original Message-
>> From: Robert Haas [mailto:robertmh...@gmail.com]
>> Sent: Sunday, January 30, 2011 12:19 PM
>> To: mark
>> Cc: pgsql-bugs@postgresql.org
>> Subject: Re: [BUGS] BUG #5851: ROHS (read only hot standby) needs to be
>> restarted manually in somecases.
>>
>> On Fri, Jan 28, 2011 at 1:03 PM, mark  wrote:
>> > When showing the setting on the slave or master all tcp_keepalive
>> settings
>> > (idle, interval and count) are showing 0;
>> >
>> > The config file shows interval and count commented out, but idle in
>> the
>> > config file is set to 2100.
>> >
>> > Possible that "show tcp_keepalive_idle;" isn't reporting accurately ?
>> (or a
>> > value that high isn't be accepted?)
>> >
>> > I have reloaded configs and still seeing 0's
>> >
>> >
>> >
>> > I assume you would suggest I turn that number down... a lot.
>>
>> Yeah, the defaults are way too long for our purposes.  The way to get
>> this set correctly, I think, is to set it in the primary_conninfo
>> stream on the slave.  You end up with something like this:
>>
>> primary_conninfo='host=blahblah user=bob keepalives_idle=XX
>> keepalives_interval=XX keepalives_count=XX'
>>
> Thanks I will try this on Monday and will report back if it fixes the
> problem. (however since I can't reproduce the issue on demand it might be a
> waiting game. Might not know for a month or so tho)
>
> -Mark
>
>
>> I'm of the opinion that we really need an application-level keepalive
>> here, but the above is certainly a lot better than nothing.

my streaming replication woes continue.

I made those changes  in the recovery.conf file but I am still having
streaming replication stay broken after any sort of network
interruption until someone manaully comes along and fixes things by
restarting the standby or if it's been too long resynchronizing the
base.

I think it's a network interruption that is triggering the break down,
but I don't have anything to prove it.

wal_keep_segments are set to 250, which was supposed to give us a few
hours to fix the issue but it seems we blew through that many last
night and such when someone got around to fixing it the standby was
too far behind.

my #1 problem with this right now is I can't seem to reproduce on
demand with virtual machines in our development area.

this is the recovery.conf file, see any problems with it? maybe I
didn't do some syntax right right ?

[postgres@ data9.0]$ cat recovery.conf
standby_mode = 'on'
primary_conninfo = 'host= port=5432 user=postgres
keepalives_idle=30 keepalives_interval=30 keepalives_count=30'

thanks
..: Mark

p.s. looking forward to 9.1 where a standby can be started with
streaming from scratch. that sounds nice.

>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>
>

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5851: ROHS (read only hot standby) needs to be restarted manually in somecases.

2011-01-30 Thread mark



> -Original Message-
> From: Robert Haas [mailto:robertmh...@gmail.com]
> Sent: Sunday, January 30, 2011 12:19 PM
> To: mark
> Cc: pgsql-bugs@postgresql.org
> Subject: Re: [BUGS] BUG #5851: ROHS (read only hot standby) needs to be
> restarted manually in somecases.
> 
> On Fri, Jan 28, 2011 at 1:03 PM, mark  wrote:
> > When showing the setting on the slave or master all tcp_keepalive
> settings
> > (idle, interval and count) are showing 0;
> >
> > The config file shows interval and count commented out, but idle in
> the
> > config file is set to 2100.
> >
> > Possible that "show tcp_keepalive_idle;" isn't reporting accurately ?
> (or a
> > value that high isn't be accepted?)
> >
> > I have reloaded configs and still seeing 0's
> >
> >
> >
> > I assume you would suggest I turn that number down... a lot.
> 
> Yeah, the defaults are way too long for our purposes.  The way to get
> this set correctly, I think, is to set it in the primary_conninfo
> stream on the slave.  You end up with something like this:
> 
> primary_conninfo='host=blahblah user=bob keepalives_idle=XX
> keepalives_interval=XX keepalives_count=XX'
> 
Thanks I will try this on Monday and will report back if it fixes the
problem. (however since I can't reproduce the issue on demand it might be a
waiting game. Might not know for a month or so tho)

-Mark


> I'm of the opinion that we really need an application-level keepalive
> here, but the above is certainly a lot better than nothing.
> 
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5851: ROHS (read only hot standby) needs to be restarted manually in somecases.

2011-01-28 Thread mark

When showing the setting on the slave or master all tcp_keepalive settings
(idle, interval and count) are showing 0;

The config file shows interval and count commented out, but idle in the
config file is set to 2100. 

Possible that "show tcp_keepalive_idle;" isn't reporting accurately ? (or a
value that high isn't be accepted?)

I have reloaded configs and still seeing 0's



I assume you would suggest I turn that number down... a lot. 



..: Mark




> -Original Message-
> From: Robert Haas [mailto:robertmh...@gmail.com]
> Sent: Friday, January 28, 2011 6:48 AM
> To: Mark
> Cc: pgsql-bugs@postgresql.org
> Subject: Re: [BUGS] BUG #5851: ROHS (read only hot standby) needs to be
> restarted manually in somecases.
> 
> On Wed, Jan 26, 2011 at 8:24 PM, Mark  wrote:
> > getting a break down in streaming rep. my current work around is to
> restart
> > the PG instance on the ROHS. doesn't seem to affect the master any.
> doesn't
> > require a re-rsync of the base to get replication going again. has
> happened
> > with 9.0.2 twice now in a month.
> >
> >
> >
> > 2011-01-26 08:35:42 MST :: (postgres@10.80.2.89) LOG:  could not
> receive
> > data
> > from client: Connection reset by peer
> > 2011-01-26 08:35:42 MST :: (postgres@10.80.2.89) LOG:  unexpected EOF
> on
> > standby connection
> >
> > this was all I have in the master's log with the level set to debug
> 1, I
> > have reset it to debug 5 and will just wait till it dies again and
> hopefully
> > get a better idea of what is going on. nothing is being logged to the
> > standby.
> 
> Maybe a break in network connectivity is leading the master to think
> that the slave is dead, while the slave still thinks it's connected.
> You might need to adjust the TCP keepalive parameters the slave uses
> to connect to the master.
> 
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #5851: ROHS (read only hot standby) needs to be restarted manually in somecases.

2011-01-26 Thread Mark


The following bug has been logged online:

Bug reference:  5851
Logged by:  Mark
Email address:  dvlh...@gmail.com
PostgreSQL version: 9.0.2 x86_64
Operating system:   CentOS release 5.5 (Final) |  2.6.18-194.17.1.el5 #1 SMP
X86_64
Description:ROHS (read only hot standby) needs to be restarted
manually in somecases.
Details: 

getting a break down in streaming rep. my current work around is to restart
the PG instance on the ROHS. doesn't seem to affect the master any. doesn't
require a re-rsync of the base to get replication going again. has happened
with 9.0.2 twice now in a month.  



2011-01-26 08:35:42 MST :: (postgres@10.80.2.89) LOG:  could not receive
data
from client: Connection reset by peer
2011-01-26 08:35:42 MST :: (postgres@10.80.2.89) LOG:  unexpected EOF on
standby connection

this was all I have in the master's log with the level set to debug 1, I
have reset it to debug 5 and will just wait till it dies again and hopefully
get a better idea of what is going on. nothing is being logged to the
standby. I can't find anything else to grab that shows this break down in
streaming rep that won't start back up. 

This is a somewhat *long* distance replication over a 100mbit metro line. we
have had routing issues in the past and see replication fall behind but once
connectivity is restored we see it catch up, without a restart of the
standby. 

probably only ships a few gig of changes a day. 

these are production machines so I can't do too much playing around to try
and induce "issues"


 PostgreSQL 9.0.2 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20080704 (Red Hat 4.1.2-48), 64-bit
(1 row)




is this a known issue ? I didn't see anything when I have the mailing list
archive a quick glance search that looked like this.

is there somewhere else I should be looking for more details into why this
is happening ?


I can post the configs if you all want them but nothing special is happening
w/ regards to them. 

thank you,

Mark

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #5734: autovacuum_enabled input should be validated, standardized.

2010-10-30 Thread Mark Stosberg


The following bug has been logged online:

Bug reference:  5734
Logged by:  Mark Stosberg
Email address:  m...@summersault.com
PostgreSQL version: 9.0.1
Operating system:   FreeBSD
Description:autovacuum_enabled input should be validated,
standardized.
Details: 

The "autovacuum_enabled" storage parameter claims to be a boolean type:
http://www.postgresql.org/docs/9.0/static/sql-createtable.html#SQL-CREATETAB
LE-STORAGE-PARAMETERS
... but it fails to behave a normal boolean.

Normally, you could set a boolean with a value of false, 'off' or 'f',
but you would always get back a value of 'f'  With this value, there is
no translation.

I think this kind of boolean should be handled like a standard
PostgreSQL boolean.

I noticed because the Slony code base has a hardcoded check for
"autovacuum_enabled=off", when a false value could also be stored as
autovacuum_enabled=f

We should be able to rely on this value being always returned as
"autovacuum_enabled='f'" just a normal boolean would.

Mark

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Btree index left link changed unexpectedly after bringing up 8.3.11 warm standby

2010-10-28 Thread Mark Kirkwood


On 29/10/10 10:27, Tom Lane wrote:


Were there similar warnings on the master?  Uninitialized-page warnings
are expected in certain error-recovery scenarios, but I'd be a little
worried if the slave appeared to be out of sync with the master.


   


I don't see any in the logs - however the relevant table may not have 
been vacuum recently enough to spit out the warning. I do see lots of these:


2010-09-30 17:31:20 NZDT ERROR:  could not open relation with OID 1836671
2010-09-30 17:31:20 NZDT ERROR:  could not open relation with OID 1836671

and also the following has recently started appearing on the console:

kernel: [ 7341.689322] ECC/ChipKill ECC error.

So... lots of brokeness to examine here (have promoted one of our slaves 
to be the new master). I'll see if we still have the old master db 
around, it would be interesting to see what happens if we start it up 
and try a VACUUM - however the dbas may have set the box up as a slave 
again before we noticed the memory errors (so possibly deleted the old 
master).


Cheers

Mark



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Btree index left link changed unexpectedly after bringing up 8.3.11 warm standby

2010-10-28 Thread Mark Kirkwood


On 29/10/10 04:32, Alvaro Herrera wrote:

Excerpts from Mark Kirkwood's message of jue oct 28 02:20:56 -0300 2010:
   

I'm guessing the index error is due to the uninitialized table pages
(the index "content_node_node_type_id_inserted_idx" is on the "node"
table).
 

Not necessarily ... You still have the index in that state, right?
We could try some diagnostics on it.

   



Doing some more digging - there was an out of memory incident on the 
master the previous day, so I guess both of these observations could 
well be caused by leftover partially completed operations.


Mark

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] Btree index left link changed unexpectedly after bringing up 8.3.11 warm standby

2010-10-27 Thread Mark Kirkwood

I'm seeing this on a Pitrtools managed warm standby box that we 
periodically bring the db fully up on in order to test if the standby is 
good.


After the standby is up, then a db wide VACUUM produces:

2010-10-28 17:20:51 NZDT WARNING:  relation "node" page 248500 is 
uninitialized --- fixing
2010-10-28 17:20:51 NZDT WARNING:  relation "node" page 248502 is 
uninitialized --- fixing
2010-10-28 17:20:51 NZDT WARNING:  relation "node" page 248504 is 
uninitialized --- fixing
2010-10-28 17:20:54 NZDT ERROR:  left link changed unexpectedly in block 
21042 of index "content_node_node_type_id_inserted_idx"


I'm guessing the index error is due to the uninitialized table pages 
(the index "content_node_node_type_id_inserted_idx" is on the "node" 
table). Are the uninitialized pages cause for concern, or merely an 
artifact of log shipping?


The is 8.3.11 on Debian Lenny x86-64.

Thanks

Mark

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] src/tools/fsync/test_fsync.c does not compile

2010-09-30 Thread Mark Kirkwood

The discussion on -performance about disk caching reminded me that the 
useful fsync test utility does not seem to compile (git master on Ubuntu 
10.04):


$ make
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing 
-fwrapv -g -I../../../src/interfaces/libpq -I../../../src/include 
-D_GNU_SOURCE -c -o test_fsync.o test_fsync.c

In file included from /usr/include/fcntl.h:205,
from ../../../src/include/access/xlogdefs.h:15,
from ../../../src/include/access/xlog.h:15,
from ../../../src/include/access/xlog_internal.h:19,
from test_fsync.c:11:
In function ‘open’,
inlined from ‘main’ at test_fsync.c:66:
/usr/include/bits/fcntl2.h:45: error: call to ‘__open_too_many_args’ 
declared with attribute error: open can be called either with 2 or 3 
arguments, not more

make: *** [test_fsync.o] Error 1

The tiny change (attached) seems it fix it for me.

regards

Mark
diff --git a/src/tools/fsync/test_fsync.c b/src/tools/fsync/test_fsync.c
index 3c9c6b6..28c2119 100644
--- a/src/tools/fsync/test_fsync.c
+++ b/src/tools/fsync/test_fsync.c
@@ -63,7 +63,7 @@ main(int argc, char *argv[])
 	for (i = 0; i < XLOG_SEG_SIZE; i++)
 		full_buf[i] = random();
 
-	if ((tmpfile = open(filename, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR, 0)) == -1)
+	if ((tmpfile = open(filename, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR)) == -1)
 		die("Cannot open output file.");
 	if (write(tmpfile, full_buf, XLOG_SEG_SIZE) != XLOG_SEG_SIZE)
 		die("write failed");

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

2010-09-27 Thread Llewellyn, Mark (ADP DSI)

Hi Ashesh

Yes, this appears to be the issue that the password in the pgpass.conf file for 
the postgres user had not been changed when the windows account password was 
reset.

Problem now resolved.

Many thanks
Mark Llewellyn


From: Ashesh Vashi [mailto:ashesh.va...@enterprisedb.com]
Sent: 17 September 2010 04:51
To: Llewellyn, Mark (ADP DSI)
Cc: pgsql-bugs@postgresql.org; Sujeet Rajguru
Subject: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact 
it is running

Hi Mark,

On of my college (Sujeet) has found a way to reproduce the same behaviour.
1. Installed PG 9.0 on Win XP SP3
2. Stop the Postgresql-9.0 service from service manager console
3. Create pgpass.conf in postgres (service account) user's profile with an 
incorrect password deliberately.
(Refer: http://www.postgresql.org/docs/8.4/interactive/libpq-pgpass.html)
4. Now start the postgresql-9.0 service, it will return an error and the status
   shows stopped
5. However i could connect to the psql shell and get the prompt which means
the server is running.

So, please check if pgpass.conf file exists in service account (postgres) 
user's %APPDATA%\postgresql directory with wrong password.

--
Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise Postgres Company<http://www.enterprisedb.com>

On Thu, Sep 9, 2010 at 9:52 PM, Mark Llewellyn 
mailto:mark_llewel...@adp.com>> wrote:

The following bug has been logged online:

Bug reference:  5650
Logged by:  Mark Llewellyn
Email address:  mark_llewel...@adp.com<mailto:mark_llewel...@adp.com>
PostgreSQL version: 9.0 RC1
Operating system:   Windows XP
Description:Postgres service showing as stopped when in fact it is
running
Details:

The postgresql-9.0 service is showing as *not running* in pgAdmin 3 1.12 RC1
and as *stopped* in the Windows services list, although the database is
actually running and I can connect to it and create databases, tables etc.
It also shows multiple postgresql tasks in Windows task manager.
However, because Windows XP is not aware that the postgresql-9.0 service is
running it is not possible to stop the service.

--
Sent via pgsql-bugs mailing list 
(pgsql-bugs@postgresql.org<mailto:pgsql-bugs@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs



This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please notify us immediately by e-mail and delete the message and any 
attachments from your system.

Re: [BUGS] BUG #5665: VPATH catalog/schemapg.h

2010-09-20 Thread Mark Kirkwood

That is not what is being said (or perhaps I should say not what is 
being meant)! Essentially you need to help us help you. Given that VPATH 
builds seem to work for the rest of us, you need to help us see what 
(possibly unusual) thing(s) you did that have got such a build confused.


The guys here like to fix stuff - but cannot fix your bug unless you 
help by supplying what has been asked for.


Best wishes

Mark

On 21/09/10 08:37, Graham Swallow wrote:

Noone else has missing files, in the wrong places,
Its not their problem.

All of the files on all of your machines, are in the right places.
Its not your problem.

All of the files on my machine, are (NOW) in the right places.
Its not my problem.

everybody's happy ;-)

   



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

2010-09-09 Thread Mark Llewellyn


The following bug has been logged online:

Bug reference:  5650
Logged by:  Mark Llewellyn
Email address:  mark_llewel...@adp.com
PostgreSQL version: 9.0 RC1
Operating system:   Windows XP
Description:Postgres service showing as stopped when in fact it is
running
Details: 

The postgresql-9.0 service is showing as *not running* in pgAdmin 3 1.12 RC1
and as *stopped* in the Windows services list, although the database is
actually running and I can connect to it and create databases, tables etc.
It also shows multiple postgresql tasks in Windows task manager.
However, because Windows XP is not aware that the postgresql-9.0 service is
running it is not possible to stop the service.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5595: Documentation is not installs from VPATH build.

2010-08-04 Thread Mark Kirkwood


On 04/08/10 16:55, Tom Lane wrote:


You're right, I did.  Perhaps the presence of prebuilt docs in the
source tree confuses something --- anybody wanna test?

   


The files that seem to be causing the confusion are:

/doc/src/sgml/html-stamp
/doc/src/sgm/man-stamp

A src tree 'maintainer-clean' removes then, but any lesser level of 
clean doesn't. Hmm - shouldn't a VPATH build look at its *own* 
doc/src/sgml/*-stamp files to see if it needs to build the docs? Note 
that it does *create* them in there after a successful build...


Cheers

Mark

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5595: Documentation is not installs from VPATH build.

2010-08-03 Thread Mark Kirkwood


On 04/08/10 03:35, Tom Lane wrote:

"Dmtiriy Igrishin"  writes:
   

Description:Documentation is not installs from VPATH build.
 
   

When 'configure' executed in a directory outside the source tree the
documentation is not installs later nevertheless the "gmake install-docs" or
"gmake install-world" (I tried to build with "gmake world" also) typed to
install PostgreSQL.
When 'configure' executed inside the source tree - all is okay.
 

Hmm ... works for me, on a Fedora 13 box.  Maybe a bug in your copy of
gmake?


   


Hmm, I'm on Ubuntu 10.04 - works for me only if I do the following in 
the src tree first:


$ cd pgsql-9.0beta3
$ ./configure
$ make maintainer-clean

... and then do a VPATH configure and build somewhere else. I'm guessing 
that if I used a checked out src tree instead of a downloaded one then 
it work work fine (suspect Tom used a checked out tree) Tom?


Cheers

Mark

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] DETAIL: Failed on request of size 8

2010-07-22 Thread Mark Kirkwood


On 23/07/10 14:34, vamsi krishna wrote:

Hi

I am running a query on postgres and got the following error:

ERROR: out of memory
DETAIL:  Failed on request of size 8


   


Hmm - looks like your system needs more memory to complete the query 
(ahem - would help to see the query, plus EXPLAIN output thereof). To be 
of any more help, you need to supply more detail - see:


http://wiki.postgresql.org/wiki/Guide_to_reporting_problems




--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Bad optimizer data for xml (WAS: xml data type implications of no =)

2010-06-09 Thread Mark Kirkwood


On 10/06/10 02:17, Tom Lane wrote:

Mark Kirkwood  writes:
   

It seems that the nub of this issue is that there are conceptually two
types of =, one for datatype specific comparison, and one for optimizer
statistical information calculation. However the system allows only the
first, so if you don't (or can't) have one then you lose some possibly
important optimization data.
 

Nonsense.  ANALYZE and the optimizer work with the datatype's usual
notion of '=', whatever it is.

   


Slow down the reading Tom... and read what I was actually saying - note 
the"conceptually". Of course the code uses the datatype's defined "=".



It's possible that we should install a simplified code path in analyze.c
that can collect width data for a column even in the absence of any '='
operator.
   


Yeah I was thinking along the same lines.


Do you have an actual example where such data would have affected a
plan choice?


   


Not at the moment, I was thinking that anywhere that used such datatypes 
in a subquery of similar might be a likely case. I guess I was looking 
at this as a case of "this is an area where we have less accurate 
optimizer data that we could have", and thinking of ways to improve it.


regards

Mark



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Bad optimizer data for xml (WAS: xml data type implications of no =)

2010-06-08 Thread Mark Kirkwood


On 09/06/10 17:14, Tom Lane wrote:

Robert Haas  writes:
   

It's possible.  I don't really see a reason not to add an = operator
for XML - does anyone else?
 

Yes, that was considered and rejected, IIRC.  What is your definition
of equality for xml?
   


Yes - but in that previous discussion the optimizer (lack of) 
information was not considered (or known I suspect), so maybe a rethink 
is worthwhile?


It seems that the nub of this issue is that there are conceptually two 
types of =, one for datatype specific comparison, and one for optimizer 
statistical information calculation. However the system allows only the 
first, so if you don't (or can't) have one then you lose some possibly 
important optimization data.


regards

Mark

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Bad optimizer data for xml (WAS: xml data type implications of no =)

2010-06-08 Thread Mark Kirkwood


On 09/06/10 15:22, Robert Haas wrote:

On Thu, Jun 3, 2010 at 7:16 PM, Mark Kirkwood
  wrote:
   

Maybe I gave this guy a bad title - is it a concern that the 'width'
estimate is so far off for xml datatypes (because of no = op)? It seemed to
me that this could result in some bad plan choices (e.g in subqueries etc).
 

It's possible.  I don't really see a reason not to add an = operator
for XML - does anyone else?

It would need to be done by updating src/include/catalog/pg_*.h,
rather than via SQL, of course.

   


Heh, sure should - I merely included the SQL stuff in case anyone else 
wanted to reproduce what I was seeing!


Cheers

Mark

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Bad optimizer data for xml (WAS: xml data type implications of no =)

2010-06-03 Thread Mark Kirkwood


On 27/05/10 13:37, Mark Kirkwood wrote:

On 25/05/10 16:43, Mark Kirkwood wrote:
Today I ran into some interesting consequences of the xml data type 
being without an "=" operator. One I thought I'd post here because it 
has a *possible* planner impact. I'm not sure it is actually a bug as 
such, but this seemed the best forum to post in initially:


test=# \d bug
  Table "public.bug"
 Column |  Type   | Modifiers
+-+---
 id | integer |
 val| xml |

test=# explain select val::text from bug;
  QUERY PLAN
--
 Seq Scan on bug  (cost=0.00..58127.78 rows=1000278 width=32)


Note the width estimate. However a more realistic estimate for width is:

test=# select 8192/(reltuples/relpages) as width from pg_class where 
relname='bug';

  width
--
 394.130431739976

So we are going to massively underestimate the "size" of such a 
dataset. Now this appears to be a consequence of no "=" operator 
(std_typanalyze in analyze.c bails if there isn't one), so the 
planner has no idea about how wide 'val' actually is. I'm wondering 
if it is worth having at least an "=" operator to enable some minimal 
stats to be available for xml columns.




Adding a minimal = op (see attached) and an analyze results in:

test=# explain select val::text from bug;
  QUERY PLAN
---
 Seq Scan on bug  (cost=0.00..62632.08 rows=108 width=385)

which gives a much better indication of dataset size.


   


Maybe I gave this guy a bad title - is it a concern that the 'width' 
estimate is so far off for xml datatypes (because of no = op)? It seemed 
to me that this could result in some bad plan choices (e.g in subqueries 
etc).


regards

Mark

Re: [BUGS] xml data type implications of no =

2010-05-26 Thread Mark Kirkwood


On 25/05/10 16:43, Mark Kirkwood wrote:
Today I ran into some interesting consequences of the xml data type 
being without an "=" operator. One I thought I'd post here because it 
has a *possible* planner impact. I'm not sure it is actually a bug as 
such, but this seemed the best forum to post in initially:


test=# \d bug
  Table "public.bug"
 Column |  Type   | Modifiers
+-+---
 id | integer |
 val| xml |

test=# explain select val::text from bug;
  QUERY PLAN
--
 Seq Scan on bug  (cost=0.00..58127.78 rows=1000278 width=32)


Note the width estimate. However a more realistic estimate for width is:

test=# select 8192/(reltuples/relpages) as width from pg_class where 
relname='bug';

  width
--
 394.130431739976

So we are going to massively underestimate the "size" of such a 
dataset. Now this appears to be a consequence of no "=" operator 
(std_typanalyze in analyze.c bails if there isn't one), so the planner 
has no idea about how wide 'val' actually is. I'm wondering if it is 
worth having at least an "=" operator to enable some minimal stats to 
be available for xml columns.




Adding a minimal = op (see attached) and an analyze results in:

test=# explain select val::text from bug;
  QUERY PLAN
---
 Seq Scan on bug  (cost=0.00..62632.08 rows=108 width=385)

which gives a much better indication of dataset size.

-- Add support for a minimal = operator for xml type.
CREATE FUNCTION xmleq(xml, xml)
RETURNS bool
AS  'texteq'
LANGUAGE INTERNAL IMMUTABLE STRICT;

CREATE OPERATOR = (
leftarg = xml,
rightarg = xml,
procedure = xmleq,
commutator = =
);

-- Add class so analyze populates pg_statistic.
CREATE OPERATOR CLASS xml_ops
	DEFAULT FOR TYPE xml USING hash AS
		OPERATOR1   =;

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5469: regexp_matches() has poor behaviour and more poor documentation

2010-05-25 Thread Mark Kirkwood


On 26/05/10 15:51, Robert Haas wrote:


I'm not sure that it's very productive to refer to the behavior of our
code as insane.
   


Not meaning to single you out Robert, but typically folk are honest with 
their impression of the code without worrying about feather ruffling too 
much e.g: searching for "brain dead" in the pg-hackers archives returns 
a sizeable collection of reading material.


Personally I think it is good to be blunt about code we consider not 
well thought out or well behaved. Obviously in some cases such comments 
may turn out to be incorrect or misleading (e.g user error or not 
reading the docs), but I don't think we should try (too hard anyway) to 
smother any strong criticism.


regards

Mark


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] xml data type implications of no =

2010-05-24 Thread Mark Kirkwood

Today I ran into some interesting consequences of the xml data type 
being without an "=" operator. One I thought I'd post here because it 
has a *possible* planner impact. I'm not sure it is actually a bug as 
such, but this seemed the best forum to post in initially:


test=# \d bug
  Table "public.bug"
 Column |  Type   | Modifiers
+-+---
 id | integer |
 val| xml |

test=# explain select val::text from bug;
  QUERY PLAN
--
 Seq Scan on bug  (cost=0.00..58127.78 rows=1000278 width=32)


Note the width estimate. However a more realistic estimate for width is:

test=# select 8192/(reltuples/relpages) as width from pg_class where 
relname='bug';

  width
--
 394.130431739976

So we are going to massively underestimate the "size" of such a dataset. 
Now this appears to be a consequence of no "=" operator (std_typanalyze 
in analyze.c bails if there isn't one), so the planner has no idea about 
how wide 'val' actually is. I'm wondering if it is worth having at least 
an "=" operator to enable some minimal stats to be available for xml 
columns.


regards

Mark

Re: [BUGS] BUG #5312: I NEED HELP

2010-02-04 Thread Mark Kirkwood


Alvaro Herrera wrote:

Daniel J. Baldev escribió:
  

All I want to do is to delete a database, but I don't know how to
actually input the dropdb command and what other stuff I need to
open...can you help? I think my problem will be very simple for
someone who understands this



Are you using pgAdmin?  If so, just right-click on the database and
select "delete/drop".

  
Probably too late to be mentioning this... but Daniel, are you sure that 
the database is not needed by anything?


regards

Mark

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] Foreign key constaint can be broken

2010-01-26 Thread Mark Kazemier

Dear,

I found a way to break a foreign key constraint in PostgreSQL. I discussed
this bug on a dutch discussion board and multiple people tried this on
multiple versions.

When I create the following tables:


CREATE TABLE a
(
id SERIAL NOT NULL,
foo CHAR(100) NOT NULL,
PRIMARY KEY(id)
);

CREATE TABLE b
(
id INT NOT NULL,
bar CHAR(20) NOT NULL,
PRIMARY KEY(id),
FOREIGN KEY(id) REFERENCES a(id) ON DELETE CASCADE
);

and add the following rule to table b:

CREATE OR REPLACE RULE delete_b AS ON DELETE
TO b
DO INSTEAD
DELETE FROM a WHERE OLD.id = id;

When you try to delete a row on table b:
DELETE FROM b WHERE id = 1;
The record from table a disappears, but the record in table b is still
there. Of course this is a very stupid construction, but I would expect some
kind of error / warning message instead. Now it is possible to corrupt your
data.

Best regards,
Mark Kazemier

Re: [BUGS] BUG #5268: PQgetvalue incorrectly returns 0

2010-01-07 Thread Mark Kirkwood


Mike Landis wrote:

At 09:09 PM 1/7/2010, you wrote:



I suspect they do not. Its all in the permissions.


There's no user account control enabled on this Vista machine, 
therefore effectively wide open, hence different platform behavior or 
at least a difference between the behavior in pgAdmin and client program.




The lack of os level permissions is not relevant to this issue - I was 
referring to database level users and their permissions on tables. It 
does look like you are somehow running your c program as a different 
(db) user from who you are using in Pgadmin. As Robert suggested, try 
doing 'SELECT user' in both.


Also note that Pgadmin user PQexec and PQgetValue...

Cheers

Mark


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5268: PQgetvalue incorrectly returns 0 (corrected)

2010-01-07 Thread Mark Kirkwood


(I forgot to cc -bugs...)

Mike Landis wrote:


Two things strike me as odd about that...
1) What's the logic behind the owner of a table not automatically 
getting a readonly privilege like SELECT?


Owner always has select on a table they have created.

2) I think it would be more logical to refuse to return a NULL result 
from PQexec() than to return a result with the proper number of tuples 
and columns, but inaccurate contents.


COUNT always returns a number (never null). Recall you are select from
the information schema table.



It's also strange that Linux and Vista would produce different answers.


I suspect they do not. Its all in the permissions.

Cheers

Mark


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5268: PQgetvalue incorrectly returns 0

2010-01-07 Thread Mark Kirkwood


Mike Landis wrote:
Pick a database and table that exists, configure the string 
cconstants, compile and run the attached cpp, get 0 instead of 1 (that 
you get in pgAdmin...


Where's can I download the libpq source?  Maybe I can find and/or fix 
the problem myself.


Your program works fine for me (apart from minor datatype changes to 
port to Linux). I used user=postgres and database=regression and 
pgtable='tenk1' (schema from the regression test suite). As Tom 
mentioned, your user might not have access to the table you are using - 
try using the superuser account - typically 'postgres' to eliminate this 
possibility.


With respect to the libpq source, it is in the source tarball from the 
Postgresql website (directory src/interfaces/libpq ).


regards

Mark


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] remove

2009-12-30 Thread Mark Williamson

Re: [BUGS] BUG #5244: Attempting to rollback to a savepoint after receiving an error with state 55000 the process hangs

2009-12-16 Thread Mark Kirkwood


Philip Graham wrote:

The following bug has been logged online:

Bug reference:  5244
Logged by:  Philip Graham
Email address:  phi...@lightbox.org
PostgreSQL version: 8.3.8
Operating system:   Linux
Description:Attempting to rollback to a savepoint after receiving an
error with state 55000 the process hangs
Details: 


This may be a PHP so please excure me if it is.

setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

echo 'Creating test objects'."\n";
$pdo->query('CREATE SEQUENCE test_seq');

echo 'Setup complete'."\n";
$pdo->beginTransaction();

try {
echo 'Setting savepoint'."\n";
$pdo->query('SAVEPOINT pre_id_fetch');
echo 'Fetching value'."\n";
$stmt = $pdo->query('SELECT currval(\'test_seq\');');
$curId = $stmt->fetchColumn();
echo 'Releasing savepoint'."\n";
$pdo->query('RELEASE SAVEPOINT pre_id_fetch');
} catch (PDOException $e) {
echo 'Rolling back'."\n";
$pdo->query('ROLLBACK TO pre_id_fetch');
$curId = 0;
}

echo 'Cur Id: ',$curId,"\n";


Running this code it hangs after echoing 'Rolling back', but only hangs
every other execution (assuming the sequence was deleted first).

  


I think you need to be using $pdo->exec instead of $pdo->query for 
everything *except* the SELECT operation.  The query method is really 
only intended for statements returning rows. Making the indicated 
changes stops the hang for me (Php 5.3.2)


regards

Mark


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] Optimal platform for pg?

2009-12-15 Thread Mark Williamson

What is the most ideal/optimal platform for postgresql?  Linux  
(distro?), freebsd, windows, etc.


consider memory management, file system performance, threading model  
etc.


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] statement_timeout is not cancelling query

2009-12-14 Thread Mark Williamson

I have a few things to report so I'm not sure if one email is good or
several but here goes.

We are using Postgresql 8.3.8

We were having a blocking query problem that should have been fixed by
statement_timeout = 9 however this seems to have had zero effect.

The query we have was like so:

update articles set views=views+1 where id=7223

Thats it.  Fairly simple right?  Well, we also had a trigger function that
updates a full text index on that record whenever any value is updated.  We
have since changed this function to only update the gist index for inserts
or updates when one of the indexed columns is updated.  However, let's stick
with the original for a moment.  There should have really been no problem
updating the GIST index for a single row in an insert/update trigger.

So what happened is, the above update never completed and the Postgresql
service consumed all available memory.  We had to forcefully reboot the
machine, we turned on track activity, and watch it do it again and again.
Luckily we were able to kill the process with the offending query before
losing the machine.

The postgresql configuration has a max of 255 connections.  The machine has
16 gigabytes of RAM and 2 quad core xeons.  We have several instances of
Postgresql running on different ports.  Our reason for doing this was to
prevent one customer's database 'instance' from impacting another customer.
A couple of years ago we had a run away query that brought the whole system
down.  So I implemented this separate instance concept and it has been
purring along great ever since, until now.

So we contacted a PG expert who was able to determine we had a corrupt full
text index and recommended rebuilding it and fixing the trigger function.
Once we rebuilt the index things worked (or are working) so far.

So we have a couple of questions:

Why is it that statement_timeout was ignored and the update statement was
allowed to run for excessive time?
Why does Postgresql NOT have a maximum memory allowed setting?  We want to
allocate resources efficiently and cannot allow one customer to impact
others.

That's it for now.

Hope someone can provide helpful answers.

Thanks,
Mark W.

Re: [BUGS] BUG #5150: math bug

2009-11-01 Thread Mark Kirkwood


David Fetter wrote:

On Fri, Oct 30, 2009 at 08:51:57PM -0700, John R Pierce wrote:
  

Tom Lane wrote:


There is special-purpose software out there that can compute
exactly with rational numbers, but you aren't likely to find it
embedded in any general-purpose tools like databases --- the
use-case just isn't wide enough.  One reason why not is that it'll
still fall down on irrational numbers.
  
  



1/3 is a rational number.  however,  it is a repeating fraction when
expressed in decimal.






The set of algebraic numbers, of which rational numbers are a proper
subset, is countable and hence has Lebesgue measure zero on the real
line.
 ;)

  


LOL - fortunately (going by the bug) he is not trying to compute a 
measure (i.e integrate) from a set of 'em.


Mark

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5096: Error installing edb_apachephp.bin

2009-10-04 Thread Mark Kirkwood


Federico wrote:

The following bug has been logged online:

Bug reference:  5096
Logged by:  Federico
Email address:  federicoaagui...@gmail.com
PostgreSQL version: 8.4
Operating system:   OpenSuSE 11.1
Description:Error installing edb_apachephp.bin
Details: 


Hello, when i'm installing edb_apachephp.bin (./edb_apachephp.bin) it show
an error: Error running getenforce  : /bin/sh: getenforce: command not
found.

Please, help me.


  


This looks like an Enterprisedb installation problem - I suspect you 
will get more help raising it with them (hopefully there is a link off 
http://www.enterprisedb.com for support/bugs).


The error you are seeing is because your Opensuse install does not have, 
or cannot find 'gatenforce' - an SELunix command. Your operating system 
may be not configured correctly for Enterprisedb, or else your PATH does 
not include where gatenforce lives.


regards

Mark




--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5054: PDO -> Query returns "" from Boolean type field, if it has false value.

2009-09-15 Thread Mark Kirkwood


I wrote:



Trying out some code with Php 5.3.1-dev:

$sql = "SELECT false";
$stmt = $dbh->query($sql);
$result = $stmt->fetch(PDO::FETCH_NUM);
print("  " . $result[0] . "\n");

reproduces what Yujin is seeing...


After a bit of digging through the PDO code, I see what is happening. 
the ->fetch operation is returning a Php boolean correctly from PDO (you 
can use var_dump($result[0]) to check this), but when you use print as 
above, Php casts the boolean to string - and for whatever reason Php 
reckons turning (boolean)false into (string)"" is the way to go. So to 
get a sensible result you need to do something like:


$result = $stmt->fetch(PDO::FETCH_NUM);
print("  " . (integer) $result[0] . "\n");

i.e: explicit cast on the result value.

This is confusing and seemingly not consistent - for instance the Mysql 
version of this example returns a string "0" from PDO, so gives a 0 for 
false in a more expected/intuitive way...


regards

Mark

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5054: PDO -> Query returns "" from Boolean type field, if it has false value.

2009-09-14 Thread Mark Kirkwood


Mark Kirkwood wrote:
I guess it must be something funny with how PDO represents the bool 
type...(will have a look at the PDO code). But this needs to be raised 
on bugs.php.net.





FYI - a related bug is : http://bugs.php.net/bug.php?id=33876


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5054: PDO -> Query returns "" from Boolean type field, if it has false value.

2009-09-14 Thread Mark Kirkwood


Tom Lane wrote:

"Yujin"  writes:
  

When i get query from table with bolean type fields, that have false value ,
function PDO -> fetch  return that fields with not "0" value , but empty
string.



Are you sure the field is actually false, and not null?

If so, this is a PDO bug, not a Postgres bug.

regards, tom lane

  

Does seem to be a PDO bug or some sort:

Trying out some code with Php 5.3.1-dev:

$sql = "SELECT false";
$stmt = $dbh->query($sql);
$result = $stmt->fetch(PDO::FETCH_NUM);
print("  " . $result[0] . "\n");

reproduces what Yujin is seeing, whereas replacing $sql with:

$sql = "SELECT false::int4";

gives a 0 in the result array. I guess it must be something funny with 
how PDO represents the bool type...(will have a look at the PDI code). 
But this needs to be raised on bugs.php.net.


Cheers

Mark

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5031: DATE_TRUNC returns the wrong value when specifying MONTH

2009-09-03 Thread Mark Douglas

You're correct. When I run this from psql it returns the correct result. When I 
run it from DBVisualizer, which I normally use, it adjust the result to my 
local time zone. Thanks for looking into it. Sorry about bugging you with that.

Thanks,

Mark

On 9/2/09 10:24 PM, "Tom Lane"  wrote:

Mark Douglas  writes:
> I have my timezone set to GMT so there really shouldn't be any time zone 
> adjustments.

Okay ...

postgres=# set timezone = GMT;
SET
postgres=# SELECT DATE_TRUNC('MONTH', CURRENT_DATE);
   date_trunc

 2009-09-01 00:00:00+00
(1 row)

I suspect there's something you're not telling us, like you're
using a client-side library that is doing timezone adjustments
behind your back.

regards, tom lane

Re: [BUGS] BUG #5031: DATE_TRUNC returns the wrong value when specifying MONTH

2009-09-02 Thread Mark Douglas

I have my timezone set to GMT so there really shouldn't be any time zone 
adjustments.

Mark

On 9/2/09 10:01 PM, "Tom Lane"  wrote:

"Mark Douglas"  writes:
> The following use of DATE_TRUNC returns the wrong value. I called the
> function on 2009-09-02. It should return '2009-09-01 00:00:00' for the
> following usage:

> SELECT DATE_TRUNC('MONTH', CURRENT_DATE);

> It instead returns '2009-08-31 17:00:00.

Really?  What timezone setting are you using?  I get

postgres=# SELECT DATE_TRUNC('MONTH', CURRENT_DATE);
   date_trunc

 2009-09-01 00:00:00-04
(1 row)

with timezone set to 'America/New_York' or equivalent.
This test might also be instructive:

postgres=# select CURRENT_DATE::timestamptz;
  timestamptz

 2009-09-03 00:00:00-04
(1 row)

regards, tom lane

[BUGS] BUG #5031: DATE_TRUNC returns the wrong value when specifying MONTH

2009-09-02 Thread Mark Douglas


The following bug has been logged online:

Bug reference:  5031
Logged by:  Mark Douglas
Email address:  m...@steelhousemedia.com
PostgreSQL version: 8.4.0
Operating system:   Ubunto Linux
Description:DATE_TRUNC returns the wrong value when specifying MONTH
Details: 

The following use of DATE_TRUNC returns the wrong value. I called the
function on 2009-09-02. It should return '2009-09-01 00:00:00' for the
following usage:

SELECT DATE_TRUNC('MONTH', CURRENT_DATE);

It instead returns '2009-08-31 17:00:00.

Casting CURRENT_DATE to a TIMESTAMP causes it to return the correct value
but that shouldn't be required. Cast example:

SELECT DATE_TRUNC('MONTH', CAST(CURRENT_DATE AS TIMESTAMP));

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [HACKERS] [BUGS] Cursor with hold emits the same row more than once across commits in 8.3.7

2009-06-10 Thread Mark Kirkwood


Tom Lane wrote:

Given that RC freeze is nearly upon us for 8.4, and that we need a
reasonably non-invasive fix for 8.3 anyway, I propose that for now
we just deal with the syncscan issue by tweaking heap_rescan so that
rs_startblock doesn't get changed.  It looks like that's about a
three-line patch.  The question of how cursors should behave with
respect to volatile functions can be documented and left for another
time.
  


Sounds like a good approach.

Mark

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] Cursor with hold emits the same row more than once across commits in 8.3.7

2009-06-08 Thread Mark Kirkwood

Short Desc: Cursor with hold emits the same row more than once across 
commits in 8.3.7

Os : Debian Etch amd64 / Ubuntu Jaunty amd64
Pg : 8.3.7
Build options: Official package and also compiled from source with 
--enable-integer-datetimes

Detailed Desc:

A construction of the form

DECLARE cur CURSOR WITH HOLD FOR SELECT * FROM obj

loop
   FETCH 1000 FROM cur
   process 'em
   COMMIT

results in some of the same rows being emitted more than once, altho the 
final rowcount is correct (i.e some rows end up being never seen).


Originally discovered using a perl DBI program, and we wondered if the 
version of DBD::Pg might be an issue, so a c library program was written 
to test this - and it exhibits the problem too (see attached for schema 
and program). The table rows are reasonably wide:



select attname,n_distinct,avg_width from pg_stats where tablename='obj';
  attname   | n_distinct | avg_width
-++---
obj_id  | -1 | 4
obj_link_id |  5 | 4
obj_fil |  13035 |  1188

which may be a factor(tuplestore issues?)... The table is reasonably 
sizable (1000 rows). I can attach the generation program for this 
dataset if required.


regards

Mark



cursor-bug.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error

2009-05-02 Thread Mark

On Sat, 02 May 2009 14:47:48 GMT, Tom Lane wrote

> Mark  writes:

> > I understand the rationale for relocatable packages. So,
> > I guess hardlinks are out. But, barring hardlinks,
> > perhaps, in the existence of a symlink, a simple 'readlink'
> > function could be done to auto-correct PostgreSQL's
> > base-location? Ala:
> 
> That's exactly what it already does, and why it would've worked
> if you'd used symlinks not hardlinks.

Interesting. Yet, as I reported earlier, whilst a symlink does seem to start 
the server, pg_ctl takes a long time to do so, and then report: "could not 
start server" anyway. But it actually *does* get started. So I figured maybe 
something was not entirely right with the symlink, either.

- Mark

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error

2009-05-02 Thread Mark

-Original Message-
From: pgsql-bugs-ow...@postgresql.org
[mailto:pgsql-bugs-ow...@postgresql.org] On Behalf Of Tom Lane
Sent: vrijdag 1 mei 2009 23:57
To: Mark; pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #4787: Hardlink (ln) causes startup failure with
bizarre "timezone_abbreviations" error 

> What I'm inclined to do is modify PostmasterMain so that immediately
> after find_my_exec, it checks that get_share_path returns the name of
> a readable directory.

I understand the rationale for relocatable packages. So, I guess hardlinks
are out. But, barring hardlinks, perhaps, in the existence of a symlink, a
simple 'readlink' function could be done to auto-correct PostgreSQL's
base-location? Ala:

char buf[1024];
ssizet_t len;

if ((len = readlink ("/usr/local/bin/pg_ctl", buf, sizeof(buf)-1)) != -1)
buf[len] = '\0';

Symlinks are used quite often, *especially* when dealing with relocatable
packages (read: that will likely not reside in /usr/local/, etc.). And it
would only requires two or three extra lines of code, no?

At any rate, I appreciate you looking into this.

- Mark

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error

2009-05-01 Thread Mark

-Original Message-
From: pgsql-bugs-ow...@postgresql.org
[mailto:pgsql-bugs-ow...@postgresql.org] On Behalf Of Tom Lane
Sent: vrijdag 1 mei 2009 17:46
To: Mark Kramer
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #4787: Hardlink (ln) causes startup failure with
bizarre "timezone_abbreviations" error

"Mark Kramer"  writes:

> > I have my PostgreSQL installed in /usr/local/PostgreSQL/ (cleaner for
> > updates, instead of just /usr/local) As a result, I made hard-links
> > like this,

> > cd /usr/local/bin/
> > ln /usr/local/PostgreSQL/bin/pg_ctl pg_ctl

> This isn't going to work because pg_ctl assumes it can find postgres in
> the same directory it is in. Try using a symlink instead.  (It'll be
> less likely to fail miserably after an upgrade, too.)

I tried a symlink as well. Then pg_ctl *can* start the server (which is
kinda odd, by itself, that it can do so now, whereas not with a hardlink;
unless pg_ctl actually reads the symlink content, which is very unlikely),
but it reports a spurious error nonetheless: "could not start server"
(whilst it DOES start the server just fine).

As for pg_ctl assuming it can find postgres in the same directory it is
in, it SHOULD. :) Basically, I hard-linked all files in
/usr/local/PostgreSQL/bin/ to /usr/local/bin/. So, even when pg_ctl got
started from /usr/local/bin/, it should have found /usr/local/bin/postgres
right under its very nose! Also, the error message actually DOES seem to
come from postgres (postgres[9742]: [6-1] FATAL), but that may well be an
optical illusion on my end (as pg_ctl could log as 'postgres' too: haven't
examined that yet).

Clearly, seems PostgreSQL just really wants to be started from its
original install-location.

> > I get this error, though:
> > May  1 04:40:26 asarian-host postgres[9742]: [6-1] FATAL: invalid
> > value for parameter "timezone_abbreviations": "Default"

> I agree this is an odd error message though.  Perhaps you hardlinked a
> few other things you didn't tell us about?  I'm not sure what it would
> take to make this be the first complaint. What is probably happening is
> that postgres is trying to find /usr/local/PostgreSQL/share/ relative
> to itself, but I'd have thought it would notice the problem sooner.

The /share/ thingy is what I strongly suspected too; but since the bug
report FAQ strongly discourages one from writing your assumptions about
what you *think* might be the issue, I refrained from mentioning it. :)
But yes, that seems like a logical place to look.

- Mark

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

1 2 >

1 - 100 of 153 matches

Mail list logo