[BUGS] 8.2b2: update.bad in windows release points to wrong .msi

2006-10-25 Thread Thomas H.



as the subject says - the upgrade.bat in the b2 
release thats currently being mirrored points to the installation files 
of8.1 instead of the 8.2 ones.

best regards,
thomas


Re: [BUGS] 8.2b2: update.bad in windows release points to wrong .msi

2006-10-25 Thread Dave Page



Thanks, fixed in CVS.

Regards, Dave.

  
  
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of Thomas 
  H.Sent: 25 October 2006 12:16To: 
  pgsql-bugs@postgresql.orgSubject: [BUGS] 8.2b2: update.bad in 
  windows release points to wrong .msi
  
  as the subject says - the upgrade.bat in the b2 
  release thats currently being mirrored points to the installation files 
  of8.1 instead of the 8.2 ones.
  
  best regards,
  thomas


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-25 Thread Magnus Hagander
  The same problem exists in 8.1 too.  See this thread 
  http://archives.postgresql.org/pgsql-bugs/2006-04/msg00177.php
  Tom and Magnus tracked down a cause, but I don't think a 
 fix was ever 
  implemented.
 
 Thomas seems to have two different issues there: the could 
 not rename file problem on the pg_xlog file is probably 
 explained by the mechanism we identified back then (and I'm 
 not sure why no fix has been installed), but there is no 
 known reason other than antivirus software for the could not 
 fsync problem.
 
 As for fixing the problem we do understand: ISTM it's just an 
 awful idea for pgrename and pgunlink to be willing to loop 
 forever.  I think they should time out and report the failure 
 after some reasonable period (say between 10 sec and a minute).
 
 If we simply made that change, then the behavior when there's 
 an idle backend sitting on a filehandle for an old xlog 
 segment would be that checkpoints would fail at the 
 MoveOfflineLogs stage, which would not be fatal, but it'd be 
 annoying.  We'd probably want to further tweak 
 InstallXLogFileSegment so that rename failure isn't an ERROR, 
 at least not on Windows.  (I think we could just make it 
 return false, which'd cause the caller to try to delete the 
 xlog segment, which should work even though rename doesn't.)
 
 I'm not in a position to test this though.  Magnus or Bruce?

I haven't reproduced this on my box. But if you can give me a patch to
try I can build binaries for Thomas to test, if he can do testing but
not building.

//Magnus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-25 Thread Thomas H.



I'm not in a position to test this though.  Magnus or Bruce?


I haven't reproduced this on my box. But if you can give me a patch to
try I can build binaries for Thomas to test, if he can do testing but
not building.


a binary would be marvelous. if too much hasle, i can setup a msvc++ 2005 
here and try to build it on my own, but would obviously prefere if i won't 
have to...


b2 is installed here, but i'm seeing the same problems, so yes, i'm ready 
for testing ;-)


thanks,
thomas 




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-25 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 I haven't reproduced this on my box. But if you can give me a patch to
 try I can build binaries for Thomas to test, if he can do testing but
 not building.

Utterly untested ... BTW, why does pgrename have an #if to check
either GetLastError() or errno, but pgunlink doesn't?

regards, tom lane



binIB0etwwbRx.bin
Description: xlog-rename.patch.gz

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-25 Thread Bruce Momjian
Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
  I haven't reproduced this on my box. But if you can give me a patch to
  try I can build binaries for Thomas to test, if he can do testing but
  not building.
 
 Utterly untested ... BTW, why does pgrename have an #if to check
 either GetLastError() or errno, but pgunlink doesn't?

Seems like a bug ---  they both should have it.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [BUGS] 8.2beta1 (w32): server process crash (tsvector)

2006-10-25 Thread Thomas H.

just a small update: this problem is also present in beta 2.
not a big problem for the moment, as we currently have disabled fulltext 
search capabilities on the website.


regards,
thomas

- Original Message - 
From: [EMAIL PROTECTED]

To: Tom Lane [EMAIL PROTECTED]
Cc: pgsql-bugs@postgresql.org
Sent: Tuesday, October 17, 2006 10:19 PM
Subject: Re: [BUGS] 8.2beta1 (w32): server process crash (tsvector)



the following query will crash the server process:
INSERT INTO news.news
SELECT * FROM news.news2;


This is undoubtedly data-dependent.  Can you supply some sample data
that makes it happen?


it's not only happening with INSERTS, but also updates. as thats easier to
test, here's how i can reproduce the error:

1. create new database (encoding: UTF8) with tsearch2 on 8.2b1 win32 
(system

locale: de_CH.1252)
2. insert the data from the zip file 
[http://alternize.com/pgsql/tsearch2test.zip] (be sure to also update 
pg_ts_cf /

pg_ts_cfgmap as we have WIN1252 locale)
3. execute UPDATE test SET idxFTI = to_tsvector('default', sometext); or
similar queries
4. hopefully see the process crashing as i do ;-)


2006-10-17 17:23:44 LOG:  server process (PID 4584) exited with exit
code -1073741819
2006-10-17 17:23:44 LOG:  terminating any other active server processes
2006-10-17 17:23:44 WARNING:  terminating connection because of crash of
another server process
2006-10-17 17:23:44 DETAIL:  The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
{snipp}
2006-10-17 17:23:44 LOG:  all server processes terminated; reinitializing
2006-10-17 17:23:44 LOG:  database system was interrupted at 2006-10-17
17:23:41 W. Europe Daylight Time
2006-10-17 17:23:44 LOG:  Windows fopen(recovery.conf,r) failed: code 
2,

errno 2
2006-10-17 17:23:44 LOG:  Windows fopen(pg_xlog/0001.history,r)
failed: code 2, errno 2
2006-10-17 17:23:44 LOG:  Windows fopen(backup_label,r) failed: code 
2,

errno 2
2006-10-17 17:23:44 LOG:  checkpoint record is at 0/E2ECA728
2006-10-17 17:23:44 LOG:  redo record is at 0/E2ECA728; undo record is at
0/0; shutdown FALSE
2006-10-17 17:23:44 LOG:  next transaction ID: 0/514299; next OID: 6276957
2006-10-17 17:23:44 LOG:  next MultiXactId: 1; next MultiXactOffset: 0
2006-10-17 17:23:44 LOG:  database system was not properly shut down;
automatic recovery in progress
2006-10-17 17:23:44 LOG:  redo starts at 0/E2ECA778
2006-10-17 17:23:44 LOG:  unexpected pageaddr 0/DB0CC000 in log file 0,
segment 227, offset 835584
2006-10-17 17:23:44 LOG:  redo done at 0/E30CBE78
2006-10-17 17:23:45 LOG:  database system is ready
2006-10-17 17:23:45 LOG:  Windows fopen(global/pg_fsm.cache,rb) 
failed:

code 2, errno 2
2006-10-17 17:23:45 LOG:  transaction ID wrap limit is 2147484172, limited
by database postgres
2006-10-17 17:23:45 LOG:  Windows fopen(global/pgstat.stat,rb) failed:
code 2, errno 2


i've also tried to update each record on its own in a for-loop. here the
crash happens as well, sometimes after 10 updates, sometimes after 100
updates, sometimes even after 1 update. but eventually every record can be
updated. so i do not think its entierly content-related...

for what its worth, here's the output of pg_controldata:

pg_control version number:822
Catalog version number:   200609181
Database system identifier:   4986650172201464825
Database cluster state:   in production
pg_control last modified: 17.10.2006 17:44:29
Current log file ID:  0
Next log file segment:230
Latest checkpoint location:   0/E4E0F978
Prior checkpoint location:0/E46BF420
Latest checkpoint's REDO location:0/E4E03098
Latest checkpoint's UNDO location:0/0
Latest checkpoint's TimeLineID:   1
Latest checkpoint's NextXID:  0/531333
Latest checkpoint's NextOID:  6285149
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Time of latest checkpoint:17.10.2006 17:43:45
Minimum recovery ending location: 0/0
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Date/time type storage:   floating-point numbers
Maximum length of locale name:128
LC_COLLATE:   German_Switzerland.1252
LC_CTYPE: German_Switzerland.1252

let me know if more information / data is needed.

on a sidenote: are those fopen() errors debug-code-leftovers or something
one should worry about? i can't find those files on the file system.

- thomas


---(end of broadcast)---
TIP 4: Have you searched our list 

[BUGS] Out of memory error causes Abort, Abort tries to allocate memory

2006-10-25 Thread Jeff Davis
I found the root cause of the bug I reported at:

http://archives.postgresql.org/pgsql-bugs/2006-10/msg00211.php

What happens is this:
* Out of memory condition causes an ERROR
* ERROR triggers an AbortTransaction()
* AbortTransaction() calls RecordTransactionAbort()
* RecordTransactionAbort calls smgrGetPendingDeletes()
* smgrGetPendingDeletes() calls palloc()
* palloc() fails, resulting in ERROR, causing infinite recursion
* elog.c detects infinite recursion, and elevates it to PANIC

I'm not sure how easy this is to fix, but I asked on IRC and got some
agreement that this is a bug.

It seems to me, in order to fix it, we would have to avoid allocating
memory on the AbortTransaction path. All smgrGetPendingDeletes() needs
to allocate is a few dozen bytes (depending on the number of relations
to be deleted). Perhaps it could allocate those bytes as list of pending
deletes fills up. Or maybe we can somehow avoid needing to record the
relnodes to be deleted in order for the abort to succeed.

I'm still not sure why foreign keys on large insert statements don't eat
memory on 7.4, but do on 8.0+.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [BUGS] Out of memory error causes Abort, Abort tries to allocate memory

2006-10-25 Thread Alvaro Herrera
Jeff Davis wrote:
 I found the root cause of the bug I reported at:
 
 http://archives.postgresql.org/pgsql-bugs/2006-10/msg00211.php
 
 What happens is this:
 * Out of memory condition causes an ERROR
 * ERROR triggers an AbortTransaction()
 * AbortTransaction() calls RecordTransactionAbort()
 * RecordTransactionAbort calls smgrGetPendingDeletes()
 * smgrGetPendingDeletes() calls palloc()
 * palloc() fails, resulting in ERROR, causing infinite recursion
 * elog.c detects infinite recursion, and elevates it to PANIC
 
 I'm not sure how easy this is to fix, but I asked on IRC and got some
 agreement that this is a bug.

Hmm, maybe we could have AbortTransaction switch to ErrorContext, which
has some preallocated space, before calling RecordTransactionAbort (or
maybe have RecordTransactionAbort itself do it).

Problem is, what happens if ErrorContext is filled up by doing this?  At
that point we will be severely fscked up, and you probably won't get the
PANIC either.  (Maybe it doesn't happen in this particular case, but
seems a real risk.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [BUGS] Out of memory error causes Abort, Abort tries to

2006-10-25 Thread Jeff Davis
On Wed, 2006-10-25 at 16:20 -0300, Alvaro Herrera wrote:
 Jeff Davis wrote:
  I found the root cause of the bug I reported at:
  
  http://archives.postgresql.org/pgsql-bugs/2006-10/msg00211.php
  
  What happens is this:
  * Out of memory condition causes an ERROR
  * ERROR triggers an AbortTransaction()
  * AbortTransaction() calls RecordTransactionAbort()
  * RecordTransactionAbort calls smgrGetPendingDeletes()
  * smgrGetPendingDeletes() calls palloc()
  * palloc() fails, resulting in ERROR, causing infinite recursion
  * elog.c detects infinite recursion, and elevates it to PANIC
  
  I'm not sure how easy this is to fix, but I asked on IRC and got some
  agreement that this is a bug.
 
 Hmm, maybe we could have AbortTransaction switch to ErrorContext, which
 has some preallocated space, before calling RecordTransactionAbort (or
 maybe have RecordTransactionAbort itself do it).
 
 Problem is, what happens if ErrorContext is filled up by doing this?  At
 that point we will be severely fscked up, and you probably won't get the
 PANIC either.  (Maybe it doesn't happen in this particular case, but
 seems a real risk.)
 

If we have a way to allocate memory and recover if it fails, perhaps
RecordTransactionAbort() could set the rels to delete part of the log
record to some special value that means There might be relations to
delete, but I don't know which ones. Then, if necessary, it could
determine the relations that should be deleted at recovery time.

This idea assumes that we can figure out which relations are abandoned,
and also assumes that smgrGetPendingDeletes() is the only routine that
allocates memory on the path to abort a transaction due to an out of
memory error.

Regards,
Jeff Davis



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [BUGS] Out of memory error causes Abort, Abort tries to allocate memory

2006-10-25 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Jeff Davis wrote:
 * smgrGetPendingDeletes() calls palloc()
 * palloc() fails, resulting in ERROR, causing infinite recursion

 Hmm, maybe we could have AbortTransaction switch to ErrorContext, which
 has some preallocated space, before calling RecordTransactionAbort (or
 maybe have RecordTransactionAbort itself do it).

Seems like it'd be smarter to try to free some memory before we push
forward with transaction abort.  ErrorContext has only a limited amount
of space ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-25 Thread Thomas H.

As for fixing the problem we do understand: ISTM it's just an
awful idea for pgrename and pgunlink to be willing to loop
forever.  I think they should time out and report the failure
after some reasonable period (say between 10 sec and a minute).


is the main problem realy in the rename/delete function? while i'm in no 
position of actually knowing whats going on under the hood, my observations 
in +10 cases during this afternoon/evening revealed some patterns:


it is defenitely the writer process that blocks the db. but in every case 
the writer process seems to fail to rename the file due to another 
postgresql still holding a filehandle to the very xlog file that should be 
renamed. ProcessExplorer lets you force a close of the file handle - as soon 
as you do this [which is a bad thing to do, i assume], the rename succeeds 
and processing continues normally.


i actually can reproduce the error at will now - i just need do pump enough 
data into the db (~200mb data seems sufficient) to have it lock up.


- thomas 




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-25 Thread Tom Lane
Thomas H. [EMAIL PROTECTED] writes:
 it is defenitely the writer process that blocks the db. but in every case 
 the writer process seems to fail to rename the file due to another 
 postgresql still holding a filehandle to the very xlog file that should be 
 renamed.

Right, all you need is a backend process that's made at least one xlog
write and then is left to sit idle for long enough that that xlog file
is due for recycling.

However, the fact that the writer process is stuck should not in itself
cause the DB to lock up.  I think what's really happening is that after
the writer process gets stuck, the remaining backends chew all the
available WAL, and then they need to create more WAL segments for
themselves, and the writer process is holding the ControlFileLock so
they can't.

It might be interesting to think about not requiring the ControlFileLock
to be held while making new WAL segments.  I think the only reason it
does that is to ensure that link/rename failure can be treated as a hard
error (because no one could have beat us to the filename), but we're
already having to back off that stance for Windows ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org