[HACKERS] Re: [BUGS] BUG #3242: FATAL: could not unlock semaphore: error code 298

2007-04-21 Thread Marcin Waldowski

Magnus Hagander wrote:

Tom Lane wrote:
  

Magnus Hagander [EMAIL PROTECTED] writes:


No, it's definitly the right primitive. But we're creating it with a max
count of 1.
  

That's definitely wrong.  There are at least three reasons for a PG
process's semaphore to be signaled (heavyweight lock release, LWLock
release, pin count waiter), and at least two of them can occur
concurrently (eg, if deadlock checker fires, it will need to take
LWLocks, but there's nothing saying that the original lock won't be
released while it waits for an LWLock).

The effective max count on Unixen is typically in the thousands,
and I'd suggest the same on Windows unless there's some efficiency
reason to keep it small (in which case, maybe ten would do).



AFAIK there's no problem with huge numbers (it takes an int32, and the
documentation says nothing about a limit - I'm sure it's just a 32-bit
counter in the kernel). I'll give that a shot.
  


Magnus, Tom, thank you for finding what causes the problem :) I hope 
that was also a reason why other transactions were hung (because that is 
a prior, I think).



Marcin - can you test a source patch? Or should I try to build you a
binary for testing? It'd be good if you can confirm that it works before
we commit anything, I think.
  


Of course I will check fix :) I will be able to do tests on monday. I 
think source path should be enought, despite I've newer build PostgreSQL 
on Windows (I definitely should try). If i have problems then I will ask 
you for binary.


Regards, Marcin

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] Re: [BUGS] BUG #3242: FATAL: could not unlock semaphore: error code 298

2007-04-20 Thread Marcin Waldowski

Magnus Hagander wrote:

I've looked at the code there, and can't find a clear problem. One way it
could happen is if the actual PGSemaphoreUnlock() is called once more than
needed. 


CC:ing to hackers for this question:

Any chance that's happening? If this happens with SysV semaphores, will
they error out, or just say it was done and do nothing? (meaning should we
actuallyi be ignoring this error on windows?)
  


Hmm, PGSemaphoreUnlock() actually ignore this error, only log that it 
happens. As I mentioned previously after it happens others connections 
were hung on update operations. What is strange we cannot reproduce this 
problem on Linux. But we can do this on Windows. What another 
information should we provide?


Regards, Marcin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[HACKERS] Re: [BUGS] BUG #3242: FATAL: could not unlock semaphore: error code 298

2007-04-20 Thread Marcin Waldowski

Magnus Hagander wrote:
Hmm, PGSemaphoreUnlock() actually ignore this error, only log that it 
happens.



No. It does ereport(FATAL) which terminates the backend.

  


Oh, now I see, sorry :) Indeed on this one connection we receive 
exception FATAL: could not unlock semaphore, after that rollback 
failed because of IO error during write to connection and that was 
caused by Connection reset by peer: socket write error.


As I mentioned previously after it happens others connections 
were hung on update operations. What is strange we cannot reproduce this 
problem on Linux. But we can do this on Windows. What another 
information should we provide?



Doesn't the postmaster restart all other backends due to the FATAL error?
Are you saying that you can no longer make new connections to the server,
or is the problem coming from that the aplpication doesn't like that the
server kicked out all connections?
  


No, we are sure that he didn't do that. As I mentioned above one 
connection was terminated, but other ones were hung on update 
operations. In this state it was possible to create new connection from 
PGAdmin and do some select and update operations. In addition I can say 
that we use only read-commited transactions and all operations are based 
on prepared statemens which are reused.



If you can produce a self-contained test-case, that would certainly make
debugging a lot easier. So if it's possible - but I realise that might not
be easy for a problem like this :-)
  


Our test case is our application, but unfortunately I cannot send it to 
you. I will think about test case, but I need to find a time for writing 
it :( I can reproduce error and provide all information you need from 
PostgreSQL. Please instruct me what to do :)


Regards, Marcin



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[HACKERS] Re: [BUGS] BUG #3242: FATAL: could not unlock semaphore: error code 298

2007-04-20 Thread Marcin Waldowski
Marcin Waldowski wrote: 


Doesn't the postmaster restart all other backends due to the FATAL 
error?
Are you saying that you can no longer make new connections to the 
server,

or is the problem coming from that the aplpication doesn't like that the
server kicked out all connections?
  


No, we are sure that he didn't do that. As I mentioned above one 
connection was terminated, but other ones were hung on update 
operations. In this state it was possible to create new connection 
from PGAdmin and do some select and update operations. In addition I 
can say that we use only read-commited transactions and all operations 
are based on prepared statemens which are reused.


It may mean that PGSemaphoreUnlock(PGSemaphore sema) was executed for 
unintended sema object. That's why PGSemaphoreUnlock() for unintended 
sema object failed and PGSemaphoreUnlock() for intended sema object 
*never* happens. That would explain why other connections were hung on 
update operations.


I think it sounds quite reasonable to be one of possibilities ;)

Regards, Marcin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq