On Oct 6, 2013, at 12:51 , Norbert Hartl <norb...@hartl.name> wrote:

> I took some time to analyze my current problem with external semaphores. I 
> was just reluctant to raise the limit in my image because I want the problem 
> solved. I logged the management of external semaphores and discovered that 
> the table fills if a connection times out. 
> 
> The problem turns out to be in 
> 
> Socket>>#connectTo: hostAddress port: port waitForConnectionFor: timeout 
>       "Initiate a connection to the given port at the given host 
>       address. Waits until the connection is established or time outs."
>       self connectNonBlockingTo: hostAddress port: port.
>       self
>               waitForConnectionFor: timeout
>               ifTimedOut: [ConnectionTimedOut signal: 'Cannot connect to '
>                                       , (NetNameResolver stringFromAddress: 
> hostAddress) , ':' , port asString]
> 
> When a socket is created three external semaphores are registered in the 
> ExternalSemaphoreTable. If a connection times out the exception is thrown but 
> the Socket still has his resources attached. 
> 
> So e.g. in 
> 
> SocketStream class>>#openConnectionToHost: hostIP port: portNumber timeout: 
> timeout
>       | socket |
>       socket := Socket new.
>       socket connectTo: hostIP port: portNumber waitForConnectionFor: timeout.
>       ^self on: socket
> 
> it holds locally a socket (with semaphores registered) but on exception time 
> the reference to the socket gets lost and the semaphores stay registered. The 
> only way to unregister is on finalization time but I think it should work 
> better. So I would add a destroy before the exception is raised.
> 
> Socket>>#connectTo: hostAddress port: port waitForConnectionFor: timeout 
>       "Initiate a connection to the given port at the given host 
>       address. Waits until the connection is established or time outs."
>       self connectNonBlockingTo: hostAddress port: port.
>       self
>               waitForConnectionFor: timeout
>               ifTimedOut: [
>                       self destroy.
>                       ConnectionTimedOut signal: 'Cannot connect to '
>                                       , (NetNameResolver stringFromAddress: 
> hostAddress) , ':' , port asString]
> 
> I opened a ticket at https://pharo.fogbugz.com/f/cases/11797 but I'm not sure 
> how I am supposed to provide fixes made against a pharo2.0 image. Probably I 
> should fix this againt 3.0 but then I'm still a 2.0 user :) 
> 
> Norbert
Destroying the socket manually there would only be an optimization, right? 
It's still cleaned eventually by a finalization action?
Finalization all of a sudden stopping to work is an unlikely culprit of running 
out of external object space, load above what the size can handle, or actual 
leak(s) are much more probable. 

Thankfully, there are a limited number of places leaks could occur, for Socket 
you have improper uses of acceptFrom:/initialize: (multiple sends to the same 
socket).

In a stock image, there's an obscure way that might occur afaict, if class 
methods acceptFrom:/newXXX are called, and the primitive calls in the new 
instance's acceptFrom:/initialize: returns a handle with status InvalidSocket 
instead of nil.
Then, the external objects will not be unregistered before repeatWithGCIf: 
clears them by calling them again.
(No idea if this can ever really happen though...)

To flush out if such leaks are the cause, it might be useful to adjust 
initialize:/acceptFrom: with helpful logging/errors, as per
https://pharo.fogbugz.com/f/cases/5465/Prevent-accidental-semaphore-leaks-when-using-Socket-acceptFrom

There's a few other users of registerExternalObject:,
- InputEventFetcher >> #startUp (Used to leak when saving but not quitting due 
to missing symmetric unregister in #shutDown, should be fixed in 2.0 )
-  AsynchFile (none uses this anyways right, so didn't bother checking)
- NetNameResolver >> initializeNetwork - (This potentially leaked if network 
could ever be in an uninitialized state 2x in a session, but should be fixed in 
2.0)

As long as both the above are fixed in your image, checking if there are 
non-stock users of registerExternalObject:/initialize:/acceptFrom: is probably 
the best bet to find trouble spots without waiting for another crash.

Cheers,
Henry

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to