The wait appears to be the culprit. I've opened a new case for the
quick fix (to error out immediately) and attached a patch. Could you
or Fred test this patch with the test case and let us know the result?

The JIRA issue/patch is available at
https://issues.apache.org/jira/browse/FTPSERVER-360.

Regards,
Sai Pullabhotla





On Fri, Mar 26, 2010 at 7:11 AM, David Latorre <dvl...@gmail.com> wrote:
> 2010/3/26 Niklas Gustavsson <nik...@protocol7.com>:
>> On Fri, Mar 26, 2010 at 9:50 AM, Fred Moore <fred.moor...@gmail.com> wrote:
>>> 1\ Priority of passive port sharing ehnancement: Niklas survey shows that we
>>> are indeed in good company here, but it's problably worth having a better
>>> look at this anyway, there might be good technical reasons that led all the
>>> other teams not to support this or it may turn up that it's "simply" because
>>> it's somewhat hard to develop and test.
>>
>> After this discussion I'm significantly less thrilled at implementing
>> shared passive ports :-)
>
> Shared passive ports would be a nice feature if they aren't too hard
> to implement. Among the opensource servers, I think coloradoFTP -a
> NIO-based java FTPServer under the LGPL license- offered this (since
> their data connections also use async sockets this shouldn't be too
> hard for them, but I don't know if they solved the use case depicted
> by Sai: when there are several sessions open from the same IP)  but it
> seems that commercial solutions offer this and more...
>
>
>
>>> 2\ Quick fix for 1.0.x codebase: pushing a 40x to the client  when no
>>> passive port is available (or probably better: no passive port is available
>>> within X seconds) it's probably something we need to do anyway.
>>
>> Thinking some more about this, I'm personally now convinced that
>> should simple return an error (not waiting). I'm not sure what the
>> best reply code should be, but "425 Can't open data connection" seems
>> fitting although not specified as valid return from the PASV command.
>>
>>> 3\ Suspect race condition: the problem description for the originally
>>> reported http://issues.apache.org/jira/browse/FTPSERVER-359 (see also repro
>>> code attached to the jira) actually hints also to something different as
>>> well, in fact we state that a few (say 20) parallel threads issuing LISTs in
>>> passive mode are able to "lock-up" the server forever. Questions:
>>>
>>> 3.1\ Is this interely explained by this thread discussion? (I don't think
>>> so: the server should *always* be able to recover)
>>
>> Agreed, the server should always recover from a situation like this.
>> After looking into how to fix item 2, we need to rerun your tests and
>> make sure we always survive.
>
> Thinking about this issue my understanding of the problem is as follows:
>
> 1. We have a number of connections to FTPServer >  the Executor
> threadpool max  size (I think it is 16) sending  the PASV command.
>
> 2. The first one of them requests the only available port and gets it.
> Now the port is in use by a server socket and any subsequent call to
> requestPassivePort will end up invoking wait().
>
> 3. The thread that processed this PASV command is now available and a
> new PASV request is assigned to it.
>
> 4. Now all threads are trying to request a passive port, but since
> there are no ports available  all the threads in the OrderedThreadPool
> get blocked by the wait() method.
>
> I wonder if we are suffering a similar problem in any other cases; if
> it was so, we might need to delay the opening of the ServerSocket
> until the LIST (or GET or PUT...) commands are executed.
>
> I hope I made myself clear and that my understanding was right.
>
>
>>> 3.2\ Would this be fixed by a quick fix as per 2\? (likely, but it's sort of
>>> like using nukes to for mowing the lawn)
>>
>> I really have no idea, but I think we should fix 2 first and then make
>> sure we handle your test case.
>>
>>> In short my current position can be stated as follows: I think that
>>> FTPSERVER-359 has a different root cause from what we discussed, the problem
>>>  impact is not completely known at the moment but it appears to *severely*
>>> affect the server availabily... having just one port is an easy way of
>>> reproducing it (but not the cause of it).
>>
>> Agreed.
>>
>> /niklas
>>
>

Reply via email to