[ https://issues.apache.org/jira/browse/THRIFT-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791733#comment-14791733 ]
ASF GitHub Bot commented on THRIFT-2918: ---------------------------------------- Github user bufferoverflow commented on the pull request: https://github.com/apache/thrift/pull/606#issuecomment-140994555 committed > Race condition in Python TProcessPoolServer test > ------------------------------------------------ > > Key: THRIFT-2918 > URL: https://issues.apache.org/jira/browse/THRIFT-2918 > Project: Thrift > Issue Type: Bug > Components: Python - Library > Environment: openSUSE 13.2 > Python 2.7.8 (default, Sep 30 2014, 15:34:38) [GCC] on linux2 > Reporter: Jens Geyer > Assignee: Jens Geyer > Fix For: 0.9.3 > > Attachments: THRIFT-2918-python-test-timeouts.patch > > > {{make check}} gets stuck very reproducible in my VM at Test run #218: > {code} > Test run #217: (includes gen-py-default) Server=TProcessPoolServer, > Proto=accel, zlib=False, SSL=False > Testing server TProcessPoolServer: /usr/bin/python ./TestServer.py > --genpydir=gen-py-default --protocol=accel --port=9090 TProcessPoolServer > Testing client: /usr/bin/python ./TestClient.py --genpydir=gen-py-default > --protocol=accel --port=9090 --transport=buffered > ...testException(Safe) > testException(Xception) > testException(throw_undeclared) > ............... > ---------------------------------------------------------------------- > Ran 18 tests in 0.563s > OK > Giving TProcessPoolServer (proto=accel,zlib=False,ssl=False) an extra 3 > seconds for childprocesses to terminate via alarm > Terminating worker: <Process(Process-1, started daemon)> > Terminating worker: <Process(Process-2, started daemon)> > Terminating worker: <Process(Process-3, started daemon)> > Terminating worker: <Process(Process-4, started daemon)> > Terminating worker: <Process(Process-5, started daemon)> > Requesting server to stop() > OK: Finished (includes gen-py-default) TProcessPoolServer / accel proto / > zlib=False / SSL=False. 217 combinations tested. > Test run #218: (includes gen-py-default) Server=TProcessPoolServer, > Proto=accel, zlib=False, SSL=True > Testing server TProcessPoolServer: /usr/bin/python ./TestServer.py > --genpydir=gen-py-default --protocol=accel --port=9090 --ssl > TProcessPoolServer > Testing client: /usr/bin/python ./TestClient.py --genpydir=gen-py-default > --protocol=accel --port=9090 --ssl --transport=buffered > ...testException(Safe) > testException(Xception) > testException(throw_undeclared) > ..........Terminating worker: <Process(Process-1, started daemon)> > Terminating worker: <Process(Process-2, started daemon)> > Terminating worker: <Process(Process-3, started daemon)> > Terminating worker: <Process(Process-4, started daemon)> > Terminating worker: <Process(Process-5, started daemon)> > Requesting server to stop() > {code} > After fiddling a bit around with it I got it to work by increasing the > alarm() timeout from 2 seconds to 4 seconds. > I'm not a Python expert, but the code looks somewhat interesting to me: > - The server code starts the workers, but some piece of code outside of the > server is responsible for terminating them. Is that really idiomatic in > Python or just bad design? > - The Condition() object used in TProcessPoolsServer.py should probably be > replaced by an Event() object. Especially, as Condition.wait() [seems to have > it's own > perils|http://stackoverflow.com/questions/24137480/threading-condition-waittimeout-ignores-threading-condition-notify] > and Event is much easier to use. > - Calling Condition.aquire() without a matching release() within a {{while > True:}} loop looks also not very convincing to me. AFAIK the second call to > aquire() will block, if that ever happens (it did not in my tests). > The bad news is, that neither of the proposed changes above had any effect on > the race conditions, except increasing the timeout - but that is merely a > workaround, not a solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)