[ https://issues.apache.org/jira/browse/THRIFT-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840532#comment-15840532 ]
Chandler May commented on THRIFT-4042: -------------------------------------- Clarified that the bug occurs when thrift is installed via setuptools, as a dependency of our project---but not when thrift is installed beforehand via pip. > ExtractionError when using accelerated thrift in a multiprocess test > -------------------------------------------------------------------- > > Key: THRIFT-4042 > URL: https://issues.apache.org/jira/browse/THRIFT-4042 > Project: Thrift > Issue Type: Bug > Components: Python - Library > Affects Versions: 0.10.0 > Reporter: Chandler May > > We recently switched to thrift 0.10.0 with accelerated protocols and started > getting sporadic errors in tests that use the multiprocessing module of the > form: > {code} > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > /usr/lib64/python2.7/multiprocessing/pool.py:250: in map > return self.map_async(func, iterable, chunksize).get() > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > self = <multiprocessing.pool.MapResult object at 0x2e06950>, timeout = None > def get(self, timeout=None): > self.wait(timeout) > if not self._ready: > raise TimeoutError > if self._success: > return self._value > else: > > raise self._value > E ExtractionError: Can't extract file(s) to egg cache > E > E The following error occurred while trying to extract file(s) to > the Python egg > E cache: > E > E [Errno 17] File exists: '/home/concrete/.cache/Python-Eggs' > E > E The Python egg cache directory is currently set to: > E > E /home/concrete/.cache/Python-Eggs > E > E Perhaps your account does not have write access to this > directory? You can > E change the cache directory by setting the PYTHON_EGG_CACHE > environment > E variable to point to an accessible directory. > /usr/lib64/python2.7/multiprocessing/pool.py:554: ExtractionError > {code} > This particular error arose from a test we wrote to isolate the issue. It is > of the form: > {code} > from multiprocessing import Pool > input_path = '/path/to/thrift_serialized_data' > > num_trials = 100 > > num_procs = 2 > > num_tasks = 4 > > > > for i in xrange(num_trials): > > pool = Pool(num_procs) > > results = pool.map(_deserialize, [input_path] * num_tasks) > for result in results: > assert result is True > {code} > where {{_deserialize}} is a function that reads thrift serialized objects > from a file and returns {{True}} on success. I can provide MWE if necessary > but it would take some time on my part. > I want to stress that this only happens when using the new accelerated > protocol in thrift 0.10.0 and only happens in {{python setup.py test}} in our > project when thrift has not been installed via *pip* (but has been installed > by {{python setup.py install}} in our project, which depends on thrift). We > are using pytest but I'm not sure whether that's important. At test time > thrift gets installed/unpacked as an egg in a local directory and gets a > locking error. I believe this is the same error as: > http://dev.list.galaxyproject.org/python-egg-cache-exists-error-td4656276.html > http://www.georgevreilly.com/blog/2015/01/28/PythonEggCache.html > I believe the documentation indicates this problem can be worked around by > setting {{zip_safe}} to {{False}} in {{setup.py}}: > http://setuptools.readthedocs.io/en/latest/setuptools.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)