Chandler May created THRIFT-4042: ------------------------------------ Summary: ExtractionError when using accelerated thrift in a multiprocess test Key: THRIFT-4042 URL: https://issues.apache.org/jira/browse/THRIFT-4042 Project: Thrift Issue Type: Bug Components: Python - Library Affects Versions: 0.10.0 Reporter: Chandler May
We recently switched to thrift 0.10.0 with accelerated protocols and started getting sporadic errors in tests that use the multiprocessing module of the form: {code} _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /usr/lib64/python2.7/multiprocessing/pool.py:250: in map return self.map_async(func, iterable, chunksize).get() _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <multiprocessing.pool.MapResult object at 0x2e06950>, timeout = None def get(self, timeout=None): self.wait(timeout) if not self._ready: raise TimeoutError if self._success: return self._value else: > raise self._value E ExtractionError: Can't extract file(s) to egg cache E E The following error occurred while trying to extract file(s) to the Python egg E cache: E E [Errno 17] File exists: '/home/concrete/.cache/Python-Eggs' E E The Python egg cache directory is currently set to: E E /home/concrete/.cache/Python-Eggs E E Perhaps your account does not have write access to this directory? You can E change the cache directory by setting the PYTHON_EGG_CACHE environment E variable to point to an accessible directory. /usr/lib64/python2.7/multiprocessing/pool.py:554: ExtractionError {code} This particular error arose from a test we wrote to isolate the issue. It is of the form: {code} from multiprocessing import Pool input_path = '/path/to/thrift_serialized_data' num_trials = 100 num_procs = 2 num_tasks = 4 for i in xrange(num_trials): pool = Pool(num_procs) results = pool.map(_deserialize, [input_path] * num_tasks) for result in results: assert result is True {code} where {{_deserialize}} is a function that reads thrift serialized objects from a file and returns {{True}} on success. I can provide MWE if necessary but it would take some time on my part. I want to stress that this only happens when using the new accelerated protocol in thrift 0.10.0 and only happens in {{python setup.py test}} in our project when thrift is *not* installed already on the system. We are using pytest but I'm not sure whether that's important. At test time thrift gets installed/unpacked as an egg in a local directory and gets a locking error. I believe this is the same error as: http://dev.list.galaxyproject.org/python-egg-cache-exists-error-td4656276.html http://www.georgevreilly.com/blog/2015/01/28/PythonEggCache.html I believe the documentation indicates this problem can be worked around by setting {{zip_safe}} to {{False}} in {{setup.py}}: http://setuptools.readthedocs.io/en/latest/setuptools.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)