New issue 2530: segfault with ThreadPool, pandas
https://bitbucket.org/pypy/pypy/issues/2530/segfault-with-threadpool-pandas
mattip:
I can install pandas using pip install (it takes a while to build)::
```
#!shell
pip install pandas
```
using the 5.7.1 release on Ubuntu 16.04, I get a segfault when running this
code (distilled from a pandas test). If I remove the ``from __future__ ...``
the segfault disappears (???)
```
#!python
from __future__ import division
from multiprocessing.pool import ThreadPool
from pandas import read_csv, read_table
from pandas.compat import BytesIO, range
import pandas.util.testing as tm
class TestMultithreadTests(object):
engine = 'python'
def read_csv(self, *args):
ret = read_csv(*args, engine='python')
return ret
def test_multithread_stringio_read_csv(self):
# see gh-11786
max_row_range = 10000
num_files = 100
bytes_to_df = [
'\n'.join(
['%d,%d,%d' % (i, i, i) for i in range(max_row_range)]
).encode() for j in range(num_files)]
files = [BytesIO(b) for b in bytes_to_df]
# read all files in many threads
pool = ThreadPool(8)
results = pool.map(self.read_csv, files)
first_result = results[0]
for result in results:
tm.assert_frame_equal(first_result, result)
if __name__ == '__main__':
t = TestMultithreadTests()
t.test_multithread_stringio_read_csv()
```
_______________________________________________
pypy-issue mailing list
[email protected]
https://mail.python.org/mailman/listinfo/pypy-issue