Thanks for your answer! I read the link [0] and run such benchmarks, finding that slow performance on allocate_tuple still exists now. On my cpu, it was 0.65s for cpython3.8, and 7.6s for PyPy 3.9. I am wondering if this could lead to the undesired low performance on pandas.
As for my code pattern, I am trying to add support for PyPy in my simulation library, and the workflow of my project was like that: 1. Read data from excel files 2. Perform CPU-intensive computations with an object-oriented program involving a number of objects. 3. Write simulation data to a SQLite database by Pandas, just using `pd.to_sql`. When running on PyPy, step 2 was more than 8 times faster than CPython interpreter. However, step 3 were 3~5 times slower. I have found that it was not a problem caused by the sqlite3 library inside PyPy, because on a pure-python SQLite3 program, PyPy was 1.2x~2x faster than CPython. So this problem might be due to the C-API performance problem when calling pandas. As far as I know, (1) the easiest way is to rewrite a pure-python table IO library instead of pandas, because there were just few functions in pandas that had been imported into my project. (2) But if one day the performance of pandas on PyPy could be better (about 0.5x~0.8x of that on CPython), the better idea should be continuing using Pandas, because most of the python programmer knows it. Could you please give me some suggestion about what I should do to solve this problem? Should I choose way (1) to implement a pure-python table library, or had better wait for (2)? Also I am interested in PyPy project itself, and wondering if improving performance for `Py_BuildValue` is feasible. Thanks! Hou _______________________________________________ pypy-dev mailing list -- pypy-dev@python.org To unsubscribe send an email to pypy-dev-le...@python.org https://mail.python.org/mailman3/lists/pypy-dev.python.org/ Member address: arch...@mail-archive.com