Thanks for your answer! 

I read the link [0] and run such benchmarks, finding that slow performance on 
allocate_tuple still exists now. On my cpu, it was 0.65s for cpython3.8, and 
7.6s for PyPy 3.9. I am wondering if this could lead to the undesired low 
performance on pandas.

As for my code pattern, I am trying to add support for PyPy in my simulation 
library, and the workflow of my project was like that:

1. Read data from excel files
2. Perform CPU-intensive computations with an object-oriented program involving 
a number of objects.
3. Write simulation data to a SQLite database by Pandas, just using `pd.to_sql`.

When running on PyPy, step 2 was more than 8 times faster than CPython 
interpreter. However, step 3 were 3~5 times slower. 

I have found that it was not a problem caused by the sqlite3 library inside 
PyPy, because on a pure-python SQLite3 program,  PyPy was 1.2x~2x faster than 
CPython. So this problem might be due to the C-API performance problem when 
calling pandas.

As far as I know, (1) the easiest way is to rewrite a pure-python table IO 
library instead of pandas, because there were just few functions in pandas that 
had been imported into my project. (2) But if one day the performance of pandas 
on PyPy could be better (about 0.5x~0.8x of that on CPython), the better idea 
should be continuing using Pandas, because most of the python programmer knows 
it.

Could you please give me some suggestion about what I should do to solve this 
problem? Should I choose way (1) to implement a pure-python table library, or 
had better wait for (2)? Also I am interested in PyPy project itself, and 
wondering if improving performance for `Py_BuildValue` is feasible. Thanks!

Hou
_______________________________________________
pypy-dev mailing list -- pypy-dev@python.org
To unsubscribe send an email to pypy-dev-le...@python.org
https://mail.python.org/mailman3/lists/pypy-dev.python.org/
Member address: arch...@mail-archive.com

Reply via email to