Re: [I] [Python] Massive performance deterioration with pandas 2.1.1 vs. 1.5.3 when calling pa.Table.from_pandas() [arrow]

via GitHub Sat, 14 Oct 2023 06:07:12 -0700


jorisvandenbossche commented on issue #38260:
URL: https://github.com/apache/arrow/issues/38260#issuecomment-1762889496


   I was going to comment yesterday that this is quite likely an issue on the 
pandas side, which is in the meantime confirmed by the comments above. And 
coincidentally, I was now looking at a perf regression report in pandas 
(https://github.com/pandas-dev/pandas/issues/55245) that shows the same culprit 
as in the py-spy image from @amoeba above: `Manager.iget`, which is what is 
used under the hood to access a column.
   
   So it's indeed repeated column lookup in wide dataframes that has become 
significantly slower. It's a regression in 2.1.0 -> 2.1.1, and caused by 
https://github.com/pandas-dev/pandas/pull/55008#issuecomment-1759810383


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Python] Massive performance deterioration with pandas 2.1.1 vs. 1.5.3 when calling pa.Table.from_pandas() [arrow]

Reply via email to