jorisvandenbossche commented on issue #38260: URL: https://github.com/apache/arrow/issues/38260#issuecomment-1762889496
I was going to comment yesterday that this is quite likely an issue on the pandas side, which is in the meantime confirmed by the comments above. And coincidentally, I was now looking at a perf regression report in pandas (https://github.com/pandas-dev/pandas/issues/55245) that shows the same culprit as in the py-spy image from @amoeba above: `Manager.iget`, which is what is used under the hood to access a column. So it's indeed repeated column lookup in wide dataframes that has become significantly slower. It's a regression in 2.1.0 -> 2.1.1, and caused by https://github.com/pandas-dev/pandas/pull/55008#issuecomment-1759810383 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
