[ https://issues.apache.org/jira/browse/ARROW-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358990#comment-16358990 ]
ASF GitHub Bot commented on ARROW-2121: --------------------------------------- robertnishihara commented on issue #1581: ARROW-2121: [Python] Handle object arrays directly in pandas serializer. URL: https://github.com/apache/arrow/pull/1581#issuecomment-364573786 Some performance numbers. The numbers are somewhat variable if you run the benchmarks multiple times. ```python import pyarrow as pa import pandas as pd df = pd.DataFrame(data={str(i): [i, str(i)] for i in range(10 ** 6)}) ``` Before this PR ```python context = pa.pandas_serialization_context() %time s = pa.serialize(df, context=context).to_buffer() # 570ms %time d = pa.deserialize(s, context=context) # 485ms %timeit s = pa.serialize(df, context=context).to_buffer() # 482ms %timeit d = pa.deserialize(s, context=context) # 376ms ``` After this PR ```python %time s = pa.serialize(df).to_buffer() # 577ms %time d = pa.deserialize(s) # 672ms %timeit s = pa.serialize(df).to_buffer() # 467ms %timeit d = pa.deserialize(s) # 349ms ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Consider special casing object arrays in pandas serializers. > ------------------------------------------------------------ > > Key: ARROW-2121 > URL: https://issues.apache.org/jira/browse/ARROW-2121 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Reporter: Robert Nishihara > Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)