[
https://issues.apache.org/jira/browse/ARROW-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17660933#comment-17660933
]
Rok Mihevc commented on ARROW-3909:
-----------------------------------
This issue has been migrated to [issue
#20522|https://github.com/apache/arrow/issues/20522] on GitHub. Please see the
[migration documentation|https://github.com/apache/arrow/issues/14542] for
further details.
> [Python] Table.from_pandas call that seemingly should zero copy does not
> ------------------------------------------------------------------------
>
> Key: ARROW-3909
> URL: https://issues.apache.org/jira/browse/ARROW-3909
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Reporter: Wes McKinney
> Priority: Major
> Fix For: 0.12.0
>
>
> While doing some performance testing, I noticed that a {{Table.from_pandas}}
> call that ought to be zero-copy / free was taking 50ms
> {code}
> import pandas as pd
> import pyarrow as pa
> import numpy as np
> K = 1000
> N = 50000000
> df = pd.DataFrame({'ints': np.tile(np.arange(K), N // K)})
> table = pa.Table.from_pandas(df)
> {code}
> I see
> {code}
> In [14]: timeit table = pa.Table.from_pandas(df)
> 51.9 ms ± 751 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
> {code}
> I haven't determined what's going on (is it counting nulls?), and initial
> attempts to get a Flamegraph produced a bunch of "unknown" entries
--
This message was sent by Atlassian Jira
(v8.20.10#820010)