Not sure if this is actually a bug or expected behavior - I filed https://github.com/apache/arrow/issues/34210
On Wed, Feb 15, 2023 at 4:15 PM Li Jin <ice.xell...@gmail.com> wrote: > Hmm..something feels off here - I did the following experiment on Arrow 11 > and casting timestamp-naive to int64 is much faster than casting > timestamp-naive to timestamp-utc: > > In [16]: %time table.cast(schema_int) > CPU times: user 114 µs, sys: 30 µs, total: 144 µs > *Wall time: 231 µs* > Out[16]: > pyarrow.Table > time: int64 > ---- > time: [[0,1,2,3,4,...,99999995,99999996,99999997,99999998,99999999]] > > In [17]: %time table.cast(schema_tz) > CPU times: user 119 ms, sys: 140 ms, total: 260 ms > *Wall time: 259 ms* > Out[17]: > pyarrow.Table > time: timestamp[ns, tz=UTC] > ---- > time: [[1970-01-01 00:00:00.000000000,1970-01-01 > 00:00:00.000000001,1970-01-01 00:00:00.000000002,1970-01-01 > 00:00:00.000000003,1970-01-01 00:00:00.000000004,...,1970-01-01 > 00:00:00.099999995,1970-01-01 00:00:00.099999996,1970-01-01 > 00:00:00.099999997,1970-01-01 00:00:00.099999998,1970-01-01 > 00:00:00.099999999]] > > In [18]: table > Out[18]: > pyarrow.Table > time: timestamp[ns] > ---- > time: [[1970-01-01 00:00:00.000000000,1970-01-01 > 00:00:00.000000001,1970-01-01 00:00:00.000000002,1970-01-01 > 00:00:00.000000003,1970-01-01 00:00:00.000000004,...,1970-01-01 > 00:00:00.099999995,1970-01-01 00:00:00.099999996,1970-01-01 > 00:00:00.099999997,1970-01-01 00:00:00.099999998,1970-01-01 > 00:00:00.099999999]] > > On Wed, Feb 15, 2023 at 2:52 PM Rok Mihevc <rok.mih...@gmail.com> wrote: > >> I'm not sure about (1) but I'm pretty sure for (2) doing a cast of >> tz-aware >> timestamp to tz-naive should be a metadata-only change. >> >> On Wed, Feb 15, 2023 at 4:19 PM Li Jin <ice.xell...@gmail.com> wrote: >> >> > Asking (2) because IIUC this is a metadata operation that could be zero >> > copy but I am not sure if this is actually the case. >> > >> > On Wed, Feb 15, 2023 at 10:17 AM Li Jin <ice.xell...@gmail.com> wrote: >> > >> > > Hello! >> > > >> > > I have some questions about type casting memory usage with pyarrow >> Table. >> > > Let's say I have a pyarrow Table with 100 columns. >> > > >> > > (1) if I want to cast n columns to a different type (e.g., float to >> int). >> > > What is the smallest memory overhead that I can do? (memory overhead >> of 1 >> > > column, n columns or 100 columns?) >> > > >> > > (2) if I want to cast n timestamp columns from tz-native to tz-UTC. >> What >> > > is the smallest memory overhead that I can do? (0, 1 column, n >> columns or >> > > 100 columns?) >> > > >> > > Thanks! >> > > Li >> > > >> > >> >