Oh thanks that could be a workaround! I thought pa tables are supposed to
be immutable , is there a safe way to just change the metadata?

On Wed, Feb 15, 2023 at 5:44 PM Rok Mihevc <rok.mih...@gmail.com> wrote:

> Well that's suboptimal. As a workaround I suppose you could just change the
> metadata if the array is timezone aware.
>
> On Wed, Feb 15, 2023 at 10:37 PM Li Jin <ice.xell...@gmail.com> wrote:
>
> > Oh found this comment:
> >
> >
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_cast_temporal.cc#L156
> >
> >
> >
> > On Wed, Feb 15, 2023 at 4:23 PM Li Jin <ice.xell...@gmail.com> wrote:
> >
> > > Not sure if this is actually a bug or expected behavior - I filed
> > > https://github.com/apache/arrow/issues/34210
> > >
> > > On Wed, Feb 15, 2023 at 4:15 PM Li Jin <ice.xell...@gmail.com> wrote:
> > >
> > >> Hmm..something feels off here - I did the following experiment on
> Arrow
> > >> 11 and casting timestamp-naive to int64 is much faster than casting
> > >> timestamp-naive to timestamp-utc:
> > >>
> > >> In [16]: %time table.cast(schema_int)
> > >> CPU times: user 114 µs, sys: 30 µs, total: 144 µs
> > >> *Wall time: 231 µs*
> > >> Out[16]:
> > >> pyarrow.Table
> > >> time: int64
> > >> ----
> > >> time: [[0,1,2,3,4,...,99999995,99999996,99999997,99999998,99999999]]
> > >>
> > >> In [17]: %time table.cast(schema_tz)
> > >> CPU times: user 119 ms, sys: 140 ms, total: 260 ms
> > >> *Wall time: 259 ms*
> > >> Out[17]:
> > >> pyarrow.Table
> > >> time: timestamp[ns, tz=UTC]
> > >> ----
> > >> time: [[1970-01-01 00:00:00.000000000,1970-01-01
> > >> 00:00:00.000000001,1970-01-01 00:00:00.000000002,1970-01-01
> > >> 00:00:00.000000003,1970-01-01 00:00:00.000000004,...,1970-01-01
> > >> 00:00:00.099999995,1970-01-01 00:00:00.099999996,1970-01-01
> > >> 00:00:00.099999997,1970-01-01 00:00:00.099999998,1970-01-01
> > >> 00:00:00.099999999]]
> > >>
> > >> In [18]: table
> > >> Out[18]:
> > >> pyarrow.Table
> > >> time: timestamp[ns]
> > >> ----
> > >> time: [[1970-01-01 00:00:00.000000000,1970-01-01
> > >> 00:00:00.000000001,1970-01-01 00:00:00.000000002,1970-01-01
> > >> 00:00:00.000000003,1970-01-01 00:00:00.000000004,...,1970-01-01
> > >> 00:00:00.099999995,1970-01-01 00:00:00.099999996,1970-01-01
> > >> 00:00:00.099999997,1970-01-01 00:00:00.099999998,1970-01-01
> > >> 00:00:00.099999999]]
> > >>
> > >> On Wed, Feb 15, 2023 at 2:52 PM Rok Mihevc <rok.mih...@gmail.com>
> > wrote:
> > >>
> > >>> I'm not sure about (1) but I'm pretty sure for (2) doing a cast of
> > >>> tz-aware
> > >>> timestamp to tz-naive should be a metadata-only change.
> > >>>
> > >>> On Wed, Feb 15, 2023 at 4:19 PM Li Jin <ice.xell...@gmail.com>
> wrote:
> > >>>
> > >>> > Asking (2) because IIUC this is a metadata operation that could be
> > zero
> > >>> > copy but I am not sure if this is actually the case.
> > >>> >
> > >>> > On Wed, Feb 15, 2023 at 10:17 AM Li Jin <ice.xell...@gmail.com>
> > wrote:
> > >>> >
> > >>> > > Hello!
> > >>> > >
> > >>> > > I have some questions about type casting memory usage with
> pyarrow
> > >>> Table.
> > >>> > > Let's say I have a pyarrow Table with 100 columns.
> > >>> > >
> > >>> > > (1) if I want to cast n columns to a different type (e.g., float
> to
> > >>> int).
> > >>> > > What is the smallest memory overhead that I can do? (memory
> > overhead
> > >>> of 1
> > >>> > > column, n columns or 100 columns?)
> > >>> > >
> > >>> > > (2) if I want to cast n timestamp columns from tz-native to
> tz-UTC.
> > >>> What
> > >>> > > is the smallest memory overhead that I can do? (0, 1 column, n
> > >>> columns or
> > >>> > > 100 columns?)
> > >>> > >
> > >>> > > Thanks!
> > >>> > > Li
> > >>> > >
> > >>> >
> > >>>
> > >>
> >
>

Reply via email to