>From what you've written I am not sure where the problem is. If you can
point us to some unit tests or some other code that is not working, we can
help with the "pandas" way of doing things. If changes are needed in
PySpark this would be good motivation.

On Tue, Apr 25, 2017 at 6:40 PM Bryan Cutler <cutl...@gmail.com> wrote:

> Thanks Wes.  I think I've managed to confuse myself pretty good over this,
> I'm not sure where the fix should be.  Spark, by default, will store a
> timestamp internally with python "time.mktime", which is in local time and
> not UTC, I believe.  If there is a tzinfo object, Spark will use
> "calendar.timegm" instead, and I get the correct values.  Maybe this is a
> Spark issue?
>
> On Tue, Apr 25, 2017 at 11:52 AM, Wes McKinney <wesmck...@gmail.com>
> wrote:
>
> > hi Bryan,
> >
> > You will want to create DataFrame objects having datetime64[ns] columns.
> > There are some examples in the pyarrow test suite:
> >
> > https://github.com/apache/arrow/blob/master/python/
> > pyarrow/tests/test_convert_pandas.py#L324
> >
> > You can convert an array of datetime.datetime objects to datetime64[ns]
> > dtype with pandas.to_datetime
> >
> > In [15]: df = pd.DataFrame(data)
> >
> > In [16]: df['timestamp_t'] = pd.to_datetime(df.timestamp_t)
> >
> > In [17]: df.dtypes
> > Out[17]:
> > timestamp_t    datetime64[ns]
> > dtype: object
> >
> > pd.to_datetime does not seem to work with the NaiveTZ object here (if
> Jeff
> > Reback is reading, maybe he can explain why); why do you need that for
> > tz-naive data? If that's something we absolutely need fixed in pandas, we
> > should try to do it right away since the 0.20 rc is pending right now.
> >
> > - Wes
> >
> > On Tue, Apr 25, 2017 at 1:38 PM, Bryan Cutler <cutl...@gmail.com> wrote:
> >
> > > I am writing a unit test to compare that a Pandas DataFrame made by
> Arrow
> > > is equal to one constructed directly with data.  The timestamp values
> > are a
> > > Python datetime object with a timezone tzinfo object.  When I compare
> the
> > > results, the values are equal but the schema is not.  Using arrow the
> > type
> > > is "datetime64[ns]" and without it is "object."  Without a tzinfo, the
> > > types match but I do need it there for the conversion with Arrow
> data.  I
> > > could just replace the tzinfo for the Pandas DataFrame, it is a naive
> > > timezone with utcoffset=None.  Does anyone know another way to produce
> > > compatible types?  I do need the data to be compatible with Spark too.
> > > Hopefully this makes sense, I could attach some code if that would
> help,
> > > thanks! Here is a sample of the data:
> > >
> > > class NaiveTZ(tzinfo):
> > >     def utcoffset(self, date_time):
> > >         return None
> > >
> > >     def dst(self, date_time):
> > >         return None
> > >
> > > data = {"timestamp_t": [datetime(2011, 1, 1, 1, 1, 1,
> tzinfo=NaiveTZ())]}
> > >
> > > pd.DataFrame(data)
> > >
> >
>

Reply via email to