[ https://issues.apache.org/jira/browse/ARROW-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933714#comment-16933714 ]
Bryan Cutler commented on ARROW-6429: ------------------------------------- [~wesm] the issue with the timestamp test failures looks to be because calling {{to_pandas}} on a pyarrow ChunkedArray with a tz aware timestamp type removes the tz from the resulting dtype. The behavior before was a pyarrow Column keeps the tz but the pyarrow Array removes when converting to a numpy array. With Arrow 0.14.1 {code} In [4]: import pyarrow as pa ...: a = pa.array([1], type=pa.timestamp('us', tz='America/Los_Angeles')) ...: c = pa.Column.from_array('ts', a) In [5]: c.to_pandas() Out[5]: 0 1969-12-31 16:00:00.000001-08:00 Name: ts, dtype: datetime64[ns, America/Los_Angeles] In [6]: a.to_pandas() Out[6]: array(['1970-01-01T00:00:00.000001'], dtype='datetime64[us]') {code} With current master {code} >>> import pyarrow as pa >>> a = pa.array([1], type=pa.timestamp('us', tz='America/Los_Angeles')) >>> a.to_pandas() 0 1970-01-01 00:00:00.000001 dtype: datetime64[ns] {code} After manually adding the timezone back in the series dtype (and fixing the Java compilation), all tests pass and the spark integration run finished. I wasn't able to look into why the timezone is being removed though. Should I open up a jira for this? > [CI][Crossbow] Nightly spark integration job fails > -------------------------------------------------- > > Key: ARROW-6429 > URL: https://issues.apache.org/jira/browse/ARROW-6429 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration > Reporter: Neal Richardson > Assignee: Wes McKinney > Priority: Blocker > Labels: nightly, pull-request-available > Fix For: 0.15.0 > > Time Spent: 50m > Remaining Estimate: 0h > > See https://circleci.com/gh/ursa-labs/crossbow/2310. Either fix, skip job and > create followup Jira to unskip, or delete job. -- This message was sent by Atlassian Jira (v8.3.4#803005)