Well, the problem is that time zones are really finicky comparing
Spark (which uses a localtime interpretation of timestamps without
time zone) and Arrow (which has naive timestamps -- a concept similar
but different from the SQL concept TIMESTAMP WITHOUT TIME ZONE -- and
tz-aware timestamps). So somewhere there is a time zone being stripped
or applied/localized which may result in the transferred data to/from
Spark being shifted by the time zone offset. I think it's important
that we determine what the problem is -- if it's a problem that has to
be fixed in Arrow (and it's not clear to me that it is) it's worth
spending some time to understand what's going on to avoid the
possibility of patch release on account of this.

On Sun, Jul 19, 2020 at 6:12 PM Neal Richardson
<neal.p.richard...@gmail.com> wrote:
>
> If it’s a display problem, should it block the release?
>
> Sent from my iPhone
>
> > On Jul 19, 2020, at 3:57 PM, Wes McKinney <wesmck...@gmail.com> wrote:
> >
> > I opened https://issues.apache.org/jira/browse/ARROW-9525 about the
> > display problem. My guess is that there are other problems lurking
> > here
> >
> >> On Sun, Jul 19, 2020 at 5:54 PM Wes McKinney <wesmck...@gmail.com> wrote:
> >>
> >> hi Bryan,
> >>
> >> This is a display bug
> >>
> >> In [6]: arr = pa.array([0, 1, 2], type=pa.timestamp('ns',
> >> 'America/Los_Angeles'))
> >>
> >> In [7]: arr.view('int64')
> >> Out[7]:
> >> <pyarrow.lib.Int64Array object at 0x7fd1b8aaef30>
> >> [
> >>  0,
> >>  1,
> >>  2
> >> ]
> >>
> >> In [8]: arr
> >> Out[8]:
> >> <pyarrow.lib.TimestampArray object at 0x7fd1b8aae6e0>
> >> [
> >>  1970-01-01 00:00:00.000000000,
> >>  1970-01-01 00:00:00.000000001,
> >>  1970-01-01 00:00:00.000000002
> >> ]
> >>
> >> In [9]: arr.to_pandas()
> >> Out[9]:
> >> 0             1969-12-31 16:00:00-08:00
> >> 1   1969-12-31 16:00:00.000000001-08:00
> >> 2   1969-12-31 16:00:00.000000002-08:00
> >> dtype: datetime64[ns, America/Los_Angeles]
> >>
> >> the repr of TimestampArray doesn't take into account the timezone
> >>
> >> In [10]: arr[0]
> >> Out[10]: <pyarrow.TimestampScalar: Timestamp('1969-12-31
> >> 16:00:00-0800', tz='America/Los_Angeles')>
> >>
> >> So if it's incorrect, the problem is happening somewhere before or
> >> while the StructArray is being created. If I had to guess it's caused
> >> by the tzinfo of the datetime.datetime values not being handled in the
> >> way that they were before
> >>
> >>> On Sun, Jul 19, 2020 at 5:19 PM Wes McKinney <wesmck...@gmail.com> wrote:
> >>>
> >>> Well this is not good and pretty disappointing given that we had nearly a 
> >>> month to sort through the implications of Micah’s patch. We should try to 
> >>> resolve this ASAP
> >>>
> >>> On Sun, Jul 19, 2020 at 5:10 PM Bryan Cutler <cutl...@gmail.com> wrote:
> >>>>
> >>>> +0 (non-binding)
> >>>>
> >>>> I ran verification script for binaries and then source, as below, and 
> >>>> both
> >>>> look good
> >>>> ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_SOURCE=1 TEST_CPP=1
> >>>> TEST_PYTHON=1 TEST_JAVA=1 TEST_INTEGRATION_CPP=1 TEST_INTEGRATION_JAVA=1
> >>>> dev/release/verify-release-candidate.sh source 1.0.0 1
> >>>>
> >>>> I tried to patch Spark locally to verify the recent change in nested
> >>>> timestamps and was not able to get things working quite right, but I'm 
> >>>> not
> >>>> sure if the problem is in Spark, Arrow or my patch - hence my vote of +0.
> >>>>
> >>>> Here is what I'm seeing
> >>>>
> >>>> ```
> >>>> (Input as datetime)
> >>>> datetime.datetime(2018, 3, 10, 0, 0)
> >>>> datetime.datetime(2018, 3, 15, 0, 0)
> >>>>
> >>>> (Struct Array)
> >>>> -- is_valid: all not null
> >>>> -- child 0 type: timestamp[us, tz=America/Los_Angeles]
> >>>>  [
> >>>>    2018-03-10 00:00:00.000000,
> >>>>    2018-03-10 00:00:00.000000
> >>>>  ]
> >>>> -- child 1 type: timestamp[us, tz=America/Los_Angeles]
> >>>>  [
> >>>>    2018-03-15 00:00:00.000000,
> >>>>    2018-03-15 00:00:00.000000
> >>>>  ]
> >>>>
> >>>> (Flattened Arrays)
> >>>> types [TimestampType(timestamp[us, tz=America/Los_Angeles]),
> >>>> TimestampType(timestamp[us, tz=America/Los_Angeles])]
> >>>> [<pyarrow.lib.TimestampArray object at 0x7ffbbd88f520>
> >>>> [
> >>>>  2018-03-10 00:00:00.000000,
> >>>>  2018-03-10 00:00:00.000000
> >>>> ], <pyarrow.lib.TimestampArray object at 0x7ffba958be50>
> >>>> [
> >>>>  2018-03-15 00:00:00.000000,
> >>>>  2018-03-15 00:00:00.000000
> >>>> ]]
> >>>>
> >>>> (Pandas Conversion)
> >>>> [
> >>>> 0   2018-03-09 16:00:00-08:00
> >>>> 1   2018-03-09 16:00:00-08:00
> >>>> dtype: datetime64[ns, America/Los_Angeles],
> >>>>
> >>>> 0   2018-03-14 17:00:00-07:00
> >>>> 1   2018-03-14 17:00:00-07:00
> >>>> dtype: datetime64[ns, America/Los_Angeles]]
> >>>> ```
> >>>>
> >>>> Based on output of existing a correct timestamp udf, it looks like the
> >>>> pyarrow Struct Array values are wrong and that's carried through the
> >>>> flattened arrays, causing the Pandas values to have a negative offset.
> >>>>
> >>>> Here is output from a working udf with timestamp, the pyarrow Array
> >>>> displays in UTC time, I believe.
> >>>>
> >>>> ```
> >>>> (Timestamp Array)
> >>>> type timestamp[us, tz=America/Los_Angeles]
> >>>> [
> >>>>  [
> >>>>    1969-01-01 09:01:01.000000
> >>>>  ]
> >>>> ]
> >>>>
> >>>> (Pandas Conversion)
> >>>> 0   1969-01-01 01:01:01-08:00
> >>>> Name: _0, dtype: datetime64[ns, America/Los_Angeles]
> >>>>
> >>>> (Timezone Localized)
> >>>> 0   1969-01-01 01:01:01
> >>>> Name: _0, dtype: datetime64[ns]
> >>>> ```
> >>>>
> >>>> I'll have to dig in further at another time and debug where the values go
> >>>> wrong.
> >>>>
> >>>> On Sat, Jul 18, 2020 at 9:51 PM Micah Kornfield <emkornfi...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> +1 (binding)
> >>>>>
> >>>>> Ran wheel and binary tests on ubuntu 19.04
> >>>>>
> >>>>> On Fri, Jul 17, 2020 at 2:25 PM Neal Richardson <
> >>>>> neal.p.richard...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> +1 (binding)
> >>>>>>
> >>>>>> In addition to the usual verification on
> >>>>>> https://github.com/apache/arrow/pull/7787, I've successfully staged the
> >>>>> R
> >>>>>> binary artifacts on Windows (
> >>>>>> https://github.com/r-windows/rtools-packages/pull/126), macOS (
> >>>>>> https://github.com/autobrew/homebrew-core/pull/12), and Linux (
> >>>>>> https://github.com/ursa-labs/arrow-r-nightly/actions/runs/172977277)
> >>>>> using
> >>>>>> the release candidate.
> >>>>>>
> >>>>>> And I agree with the judgment about skipping a JS release artifact. 
> >>>>>> Looks
> >>>>>> like there hasn't been a code change since October so there's no point.
> >>>>>>
> >>>>>> Neal
> >>>>>>
> >>>>>> On Fri, Jul 17, 2020 at 10:37 AM Wes McKinney <wesmck...@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>>> I see the JS failures as well. I think it is a failure localized to
> >>>>>>> newer Node versions since our JavaScript CI works fine. I don't think
> >>>>>>> it should block the release given the lack of development activity in
> >>>>>>> JavaScript [1] -- if any JS devs are concerned about publishing an
> >>>>>>> artifact then we can skip pushing it to NPM
> >>>>>>>
> >>>>>>> @Ryan it seems it may be something environment related on your
> >>>>>>> machine, I'm on Ubuntu 18.04 and have not seen this.
> >>>>>>>
> >>>>>>> On
> >>>>>>>
> >>>>>>>>  * Python 3.8 wheel's tests are failed. 3.5, 3.6 and 3.7
> >>>>>>>>    are passed. It seems that -larrow and -larrow_python for
> >>>>>>>>    Cython are failed.
> >>>>>>>
> >>>>>>> I suspect this is related to
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>> https://github.com/apache/arrow/commit/120c21f4bf66d2901b3a353a1f67bac3c3355924#diff-0f69784b44040448d17d0e4e8a641fe8
> >>>>>>> ,
> >>>>>>> but I don't think it's a blocking issue
> >>>>>>>
> >>>>>>> [1]: https://github.com/apache/arrow/commits/master/js
> >>>>>>>
> >>>>>>> On Fri, Jul 17, 2020 at 9:42 AM Ryan Murray <rym...@dremio.com> wrote:
> >>>>>>>>
> >>>>>>>> I've tested Java and it looks good. However the verify script keeps
> >>>>> on
> >>>>>>>> bailing with protobuf related errors:
> >>>>>>>> 'cpp/build/orc_ep-prefix/src/orc_ep-build/c++/src/orc_proto.pb.cc'
> >>>>> and
> >>>>>>>> friends cant find protobuf definitions. A bit odd as cmake can see
> >>>>>>> protobuf
> >>>>>>>> headers and builds directly off master work just fine. Has anyone
> >>>>> else
> >>>>>>>> experienced this? I am on ubutnu 18.04
> >>>>>>>>
> >>>>>>>> On Fri, Jul 17, 2020 at 10:49 AM Antoine Pitrou <anto...@python.org>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> +1 (binding).  I tested on Ubuntu 18.04.
> >>>>>>>>>
> >>>>>>>>> * Wheels verification went fine.
> >>>>>>>>> * Source verification went fine with CUDA enabled and
> >>>>>>>>> TEST_INTEGRATION_JS=0 TEST_JS=0.
> >>>>>>>>>
> >>>>>>>>> I didn't test the binaries.
> >>>>>>>>>
> >>>>>>>>> Regards
> >>>>>>>>>
> >>>>>>>>> Antoine.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Le 17/07/2020 à 03:41, Krisztián Szűcs a écrit :
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> I would like to propose the second release candidate (RC1) of
> >>>>>> Apache
> >>>>>>>>>> Arrow version 1.0.0.
> >>>>>>>>>> This is a major release consisting of 826 resolved JIRA
> >>>>> issues[1].
> >>>>>>>>>>
> >>>>>>>>>> The verification of the first release candidate (RC0) has failed
> >>>>>>> [0], and
> >>>>>>>>>> the packaging scripts were unable to produce two wheels. Compared
> >>>>>>>>>> to RC0 this release candidate includes additional patches for the
> >>>>>>>>>> following bugs: ARROW-9506, ARROW-9504, ARROW-9497,
> >>>>>>>>>> ARROW-9500, ARROW-9499.
> >>>>>>>>>>
> >>>>>>>>>> This release candidate is based on commit:
> >>>>>>>>>> bc0649541859095ee77d03a7b891ea8d6e2fd641 [2]
> >>>>>>>>>>
> >>>>>>>>>> The source release rc1 is hosted at [3].
> >>>>>>>>>> The binary artifacts are hosted at [4][5][6][7].
> >>>>>>>>>> The changelog is located at [8].
> >>>>>>>>>>
> >>>>>>>>>> Please download, verify checksums and signatures, run the unit
> >>>>>> tests,
> >>>>>>>>>> and vote on the release. See [9] for how to validate a release
> >>>>>>> candidate.
> >>>>>>>>>>
> >>>>>>>>>> The vote will be open for at least 72 hours.
> >>>>>>>>>>
> >>>>>>>>>> [ ] +1 Release this as Apache Arrow 1.0.0
> >>>>>>>>>> [ ] +0
> >>>>>>>>>> [ ] -1 Do not release this as Apache Arrow 1.0.0 because...
> >>>>>>>>>>
> >>>>>>>>>> [0]:
> >>>>>>> https://github.com/apache/arrow/pull/7778#issuecomment-659065370
> >>>>>>>>>> [1]:
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%201.0.0
> >>>>>>>>>> [2]:
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>> https://github.com/apache/arrow/tree/bc0649541859095ee77d03a7b891ea8d6e2fd641
> >>>>>>>>>> [3]:
> >>>>>>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-1.0.0-rc1
> >>>>>>>>>> [4]: https://bintray.com/apache/arrow/centos-rc/1.0.0-rc1
> >>>>>>>>>> [5]: https://bintray.com/apache/arrow/debian-rc/1.0.0-rc1
> >>>>>>>>>> [6]: https://bintray.com/apache/arrow/python-rc/1.0.0-rc1
> >>>>>>>>>> [7]: https://bintray.com/apache/arrow/ubuntu-rc/1.0.0-rc1
> >>>>>>>>>> [8]:
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>> https://github.com/apache/arrow/blob/bc0649541859095ee77d03a7b891ea8d6e2fd641/CHANGELOG.md
> >>>>>>>>>> [9]:
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>

Reply via email to