[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812704#comment-16812704 ]
shane knapp edited comment on SPARK-27389 at 4/8/19 6:56 PM: ------------------------------------------------------------- is this even really a valid timezone? plus, i really don't think this is a jenkins issue per se. i whipped up some java to check for this timezone, which is there: {code} $ java DisplayZoneAndOffSet|grep Pacific-New US/Pacific-New (UTC-07:00) {code} but it's definitely not a valid pytz timezone: {code} $ python2.7 -c 'import pytz; print "US/Pacific-New" in pytz.all_timezones' False {code} we're also running the latest version of pytz (according to pip at least): {code} $ pip2.7 install -U pytz Requirement already up-to-date: pytz in /home/anaconda/lib/python2.7/site-packages (2018.9) $ pip2.7 show pytz Name: pytz Version: 2018.9 Summary: World timezone definitions, modern and historical Home-page: http://pythonhosted.org/pytz Author: Stuart Bishop Author-email: stu...@stuartbishop.net License: MIT Location: /home/anaconda/lib/python2.7/site-packages Requires: Required-by: pandas {code} as a work-around... i *could* hack {code}/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py{code} to include US/Pacific-New on all of the workers. ;) was (Author: shaneknapp): is this even really a valid timezone? plus, i really don't think this is a jenkins issue per se. i whipped up some java to check for this timezone, which is there: {code} $ java DisplayZoneAndOffSet|grep Pacific-New US/Pacific-New (UTC-07:00) {code} but it's definitely not a valid pytz timezone: {code} $ python2.7 -c 'import pytz; print "US/Pacific-New" in pytz.all_timezones' False {code} as a work-around... i *could* hack {code}/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py{code} to include US/Pacific-New on all of the workers. ;) > pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'" > ----------------------------------------------------------------- > > Key: SPARK-27389 > URL: https://issues.apache.org/jira/browse/SPARK-27389 > Project: Spark > Issue Type: Task > Components: jenkins, PySpark > Affects Versions: 3.0.0 > Reporter: Imran Rashid > Assignee: shane knapp > Priority: Major > > I've seen a few odd PR build failures w/ an error in pyspark tests about > "UnknownTimeZoneError: 'US/Pacific-New'". eg. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull > A bit of searching tells me that US/Pacific-New probably isn't really > supposed to be a timezone at all: > https://mm.icann.org/pipermail/tz/2009-February/015448.html > I'm guessing that this is from some misconfiguration of jenkins. that said, > I can't figure out what is wrong. There does seem to be a timezone entry for > US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to > be there on every amp-jenkins-worker, so I dunno what that alone would cause > this failure sometime. > [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be > totally wrong here and it is really a pyspark problem. > Full Stack trace from the test failure: > {noformat} > ====================================================================== > ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 522, in test_to_pandas > pdf = self._to_pandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 517, in _to_pandas > return df.toPandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py", > line 2189, in toPandas > _check_series_convert_timestamps_local_tz(pdf[field.name], timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1891, in _check_series_convert_timestamps_local_tz > return _check_series_convert_timestamps_localize(s, None, timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1877, in _check_series_convert_timestamps_localize > lambda ts: ts.tz_localize(from_tz, > ambiguous=False).tz_convert(to_tz).tz_localize(None) > File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", > line 2294, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer > (pandas/lib.c:66124) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1878, in <lambda> > if ts is not pd.NaT else pd.NaT) > File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert > (pandas/tslib.c:13923) > File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ > (pandas/tslib.c:10447) > File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject > (pandas/tslib.c:27504) > File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz > (pandas/tslib.c:32362) > File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line > 178, in timezone > raise UnknownTimeZoneError(zone) > UnknownTimeZoneError: 'US/Pacific-New' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org