[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813896#comment-16813896 ] Bryan Cutler commented on SPARK-27389: -- Thanks [~shaneknapp] for the fix. I couldn't come up with any idea why this was happening all of a sudden either, but at least we are up and running again! > pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'" > - > > Key: SPARK-27389 > URL: https://issues.apache.org/jira/browse/SPARK-27389 > Project: Spark > Issue Type: Task > Components: jenkins, PySpark >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Assignee: shane knapp >Priority: Major > > I've seen a few odd PR build failures w/ an error in pyspark tests about > "UnknownTimeZoneError: 'US/Pacific-New'". eg. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull > A bit of searching tells me that US/Pacific-New probably isn't really > supposed to be a timezone at all: > https://mm.icann.org/pipermail/tz/2009-February/015448.html > I'm guessing that this is from some misconfiguration of jenkins. that said, > I can't figure out what is wrong. There does seem to be a timezone entry for > US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to > be there on every amp-jenkins-worker, so I dunno what that alone would cause > this failure sometime. > [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be > totally wrong here and it is really a pyspark problem. > Full Stack trace from the test failure: > {noformat} > == > ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) > -- > Traceback (most recent call last): > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 522, in test_to_pandas > pdf = self._to_pandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 517, in _to_pandas > return df.toPandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py", > line 2189, in toPandas > _check_series_convert_timestamps_local_tz(pdf[field.name], timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1891, in _check_series_convert_timestamps_local_tz > return _check_series_convert_timestamps_localize(s, None, timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1877, in _check_series_convert_timestamps_localize > lambda ts: ts.tz_localize(from_tz, > ambiguous=False).tz_convert(to_tz).tz_localize(None) > File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", > line 2294, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer > (pandas/lib.c:66124) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1878, in > if ts is not pd.NaT else pd.NaT) > File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert > (pandas/tslib.c:13923) > File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ > (pandas/tslib.c:10447) > File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject > (pandas/tslib.c:27504) > File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz > (pandas/tslib.c:32362) > File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line > 178, in timezone > raise UnknownTimeZoneError(zone) > UnknownTimeZoneError: 'US/Pacific-New' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813880#comment-16813880 ] shane knapp commented on SPARK-27389: - btw, the total impact of this problem "only" failed 73 builds over the past seven days and was limited to two workers, amp-jenkins-worker-03 and -05. i still haven't figured out *why* things broke... it wasn't an errant package install by a build as i have the anaconda dirs locked down and the only way to add/update packages there is to use sudo. > pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'" > - > > Key: SPARK-27389 > URL: https://issues.apache.org/jira/browse/SPARK-27389 > Project: Spark > Issue Type: Task > Components: jenkins, PySpark >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Priority: Major > > I've seen a few odd PR build failures w/ an error in pyspark tests about > "UnknownTimeZoneError: 'US/Pacific-New'". eg. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull > A bit of searching tells me that US/Pacific-New probably isn't really > supposed to be a timezone at all: > https://mm.icann.org/pipermail/tz/2009-February/015448.html > I'm guessing that this is from some misconfiguration of jenkins. that said, > I can't figure out what is wrong. There does seem to be a timezone entry for > US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to > be there on every amp-jenkins-worker, so I dunno what that alone would cause > this failure sometime. > [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be > totally wrong here and it is really a pyspark problem. > Full Stack trace from the test failure: > {noformat} > == > ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) > -- > Traceback (most recent call last): > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 522, in test_to_pandas > pdf = self._to_pandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 517, in _to_pandas > return df.toPandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py", > line 2189, in toPandas > _check_series_convert_timestamps_local_tz(pdf[field.name], timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1891, in _check_series_convert_timestamps_local_tz > return _check_series_convert_timestamps_localize(s, None, timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1877, in _check_series_convert_timestamps_localize > lambda ts: ts.tz_localize(from_tz, > ambiguous=False).tz_convert(to_tz).tz_localize(None) > File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", > line 2294, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer > (pandas/lib.c:66124) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1878, in > if ts is not pd.NaT else pd.NaT) > File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert > (pandas/tslib.c:13923) > File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ > (pandas/tslib.c:10447) > File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject > (pandas/tslib.c:27504) > File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz > (pandas/tslib.c:32362) > File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line > 178, in timezone > raise UnknownTimeZoneError(zone) > UnknownTimeZoneError: 'US/Pacific-New' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813873#comment-16813873 ] shane knapp commented on SPARK-27389: - we are most definitely good to go... this build is running on amp-jenkins-worker-05 and the python2.7 pyspark.sql.tests.test_dataframe tests successfully passed: https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/5718 this build was previously failing on the same worker w/the TZ issue: https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/5699/console > pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'" > - > > Key: SPARK-27389 > URL: https://issues.apache.org/jira/browse/SPARK-27389 > Project: Spark > Issue Type: Task > Components: jenkins, PySpark >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Priority: Major > > I've seen a few odd PR build failures w/ an error in pyspark tests about > "UnknownTimeZoneError: 'US/Pacific-New'". eg. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull > A bit of searching tells me that US/Pacific-New probably isn't really > supposed to be a timezone at all: > https://mm.icann.org/pipermail/tz/2009-February/015448.html > I'm guessing that this is from some misconfiguration of jenkins. that said, > I can't figure out what is wrong. There does seem to be a timezone entry for > US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to > be there on every amp-jenkins-worker, so I dunno what that alone would cause > this failure sometime. > [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be > totally wrong here and it is really a pyspark problem. > Full Stack trace from the test failure: > {noformat} > == > ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) > -- > Traceback (most recent call last): > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 522, in test_to_pandas > pdf = self._to_pandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 517, in _to_pandas > return df.toPandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py", > line 2189, in toPandas > _check_series_convert_timestamps_local_tz(pdf[field.name], timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1891, in _check_series_convert_timestamps_local_tz > return _check_series_convert_timestamps_localize(s, None, timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1877, in _check_series_convert_timestamps_localize > lambda ts: ts.tz_localize(from_tz, > ambiguous=False).tz_convert(to_tz).tz_localize(None) > File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", > line 2294, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer > (pandas/lib.c:66124) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1878, in > if ts is not pd.NaT else pd.NaT) > File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert > (pandas/tslib.c:13923) > File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ > (pandas/tslib.c:10447) > File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject > (pandas/tslib.c:27504) > File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz > (pandas/tslib.c:32362) > File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line > 178, in timezone > raise UnknownTimeZoneError(zone) > UnknownTimeZoneError: 'US/Pacific-New' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813812#comment-16813812 ] shane knapp commented on SPARK-27389: - ok, this should be fixed now... i got all the workers to recognize US/Pacific-New w/python2.7 and the python/run-tests script now passes! {noformat} -bash-4.1$ python/run-tests --python-executables=python2.7 Running PySpark tests. Output is in /home/jenkins/src/spark/python/unit-tests.log Will test against the following Python executables: ['python2.7'] Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming'] ...trimming a bunch to get to the failing tests... Finished test(python2.7): pyspark.sql.tests.test_dataframe (32s) ... 2 tests were skipped ...yay! it passed! now skipping more output to get to the end... Tests passed in 797 seconds Skipped tests in pyspark.sql.tests.test_dataframe with python2.7: test_create_dataframe_required_pandas_not_found (pyspark.sql.tests.test_dataframe.DataFrameTests) ... skipped 'Required Pandas was found.' test_to_pandas_required_pandas_not_found (pyspark.sql.tests.test_dataframe.DataFrameTests) ... skipped 'Required Pandas was found.' {noformat} turns out that a couple of workers were missing the US/Pacific-New tzinfo file in the pytz libdir. a quick scp + python2.7 -m compileall later and things seem to be happy! i'll leave this open for now, and if anyone notices other builds failing in this way please link to them here. > pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'" > - > > Key: SPARK-27389 > URL: https://issues.apache.org/jira/browse/SPARK-27389 > Project: Spark > Issue Type: Task > Components: jenkins, PySpark >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Priority: Major > > I've seen a few odd PR build failures w/ an error in pyspark tests about > "UnknownTimeZoneError: 'US/Pacific-New'". eg. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull > A bit of searching tells me that US/Pacific-New probably isn't really > supposed to be a timezone at all: > https://mm.icann.org/pipermail/tz/2009-February/015448.html > I'm guessing that this is from some misconfiguration of jenkins. that said, > I can't figure out what is wrong. There does seem to be a timezone entry for > US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to > be there on every amp-jenkins-worker, so I dunno what that alone would cause > this failure sometime. > [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be > totally wrong here and it is really a pyspark problem. > Full Stack trace from the test failure: > {noformat} > == > ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) > -- > Traceback (most recent call last): > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 522, in test_to_pandas > pdf = self._to_pandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 517, in _to_pandas > return df.toPandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py", > line 2189, in toPandas > _check_series_convert_timestamps_local_tz(pdf[field.name], timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1891, in _check_series_convert_timestamps_local_tz > return _check_series_convert_timestamps_localize(s, None, timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1877, in _check_series_convert_timestamps_localize > lambda ts: ts.tz_localize(from_tz, > ambiguous=False).tz_convert(to_tz).tz_localize(None) > File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", > line 2294, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer > (pandas/lib.c:66124) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1878, in > if ts is not pd.NaT else pd.NaT) > File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert > (pandas/tslib.c:13923) > File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ > (pandas/tslib.c:10447) > File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject > (pandas/tslib.c:27504) > File "pa
[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813625#comment-16813625 ] shane knapp commented on SPARK-27389: - done. {noformat} $ pssh -h jenkins_workers.txt "cp /root/python/__init__.py /home/anaconda/lib/python2.7/site-packages/pytz/__init__.py" [1] 10:06:19 [SUCCESS] amp-jenkins-worker-02 [2] 10:06:19 [SUCCESS] amp-jenkins-worker-06 [3] 10:06:19 [SUCCESS] amp-jenkins-worker-05 [4] 10:06:19 [SUCCESS] amp-jenkins-worker-03 [5] 10:06:19 [SUCCESS] amp-jenkins-worker-01 [6] 10:06:19 [SUCCESS] amp-jenkins-worker-04 $ ssh amp-jenkins-worker-03 "grep Pacific-New /home/anaconda/lib/python2.7/site-packages/pytz/__init__.py" 'US/Pacific-New', 'US/Pacific-New', {noformat} > pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'" > - > > Key: SPARK-27389 > URL: https://issues.apache.org/jira/browse/SPARK-27389 > Project: Spark > Issue Type: Task > Components: jenkins, PySpark >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Priority: Major > > I've seen a few odd PR build failures w/ an error in pyspark tests about > "UnknownTimeZoneError: 'US/Pacific-New'". eg. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull > A bit of searching tells me that US/Pacific-New probably isn't really > supposed to be a timezone at all: > https://mm.icann.org/pipermail/tz/2009-February/015448.html > I'm guessing that this is from some misconfiguration of jenkins. that said, > I can't figure out what is wrong. There does seem to be a timezone entry for > US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to > be there on every amp-jenkins-worker, so I dunno what that alone would cause > this failure sometime. > [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be > totally wrong here and it is really a pyspark problem. > Full Stack trace from the test failure: > {noformat} > == > ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) > -- > Traceback (most recent call last): > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 522, in test_to_pandas > pdf = self._to_pandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 517, in _to_pandas > return df.toPandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py", > line 2189, in toPandas > _check_series_convert_timestamps_local_tz(pdf[field.name], timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1891, in _check_series_convert_timestamps_local_tz > return _check_series_convert_timestamps_localize(s, None, timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1877, in _check_series_convert_timestamps_localize > lambda ts: ts.tz_localize(from_tz, > ambiguous=False).tz_convert(to_tz).tz_localize(None) > File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", > line 2294, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer > (pandas/lib.c:66124) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1878, in > if ts is not pd.NaT else pd.NaT) > File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert > (pandas/tslib.c:13923) > File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ > (pandas/tslib.c:10447) > File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject > (pandas/tslib.c:27504) > File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz > (pandas/tslib.c:32362) > File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line > 178, in timezone > raise UnknownTimeZoneError(zone) > UnknownTimeZoneError: 'US/Pacific-New' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813620#comment-16813620 ] shane knapp commented on SPARK-27389: - ok, i am going to go all cowboy on this and manually update: {noformat} /home/anaconda/lib/python2.7/site-packages/pytz/__init__.py {noformat} and add the US/Pacific-New TZ. this should definitely fix the problem, and if it doesn't, i can very quickly roll back. > pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'" > - > > Key: SPARK-27389 > URL: https://issues.apache.org/jira/browse/SPARK-27389 > Project: Spark > Issue Type: Task > Components: jenkins, PySpark >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Priority: Major > > I've seen a few odd PR build failures w/ an error in pyspark tests about > "UnknownTimeZoneError: 'US/Pacific-New'". eg. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull > A bit of searching tells me that US/Pacific-New probably isn't really > supposed to be a timezone at all: > https://mm.icann.org/pipermail/tz/2009-February/015448.html > I'm guessing that this is from some misconfiguration of jenkins. that said, > I can't figure out what is wrong. There does seem to be a timezone entry for > US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to > be there on every amp-jenkins-worker, so I dunno what that alone would cause > this failure sometime. > [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be > totally wrong here and it is really a pyspark problem. > Full Stack trace from the test failure: > {noformat} > == > ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) > -- > Traceback (most recent call last): > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 522, in test_to_pandas > pdf = self._to_pandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 517, in _to_pandas > return df.toPandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py", > line 2189, in toPandas > _check_series_convert_timestamps_local_tz(pdf[field.name], timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1891, in _check_series_convert_timestamps_local_tz > return _check_series_convert_timestamps_localize(s, None, timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1877, in _check_series_convert_timestamps_localize > lambda ts: ts.tz_localize(from_tz, > ambiguous=False).tz_convert(to_tz).tz_localize(None) > File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", > line 2294, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer > (pandas/lib.c:66124) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1878, in > if ts is not pd.NaT else pd.NaT) > File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert > (pandas/tslib.c:13923) > File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ > (pandas/tslib.c:10447) > File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject > (pandas/tslib.c:27504) > File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz > (pandas/tslib.c:32362) > File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line > 178, in timezone > raise UnknownTimeZoneError(zone) > UnknownTimeZoneError: 'US/Pacific-New' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812883#comment-16812883 ] shane knapp commented on SPARK-27389: - no, it appears to be random. [https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.4-test-sbt-hadoop-2.7/365/] [https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.4-test-sbt-hadoop-2.7/364/] these two identical builds ran w/the same python/java/whathaveyou setup on the *same physical worker*. one passes, one fails w/the date thing. > pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'" > - > > Key: SPARK-27389 > URL: https://issues.apache.org/jira/browse/SPARK-27389 > Project: Spark > Issue Type: Task > Components: jenkins, PySpark >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Priority: Major > > I've seen a few odd PR build failures w/ an error in pyspark tests about > "UnknownTimeZoneError: 'US/Pacific-New'". eg. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull > A bit of searching tells me that US/Pacific-New probably isn't really > supposed to be a timezone at all: > https://mm.icann.org/pipermail/tz/2009-February/015448.html > I'm guessing that this is from some misconfiguration of jenkins. that said, > I can't figure out what is wrong. There does seem to be a timezone entry for > US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to > be there on every amp-jenkins-worker, so I dunno what that alone would cause > this failure sometime. > [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be > totally wrong here and it is really a pyspark problem. > Full Stack trace from the test failure: > {noformat} > == > ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) > -- > Traceback (most recent call last): > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 522, in test_to_pandas > pdf = self._to_pandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 517, in _to_pandas > return df.toPandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py", > line 2189, in toPandas > _check_series_convert_timestamps_local_tz(pdf[field.name], timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1891, in _check_series_convert_timestamps_local_tz > return _check_series_convert_timestamps_localize(s, None, timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1877, in _check_series_convert_timestamps_localize > lambda ts: ts.tz_localize(from_tz, > ambiguous=False).tz_convert(to_tz).tz_localize(None) > File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", > line 2294, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer > (pandas/lib.c:66124) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1878, in > if ts is not pd.NaT else pd.NaT) > File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert > (pandas/tslib.c:13923) > File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ > (pandas/tslib.c:10447) > File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject > (pandas/tslib.c:27504) > File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz > (pandas/tslib.c:32362) > File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line > 178, in timezone > raise UnknownTimeZoneError(zone) > UnknownTimeZoneError: 'US/Pacific-New' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812877#comment-16812877 ] Bryan Cutler commented on SPARK-27389: -- [~shaneknapp], I had a couple of successful tests with worker-4. Do you know if the problem consistent on certain workers or just random on all of them? > pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'" > - > > Key: SPARK-27389 > URL: https://issues.apache.org/jira/browse/SPARK-27389 > Project: Spark > Issue Type: Task > Components: jenkins, PySpark >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Priority: Major > > I've seen a few odd PR build failures w/ an error in pyspark tests about > "UnknownTimeZoneError: 'US/Pacific-New'". eg. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull > A bit of searching tells me that US/Pacific-New probably isn't really > supposed to be a timezone at all: > https://mm.icann.org/pipermail/tz/2009-February/015448.html > I'm guessing that this is from some misconfiguration of jenkins. that said, > I can't figure out what is wrong. There does seem to be a timezone entry for > US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to > be there on every amp-jenkins-worker, so I dunno what that alone would cause > this failure sometime. > [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be > totally wrong here and it is really a pyspark problem. > Full Stack trace from the test failure: > {noformat} > == > ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) > -- > Traceback (most recent call last): > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 522, in test_to_pandas > pdf = self._to_pandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 517, in _to_pandas > return df.toPandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py", > line 2189, in toPandas > _check_series_convert_timestamps_local_tz(pdf[field.name], timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1891, in _check_series_convert_timestamps_local_tz > return _check_series_convert_timestamps_localize(s, None, timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1877, in _check_series_convert_timestamps_localize > lambda ts: ts.tz_localize(from_tz, > ambiguous=False).tz_convert(to_tz).tz_localize(None) > File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", > line 2294, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer > (pandas/lib.c:66124) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1878, in > if ts is not pd.NaT else pd.NaT) > File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert > (pandas/tslib.c:13923) > File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ > (pandas/tslib.c:10447) > File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject > (pandas/tslib.c:27504) > File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz > (pandas/tslib.c:32362) > File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line > 178, in timezone > raise UnknownTimeZoneError(zone) > UnknownTimeZoneError: 'US/Pacific-New' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812841#comment-16812841 ] shane knapp commented on SPARK-27389: - also, java8 appears to believe i'm in the US/Pacific (not Pacific-New) TZ: {preformat} [sknapp@amp-jenkins-worker-04 ~]$ cat tz.java import java.util.TimeZone; public class tz { public static void main(String[] args) { TimeZone tz = TimeZone.getDefault(); System.out.println(tz.getID()); } } [sknapp@amp-jenkins-worker-04 ~]$ javac tz.java [sknapp@amp-jenkins-worker-04 ~]$ java tz US/Pacific [sknapp@amp-jenkins-worker-04 ~]$ java -version java version "1.8.0_191" Java(TM) SE Runtime Environment (build 1.8.0_191-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode) {preformat} > pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'" > - > > Key: SPARK-27389 > URL: https://issues.apache.org/jira/browse/SPARK-27389 > Project: Spark > Issue Type: Task > Components: jenkins, PySpark >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Priority: Major > > I've seen a few odd PR build failures w/ an error in pyspark tests about > "UnknownTimeZoneError: 'US/Pacific-New'". eg. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull > A bit of searching tells me that US/Pacific-New probably isn't really > supposed to be a timezone at all: > https://mm.icann.org/pipermail/tz/2009-February/015448.html > I'm guessing that this is from some misconfiguration of jenkins. that said, > I can't figure out what is wrong. There does seem to be a timezone entry for > US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to > be there on every amp-jenkins-worker, so I dunno what that alone would cause > this failure sometime. > [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be > totally wrong here and it is really a pyspark problem. > Full Stack trace from the test failure: > {noformat} > == > ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) > -- > Traceback (most recent call last): > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 522, in test_to_pandas > pdf = self._to_pandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 517, in _to_pandas > return df.toPandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py", > line 2189, in toPandas > _check_series_convert_timestamps_local_tz(pdf[field.name], timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1891, in _check_series_convert_timestamps_local_tz > return _check_series_convert_timestamps_localize(s, None, timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1877, in _check_series_convert_timestamps_localize > lambda ts: ts.tz_localize(from_tz, > ambiguous=False).tz_convert(to_tz).tz_localize(None) > File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", > line 2294, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer > (pandas/lib.c:66124) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1878, in > if ts is not pd.NaT else pd.NaT) > File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert > (pandas/tslib.c:13923) > File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ > (pandas/tslib.c:10447) > File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject > (pandas/tslib.c:27504) > File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz > (pandas/tslib.c:32362) > File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line > 178, in timezone > raise UnknownTimeZoneError(zone) > UnknownTimeZoneError: 'US/Pacific-New' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812838#comment-16812838 ] shane knapp commented on SPARK-27389: - well, according to [~bryanc]: """ >From the stacktrace, it looks like it's getting this from >"spark.sql.session.timeZone" which defaults to Java.util >TimeZone.getDefault.getID() """ here are the versions of tzdata* installed on the workers having this problem: {noformat} tzdata-2019a-1.el6.noarch tzdata-java-2019a-1.el6.noarch {noformat} looks like we're on the latest, but US/Pacific-New is STILL showing up in /usr/share/zoneinfo/US. when i dig in to the java tzdata package, i am finding the following: {noformat} $ strings /usr/share/javazi/ZoneInfoMappings ...bunch of cruft deleted... US/Pacific America/Los_Angeles US/Pacific-New America/Los_Angeles {noformat} so, it appears to me that: 1) the OS still sees US/Pacific-New via tzdata 2) java still sees US/Pacific-New via tzdata-java 3) python has no idea WTF US/Pacific-New is and (occasionally) barfs during pyspark unit tests so, should i go ahead and manually hack lib/python2.7/site-packages/pytz/__init__.py and add 'US/Pacific-New' which will fix the symptom (w/o fixing the cause)? other than doing that, i'm actually stumped as to why this literally just started failing. > pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'" > - > > Key: SPARK-27389 > URL: https://issues.apache.org/jira/browse/SPARK-27389 > Project: Spark > Issue Type: Task > Components: jenkins, PySpark >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Priority: Major > > I've seen a few odd PR build failures w/ an error in pyspark tests about > "UnknownTimeZoneError: 'US/Pacific-New'". eg. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull > A bit of searching tells me that US/Pacific-New probably isn't really > supposed to be a timezone at all: > https://mm.icann.org/pipermail/tz/2009-February/015448.html > I'm guessing that this is from some misconfiguration of jenkins. that said, > I can't figure out what is wrong. There does seem to be a timezone entry for > US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to > be there on every amp-jenkins-worker, so I dunno what that alone would cause > this failure sometime. > [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be > totally wrong here and it is really a pyspark problem. > Full Stack trace from the test failure: > {noformat} > == > ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) > -- > Traceback (most recent call last): > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 522, in test_to_pandas > pdf = self._to_pandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 517, in _to_pandas > return df.toPandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py", > line 2189, in toPandas > _check_series_convert_timestamps_local_tz(pdf[field.name], timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1891, in _check_series_convert_timestamps_local_tz > return _check_series_convert_timestamps_localize(s, None, timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1877, in _check_series_convert_timestamps_localize > lambda ts: ts.tz_localize(from_tz, > ambiguous=False).tz_convert(to_tz).tz_localize(None) > File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", > line 2294, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer > (pandas/lib.c:66124) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1878, in > if ts is not pd.NaT else pd.NaT) > File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert > (pandas/tslib.c:13923) > File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ > (pandas/tslib.c:10447) > File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject > (pandas/tslib.c:27504) > File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz > (pandas/tslib.c:32362) > File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line > 178, in timezone > raise UnknownTimeZoneError(zone
[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812802#comment-16812802 ] Sean Owen commented on SPARK-27389: --- I wonder what has created /usr/share/zoneinfo/US/Pacific-New ? AFAICT that shouldn't be there. It was updated at about the same time -- not just that one TZ but the whole thing. Doesn't sound like it's pytz; that's just the Python timezone library. Can't really be Pyspark; this isn't something in the Spark code at all. Here's a complaint about tzdata providing this from a few years ago: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=815200 Removed in 2018d-1? https://launchpad.net/ubuntu/+source/tzdata/+changelog > pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'" > - > > Key: SPARK-27389 > URL: https://issues.apache.org/jira/browse/SPARK-27389 > Project: Spark > Issue Type: Task > Components: jenkins, PySpark >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Priority: Major > > I've seen a few odd PR build failures w/ an error in pyspark tests about > "UnknownTimeZoneError: 'US/Pacific-New'". eg. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull > A bit of searching tells me that US/Pacific-New probably isn't really > supposed to be a timezone at all: > https://mm.icann.org/pipermail/tz/2009-February/015448.html > I'm guessing that this is from some misconfiguration of jenkins. that said, > I can't figure out what is wrong. There does seem to be a timezone entry for > US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to > be there on every amp-jenkins-worker, so I dunno what that alone would cause > this failure sometime. > [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be > totally wrong here and it is really a pyspark problem. > Full Stack trace from the test failure: > {noformat} > == > ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) > -- > Traceback (most recent call last): > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 522, in test_to_pandas > pdf = self._to_pandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 517, in _to_pandas > return df.toPandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py", > line 2189, in toPandas > _check_series_convert_timestamps_local_tz(pdf[field.name], timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1891, in _check_series_convert_timestamps_local_tz > return _check_series_convert_timestamps_localize(s, None, timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1877, in _check_series_convert_timestamps_localize > lambda ts: ts.tz_localize(from_tz, > ambiguous=False).tz_convert(to_tz).tz_localize(None) > File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", > line 2294, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer > (pandas/lib.c:66124) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1878, in > if ts is not pd.NaT else pd.NaT) > File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert > (pandas/tslib.c:13923) > File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ > (pandas/tslib.c:10447) > File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject > (pandas/tslib.c:27504) > File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz > (pandas/tslib.c:32362) > File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line > 178, in timezone > raise UnknownTimeZoneError(zone) > UnknownTimeZoneError: 'US/Pacific-New' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812789#comment-16812789 ] shane knapp commented on SPARK-27389: - updating tzdata didn't do anything noticeable: {noformat} [sknapp@amp-jenkins-worker-04 ~]$ python2.7 -c 'import pytz; print "US/Pacific-New" in pytz.all_timezones' False [sknapp@amp-jenkins-worker-04 ~]$ which python2.7 /home/anaconda/bin/python2.7 {noformat} > pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'" > - > > Key: SPARK-27389 > URL: https://issues.apache.org/jira/browse/SPARK-27389 > Project: Spark > Issue Type: Task > Components: jenkins, PySpark >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Priority: Major > > I've seen a few odd PR build failures w/ an error in pyspark tests about > "UnknownTimeZoneError: 'US/Pacific-New'". eg. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull > A bit of searching tells me that US/Pacific-New probably isn't really > supposed to be a timezone at all: > https://mm.icann.org/pipermail/tz/2009-February/015448.html > I'm guessing that this is from some misconfiguration of jenkins. that said, > I can't figure out what is wrong. There does seem to be a timezone entry for > US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to > be there on every amp-jenkins-worker, so I dunno what that alone would cause > this failure sometime. > [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be > totally wrong here and it is really a pyspark problem. > Full Stack trace from the test failure: > {noformat} > == > ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) > -- > Traceback (most recent call last): > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 522, in test_to_pandas > pdf = self._to_pandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 517, in _to_pandas > return df.toPandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py", > line 2189, in toPandas > _check_series_convert_timestamps_local_tz(pdf[field.name], timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1891, in _check_series_convert_timestamps_local_tz > return _check_series_convert_timestamps_localize(s, None, timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1877, in _check_series_convert_timestamps_localize > lambda ts: ts.tz_localize(from_tz, > ambiguous=False).tz_convert(to_tz).tz_localize(None) > File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", > line 2294, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer > (pandas/lib.c:66124) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1878, in > if ts is not pd.NaT else pd.NaT) > File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert > (pandas/tslib.c:13923) > File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ > (pandas/tslib.c:10447) > File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject > (pandas/tslib.c:27504) > File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz > (pandas/tslib.c:32362) > File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line > 178, in timezone > raise UnknownTimeZoneError(zone) > UnknownTimeZoneError: 'US/Pacific-New' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812785#comment-16812785 ] shane knapp commented on SPARK-27389: - [~srowen] sure, i can update the tzdata package on the centos workers... let's see if that does anything. this will take ~5 mins. > pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'" > - > > Key: SPARK-27389 > URL: https://issues.apache.org/jira/browse/SPARK-27389 > Project: Spark > Issue Type: Task > Components: jenkins, PySpark >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Priority: Major > > I've seen a few odd PR build failures w/ an error in pyspark tests about > "UnknownTimeZoneError: 'US/Pacific-New'". eg. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull > A bit of searching tells me that US/Pacific-New probably isn't really > supposed to be a timezone at all: > https://mm.icann.org/pipermail/tz/2009-February/015448.html > I'm guessing that this is from some misconfiguration of jenkins. that said, > I can't figure out what is wrong. There does seem to be a timezone entry for > US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to > be there on every amp-jenkins-worker, so I dunno what that alone would cause > this failure sometime. > [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be > totally wrong here and it is really a pyspark problem. > Full Stack trace from the test failure: > {noformat} > == > ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) > -- > Traceback (most recent call last): > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 522, in test_to_pandas > pdf = self._to_pandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 517, in _to_pandas > return df.toPandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py", > line 2189, in toPandas > _check_series_convert_timestamps_local_tz(pdf[field.name], timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1891, in _check_series_convert_timestamps_local_tz > return _check_series_convert_timestamps_localize(s, None, timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1877, in _check_series_convert_timestamps_localize > lambda ts: ts.tz_localize(from_tz, > ambiguous=False).tz_convert(to_tz).tz_localize(None) > File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", > line 2294, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer > (pandas/lib.c:66124) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1878, in > if ts is not pd.NaT else pd.NaT) > File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert > (pandas/tslib.c:13923) > File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ > (pandas/tslib.c:10447) > File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject > (pandas/tslib.c:27504) > File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz > (pandas/tslib.c:32362) > File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line > 178, in timezone > raise UnknownTimeZoneError(zone) > UnknownTimeZoneError: 'US/Pacific-New' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812784#comment-16812784 ] shane knapp commented on SPARK-27389: - well, this started happening ~6am PST on april 2nd as best as i can tell. regarding the tzinfo on the centos workers (where this is failing), nothing has changed for a year: {noformat} $ ls -l /usr/share/zoneinfo/US total 52 -rw-r--r--. 2 root root 2354 Apr 3 2017 Alaska -rw-r--r--. 3 root root 2339 Apr 3 2017 Aleutian -rw-r--r--. 2 root root 327 Apr 3 2017 Arizona -rw-r--r--. 2 root root 3543 Apr 3 2017 Central -rw-r--r--. 3 root root 3519 Apr 3 2017 Eastern -rw-r--r--. 4 root root 1649 Apr 3 2017 East-Indiana -rw-r--r--. 3 root root 250 Apr 3 2017 Hawaii -rw-r--r--. 3 root root 2395 Apr 3 2017 Indiana-Starke -rw-r--r--. 2 root root 2202 Apr 3 2017 Michigan -rw-r--r--. 4 root root 2427 Apr 3 2017 Mountain -rw-r--r--. 3 root root 2819 Apr 3 2017 Pacific -rw-r--r--. 3 root root 2819 Apr 3 2017 Pacific-New -rw-r--r--. 4 root root 174 Apr 3 2017 Samoa {noformat} anyways: i still believe that this is a pyspark problem, not a jenkins worker configuration problem. > pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'" > - > > Key: SPARK-27389 > URL: https://issues.apache.org/jira/browse/SPARK-27389 > Project: Spark > Issue Type: Task > Components: jenkins, PySpark >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Priority: Major > > I've seen a few odd PR build failures w/ an error in pyspark tests about > "UnknownTimeZoneError: 'US/Pacific-New'". eg. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull > A bit of searching tells me that US/Pacific-New probably isn't really > supposed to be a timezone at all: > https://mm.icann.org/pipermail/tz/2009-February/015448.html > I'm guessing that this is from some misconfiguration of jenkins. that said, > I can't figure out what is wrong. There does seem to be a timezone entry for > US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to > be there on every amp-jenkins-worker, so I dunno what that alone would cause > this failure sometime. > [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be > totally wrong here and it is really a pyspark problem. > Full Stack trace from the test failure: > {noformat} > == > ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) > -- > Traceback (most recent call last): > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 522, in test_to_pandas > pdf = self._to_pandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 517, in _to_pandas > return df.toPandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py", > line 2189, in toPandas > _check_series_convert_timestamps_local_tz(pdf[field.name], timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1891, in _check_series_convert_timestamps_local_tz > return _check_series_convert_timestamps_localize(s, None, timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1877, in _check_series_convert_timestamps_localize > lambda ts: ts.tz_localize(from_tz, > ambiguous=False).tz_convert(to_tz).tz_localize(None) > File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", > line 2294, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer > (pandas/lib.c:66124) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1878, in > if ts is not pd.NaT else pd.NaT) > File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert > (pandas/tslib.c:13923) > File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ > (pandas/tslib.c:10447) > File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject > (pandas/tslib.c:27504) > File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz > (pandas/tslib.c:32362) > File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line > 178, in timezone > raise UnknownTimeZoneError(zone) > UnknownTimeZoneError: 'US/Pacific-New' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) -
[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812767#comment-16812767 ] Sean Owen commented on SPARK-27389: --- What about updating tzdata? > pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'" > - > > Key: SPARK-27389 > URL: https://issues.apache.org/jira/browse/SPARK-27389 > Project: Spark > Issue Type: Task > Components: jenkins, PySpark >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Priority: Major > > I've seen a few odd PR build failures w/ an error in pyspark tests about > "UnknownTimeZoneError: 'US/Pacific-New'". eg. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull > A bit of searching tells me that US/Pacific-New probably isn't really > supposed to be a timezone at all: > https://mm.icann.org/pipermail/tz/2009-February/015448.html > I'm guessing that this is from some misconfiguration of jenkins. that said, > I can't figure out what is wrong. There does seem to be a timezone entry for > US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to > be there on every amp-jenkins-worker, so I dunno what that alone would cause > this failure sometime. > [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be > totally wrong here and it is really a pyspark problem. > Full Stack trace from the test failure: > {noformat} > == > ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) > -- > Traceback (most recent call last): > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 522, in test_to_pandas > pdf = self._to_pandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 517, in _to_pandas > return df.toPandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py", > line 2189, in toPandas > _check_series_convert_timestamps_local_tz(pdf[field.name], timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1891, in _check_series_convert_timestamps_local_tz > return _check_series_convert_timestamps_localize(s, None, timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1877, in _check_series_convert_timestamps_localize > lambda ts: ts.tz_localize(from_tz, > ambiguous=False).tz_convert(to_tz).tz_localize(None) > File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", > line 2294, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer > (pandas/lib.c:66124) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1878, in > if ts is not pd.NaT else pd.NaT) > File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert > (pandas/tslib.c:13923) > File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ > (pandas/tslib.c:10447) > File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject > (pandas/tslib.c:27504) > File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz > (pandas/tslib.c:32362) > File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line > 178, in timezone > raise UnknownTimeZoneError(zone) > UnknownTimeZoneError: 'US/Pacific-New' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812765#comment-16812765 ] Sean Owen commented on SPARK-27389: --- On the question of what the heck it is, comically: https://mm.icann.org/pipermail/tz/2009-February/015448.html So.. hm does this suggest it is the OS with something about this installed somewhere? This bug was reported against pytz over a decade ago > pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'" > - > > Key: SPARK-27389 > URL: https://issues.apache.org/jira/browse/SPARK-27389 > Project: Spark > Issue Type: Task > Components: jenkins, PySpark >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Priority: Major > > I've seen a few odd PR build failures w/ an error in pyspark tests about > "UnknownTimeZoneError: 'US/Pacific-New'". eg. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull > A bit of searching tells me that US/Pacific-New probably isn't really > supposed to be a timezone at all: > https://mm.icann.org/pipermail/tz/2009-February/015448.html > I'm guessing that this is from some misconfiguration of jenkins. that said, > I can't figure out what is wrong. There does seem to be a timezone entry for > US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to > be there on every amp-jenkins-worker, so I dunno what that alone would cause > this failure sometime. > [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be > totally wrong here and it is really a pyspark problem. > Full Stack trace from the test failure: > {noformat} > == > ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) > -- > Traceback (most recent call last): > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 522, in test_to_pandas > pdf = self._to_pandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 517, in _to_pandas > return df.toPandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py", > line 2189, in toPandas > _check_series_convert_timestamps_local_tz(pdf[field.name], timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1891, in _check_series_convert_timestamps_local_tz > return _check_series_convert_timestamps_localize(s, None, timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1877, in _check_series_convert_timestamps_localize > lambda ts: ts.tz_localize(from_tz, > ambiguous=False).tz_convert(to_tz).tz_localize(None) > File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", > line 2294, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer > (pandas/lib.c:66124) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1878, in > if ts is not pd.NaT else pd.NaT) > File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert > (pandas/tslib.c:13923) > File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ > (pandas/tslib.c:10447) > File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject > (pandas/tslib.c:27504) > File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz > (pandas/tslib.c:32362) > File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line > 178, in timezone > raise UnknownTimeZoneError(zone) > UnknownTimeZoneError: 'US/Pacific-New' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812704#comment-16812704 ] shane knapp commented on SPARK-27389: - is this even really a valid timezone? plus, i really don't think this is a jenkins issue per se. i whipped up some java to check for this timezone, which is there: {code} $ java DisplayZoneAndOffSet|grep Pacific-New US/Pacific-New (UTC-07:00) {code} but it's definitely not a valid pytz timezone: {code} $ python2.7 -c 'import pytz; print "US/Pacific-New" in pytz.all_timezones' False {code} as a work-around... i *could* hack {code}/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py{code} to include US/Pacific-New on all of the workers. ;) > pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'" > - > > Key: SPARK-27389 > URL: https://issues.apache.org/jira/browse/SPARK-27389 > Project: Spark > Issue Type: Task > Components: jenkins, PySpark >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Assignee: shane knapp >Priority: Major > > I've seen a few odd PR build failures w/ an error in pyspark tests about > "UnknownTimeZoneError: 'US/Pacific-New'". eg. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull > A bit of searching tells me that US/Pacific-New probably isn't really > supposed to be a timezone at all: > https://mm.icann.org/pipermail/tz/2009-February/015448.html > I'm guessing that this is from some misconfiguration of jenkins. that said, > I can't figure out what is wrong. There does seem to be a timezone entry for > US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to > be there on every amp-jenkins-worker, so I dunno what that alone would cause > this failure sometime. > [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be > totally wrong here and it is really a pyspark problem. > Full Stack trace from the test failure: > {noformat} > == > ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) > -- > Traceback (most recent call last): > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 522, in test_to_pandas > pdf = self._to_pandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 517, in _to_pandas > return df.toPandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py", > line 2189, in toPandas > _check_series_convert_timestamps_local_tz(pdf[field.name], timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1891, in _check_series_convert_timestamps_local_tz > return _check_series_convert_timestamps_localize(s, None, timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1877, in _check_series_convert_timestamps_localize > lambda ts: ts.tz_localize(from_tz, > ambiguous=False).tz_convert(to_tz).tz_localize(None) > File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", > line 2294, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer > (pandas/lib.c:66124) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1878, in > if ts is not pd.NaT else pd.NaT) > File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert > (pandas/tslib.c:13923) > File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ > (pandas/tslib.c:10447) > File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject > (pandas/tslib.c:27504) > File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz > (pandas/tslib.c:32362) > File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line > 178, in timezone > raise UnknownTimeZoneError(zone) > UnknownTimeZoneError: 'US/Pacific-New' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812628#comment-16812628 ] shane knapp commented on SPARK-27389: - JDKs haven't changed on the jenkins workers in a while, and neither have the python pytz packages... i'm not really sure what's going on here and why this just started failing. i'll poke around more (later) today, after i get caught up from the latter half of last week. > pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'" > - > > Key: SPARK-27389 > URL: https://issues.apache.org/jira/browse/SPARK-27389 > Project: Spark > Issue Type: Task > Components: jenkins, PySpark >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Assignee: shane knapp >Priority: Major > > I've seen a few odd PR build failures w/ an error in pyspark tests about > "UnknownTimeZoneError: 'US/Pacific-New'". eg. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull > A bit of searching tells me that US/Pacific-New probably isn't really > supposed to be a timezone at all: > https://mm.icann.org/pipermail/tz/2009-February/015448.html > I'm guessing that this is from some misconfiguration of jenkins. that said, > I can't figure out what is wrong. There does seem to be a timezone entry for > US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to > be there on every amp-jenkins-worker, so I dunno what that alone would cause > this failure sometime. > [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be > totally wrong here and it is really a pyspark problem. > Full Stack trace from the test failure: > {noformat} > == > ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) > -- > Traceback (most recent call last): > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 522, in test_to_pandas > pdf = self._to_pandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 517, in _to_pandas > return df.toPandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py", > line 2189, in toPandas > _check_series_convert_timestamps_local_tz(pdf[field.name], timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1891, in _check_series_convert_timestamps_local_tz > return _check_series_convert_timestamps_localize(s, None, timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1877, in _check_series_convert_timestamps_localize > lambda ts: ts.tz_localize(from_tz, > ambiguous=False).tz_convert(to_tz).tz_localize(None) > File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", > line 2294, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer > (pandas/lib.c:66124) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1878, in > if ts is not pd.NaT else pd.NaT) > File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert > (pandas/tslib.c:13923) > File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ > (pandas/tslib.c:10447) > File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject > (pandas/tslib.c:27504) > File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz > (pandas/tslib.c:32362) > File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line > 178, in timezone > raise UnknownTimeZoneError(zone) > UnknownTimeZoneError: 'US/Pacific-New' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16811461#comment-16811461 ] Felix Cheung commented on SPARK-27389: -- maybe a new JDK changes TimeZone? > pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'" > - > > Key: SPARK-27389 > URL: https://issues.apache.org/jira/browse/SPARK-27389 > Project: Spark > Issue Type: Task > Components: jenkins, PySpark >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Assignee: shane knapp >Priority: Major > > I've seen a few odd PR build failures w/ an error in pyspark tests about > "UnknownTimeZoneError: 'US/Pacific-New'". eg. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull > A bit of searching tells me that US/Pacific-New probably isn't really > supposed to be a timezone at all: > https://mm.icann.org/pipermail/tz/2009-February/015448.html > I'm guessing that this is from some misconfiguration of jenkins. that said, > I can't figure out what is wrong. There does seem to be a timezone entry for > US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to > be there on every amp-jenkins-worker, so I dunno what that alone would cause > this failure sometime. > [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be > totally wrong here and it is really a pyspark problem. > Full Stack trace from the test failure: > {noformat} > == > ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) > -- > Traceback (most recent call last): > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 522, in test_to_pandas > pdf = self._to_pandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 517, in _to_pandas > return df.toPandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py", > line 2189, in toPandas > _check_series_convert_timestamps_local_tz(pdf[field.name], timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1891, in _check_series_convert_timestamps_local_tz > return _check_series_convert_timestamps_localize(s, None, timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1877, in _check_series_convert_timestamps_localize > lambda ts: ts.tz_localize(from_tz, > ambiguous=False).tz_convert(to_tz).tz_localize(None) > File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", > line 2294, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer > (pandas/lib.c:66124) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1878, in > if ts is not pd.NaT else pd.NaT) > File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert > (pandas/tslib.c:13923) > File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ > (pandas/tslib.c:10447) > File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject > (pandas/tslib.c:27504) > File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz > (pandas/tslib.c:32362) > File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line > 178, in timezone > raise UnknownTimeZoneError(zone) > UnknownTimeZoneError: 'US/Pacific-New' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16811355#comment-16811355 ] Bryan Cutler commented on SPARK-27389: -- >From the stacktrace, it looks like it's getting this from >"spark.sql.session.timeZone" which defaults to Java.util >TimeZone.getDefault.getID() > pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'" > - > > Key: SPARK-27389 > URL: https://issues.apache.org/jira/browse/SPARK-27389 > Project: Spark > Issue Type: Task > Components: jenkins, PySpark >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Assignee: shane knapp >Priority: Major > > I've seen a few odd PR build failures w/ an error in pyspark tests about > "UnknownTimeZoneError: 'US/Pacific-New'". eg. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull > A bit of searching tells me that US/Pacific-New probably isn't really > supposed to be a timezone at all: > https://mm.icann.org/pipermail/tz/2009-February/015448.html > I'm guessing that this is from some misconfiguration of jenkins. that said, > I can't figure out what is wrong. There does seem to be a timezone entry for > US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to > be there on every amp-jenkins-worker, so I dunno what that alone would cause > this failure sometime. > [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be > totally wrong here and it is really a pyspark problem. > Full Stack trace from the test failure: > {noformat} > == > ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) > -- > Traceback (most recent call last): > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 522, in test_to_pandas > pdf = self._to_pandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py", > line 517, in _to_pandas > return df.toPandas() > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py", > line 2189, in toPandas > _check_series_convert_timestamps_local_tz(pdf[field.name], timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1891, in _check_series_convert_timestamps_local_tz > return _check_series_convert_timestamps_localize(s, None, timezone) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1877, in _check_series_convert_timestamps_localize > lambda ts: ts.tz_localize(from_tz, > ambiguous=False).tz_convert(to_tz).tz_localize(None) > File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", > line 2294, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer > (pandas/lib.c:66124) > File > "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py", > line 1878, in > if ts is not pd.NaT else pd.NaT) > File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert > (pandas/tslib.c:13923) > File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ > (pandas/tslib.c:10447) > File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject > (pandas/tslib.c:27504) > File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz > (pandas/tslib.c:32362) > File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line > 178, in timezone > raise UnknownTimeZoneError(zone) > UnknownTimeZoneError: 'US/Pacific-New' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org