date:20211006

[spark] branch master updated: [SPARK-36742][PYTHON] Fix ps.to_datetime with plurals of keys like years, months, days

2021-10-06 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 218da86  [SPARK-36742][PYTHON] Fix ps.to_datetime with plurals of keys 
like years, months, days
218da86 is described below

commit 218da86b8d682ddce3208e0c57b6df7055449130
Author: dch nguyen 
AuthorDate: Thu Oct 7 12:21:06 2021 +0900

[SPARK-36742][PYTHON] Fix ps.to_datetime with plurals of keys like years, 
months, days

### What changes were proposed in this pull request?
Fix ps.to_datetime with plurals of keys like years, months, days.

### Why are the changes needed?
Fix ps.to_datetime with plurals of keys like years, months, days
Before this PR
``` python
# pandas
df_test = pd.DataFrame({'years': [2015, 2016], 'months': [2, 3], 'days': 
[4, 5]})
df_test['date'] = pd.to_datetime(df_test[['years', 'months', 'days']])
df_test

   years  months  days   date
0   2015   2 4 2015-02-04
1   2016   3 5 2016-03-05

# pandas on spark
df_test = ps.DataFrame({'years': [2015, 2016], 'months': [2, 3], 'days': 
[4, 5]})
df_test['date'] = ps.to_datetime(df_test[['years', 'months', 'days']])

Traceback (most recent call last):
  File "", line 1, in 
  File "/u02/spark/python/pyspark/pandas/namespace.py", line 1643, in 
to_datetime
psdf = arg[["year", "month", "day"]]
  File "/u02/spark/python/pyspark/pandas/frame.py", line 11888, in 
__getitem__
return self.loc[:, list(key)]
  File "/u02/spark/python/pyspark/pandas/indexing.py", line 480, in 
__getitem__
) = self._select_cols(cols_sel)
  File "/u02/spark/python/pyspark/pandas/indexing.py", line 325, in 
_select_cols
return self._select_cols_by_iterable(cols_sel, missing_keys)
  File "/u02/spark/python/pyspark/pandas/indexing.py", line 1356, in 
_select_cols_by_iterable
raise KeyError("['{}'] not in index".format(name_like_string(key)))
KeyError: "['year'] not in index"
```

### Does this PR introduce _any_ user-facing change?
After this PR :
``` python
df_test = ps.DataFrame({'years': [2015, 2016], 'months': [2, 3], 'days': 
[4, 5]})
df_test['date'] = ps.to_datetime(df_test[['years', 'months', 'days']])
df_test

   years  months  days   date
0   2015   2 4 2015-02-04
1   2016   3 5 2016-03-05
```

### How was this patch tested?
Unit tests

Closes #34182 from dchvn/SPARK-36742.

Authored-by: dch nguyen 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/pandas/namespace.py| 27 +--
 python/pyspark/pandas/tests/test_namespace.py | 21 +
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/pandas/namespace.py 
b/python/pyspark/pandas/namespace.py
index 8df5d2c..2d62dea 100644
--- a/python/pyspark/pandas/namespace.py
+++ b/python/pyspark/pandas/namespace.py
@@ -1629,9 +1629,30 @@ def to_datetime(
 DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'], 
dtype='datetime64[ns]', freq=None)
 """
 
+# mappings for assembling units
+# From pandas: pandas.core.tools.datetimes
+_unit_map = {
+"year": "year",
+"years": "year",
+"month": "month",
+"months": "month",
+"day": "day",
+"days": "day",
+}
+
+# replace passed unit with _unit_map
+def f(value):
+if value in _unit_map:
+return _unit_map[value]
+
+if value.lower() in _unit_map:
+return _unit_map[value.lower()]
+
+return value
+
 def pandas_to_datetime(pser_or_pdf: Union[pd.DataFrame, pd.Series]) -> 
Series[np.datetime64]:
 if isinstance(pser_or_pdf, pd.DataFrame):
-pser_or_pdf = pser_or_pdf[["year", "month", "day"]]
+pser_or_pdf = pser_or_pdf[[unit_rev["year"], unit_rev["month"], 
unit_rev["day"]]]
 return pd.to_datetime(
 pser_or_pdf,
 errors=errors,
@@ -1644,7 +1665,9 @@ def to_datetime(
 if isinstance(arg, Series):
 return arg.pandas_on_spark.transform_batch(pandas_to_datetime)
 if isinstance(arg, DataFrame):
-psdf = arg[["year", "month", "day"]]
+unit = {k: f(k) for k in arg.keys()}
+unit_rev = {v: k for k, v in unit.items()}
+psdf = arg[[unit_rev["year"], unit_rev["month"], unit_rev["day"]]]
 return psdf.pandas_on_spark.transform_batch(pandas_to_datetime)
 return pd.to_datetime(
 arg,
diff --git a/python/pyspark/pandas/tests/test_namespace.py 
b/python/pyspark/pandas/tests/test_namespace.py
index 29578a9..6d51216 100644
--- a/python/pyspark/pandas/tests/test_namespace.py
+++ b/python/pyspark/pandas/tests/test_namespace.py
@@ -71,6 +71,27 @@ class

[spark] branch master updated (d99edac -> 1eb6ea3)

2021-10-06 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d99edac  [SPARK-36884][PYTHON] Inline type hints for 
pyspark.sql.session
 add 1eb6ea3  [SPARK-36918][SQL] Ignore types when comparing structs for 
unionByName

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/ResolveUnion.scala |  5 +-
 .../org/apache/spark/sql/types/DataType.scala  | 26 +
 .../org/apache/spark/sql/types/DataTypeSuite.scala | 64 ++
 3 files changed, 93 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

2021-10-06 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d99edac  [SPARK-36884][PYTHON] Inline type hints for 
pyspark.sql.session
d99edac is described below

commit d99edacb7a5b66581f98b26be6b1a775b794594a
Author: Takuya UESHIN 
AuthorDate: Thu Oct 7 11:22:51 2021 +0900

[SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

### What changes were proposed in this pull request?

Inline type hints from `python/pyspark/sql/session.pyi` to 
`python/pyspark/sql/session.py`.

### Why are the changes needed?

Currently, there is type hint stub files `python/pyspark/sql/session.pyi` 
to show the expected types for functions, but we can also take advantage of 
static type checking within the functions by inlining the type hints.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test.

Closes #34136 from ueshin/issues/SPARK-36884/inline_typehints.

Authored-by: Takuya UESHIN 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/session.py  | 272 ++---
 python/pyspark/sql/session.pyi | 131 
 2 files changed, 204 insertions(+), 199 deletions(-)

diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py
index 4a3c6b3..60e2d69 100644
--- a/python/pyspark/sql/session.py
+++ b/python/pyspark/sql/session.py
@@ -19,23 +19,42 @@ import sys
 import warnings
 from functools import reduce
 from threading import RLock
+from types import TracebackType
+from typing import (
+Any, Dict, Iterable, List, Optional, Tuple, Type, Union,
+cast, no_type_check, overload, TYPE_CHECKING
+)
 
-from pyspark import since
+from py4j.java_gateway import JavaObject  # type: ignore[import]
+
+from pyspark import SparkConf, SparkContext, since
 from pyspark.rdd import RDD
 from pyspark.sql.conf import RuntimeConfig
 from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.pandas.conversion import SparkConversionMixin
 from pyspark.sql.readwriter import DataFrameReader
 from pyspark.sql.streaming import DataStreamReader
-from pyspark.sql.types import DataType, StructType, \
-_make_type_verifier, _infer_schema, _has_nulltype, _merge_type, 
_create_converter, \
+from pyspark.sql.types import (  # type: ignore[attr-defined]
+AtomicType, DataType, StructType,
+_make_type_verifier, _infer_schema, _has_nulltype, _merge_type, 
_create_converter,
 _parse_datatype_string
+)
 from pyspark.sql.utils import install_exception_handler, 
is_timestamp_ntz_preferred
 
+if TYPE_CHECKING:
+from pyspark.sql._typing import DateTimeLiteral, LiteralType, 
DecimalLiteral, RowLike
+from pyspark.sql.catalog import Catalog
+from pyspark.sql.pandas._typing import DataFrameLike as PandasDataFrameLike
+from pyspark.sql.streaming import StreamingQueryManager
+from pyspark.sql.udf import UDFRegistration
+
+
 __all__ = ["SparkSession"]
 
 
-def _monkey_patch_RDD(sparkSession):
+def _monkey_patch_RDD(sparkSession: "SparkSession") -> None:
+
+@no_type_check
 def toDF(self, schema=None, sampleRatio=None):
 """
 Converts current :class:`RDD` into a :class:`DataFrame`
@@ -65,7 +84,7 @@ def _monkey_patch_RDD(sparkSession):
 """
 return sparkSession.createDataFrame(self, schema, sampleRatio)
 
-RDD.toDF = toDF
+RDD.toDF = toDF  # type: ignore[assignment]
 
 
 class SparkSession(SparkConversionMixin):
@@ -107,10 +126,23 @@ class SparkSession(SparkConversionMixin):
 """
 
 _lock = RLock()
-_options = {}
+_options: Dict[str, Any] = {}
 _sc = None
 
-def config(self, key=None, value=None, conf=None):
+@overload
+def config(self, *, conf: SparkConf) -> "SparkSession.Builder":
+...
+
+@overload
+def config(self, key: str, value: Any) -> "SparkSession.Builder":
+...
+
+def config(
+self,
+key: Optional[str] = None,
+value: Optional[Any] = None,
+conf: Optional[SparkConf] = None
+) -> "SparkSession.Builder":
 """Sets a config option. Options set using this method are 
automatically propagated to
 both :class:`SparkConf` and :class:`SparkSession`'s own 
configuration.
 
@@ -141,13 +173,13 @@ class SparkSession(SparkConversionMixin):
 """
 with self._lock:
 if conf is None:
-self._options[key] = str(value)
+self._options[cast(str, key)] = str(value)
 else:
 for (k, v) in conf.getAll():
 self._options[k] = v
 return self
 
-def master(self,

[spark] branch branch-3.2 updated: [SPARK-36939][PYTHON][DOCS] Add orphan migration page into list in PySpark documentation

2021-10-06 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 0565d95  [SPARK-36939][PYTHON][DOCS] Add orphan migration page into 
list in PySpark documentation
0565d95 is described below

commit 0565d95a86e738d24e9c05a4c5c3c3815944b4be
Author: Hyukjin Kwon 
AuthorDate: Thu Oct 7 10:02:45 2021 +0900

[SPARK-36939][PYTHON][DOCS] Add orphan migration page into list in PySpark 
documentation

### What changes were proposed in this pull request?

This PR fixes the warning below during PySpark documentation build:

```
checking consistency... 
/.../spark/python/docs/source/migration_guide/pyspark_3.2_to_3.3.rst: WARNING: 
document isn't included in any toctree
done
```

SPARK-36618 added a new migration guide page but that's mistakenly not 
added to `spark/python/docs/source/migration_guideindex.rst` resulting in not 
being shown.

### Why are the changes needed?

To show the migration guides to end users.

### Does this PR introduce _any_ user-facing change?

It's not yet released but we should better backport to branch-3.2.
It's a followup of the new page in PySpark documentation (branch-3.2).

### How was this patch tested?

Manually built the docs

Closes #34195 from HyukjinKwon/SPARK-36939.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 0902b47a7529d036afdaf181d96abdb2fbfb8d77)
Signed-off-by: Hyukjin Kwon 
---
 python/docs/source/migration_guide/index.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/python/docs/source/migration_guide/index.rst 
b/python/docs/source/migration_guide/index.rst
index b25ac31..2e61653 100644
--- a/python/docs/source/migration_guide/index.rst
+++ b/python/docs/source/migration_guide/index.rst
@@ -25,6 +25,7 @@ This page describes the migration guide specific to PySpark.
 .. toctree::
:maxdepth: 2
 
+   pyspark_3.2_to_3.3
pyspark_3.1_to_3.2
pyspark_2.4_to_3.0
pyspark_2.3_to_2.4

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36939][PYTHON][DOCS] Add orphan migration page into list in PySpark documentation

2021-10-06 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0902b47a [SPARK-36939][PYTHON][DOCS] Add orphan migration page into 
list in PySpark documentation
0902b47a is described below

commit 0902b47a7529d036afdaf181d96abdb2fbfb8d77
Author: Hyukjin Kwon 
AuthorDate: Thu Oct 7 10:02:45 2021 +0900

[SPARK-36939][PYTHON][DOCS] Add orphan migration page into list in PySpark 
documentation

### What changes were proposed in this pull request?

This PR fixes the warning below during PySpark documentation build:

```
checking consistency... 
/.../spark/python/docs/source/migration_guide/pyspark_3.2_to_3.3.rst: WARNING: 
document isn't included in any toctree
done
```

SPARK-36618 added a new migration guide page but that's mistakenly not 
added to `spark/python/docs/source/migration_guideindex.rst` resulting in not 
being shown.

### Why are the changes needed?

To show the migration guides to end users.

### Does this PR introduce _any_ user-facing change?

It's not yet released but we should better backport to branch-3.2.
It's a followup of the new page in PySpark documentation (branch-3.2).

### How was this patch tested?

Manually built the docs

Closes #34195 from HyukjinKwon/SPARK-36939.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 python/docs/source/migration_guide/index.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/python/docs/source/migration_guide/index.rst 
b/python/docs/source/migration_guide/index.rst
index b25ac31..2e61653 100644
--- a/python/docs/source/migration_guide/index.rst
+++ b/python/docs/source/migration_guide/index.rst
@@ -25,6 +25,7 @@ This page describes the migration guide specific to PySpark.
 .. toctree::
:maxdepth: 2
 
+   pyspark_3.2_to_3.3
pyspark_3.1_to_3.2
pyspark_2.4_to_3.0
pyspark_2.3_to_2.4

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (2953d4f -> 090c9bf)

2021-10-06 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2953d4f  [SPARK-36751][PYTHON][DOCS][FOLLOW-UP] Fix unexpected section 
title for Examples in docstring
 add 090c9bf  [SPARK-36937][SQL][TESTS] Change OrcSourceSuite to test both 
V1 and V2 sources

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/datasources/orc/OrcSourceSuite.scala | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (aed977c -> 2953d4f)

2021-10-06 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from aed977c  [SPARK-36919][SQL] Make BadRecordException fields transient
 add 2953d4f  [SPARK-36751][PYTHON][DOCS][FOLLOW-UP] Fix unexpected section 
title for Examples in docstring

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/functions.py | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r50291 - in /dev/spark/v3.2.0-rc7-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/parqu

2021-10-06 Thread gengliang

Author: gengliang
Date: Wed Oct  6 14:17:02 2021
New Revision: 50291

Log:
Apache Spark v3.2.0-rc7 docs


[This commit notification would consist of 2345 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r50290 - /dev/spark/v3.2.0-rc7-bin/

2021-10-06 Thread gengliang

Author: gengliang
Date: Wed Oct  6 13:50:32 2021
New Revision: 50290

Log:
Apache Spark v3.2.0-rc7

Added:
dev/spark/v3.2.0-rc7-bin/
dev/spark/v3.2.0-rc7-bin/SparkR_3.2.0.tar.gz   (with props)
dev/spark/v3.2.0-rc7-bin/SparkR_3.2.0.tar.gz.asc
dev/spark/v3.2.0-rc7-bin/SparkR_3.2.0.tar.gz.sha512
dev/spark/v3.2.0-rc7-bin/pyspark-3.2.0.tar.gz   (with props)
dev/spark/v3.2.0-rc7-bin/pyspark-3.2.0.tar.gz.asc
dev/spark/v3.2.0-rc7-bin/pyspark-3.2.0.tar.gz.sha512
dev/spark/v3.2.0-rc7-bin/spark-3.2.0-bin-hadoop2.7.tgz   (with props)
dev/spark/v3.2.0-rc7-bin/spark-3.2.0-bin-hadoop2.7.tgz.asc
dev/spark/v3.2.0-rc7-bin/spark-3.2.0-bin-hadoop2.7.tgz.sha512
dev/spark/v3.2.0-rc7-bin/spark-3.2.0-bin-hadoop3.2-scala2.13.tgz   (with 
props)
dev/spark/v3.2.0-rc7-bin/spark-3.2.0-bin-hadoop3.2-scala2.13.tgz.asc
dev/spark/v3.2.0-rc7-bin/spark-3.2.0-bin-hadoop3.2-scala2.13.tgz.sha512
dev/spark/v3.2.0-rc7-bin/spark-3.2.0-bin-hadoop3.2.tgz   (with props)
dev/spark/v3.2.0-rc7-bin/spark-3.2.0-bin-hadoop3.2.tgz.asc
dev/spark/v3.2.0-rc7-bin/spark-3.2.0-bin-hadoop3.2.tgz.sha512
dev/spark/v3.2.0-rc7-bin/spark-3.2.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.2.0-rc7-bin/spark-3.2.0-bin-without-hadoop.tgz.asc
dev/spark/v3.2.0-rc7-bin/spark-3.2.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.2.0-rc7-bin/spark-3.2.0.tgz   (with props)
dev/spark/v3.2.0-rc7-bin/spark-3.2.0.tgz.asc
dev/spark/v3.2.0-rc7-bin/spark-3.2.0.tgz.sha512

Added: dev/spark/v3.2.0-rc7-bin/SparkR_3.2.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.2.0-rc7-bin/SparkR_3.2.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.2.0-rc7-bin/SparkR_3.2.0.tar.gz.asc
==
--- dev/spark/v3.2.0-rc7-bin/SparkR_3.2.0.tar.gz.asc (added)
+++ dev/spark/v3.2.0-rc7-bin/SparkR_3.2.0.tar.gz.asc Wed Oct  6 13:50:32 2021
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJJBAABCgAzFiEEVNI/8r5LtJnK7c4wCYJ7fOVh7E4FAmFdokcVHGdlbmdsaWFu
+Z0BhcGFjaGUub3JnAAoJEAmCe3zlYexOvOQQAJSxL821pPT75E9BY1+6TyJtntzI
+tN4So1nsbwAjOvs6MhKjA6poy0isckBeYzqa8in9qaznfD9kfgLB/GvHMgzEnmv+
+ZMT0tznboX2IaxNhqb50OPf3g3Fu+b3c/jrxbD/cFj3HQB0Jxgn5kDCCYbiCXpn2
+sI/7JFWH5WXEu8obXZlr4N23bP/yMclCnlzngBE9MgokcR60NgbQ/q/yCbbLPABJ
+SY4LHlFuu54gBzMazOhtICw/u7z7mNxKOqMndvq5iY9e8uYmb0m/BWCUOsg6sEPb
+pvB+2Srfs7XpS/wYz4YdSzQtHGIHqGMBZqr+W0gCE/jYPrJHIWN4S+eeZKhBsHBw
+4cJx95F+0Hnlm6MVJNy4S+P3SPwclvPaGEo6t+bNf46xGO5Wfmu2ppwSpsPiaCaz
+1/NTD1HdP/ull59IizdsOyIBzLzHGM1585F93ploqfJTsuouPalfAfXZBFcvaO3+
+2cgE8nuYhbKHopKLq0My9FfaqnVCRBnWsXvnDFweEpdpwdZweVq+GczCDcNwbT1+
+BFREZAki4B0rQNle7VBdHFK5VChUXrHLXXdlmVoblmtzYO0IbiqYzh2Ku4Ewz2/E
+mLDSl0G4HHHe47lDgd6V02o/9/qoedoyaKs6uSsEea1znA5F3K18kjEzRtCNOwTp
+VSBseQrX8t73jL8U
+=UeF3
+-END PGP SIGNATURE-

Added: dev/spark/v3.2.0-rc7-bin/SparkR_3.2.0.tar.gz.sha512
==
--- dev/spark/v3.2.0-rc7-bin/SparkR_3.2.0.tar.gz.sha512 (added)
+++ dev/spark/v3.2.0-rc7-bin/SparkR_3.2.0.tar.gz.sha512 Wed Oct  6 13:50:32 2021
@@ -0,0 +1,3 @@
+SparkR_3.2.0.tar.gz: 326B0E33 88DB8060 5B239008 BF6FE438 ED818397 6C658797
+ EB0CEB20 F0F5310D A6787417 9FB09EE2 6DCCADD6 DF47DBDB
+ 3C9B79B7 4F61D5B4 D5EEB465 29EA7D10

Added: dev/spark/v3.2.0-rc7-bin/pyspark-3.2.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.2.0-rc7-bin/pyspark-3.2.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.2.0-rc7-bin/pyspark-3.2.0.tar.gz.asc
==
--- dev/spark/v3.2.0-rc7-bin/pyspark-3.2.0.tar.gz.asc (added)
+++ dev/spark/v3.2.0-rc7-bin/pyspark-3.2.0.tar.gz.asc Wed Oct  6 13:50:32 2021
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJJBAABCgAzFiEEVNI/8r5LtJnK7c4wCYJ7fOVh7E4FAmFdokkVHGdlbmdsaWFu
+Z0BhcGFjaGUub3JnAAoJEAmCe3zlYexOvnMP/01CeoigQjIo+CNvYIUYbQ8OkT5p
+PAafmu1FVb8Rt+urUT9cQgkeV6YWT5CJGZoQWSJMIEFQBirAWK4EcTEE6lNlnO9j
+NytusKlz9xpNv/Hunv7c8ddz/IwtRgQUZUck6pdZlO8UOt3HVIVp9prddzxLXjd7
+O1/eCIvzgVQm/ey5c662I0zdR+J20S4ZQuEaOcZg3stReGdSX20H97+QyF2rjBFm
+60a6JaCWSSju9u2oNiopIQHLpVR3eTXl1OC1O4qKSaSYlV7TS7dgOgT1VqPTo6hv
+HAUKzGOdRnR1wp9MSpHvDggInv0NsZd1Coz3ypu/p6W2IkAoKmSHnh58BEyPWqpi
+oFDkK5GsElnJsBhIPiSijlW20zr3EL1ez7MjHouNRx1F1LRBmMxdyjdDNWYJBz3r
+uW/r3kvNxMX1+iDiZ/0Wht1jkVvS0w/RCaW6fmJr5KF3vSch1AfKxK46xhnL5pw1
+hYQIdNRj4f3aKF6KQ5Kss7KeHWhtbf+s3Jdq5BwdZ4E4VGY6ONo5d8HUV8rXRwFC
+WFZe2y4BkT5kAuifK+XXb3L7KwMr+KQKNYB0J4xKlvk3a4SGzJRH+J3LMaXp+xQf

[spark] 01/01: Preparing development version 3.2.1-SNAPSHOT

2021-10-06 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 5931f218349fea9f6a61e678b9ba304f8049ebb1
Author: Gengliang Wang 
AuthorDate: Wed Oct 6 11:45:32 2021 +

Preparing development version 3.2.1-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 4a310e6..2abad61 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.2.0
+Version: 3.2.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R: c(person("Shivaram", "Venkataraman", role = "aut",
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 83a1f46..5b2f449 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.0
+3.2.1-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index f371fe3..ff66ac6 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.0
+3.2.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 0499fe1..9db4231 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.0
+3.2.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 6c55007..1788b0f 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.0
+3.2.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 59cbebf..1528026 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.0
+3.2.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index ace7354..be18e9b 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.0
+3.2.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/tags/pom.xml b/common/tags/pom.xml
index

[spark] branch branch-3.2 updated (9760c8a -> 5931f21)

2021-10-06 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9760c8a  [SPARK-36919][SQL] Make BadRecordException fields transient
 add 5d45a41  Preparing Spark release v3.2.0-rc7
 new 5931f21  Preparing development version 3.2.1-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] tag v3.2.0-rc7 created (now 5d45a41)

2021-10-06 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to tag v3.2.0-rc7
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 5d45a41  (commit)
This tag includes the following new commits:

 new 5d45a41  Preparing Spark release v3.2.0-rc7

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/01: Preparing Spark release v3.2.0-rc7

2021-10-06 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to tag v3.2.0-rc7
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 5d45a415f3a29898d92380380cfd82bfc7f579ea
Author: Gengliang Wang 
AuthorDate: Wed Oct 6 11:45:26 2021 +

Preparing Spark release v3.2.0-rc7
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 2abad61..4a310e6 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.2.1
+Version: 3.2.0
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R: c(person("Shivaram", "Venkataraman", role = "aut",
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 5b2f449..83a1f46 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1-SNAPSHOT
+3.2.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index ff66ac6..f371fe3 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1-SNAPSHOT
+3.2.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 9db4231..0499fe1 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1-SNAPSHOT
+3.2.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 1788b0f..6c55007 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1-SNAPSHOT
+3.2.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 1528026..59cbebf 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1-SNAPSHOT
+3.2.0
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index be18e9b..ace7354 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1-SNAPSHOT
+3.2.0
 ../../pom.xml
   
 
diff --git a/common/tags/pom.xml b/common/tags/pom.xml
index

[spark] branch branch-3.0 updated: [SPARK-36919][SQL] Make BadRecordException fields transient

2021-10-06 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d454d4e  [SPARK-36919][SQL] Make BadRecordException fields transient
d454d4e is described below

commit d454d4efac75823eb07809d913adb1db5ee440f7
Author: tianhanhu 
AuthorDate: Wed Oct 6 19:06:09 2021 +0900

[SPARK-36919][SQL] Make BadRecordException fields transient

### What changes were proposed in this pull request?
Migrating a Spark application from 2.4.x to 3.1.x and finding a difference 
in the exception chaining behavior. In a case of parsing a malformed CSV, where 
the root cause exception should be Caused by: java.lang.RuntimeException: 
Malformed CSV record, only the top level exception is kept, and all lower level 
exceptions and root cause are lost. Thus, when we call 
ExceptionUtils.getRootCause on the exception, we still get itself.
The reason for the difference is that RuntimeException is wrapped in 
BadRecordException, which has unserializable fields. When we try to serialize 
the exception from tasks and deserialize from scheduler, the exception is lost.
This PR makes unserializable fields of BadRecordException transient, so the 
rest of the exception could be serialized and deserialized properly.

### Why are the changes needed?
Make BadRecordException serializable

### Does this PR introduce _any_ user-facing change?
User could get root cause of BadRecordException

### How was this patch tested?
Unit testing

Closes #34167 from tianhanhu/master.

Authored-by: tianhanhu 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit aed977c4682b6f378a26050ffab51b9b2075cae4)
Signed-off-by: Hyukjin Kwon 
---
 .../scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala | 4 ++--
 .../org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala
index d719a33..67defe7 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala
@@ -38,6 +38,6 @@ case class PartialResultException(
  * @param cause the actual exception about why the record is bad and can't be 
parsed.
  */
 case class BadRecordException(
-record: () => UTF8String,
-partialResult: () => Option[InternalRow],
+@transient record: () => UTF8String,
+@transient partialResult: () => Option[InternalRow],
 cause: Throwable) extends Exception(cause)
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
index 3b564b6..ab6fe27 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
@@ -29,6 +29,7 @@ import scala.collection.JavaConverters._
 import scala.util.Properties
 
 import com.univocity.parsers.common.TextParsingException
+import org.apache.commons.lang3.exception.ExceptionUtils
 import org.apache.commons.lang3.time.FastDateFormat
 import org.apache.hadoop.io.SequenceFile.CompressionType
 import org.apache.hadoop.io.compress.GzipCodec
@@ -357,6 +358,7 @@ abstract class CSVSuite extends QueryTest with 
SharedSparkSession with TestCsvDa
   }
 
   assert(exception.getMessage.contains("Malformed CSV record"))
+  
assert(ExceptionUtils.getRootCause(exception).isInstanceOf[RuntimeException])
 }
   }
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-36919][SQL] Make BadRecordException fields transient

2021-10-06 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 90d1932  [SPARK-36919][SQL] Make BadRecordException fields transient
90d1932 is described below

commit 90d193273ed5387984396940db1753bff5687931
Author: tianhanhu 
AuthorDate: Wed Oct 6 19:06:09 2021 +0900

[SPARK-36919][SQL] Make BadRecordException fields transient

### What changes were proposed in this pull request?
Migrating a Spark application from 2.4.x to 3.1.x and finding a difference 
in the exception chaining behavior. In a case of parsing a malformed CSV, where 
the root cause exception should be Caused by: java.lang.RuntimeException: 
Malformed CSV record, only the top level exception is kept, and all lower level 
exceptions and root cause are lost. Thus, when we call 
ExceptionUtils.getRootCause on the exception, we still get itself.
The reason for the difference is that RuntimeException is wrapped in 
BadRecordException, which has unserializable fields. When we try to serialize 
the exception from tasks and deserialize from scheduler, the exception is lost.
This PR makes unserializable fields of BadRecordException transient, so the 
rest of the exception could be serialized and deserialized properly.

### Why are the changes needed?
Make BadRecordException serializable

### Does this PR introduce _any_ user-facing change?
User could get root cause of BadRecordException

### How was this patch tested?
Unit testing

Closes #34167 from tianhanhu/master.

Authored-by: tianhanhu 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit aed977c4682b6f378a26050ffab51b9b2075cae4)
Signed-off-by: Hyukjin Kwon 
---
 .../scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala | 4 ++--
 .../org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala
index d719a33..67defe7 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala
@@ -38,6 +38,6 @@ case class PartialResultException(
  * @param cause the actual exception about why the record is bad and can't be 
parsed.
  */
 case class BadRecordException(
-record: () => UTF8String,
-partialResult: () => Option[InternalRow],
+@transient record: () => UTF8String,
+@transient partialResult: () => Option[InternalRow],
 cause: Throwable) extends Exception(cause)
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
index 3fe6ce7..5476e02 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
@@ -29,6 +29,7 @@ import scala.collection.JavaConverters._
 import scala.util.Properties
 
 import com.univocity.parsers.common.TextParsingException
+import org.apache.commons.lang3.exception.ExceptionUtils
 import org.apache.commons.lang3.time.FastDateFormat
 import org.apache.hadoop.io.SequenceFile.CompressionType
 import org.apache.hadoop.io.compress.GzipCodec
@@ -365,6 +366,7 @@ abstract class CSVSuite
   }
 
   assert(exception.getMessage.contains("Malformed CSV record"))
+  
assert(ExceptionUtils.getRootCause(exception).isInstanceOf[RuntimeException])
 }
   }
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-36919][SQL] Make BadRecordException fields transient

2021-10-06 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 9760c8a  [SPARK-36919][SQL] Make BadRecordException fields transient
9760c8a is described below

commit 9760c8ab6028bf525ea482d9e944bc8e731e0715
Author: tianhanhu 
AuthorDate: Wed Oct 6 19:06:09 2021 +0900

[SPARK-36919][SQL] Make BadRecordException fields transient

### What changes were proposed in this pull request?
Migrating a Spark application from 2.4.x to 3.1.x and finding a difference 
in the exception chaining behavior. In a case of parsing a malformed CSV, where 
the root cause exception should be Caused by: java.lang.RuntimeException: 
Malformed CSV record, only the top level exception is kept, and all lower level 
exceptions and root cause are lost. Thus, when we call 
ExceptionUtils.getRootCause on the exception, we still get itself.
The reason for the difference is that RuntimeException is wrapped in 
BadRecordException, which has unserializable fields. When we try to serialize 
the exception from tasks and deserialize from scheduler, the exception is lost.
This PR makes unserializable fields of BadRecordException transient, so the 
rest of the exception could be serialized and deserialized properly.

### Why are the changes needed?
Make BadRecordException serializable

### Does this PR introduce _any_ user-facing change?
User could get root cause of BadRecordException

### How was this patch tested?
Unit testing

Closes #34167 from tianhanhu/master.

Authored-by: tianhanhu 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit aed977c4682b6f378a26050ffab51b9b2075cae4)
Signed-off-by: Hyukjin Kwon 
---
 .../scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala | 4 ++--
 .../org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala
index d719a33..67defe7 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala
@@ -38,6 +38,6 @@ case class PartialResultException(
  * @param cause the actual exception about why the record is bad and can't be 
parsed.
  */
 case class BadRecordException(
-record: () => UTF8String,
-partialResult: () => Option[InternalRow],
+@transient record: () => UTF8String,
+@transient partialResult: () => Option[InternalRow],
 cause: Throwable) extends Exception(cause)
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
index fd25a79..3fc86fe 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
@@ -29,6 +29,7 @@ import scala.collection.JavaConverters._
 import scala.util.Properties
 
 import com.univocity.parsers.common.TextParsingException
+import org.apache.commons.lang3.exception.ExceptionUtils
 import org.apache.commons.lang3.time.FastDateFormat
 import org.apache.hadoop.io.SequenceFile.CompressionType
 import org.apache.hadoop.io.compress.GzipCodec
@@ -365,6 +366,7 @@ abstract class CSVSuite
   }
 
   assert(exception.getMessage.contains("Malformed CSV record"))
+  
assert(ExceptionUtils.getRootCause(exception).isInstanceOf[RuntimeException])
 }
   }
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f6013c8 -> aed977c)

2021-10-06 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f6013c8  [SPARK-36927][PYTHON] Inline type hints for 
python/pyspark/sql/window.py
 add aed977c  [SPARK-36919][SQL] Make BadRecordException fields transient

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala | 4 ++--
 .../org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (31b6f61 -> f6013c8)

2021-10-06 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 31b6f61  [SPARK-36892][CORE] Disable batch fetch for a shuffle when 
push based shuffle is enabled
 add f6013c8  [SPARK-36927][PYTHON] Inline type hints for 
python/pyspark/sql/window.py

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/window.py  | 50 +--
 python/pyspark/sql/window.pyi | 41 ---
 2 files changed, 29 insertions(+), 62 deletions(-)
 delete mode 100644 python/pyspark/sql/window.pyi

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-36892][CORE] Disable batch fetch for a shuffle when push based shuffle is enabled

2021-10-06 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 88f4809  [SPARK-36892][CORE] Disable batch fetch for a shuffle when 
push based shuffle is enabled
88f4809 is described below

commit 88f480914226281dd19e149ff243ec91e9ccf932
Author: Ye Zhou 
AuthorDate: Wed Oct 6 15:42:25 2021 +0800

[SPARK-36892][CORE] Disable batch fetch for a shuffle when push based 
shuffle is enabled

We found an issue where user configured both AQE and push based shuffle, 
but the job started to hang after running some  stages. We took the thread dump 
from the Executors, which showed the task is still waiting to fetch shuffle 
blocks.
Proposed changes in the PR to fix the issue.

### What changes were proposed in this pull request?
Disabled Batch fetch when push based shuffle is enabled.

### Why are the changes needed?
Without this patch, enabling AQE and Push based shuffle will have a chance 
to hang the tasks.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Tested the PR within our PR, with Spark shell and the queries are:

sql("""SELECT CASE WHEN rand() < 0.8 THEN 100 ELSE CAST(rand() * 3000 
AS INT) END AS s_item_id, CAST(rand() * 100 AS INT) AS s_quantity, 
DATE_ADD(current_date(), - CAST(rand() * 360 AS INT)) AS s_date FROM 
RANGE(10)""").createOrReplaceTempView("sales")
// Dynamically coalesce partitions
sql("""SELECT s_date, sum(s_quantity) AS q FROM sales GROUP BY s_date ORDER 
BY q DESC""").collect

Unit tests to be added.

Closes #34156 from zhouyejoe/SPARK-36892.

Authored-by: Ye Zhou 
Signed-off-by: Gengliang Wang 
(cherry picked from commit 31b6f614d3173c8a5852243bf7d0b6200788432d)
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/MapOutputTracker.scala  |  62 ---
 .../spark/shuffle/sort/SortShuffleManager.scala|  16 ++-
 .../org/apache/spark/MapOutputTrackerSuite.scala   | 120 -
 docs/configuration.md  |   2 +-
 4 files changed, 183 insertions(+), 17 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/MapOutputTracker.scala 
b/core/src/main/scala/org/apache/spark/MapOutputTracker.scala
index ca1229a..588f7d2 100644
--- a/core/src/main/scala/org/apache/spark/MapOutputTracker.scala
+++ b/core/src/main/scala/org/apache/spark/MapOutputTracker.scala
@@ -436,6 +436,8 @@ private[spark] case class GetMapOutputMessage(shuffleId: 
Int,
   context: RpcCallContext) extends MapOutputTrackerMasterMessage
 private[spark] case class GetMapAndMergeOutputMessage(shuffleId: Int,
   context: RpcCallContext) extends MapOutputTrackerMasterMessage
+private[spark] case class MapSizesByExecutorId(
+  iter: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])], 
enableBatchFetch: Boolean)
 
 /** RpcEndpoint class for MapOutputTrackerMaster */
 private[spark] class MapOutputTrackerMasterEndpoint(
@@ -512,12 +514,19 @@ private[spark] abstract class MapOutputTracker(conf: 
SparkConf) extends Logging
 getMapSizesByExecutorId(shuffleId, 0, Int.MaxValue, reduceId, reduceId + 1)
   }
 
+  // For testing
+  def getPushBasedShuffleMapSizesByExecutorId(shuffleId: Int, reduceId: Int)
+  : MapSizesByExecutorId = {
+getPushBasedShuffleMapSizesByExecutorId(shuffleId, 0, Int.MaxValue, 
reduceId, reduceId + 1)
+  }
+
   /**
* Called from executors to get the server URIs and output sizes for each 
shuffle block that
* needs to be read from a given range of map output partitions 
(startPartition is included but
* endPartition is excluded from the range) within a range of mappers 
(startMapIndex is included
-   * but endMapIndex is excluded). If endMapIndex=Int.MaxValue, the actual 
endMapIndex will be
-   * changed to the length of total map outputs.
+   * but endMapIndex is excluded) when push based shuffle is not enabled for 
the specific shuffle
+   * dependency. If endMapIndex=Int.MaxValue, the actual endMapIndex will be 
changed to the length
+   * of total map outputs.
*
* @return A sequence of 2-item tuples, where the first item in the tuple is 
a BlockManagerId,
* and the second item is a sequence of (shuffle block id, shuffle 
block size, map index)
@@ -529,7 +538,34 @@ private[spark] abstract class MapOutputTracker(conf: 
SparkConf) extends Logging
   startMapIndex: Int,
   endMapIndex: Int,
   startPartition: Int,
-  endPartition: Int): Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])]
+  endPartition: Int): Iterator[(BlockManagerId, Seq[(BlockId, Long, 
Int)])] = {
+val mapSizesByExecutorId = getPushBasedShuffleMapSizesByExecutorId(
+  shuffleId, startMapIndex, endMapIndex, startPartition, endPartition)
+

[spark] branch master updated (5c6f0b9 -> 31b6f61)

2021-10-06 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5c6f0b9  [SPARK-36930][PYTHON] Support ps.MultiIndex.dtypes
 add 31b6f61  [SPARK-36892][CORE] Disable batch fetch for a shuffle when 
push based shuffle is enabled

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/MapOutputTracker.scala  |  62 ---
 .../spark/shuffle/sort/SortShuffleManager.scala|  16 ++-
 .../org/apache/spark/MapOutputTrackerSuite.scala   | 120 -
 docs/configuration.md  |   2 +-
 4 files changed, 183 insertions(+), 17 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36930][PYTHON] Support ps.MultiIndex.dtypes

2021-10-06 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5c6f0b9  [SPARK-36930][PYTHON] Support ps.MultiIndex.dtypes
5c6f0b9 is described below

commit 5c6f0b9263f29f805f386237448f671dea3ad6c5
Author: dchvn nguyen 
AuthorDate: Wed Oct 6 15:35:32 2021 +0900

[SPARK-36930][PYTHON] Support ps.MultiIndex.dtypes

### What changes were proposed in this pull request?
Add dtypes for MultiIndex

### Why are the changes needed?
Add dtypes for MultiIndex

Before this PR:

```python
>>> idx = pd.MultiIndex.from_arrays([[0, 1, 2, 3, 4, 5, 6, 7, 8], [1, 2, 3, 
4, 5, 6, 7, 8, 9]], names=("zero", "one"))
>>> pdf = pd.DataFrame(
... {"a": [1, 2, 3, 4, 5, 6, 7, 8, 9], "b": [4, 5, 6, 3, 2, 1, 0, 0, 
0]},
... index=idx,
... )
>>> psdf = ps.from_pandas(pdf)
>>>
>>> ps.DataFrame[psdf.index.dtypes, psdf.dtypes]
Traceback (most recent call last):
  File "", line 1, in 
  File "/u02/spark/python/pyspark/pandas/indexes/multi.py", line 917, in 
__getattr__
raise AttributeError("'MultiIndex' object has no attribute 
'{}'".format(item))
AttributeError: 'MultiIndex' object has no attribute 'dtypes'
>>>
```

### Does this PR introduce _any_ user-facing change?
After this PR user can use ```MultiIndex.dtypes``` for:

``` python
>>> ps.DataFrame[psdf.index.dtypes, psdf.dtypes]
typing.Tuple[pyspark.pandas.typedef.typehints.IndexNameType, 
pyspark.pandas.typedef.typehints.IndexNameType, 
pyspark.pandas.typedef.typehints.NameType, 
pyspark.pandas.typedef.typehints.NameType]
```

### How was this patch tested?
unit tests.

Closes #34179 from dchvn/add_multiindex_dtypes.

Lead-authored-by: dchvn nguyen 
Co-authored-by: dch nguyen 
Signed-off-by: Hyukjin Kwon 
---
 .../source/reference/pyspark.pandas/indexing.rst   |  1 +
 python/pyspark/pandas/indexes/multi.py | 29 ++
 python/pyspark/pandas/tests/test_dataframe.py  | 14 +++
 3 files changed, 44 insertions(+)

diff --git a/python/docs/source/reference/pyspark.pandas/indexing.rst 
b/python/docs/source/reference/pyspark.pandas/indexing.rst
index 7e796c6..4168b67 100644
--- a/python/docs/source/reference/pyspark.pandas/indexing.rst
+++ b/python/docs/source/reference/pyspark.pandas/indexing.rst
@@ -240,6 +240,7 @@ MultiIndex Properties
MultiIndex.nlevels
MultiIndex.levshape
MultiIndex.values
+   MultiIndex.dtypes
 
 MultiIndex components
 ~
diff --git a/python/pyspark/pandas/indexes/multi.py 
b/python/pyspark/pandas/indexes/multi.py
index cff3e26..896ea2a 100644
--- a/python/pyspark/pandas/indexes/multi.py
+++ b/python/pyspark/pandas/indexes/multi.py
@@ -375,6 +375,35 @@ class MultiIndex(Index):
 def name(self, name: Name) -> None:
 raise PandasNotImplementedError(class_name="pd.MultiIndex", 
property_name="name")
 
+@property
+def dtypes(self) -> pd.Series:
+"""Return the dtypes as a Series for the underlying MultiIndex.
+
+.. versionadded:: 3.3.0
+
+Returns
+---
+pd.Series
+The data type of each level.
+
+Examples
+
+>>> psmidx = ps.MultiIndex.from_arrays(
+... [[0, 1, 2, 3, 4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7, 8, 9]],
+... names=("zero", "one"),
+... )
+>>> psmidx.dtypes
+zeroint64
+one int64
+dtype: object
+"""
+return pd.Series(
+[field.dtype for field in self._internal.index_fields],
+index=pd.Index(
+[name if len(name) > 1 else name[0] for name in 
self._internal.index_names]
+),
+)
+
 def _verify_for_rename(self, name: List[Name]) -> List[Label]:  # type: 
ignore[override]
 if is_list_like(name):
 if self._internal.index_level != len(name):
diff --git a/python/pyspark/pandas/tests/test_dataframe.py 
b/python/pyspark/pandas/tests/test_dataframe.py
index 1ae009c..800fa46 100644
--- a/python/pyspark/pandas/tests/test_dataframe.py
+++ b/python/pyspark/pandas/tests/test_dataframe.py
@@ -6000,6 +6000,20 @@ class DataFrameTest(PandasOnSparkTestCase, SQLTestUtils):
 expected_pdf = pd.DataFrame({"A": [None, 0], "B": [4.0, 1.0], "C": 
[3, 3]})
 self.assert_eq(expected_pdf, psdf1.combine_first(psdf2))
 
+def test_multi_index_dtypes(self):
+# SPARK-36930: Support ps.MultiIndex.dtypes
+arrays = [[1, 1, 2, 2], ["red", "blue", "red", "blue"]]
+pmidx = pd.MultiIndex.from_arrays(arrays, names=("number", "color"))
+psmidx = ps.from_pandas(pmidx)
+
+self.assert_eq(psmidx.dtypes, pmidx.dtypes)
+
+# multiple

[spark] branch master updated: [SPARK-36742][PYTHON] Fix ps.to_datetime with plurals of keys like years, months, days

[spark] branch master updated (d99edac -> 1eb6ea3)

[spark] branch master updated: [SPARK-36884][PYTHON] Inline type hints for pyspark.sql.session

[spark] branch branch-3.2 updated: [SPARK-36939][PYTHON][DOCS] Add orphan migration page into list in PySpark documentation

[spark] branch master updated: [SPARK-36939][PYTHON][DOCS] Add orphan migration page into list in PySpark documentation

[spark] branch master updated (2953d4f -> 090c9bf)

[spark] branch master updated (aed977c -> 2953d4f)

svn commit: r50291 - in /dev/spark/v3.2.0-rc7-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/parqu

svn commit: r50290 - /dev/spark/v3.2.0-rc7-bin/

[spark] 01/01: Preparing development version 3.2.1-SNAPSHOT

[spark] branch branch-3.2 updated (9760c8a -> 5931f21)

[spark] tag v3.2.0-rc7 created (now 5d45a41)

[spark] 01/01: Preparing Spark release v3.2.0-rc7

[spark] branch branch-3.0 updated: [SPARK-36919][SQL] Make BadRecordException fields transient

[spark] branch branch-3.1 updated: [SPARK-36919][SQL] Make BadRecordException fields transient

[spark] branch branch-3.2 updated: [SPARK-36919][SQL] Make BadRecordException fields transient

[spark] branch master updated (f6013c8 -> aed977c)

[spark] branch master updated (31b6f61 -> f6013c8)

[spark] branch branch-3.2 updated: [SPARK-36892][CORE] Disable batch fetch for a shuffle when push based shuffle is enabled

[spark] branch master updated (5c6f0b9 -> 31b6f61)

[spark] branch master updated: [SPARK-36930][PYTHON] Support ps.MultiIndex.dtypes

21 matches

Site Navigation

Mail list logo

Footer information