(spark) branch master updated: [SPARK-49999][PYTHON][CONNECT] Support optional "column" parameter in box, kde and hist plots

2024-10-27 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 03e051b5e400 [SPARK-4][PYTHON][CONNECT] Support optional "column" 
parameter in box, kde and hist plots
03e051b5e400 is described below

commit 03e051b5e4007b21e12b188ae5e940706c1da7dc
Author: Xinrong Meng 
AuthorDate: Mon Oct 28 10:01:53 2024 +0800

[SPARK-4][PYTHON][CONNECT] Support optional "column" parameter in box, 
kde and hist plots

### What changes were proposed in this pull request?
Support for the optional “column” parameter has been added in box, kde, and 
hist plots. Now, when the column is not provided, all columns of valid types 
(NumericType, DateType, TimestampType) will be used to build the plots.

### Why are the changes needed?
- Reach parity with Pandas (on Spark) default behavior.
- Simplify usage by reducing the need for explicitly specifying columns.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Unit tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

    Closes #48628 from xinrong-meng/column_param.
    
Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/errors/error-conditions.json|  5 ++
 python/pyspark/sql/plot/core.py| 30 +
 python/pyspark/sql/plot/plotly.py  | 54 +++-
 .../sql/tests/plot/test_frame_plot_plotly.py   | 72 +-
 4 files changed, 121 insertions(+), 40 deletions(-)

diff --git a/python/pyspark/errors/error-conditions.json 
b/python/pyspark/errors/error-conditions.json
index ae9fbccceb3e..5aa0313631c0 100644
--- a/python/pyspark/errors/error-conditions.json
+++ b/python/pyspark/errors/error-conditions.json
@@ -816,6 +816,11 @@
 "message": [
   "Pipe function `` exited with error code ."
 ]
+  },
+"PLOT_INVALID_TYPE_COLUMN": {
+"message": [
+  "Column  must be one of  for plotting, got 
."
+]
   },
   "PLOT_NOT_NUMERIC_COLUMN": {
 "message": [
diff --git a/python/pyspark/sql/plot/core.py b/python/pyspark/sql/plot/core.py
index 158d9130560a..328ebe348878 100644
--- a/python/pyspark/sql/plot/core.py
+++ b/python/pyspark/sql/plot/core.py
@@ -360,7 +360,7 @@ class PySparkPlotAccessor:
 )
 return self(kind="pie", x=x, y=y, **kwargs)
 
-def box(self, column: Union[str, List[str]], **kwargs: Any) -> "Figure":
+def box(self, column: Optional[Union[str, List[str]]] = None, **kwargs: 
Any) -> "Figure":
 """
 Make a box plot of the DataFrame columns.
 
@@ -374,8 +374,9 @@ class PySparkPlotAccessor:
 
 Parameters
 --
-column: str or list of str
-Column name or list of names to be used for creating the boxplot.
+column: str or list of str, optional
+Column name or list of names to be used for creating the box plot.
+If None (default), all numeric columns will be used.
 **kwargs
 Extra arguments to `precision`: refer to a float that is used by
 pyspark to compute approximate statistics for building a boxplot.
@@ -399,6 +400,7 @@ class PySparkPlotAccessor:
 ... ]
 >>> columns = ["student", "math_score", "english_score"]
 >>> df = spark.createDataFrame(data, columns)
+>>> df.plot.box()  # doctest: +SKIP
 >>> df.plot.box(column="math_score")  # doctest: +SKIP
 >>> df.plot.box(column=["math_score", "english_score"])  # doctest: 
+SKIP
 """
@@ -406,9 +408,9 @@ class PySparkPlotAccessor:
 
 def kde(
 self,
-column: Union[str, List[str]],
 bw_method: Union[int, float],
-ind: Union["np.ndarray", int, None] = None,
+column: Optional[Union[str, List[str]]] = None,
+ind: Optional[Union["np.ndarray", int]] = None,
 **kwargs: Any,
 ) -> "Figure":
 """
@@ -420,11 +422,12 @@ class PySparkPlotAccessor:
 
 Parameters
 --
-column: str or list of str
-Column name or list of names to be used for creating the kde plot.
 bw_method : int or float
 The method used to calculate the estimator bandwidth.
 See KernelDensity in PySpark for more information.
+column: str or list of str, optional
+Column name or list of names to be used for creating the kde plot.
+If None (def

(spark) branch master updated: [SPARK-50001][PYTHON][PS][CONNECT] Adjust "precision" to be part of kwargs for box plots

2024-10-20 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d2e322314c78 [SPARK-50001][PYTHON][PS][CONNECT] Adjust "precision" to 
be part of kwargs for box plots
d2e322314c78 is described below

commit d2e322314c786b892f4d8b37f383fae8e8827ca9
Author: Xinrong Meng 
AuthorDate: Mon Oct 21 11:57:30 2024 +0800

[SPARK-50001][PYTHON][PS][CONNECT] Adjust "precision" to be part of kwargs 
for box plots

### What changes were proposed in this pull request?
Adjust "precision" to be kwargs for box plots in both Pandas on Spark and 
PySpark.

### Why are the changes needed?
Per discussion here 
(https://github.com/apache/spark/pull/48445#discussion_r1804042377), precision 
is Spark-specific implementation detail, so we wanted to keep “precision” as 
part of kwargs for box plots.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48513 from xinrong-meng/precision.

Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/pandas/plot/core.py | 15 +++
 python/pyspark/sql/plot/core.py| 13 +
 2 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/python/pyspark/pandas/plot/core.py 
b/python/pyspark/pandas/plot/core.py
index 12c17a06f153..f5652177fe4a 100644
--- a/python/pyspark/pandas/plot/core.py
+++ b/python/pyspark/pandas/plot/core.py
@@ -841,7 +841,7 @@ class PandasOnSparkPlotAccessor(PandasObject):
 elif isinstance(self.data, DataFrame):
 return self(kind="barh", x=x, y=y, **kwargs)
 
-def box(self, precision=0.01, **kwds):
+def box(self, **kwds):
 """
 Make a box plot of the DataFrame columns.
 
@@ -857,12 +857,11 @@ class PandasOnSparkPlotAccessor(PandasObject):
 
 Parameters
 --
-precision: scalar, default = 0.01
-This argument is used by pandas-on-Spark to compute approximate 
statistics
-for building a boxplot. Use *smaller* values to get more precise
-statistics.
-**kwds : optional
-Additional keyword arguments are documented in
+**kwds : dict, optional
+Extra arguments to `precision`: refer to a float that is used by
+pandas-on-Spark to compute approximate statistics for building a
+boxplot. The default value is 0.01. Use smaller values to get more
+precise statistics. Additional keyword arguments are documented in
 :meth:`pyspark.pandas.Series.plot`.
 
 Returns
@@ -901,7 +900,7 @@ class PandasOnSparkPlotAccessor(PandasObject):
 from pyspark.pandas import DataFrame, Series
 
 if isinstance(self.data, (Series, DataFrame)):
-return self(kind="box", precision=precision, **kwds)
+return self(kind="box", **kwds)
 
 def hist(self, bins=10, **kwds):
 """
diff --git a/python/pyspark/sql/plot/core.py b/python/pyspark/sql/plot/core.py
index f44c0768d433..178411e5c5ef 100644
--- a/python/pyspark/sql/plot/core.py
+++ b/python/pyspark/sql/plot/core.py
@@ -359,9 +359,7 @@ class PySparkPlotAccessor:
 )
 return self(kind="pie", x=x, y=y, **kwargs)
 
-def box(
-self, column: Union[str, List[str]], precision: float = 0.01, 
**kwargs: Any
-) -> "Figure":
+def box(self, column: Union[str, List[str]], **kwargs: Any) -> "Figure":
 """
 Make a box plot of the DataFrame columns.
 
@@ -377,11 +375,10 @@ class PySparkPlotAccessor:
 --
 column: str or list of str
 Column name or list of names to be used for creating the boxplot.
-precision: float, default = 0.01
-This argument is used by pyspark to compute approximate statistics
-for building a boxplot.
 **kwargs
-Additional keyword arguments.
+Extra arguments to `precision`: refer to a float that is used by
+pyspark to compute approximate statistics for building a boxplot.
+The default value is 0.01. Use smaller values to get more precise 
statistics.
 
 Returns
 ---
@@ -404,7 +401,7 @@ class PySparkPlotAccessor:
 >>> df.plot.box(column="math_score")  # doctest: +SKIP
 >>> df.plot.box(column=["math_score", "english_score"])  # doctest: 
+SKIP
 """
-return self(kind="box", column=column, precision=precision, **kwargs)
+  

(spark) branch master updated: [SPARK-49948][PS][CONNECT] Add parameter "precision" to pandas on Spark box plot

2024-10-15 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 861b5e98e6e4 [SPARK-49948][PS][CONNECT] Add parameter "precision" to 
pandas on Spark box plot
861b5e98e6e4 is described below

commit 861b5e98e6e4f61e376d756f085e0290e01fc8f4
Author: Xinrong Meng 
AuthorDate: Wed Oct 16 08:49:10 2024 +0800

[SPARK-49948][PS][CONNECT] Add parameter "precision" to pandas on Spark box 
plot

### What changes were proposed in this pull request?
Add parameter "precision" to pandas on Spark box plot.

### Why are the changes needed?
Previously, the box method used **kwds, allowing precision to be passed 
implicitly. Now, adding precision directly to the signature ensures clarity and 
explicit control, improving usability.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48445 from xinrong-meng/ps_box.

Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/pandas/plot/core.py | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/python/pyspark/pandas/plot/core.py 
b/python/pyspark/pandas/plot/core.py
index 7333fae1ad43..12c17a06f153 100644
--- a/python/pyspark/pandas/plot/core.py
+++ b/python/pyspark/pandas/plot/core.py
@@ -841,7 +841,7 @@ class PandasOnSparkPlotAccessor(PandasObject):
 elif isinstance(self.data, DataFrame):
 return self(kind="barh", x=x, y=y, **kwargs)
 
-def box(self, **kwds):
+def box(self, precision=0.01, **kwds):
 """
 Make a box plot of the DataFrame columns.
 
@@ -857,14 +857,13 @@ class PandasOnSparkPlotAccessor(PandasObject):
 
 Parameters
 --
-**kwds : optional
-Additional keyword arguments are documented in
-:meth:`pyspark.pandas.Series.plot`.
-
 precision: scalar, default = 0.01
 This argument is used by pandas-on-Spark to compute approximate 
statistics
 for building a boxplot. Use *smaller* values to get more precise
-statistics (matplotlib-only).
+statistics.
+**kwds : optional
+Additional keyword arguments are documented in
+:meth:`pyspark.pandas.Series.plot`.
 
 Returns
 ---
@@ -902,7 +901,7 @@ class PandasOnSparkPlotAccessor(PandasObject):
 from pyspark.pandas import DataFrame, Series
 
 if isinstance(self.data, (Series, DataFrame)):
-return self(kind="box", **kwds)
+return self(kind="box", precision=precision, **kwds)
 
 def hist(self, bins=10, **kwds):
 """


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49929][PYTHON][CONNECT] Support box plots

2024-10-14 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 488f68090b22 [SPARK-49929][PYTHON][CONNECT] Support box plots
488f68090b22 is described below

commit 488f68090b228b30ba4a3b75596c9904eef1f584
Author: Xinrong Meng 
AuthorDate: Tue Oct 15 08:31:33 2024 +0800

[SPARK-49929][PYTHON][CONNECT] Support box plots

### What changes were proposed in this pull request?
Support box plots with plotly backend on both Spark Connect and Spark 
classic.

### Why are the changes needed?
While Pandas on Spark supports plotting, PySpark currently lacks this 
feature. The proposed API will enable users to generate visualizations. This 
will provide users with an intuitive, interactive way to explore and understand 
large datasets directly from PySpark DataFrames, streamlining the data analysis 
workflow in distributed environments.

See more at [PySpark Plotting API 
Specification](https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit?usp=sharing)
 in progress.

Part of https://issues.apache.org/jira/browse/SPARK-49530.

### Does this PR introduce _any_ user-facing change?
Yes. Box plots are supported as shown below.

```py
>>> data = [
... ("A", 50, 55),
... ("B", 55, 60),
... ("C", 60, 65),
... ("D", 65, 70),
... ("E", 70, 75),
... # outliers
... ("F", 10, 15),
... ("G", 85, 90),
... ("H", 5, 150),
... ]
>>> columns = ["student", "math_score", "english_score"]
>>> sdf = spark.createDataFrame(data, columns)
>>> fig1 = sdf.plot.box(column=["math_score", "english_score"])
>>> fig1.show()  # see below
>>> fig2 = sdf.plot(kind="box", column="math_score")
>>> fig2.show()  # see below
```

fig1:
![newplot 
(17)](https://github.com/user-attachments/assets/8c36c344-f6de-47e3-bd63-c0f3b57efc43)

fig2:
![newplot 
(18)](https://github.com/user-attachments/assets/9b7b60f6-58ec-4eff-9544-d5ab88a88631)

    ### How was this patch tested?
Unit tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48447 from xinrong-meng/box.

Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/errors/error-conditions.json|   5 +
 python/pyspark/sql/plot/core.py| 153 -
 python/pyspark/sql/plot/plotly.py  |  77 ++-
 .../sql/tests/plot/test_frame_plot_plotly.py   |  77 ++-
 4 files changed, 307 insertions(+), 5 deletions(-)

diff --git a/python/pyspark/errors/error-conditions.json 
b/python/pyspark/errors/error-conditions.json
index 6ca21d5d..ab01d386645b 100644
--- a/python/pyspark/errors/error-conditions.json
+++ b/python/pyspark/errors/error-conditions.json
@@ -1103,6 +1103,11 @@
   "`` is not supported, it should be one of the values from 
"
 ]
   },
+  "UNSUPPORTED_PLOT_BACKEND_PARAM": {
+"message": [
+  "`` does not support `` set to , it should be one 
of the values from "
+]
+  },
   "UNSUPPORTED_SIGNATURE": {
 "message": [
   "Unsupported signature: ."
diff --git a/python/pyspark/sql/plot/core.py b/python/pyspark/sql/plot/core.py
index f9667ee2c0d6..4bf75474d92c 100644
--- a/python/pyspark/sql/plot/core.py
+++ b/python/pyspark/sql/plot/core.py
@@ -15,15 +15,17 @@
 # limitations under the License.
 #
 
-from typing import Any, TYPE_CHECKING, Optional, Union
+from typing import Any, TYPE_CHECKING, List, Optional, Union
 from types import ModuleType
 from pyspark.errors import PySparkRuntimeError, PySparkTypeError, 
PySparkValueError
+from pyspark.sql import Column, functions as F
 from pyspark.sql.types import NumericType
-from pyspark.sql.utils import require_minimum_plotly_version
+from pyspark.sql.utils import is_remote, require_minimum_plotly_version
 
 
 if TYPE_CHECKING:
-from pyspark.sql import DataFrame
+from pyspark.sql import DataFrame, Row
+from pyspark.sql._typing import ColumnOrName
 import pandas as pd
 from plotly.graph_objs import Figure
 
@@ -338,3 +340,148 @@ class PySparkPlotAccessor:
 },
 )
 return self(kind="pie", x=x, y=y, **kwargs)
+
+def box(
+self, column: Union[str, List[str]], precision: float = 0.01, 
**kwargs: Any
+) -> "Figure":
+

(spark) branch master updated (1abfd490d072 -> 1aae16089601)

2024-10-13 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 1abfd490d072 [SPARK-49943][PS] Remove `timestamp_ntz_to_long` from 
`PythonSQLUtils`
 add 1aae16089601 [SPARK-49928][PYTHON][TESTS] Refactor plot-related unit 
tests

No new revisions were added by this update.

Summary of changes:
 .../sql/tests/plot/test_frame_plot_plotly.py   | 242 -
 1 file changed, 192 insertions(+), 50 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49776][PYTHON][CONNECT] Support pie plots

2024-09-26 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 488c3f604490 [SPARK-49776][PYTHON][CONNECT] Support pie plots
488c3f604490 is described below

commit 488c3f604490c8632dde67a00118d49ccfcbf578
Author: Xinrong Meng 
AuthorDate: Fri Sep 27 08:35:10 2024 +0800

[SPARK-49776][PYTHON][CONNECT] Support pie plots

### What changes were proposed in this pull request?
Support area plots with plotly backend on both Spark Connect and Spark 
classic.

### Why are the changes needed?
While Pandas on Spark supports plotting, PySpark currently lacks this 
feature. The proposed API will enable users to generate visualizations. This 
will provide users with an intuitive, interactive way to explore and understand 
large datasets directly from PySpark DataFrames, streamlining the data analysis 
workflow in distributed environments.

See more at [PySpark Plotting API 
Specification](https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit?usp=sharing)
 in progress.

Part of https://issues.apache.org/jira/browse/SPARK-49530.

### Does this PR introduce _any_ user-facing change?
Yes. Area plots are supported as shown below.

```py
>>> from datetime import datetime
>>> data = [
... (3, 5, 20, datetime(2018, 1, 31)),
... (2, 5, 42, datetime(2018, 2, 28)),
... (3, 6, 28, datetime(2018, 3, 31)),
... (9, 12, 62, datetime(2018, 4, 30))]
>>> columns = ["sales", "signups", "visits", "date"]
>>> df = spark.createDataFrame(data, columns)
>>> fig = df.plot(kind="pie", x="date", y="sales")  # df.plot(kind="pie", 
x="date", y="sales")
>>> fig.show()
```
![newplot 
(8)](https://github.com/user-attachments/assets/c4078bb7-4d84-4607-bcd7-bdd6fbbf8e28)

### How was this patch tested?
Unit tests.

    ### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48256 from xinrong-meng/plot_pie.

Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/errors/error-conditions.json|  5 +++
 python/pyspark/sql/plot/core.py| 41 +-
 python/pyspark/sql/plot/plotly.py  | 15 
 .../sql/tests/plot/test_frame_plot_plotly.py   | 25 +
 4 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/errors/error-conditions.json 
b/python/pyspark/errors/error-conditions.json
index 115ad658e32f..ed62ea117d36 100644
--- a/python/pyspark/errors/error-conditions.json
+++ b/python/pyspark/errors/error-conditions.json
@@ -812,6 +812,11 @@
   "Pipe function `` exited with error code ."
 ]
   },
+  "PLOT_NOT_NUMERIC_COLUMN": {
+"message": [
+  "Argument  must be a numerical column for plotting, got 
."
+]
+  },
   "PYTHON_HASH_SEED_NOT_SET": {
 "message": [
   "Randomness of hash of string should be disabled via PYTHONHASHSEED."
diff --git a/python/pyspark/sql/plot/core.py b/python/pyspark/sql/plot/core.py
index 9f83d0069652..f9667ee2c0d6 100644
--- a/python/pyspark/sql/plot/core.py
+++ b/python/pyspark/sql/plot/core.py
@@ -17,7 +17,8 @@
 
 from typing import Any, TYPE_CHECKING, Optional, Union
 from types import ModuleType
-from pyspark.errors import PySparkRuntimeError, PySparkValueError
+from pyspark.errors import PySparkRuntimeError, PySparkTypeError, 
PySparkValueError
+from pyspark.sql.types import NumericType
 from pyspark.sql.utils import require_minimum_plotly_version
 
 
@@ -97,6 +98,7 @@ class PySparkPlotAccessor:
 "bar": PySparkTopNPlotBase().get_top_n,
 "barh": PySparkTopNPlotBase().get_top_n,
 "line": PySparkSampledPlotBase().get_sampled,
+"pie": PySparkTopNPlotBase().get_top_n,
 "scatter": PySparkSampledPlotBase().get_sampled,
 }
 _backends = {}  # type: ignore[var-annotated]
@@ -299,3 +301,40 @@ class PySparkPlotAccessor:
 >>> df.plot.area(x='date', y=['sales', 'signups', 'visits'])  # 
doctest: +SKIP
 """
 return self(kind="area", x=x, y=y, **kwargs)
+
+def pie(self, x: str, y: str, **kwargs: Any) -> "Figure":
+"""
+Generate a pie plot.
+
+A pie plot is a proportional representation of the numerical data in a
+column.
+
+Parameters
+--
+x : str
+Name of column to be u

(spark) branch master updated: [SPARK-49694][PYTHON][CONNECT] Support scatter plots

2024-09-24 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6bdd151d5775 [SPARK-49694][PYTHON][CONNECT] Support scatter plots
6bdd151d5775 is described below

commit 6bdd151d57759d73870f20780fc54ab2aa250409
Author: Xinrong Meng 
AuthorDate: Tue Sep 24 15:40:38 2024 +0800

[SPARK-49694][PYTHON][CONNECT] Support scatter plots

### What changes were proposed in this pull request?
Support scatter plots with plotly backend on both Spark Connect and Spark 
classic.

### Why are the changes needed?
While Pandas on Spark supports plotting, PySpark currently lacks this 
feature. The proposed API will enable users to generate visualizations. This 
will provide users with an intuitive, interactive way to explore and understand 
large datasets directly from PySpark DataFrames, streamlining the data analysis 
workflow in distributed environments.

See more at [PySpark Plotting API 
Specification](https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit?usp=sharing)
 in progress.

Part of https://issues.apache.org/jira/browse/SPARK-49530.

### Does this PR introduce _any_ user-facing change?
Yes. Scatter plots are supported as shown below.

```py
>>> data = [(5.1, 3.5, 0), (4.9, 3.0, 0), (7.0, 3.2, 1), (6.4, 3.2, 1), 
(5.9, 3.0, 2)]
>>> columns = ["length", "width", "species"]
>>> sdf = spark.createDataFrame(data, columns)
>>> fig = sdf.plot(kind="scatter", x="length", y="width")  # or fig = 
sdf.plot.scatter(x="length", y="width")
>>> fig.show()
```
![newplot 
(6)](https://github.com/user-attachments/assets/deef452b-74d1-4f6d-b1ae-60722f3c2b17)

### How was this patch tested?
Unit tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48219 from xinrong-meng/plot_scatter.

Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/sql/plot/core.py| 34 ++
 .../sql/tests/plot/test_frame_plot_plotly.py   | 19 
 2 files changed, 53 insertions(+)

diff --git a/python/pyspark/sql/plot/core.py b/python/pyspark/sql/plot/core.py
index eb00b8a04f97..0a3a0101e189 100644
--- a/python/pyspark/sql/plot/core.py
+++ b/python/pyspark/sql/plot/core.py
@@ -96,6 +96,7 @@ class PySparkPlotAccessor:
 "bar": PySparkTopNPlotBase().get_top_n,
 "barh": PySparkTopNPlotBase().get_top_n,
 "line": PySparkSampledPlotBase().get_sampled,
+"scatter": PySparkSampledPlotBase().get_sampled,
 }
 _backends = {}  # type: ignore[var-annotated]
 
@@ -230,3 +231,36 @@ class PySparkPlotAccessor:
 ... )  # doctest: +SKIP
 """
 return self(kind="barh", x=x, y=y, **kwargs)
+
+def scatter(self, x: str, y: str, **kwargs: Any) -> "Figure":
+"""
+Create a scatter plot with varying marker point size and color.
+
+The coordinates of each point are defined by two dataframe columns and
+filled circles are used to represent each point. This kind of plot is
+useful to see complex correlations between two variables. Points could
+be for instance natural 2D coordinates like longitude and latitude in
+a map or, in general, any pair of metrics that can be plotted against
+each other.
+
+Parameters
+--
+x : str
+Name of column to use as horizontal coordinates for each point.
+y : str or list of str
+Name of column to use as vertical coordinates for each point.
+**kwargs: Optional
+Additional keyword arguments.
+
+Returns
+---
+:class:`plotly.graph_objs.Figure`
+
+Examples
+
+>>> data = [(5.1, 3.5, 0), (4.9, 3.0, 0), (7.0, 3.2, 1), (6.4, 3.2, 
1), (5.9, 3.0, 2)]
+>>> columns = ['length', 'width', 'species']
+>>> df = spark.createDataFrame(data, columns)
+>>> df.plot.scatter(x='length', y='width')  # doctest: +SKIP
+"""
+return self(kind="scatter", x=x, y=y, **kwargs)
diff --git a/python/pyspark/sql/tests/plot/test_frame_plot_plotly.py 
b/python/pyspark/sql/tests/plot/test_frame_plot_plotly.py
index 1c52c93a23d3..ccfe1a75424e 100644
--- a/python/pyspark/sql/tests/plot/test_frame_plot_plotly.py
+++ b/python/pyspark/sql/tests/plot/test_frame_plot_plotly.py
@@ -28,6 +28,12 @@ class DataFramePlo

(spark) branch master updated: [SPARK-49626][PYTHON][CONNECT] Support horizontal and vertical bar plots

2024-09-23 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 44ec70f5103f [SPARK-49626][PYTHON][CONNECT] Support horizontal and 
vertical bar plots
44ec70f5103f is described below

commit 44ec70f5103fc5674497373ac5c23e8145ae5660
Author: Xinrong Meng 
AuthorDate: Mon Sep 23 18:28:19 2024 +0800

[SPARK-49626][PYTHON][CONNECT] Support horizontal and vertical bar plots

### What changes were proposed in this pull request?
Support horizontal and vertical bar plots with plotly backend on both Spark 
Connect and Spark classic.

### Why are the changes needed?
While Pandas on Spark supports plotting, PySpark currently lacks this 
feature. The proposed API will enable users to generate visualizations. This 
will provide users with an intuitive, interactive way to explore and understand 
large datasets directly from PySpark DataFrames, streamlining the data analysis 
workflow in distributed environments.

See more at [PySpark Plotting API 
Specification](https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit?usp=sharing)
 in progress.

Part of https://issues.apache.org/jira/browse/SPARK-49530.

### Does this PR introduce _any_ user-facing change?
Yes.

```python
>>> data = [("A", 10, 1.5), ("B", 30, 2.5), ("C", 20, 3.5)]
>>> columns = ["category", "int_val", "float_val"]
>>> sdf = spark.createDataFrame(data, columns)
>>> sdf.show()
++---+-+
|category|int_val|float_val|
++---+-+
|   A| 10|  1.5|
|   B| 30|  2.5|
|   C| 20|  3.5|
++---+-+

>>> f = sdf.plot(kind="bar", x="category", y=["int_val", "float_val"])
>>> f.show()  # see below
>>> g = sdf.plot.barh(x=["int_val", "float_val"], y="category")
>>> g.show()  # see below
```
`f.show()`:
![newplot 
(4)](https://github.com/user-attachments/assets/0df9ee86-fb48-4796-b6c3-aaf2879217aa)

`g.show()`:
![newplot 
(3)](https://github.com/user-attachments/assets/f39b01c3-66e6-464b-b2e8-badebb39bc67)

### How was this patch tested?
Unit tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48100 from xinrong-meng/plot_bar.

Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/sql/plot/core.py| 79 ++
 .../sql/tests/plot/test_frame_plot_plotly.py   | 44 ++--
 2 files changed, 117 insertions(+), 6 deletions(-)

diff --git a/python/pyspark/sql/plot/core.py b/python/pyspark/sql/plot/core.py
index 392ef73b3884..ed22d02370ca 100644
--- a/python/pyspark/sql/plot/core.py
+++ b/python/pyspark/sql/plot/core.py
@@ -75,6 +75,8 @@ class PySparkSampledPlotBase:
 
 class PySparkPlotAccessor:
 plot_data_map = {
+"bar": PySparkTopNPlotBase().get_top_n,
+"barh": PySparkTopNPlotBase().get_top_n,
 "line": PySparkSampledPlotBase().get_sampled,
 }
 _backends = {}  # type: ignore[var-annotated]
@@ -133,3 +135,80 @@ class PySparkPlotAccessor:
 >>> df.plot.line(x="category", y=["int_val", "float_val"])  # doctest: 
+SKIP
 """
 return self(kind="line", x=x, y=y, **kwargs)
+
+def bar(self, x: str, y: Union[str, list[str]], **kwargs: Any) -> "Figure":
+"""
+Vertical bar plot.
+
+A bar plot is a plot that presents categorical data with rectangular 
bars with lengths
+proportional to the values that they represent. A bar plot shows 
comparisons among
+discrete categories. One axis of the plot shows the specific 
categories being compared,
+and the other axis represents a measured value.
+
+Parameters
+--
+x : str
+Name of column to use for the horizontal axis.
+y : str or list of str
+Name(s) of the column(s) to use for the vertical axis.
+Multiple columns can be plotted.
+**kwargs : optional
+Additional keyword arguments.
+
+Returns
+---
+:class:`plotly.graph_objs.Figure`
+
+Examples
+
+>>> data = [("A", 10, 1.5), ("B", 30, 2.5), ("C", 20, 3.5)]
+>>> columns = ["category", "int_val", "float_val"]
+>>> 

(spark) branch master updated: [SPARK-47864][FOLLOWUP][PYTHON][DOCS] Fix minor typo: "MLLib" -> "MLlib"

2024-04-23 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e50737be366a [SPARK-47864][FOLLOWUP][PYTHON][DOCS] Fix minor typo: 
"MLLib" -> "MLlib"
e50737be366a is described below

commit e50737be366ac0e8d5466b714f7d41991d0b05a8
Author: Haejoon Lee 
AuthorDate: Tue Apr 23 10:10:20 2024 -0700

[SPARK-47864][FOLLOWUP][PYTHON][DOCS] Fix minor typo: "MLLib" -> "MLlib"

### What changes were proposed in this pull request?

This PR followups for https://github.com/apache/spark/pull/46096 to fix 
minor typo.

### Why are the changes needed?

To use official naming from documentation for `MLlib` instead of `MLLib`. 
See https://spark.apache.org/mllib/.

### Does this PR introduce _any_ user-facing change?

No API change, but the user-facing documentation will be updated.

### How was this patch tested?

Manually built the doc from local test envs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46174 from itholic/minor_typo_installation.

    Authored-by: Haejoon Lee 
Signed-off-by: Xinrong Meng 
---
 python/docs/source/getting_started/install.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/python/docs/source/getting_started/install.rst 
b/python/docs/source/getting_started/install.rst
index 33a0560764df..ee894981387a 100644
--- a/python/docs/source/getting_started/install.rst
+++ b/python/docs/source/getting_started/install.rst
@@ -244,7 +244,7 @@ Additional libraries that enhance functionality but are not 
included in the inst
 - **matplotlib**: Provide plotting for visualization. The default is 
**plotly**.
 
 
-MLLib DataFrame-based API
+MLlib DataFrame-based API
 ^
 
 Installable with ``pip install "pyspark[ml]"``.
@@ -252,7 +252,7 @@ Installable with ``pip install "pyspark[ml]"``.
 === = ==
 Package Supported version Note
 === = ==
-`numpy` >=1.21Required for MLLib DataFrame-based API
+`numpy` >=1.21Required for MLlib DataFrame-based API
 === = ==
 
 Additional libraries that enhance functionality but are not included in the 
installation packages:
@@ -272,5 +272,5 @@ Installable with ``pip install "pyspark[mllib]"``.
 === = ==
 Package Supported version Note
 === = ==
-`numpy` >=1.21Required for MLLib
+`numpy` >=1.21Required for MLlib
 === = ==


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (f9ebe1b3d24b -> 6c827c10dc15)

2024-04-16 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f9ebe1b3d24b [SPARK-46375][DOCS] Add user guide for Python data source 
API
 add 6c827c10dc15 [SPARK-47876][PYTHON][DOCS] Improve docstring of 
mapInArrow

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/pandas/map_ops.py | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling

2024-03-07 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 501999a834ea [SPARK-47276][PYTHON][CONNECT] Introduce 
`spark.profile.clear` for SparkSession-based profiling
501999a834ea is described below

commit 501999a834ea7761a792b823c543e40fba84231d
Author: Xinrong Meng 
AuthorDate: Thu Mar 7 13:20:39 2024 -0800

[SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for 
SparkSession-based profiling

### What changes were proposed in this pull request?
Introduce `spark.profile.clear` for SparkSession-based profiling.

### Why are the changes needed?
A straightforward and unified interface for managing and resetting 
profiling results for SparkSession-based profilers.

### Does this PR introduce _any_ user-facing change?
Yes. `spark.profile.clear` is supported as shown below.

Preparation:
```py
>>> from pyspark.sql.functions import pandas_udf
>>> df = spark.range(3)
>>> pandas_udf("long")
... def add1(x):
...   return x + 1
...
>>> added = df.select(add1("id"))
>>> spark.conf.set("spark.sql.pyspark.udf.profiler", "perf")
>>> added.show()
++
|add1(id)|
++
...
++
>>> spark.profile.show()

Profile of UDF

 1410 function calls (1374 primitive calls) in 0.004 seconds
...
```

Example usage:
```py
>>> spark.profile.profiler_collector._profile_results
{2: (, None)}

>>> spark.profile.clear(1)  # id mismatch
>>> spark.profile.profiler_collector._profile_results
{2: (, None)}

>>> spark.profile.clear(type="memory")  # type mismatch
>>> spark.profile.profiler_collector._profile_results
{2: (, None)}

>>> spark.profile.clear()  # clear all
>>> spark.profile.profiler_collector._profile_results
{}
>>> spark.profile.show()
>>>
```

### How was this patch tested?
Unit tests.

    ### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45378 from xinrong-meng/profile_clear.

Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/sql/profiler.py| 79 +++
 python/pyspark/sql/tests/test_session.py  | 27 +
 python/pyspark/sql/tests/test_udf_profiler.py | 26 +
 python/pyspark/tests/test_memory_profiler.py  | 59 
 4 files changed, 191 insertions(+)

diff --git a/python/pyspark/sql/profiler.py b/python/pyspark/sql/profiler.py
index 5ab27bce2582..711e39de4723 100644
--- a/python/pyspark/sql/profiler.py
+++ b/python/pyspark/sql/profiler.py
@@ -224,6 +224,56 @@ class ProfilerCollector(ABC):
 for id in sorted(code_map.keys()):
 dump(id)
 
+def clear_perf_profiles(self, id: Optional[int] = None) -> None:
+"""
+Clear the perf profile results.
+
+.. versionadded:: 4.0.0
+
+Parameters
+--
+id : int, optional
+The UDF ID whose profiling results should be cleared.
+If not specified, all the results will be cleared.
+"""
+with self._lock:
+if id is not None:
+if id in self._profile_results:
+perf, mem, *_ = self._profile_results[id]
+self._profile_results[id] = (None, mem, *_)
+if mem is None:
+self._profile_results.pop(id, None)
+else:
+for id, (perf, mem, *_) in list(self._profile_results.items()):
+self._profile_results[id] = (None, mem, *_)
+if mem is None:
+self._profile_results.pop(id, None)
+
+def clear_memory_profiles(self, id: Optional[int] = None) -> None:
+"""
+Clear the memory profile results.
+
+.. versionadded:: 4.0.0
+
+Parameters
+--
+id : int, optional
+The UDF ID whose profiling results should be cleared.
+If not specified, all the results will be cleared.
+"""
+with self._lock:
+if id is not None:
+if id in self._profile_results:
+perf, mem, *_ = self._profile_results[id]
+self._profile_results[id] = (perf, None, *_)
+if perf is N

(spark) branch master updated (06c741a0061b -> d20650bc8cf2)

2024-02-23 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 06c741a0061b [SPARK-47129][CONNECT][SQL] Make `ResolveRelations` cache 
connect plan properly
 add d20650bc8cf2 [SPARK-46975][PS] Support dedicated fallback methods

No new revisions were added by this update.

Summary of changes:
 python/pyspark/pandas/frame.py | 49 +++---
 1 file changed, 36 insertions(+), 13 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (6de527e9ee94 -> 6185e5cad7be)

2024-02-22 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 6de527e9ee94 [SPARK-43259][SQL] Assign a name to the error class 
_LEGACY_ERROR_TEMP_2024
 add 6185e5cad7be [SPARK-47132][DOCS][PYTHON] Correct docstring for 
pyspark's dataframe.head

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/dataframe.py | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47014][PYTHON][CONNECT] Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession

2024-02-14 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4b9e9d7a9b7c [SPARK-47014][PYTHON][CONNECT] Implement methods 
dumpPerfProfiles and dumpMemoryProfiles of SparkSession
4b9e9d7a9b7c is described below

commit 4b9e9d7a9b7c1b21c7d04cdf0095cc069a35b757
Author: Xinrong Meng 
AuthorDate: Wed Feb 14 10:37:33 2024 -0800

[SPARK-47014][PYTHON][CONNECT] Implement methods dumpPerfProfiles and 
dumpMemoryProfiles of SparkSession

### What changes were proposed in this pull request?
Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession

### Why are the changes needed?
Complete support of (v2) SparkSession-based profiling.

### Does this PR introduce _any_ user-facing change?
Yes. dumpPerfProfiles and dumpMemoryProfiles of SparkSession are supported.

An example of dumpPerfProfiles is shown below.

```py
>>> udf("long")
... def add(x):
...   return x + 1
...
>>> spark.conf.set("spark.sql.pyspark.udf.profiler", "perf")
>>> spark.range(10).select(add("id")).collect()
...
>>> spark.dumpPerfProfiles("dummy_dir")
>>> os.listdir("dummy_dir")
['udf_2.pstats']
```

### How was this patch tested?
Unit tests.
    
### Was this patch authored or co-authored using generative AI tooling?
    No.
    
Closes #45073 from xinrong-meng/dump_profile.

Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/sql/connect/session.py | 10 +
 python/pyspark/sql/profiler.py| 65 +++
 python/pyspark/sql/session.py | 10 +
 python/pyspark/sql/tests/test_udf_profiler.py | 20 +
 python/pyspark/tests/test_memory_profiler.py  | 22 +
 5 files changed, 110 insertions(+), 17 deletions(-)

diff --git a/python/pyspark/sql/connect/session.py 
b/python/pyspark/sql/connect/session.py
index 9a678c28a6cc..764f71ccc415 100644
--- a/python/pyspark/sql/connect/session.py
+++ b/python/pyspark/sql/connect/session.py
@@ -958,6 +958,16 @@ class SparkSession:
 
 showMemoryProfiles.__doc__ = PySparkSession.showMemoryProfiles.__doc__
 
+def dumpPerfProfiles(self, path: str, id: Optional[int] = None) -> None:
+self._profiler_collector.dump_perf_profiles(path, id)
+
+dumpPerfProfiles.__doc__ = PySparkSession.dumpPerfProfiles.__doc__
+
+def dumpMemoryProfiles(self, path: str, id: Optional[int] = None) -> None:
+self._profiler_collector.dump_memory_profiles(path, id)
+
+dumpMemoryProfiles.__doc__ = PySparkSession.dumpMemoryProfiles.__doc__
+
 
 SparkSession.__doc__ = PySparkSession.__doc__
 
diff --git a/python/pyspark/sql/profiler.py b/python/pyspark/sql/profiler.py
index 565752197238..0db9d9b8b9b4 100644
--- a/python/pyspark/sql/profiler.py
+++ b/python/pyspark/sql/profiler.py
@@ -15,6 +15,7 @@
 # limitations under the License.
 #
 from abc import ABC, abstractmethod
+import os
 import pstats
 from threading import RLock
 from typing import Dict, Optional, TYPE_CHECKING
@@ -158,6 +159,70 @@ class ProfilerCollector(ABC):
 """
 ...
 
+def dump_perf_profiles(self, path: str, id: Optional[int] = None) -> None:
+"""
+Dump the perf profile results into directory `path`.
+
+.. versionadded:: 4.0.0
+
+Parameters
+--
+path: str
+A directory in which to dump the perf profile.
+id : int, optional
+A UDF ID to be shown. If not specified, all the results will be 
shown.
+"""
+with self._lock:
+stats = self._perf_profile_results
+
+def dump(id: int) -> None:
+s = stats.get(id)
+
+if s is not None:
+if not os.path.exists(path):
+os.makedirs(path)
+p = os.path.join(path, f"udf_{id}_perf.pstats")
+s.dump_stats(p)
+
+if id is not None:
+dump(id)
+else:
+for id in sorted(stats.keys()):
+dump(id)
+
+def dump_memory_profiles(self, path: str, id: Optional[int] = None) -> 
None:
+"""
+Dump the memory profile results into directory `path`.
+
+.. versionadded:: 4.0.0
+
+Parameters
+--
+path: str
+A directory in which to dump the memory profile.
+id : int, optional
+A UDF ID to be shown. If not specified, all the results will be 
shown.
+"""
+with self._lock:
+code_map = self._memory_profile_res

(spark) branch master updated: [SPARK-46689][SPARK-46690][PYTHON][CONNECT] Support v2 profiling in group/cogroup applyInPandas/applyInArrow

2024-02-08 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1a66c8c78a46 [SPARK-46689][SPARK-46690][PYTHON][CONNECT] Support v2 
profiling in group/cogroup applyInPandas/applyInArrow
1a66c8c78a46 is described below

commit 1a66c8c78a468a5bdc6c033e8c7a26693e4bf62e
Author: Xinrong Meng 
AuthorDate: Thu Feb 8 10:56:28 2024 -0800

[SPARK-46689][SPARK-46690][PYTHON][CONNECT] Support v2 profiling in 
group/cogroup applyInPandas/applyInArrow

### What changes were proposed in this pull request?
Support v2 (perf, memory) profiling in group/cogroup 
applyInPandas/applyInArrow, which rely on physical plan nodes 
FlatMapGroupsInBatchExec and FlatMapCoGroupsInBatchExec.

### Why are the changes needed?
Complete v2 profiling support.

### Does this PR introduce _any_ user-facing change?
Yes. V2 profiling in group/cogroup applyInPandas/applyInArrow is supported.

### How was this patch tested?
Unit tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45050 from xinrong-meng/other_p2.

Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/sql/tests/test_udf_profiler.py  | 123 +
 python/pyspark/tests/test_memory_profiler.py   | 123 +
 .../python/FlatMapCoGroupsInBatchExec.scala|   2 +-
 .../python/FlatMapGroupsInBatchExec.scala  |   2 +-
 4 files changed, 248 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/tests/test_udf_profiler.py 
b/python/pyspark/sql/tests/test_udf_profiler.py
index 99719b5475c1..4f767d274414 100644
--- a/python/pyspark/sql/tests/test_udf_profiler.py
+++ b/python/pyspark/sql/tests/test_udf_profiler.py
@@ -394,6 +394,129 @@ class UDFProfiler2TestsMixin:
 io.getvalue(), 
f"2.*{os.path.basename(inspect.getfile(_do_computation))}"
 )
 
+@unittest.skipIf(
+not have_pandas or not have_pyarrow,
+cast(str, pandas_requirement_message or pyarrow_requirement_message),
+)
+def test_perf_profiler_group_apply_in_pandas(self):
+# FlatMapGroupsInBatchExec
+df = self.spark.createDataFrame(
+[(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)], ("id", "v")
+)
+
+def normalize(pdf):
+v = pdf.v
+return pdf.assign(v=(v - v.mean()) / v.std())
+
+with self.sql_conf({"spark.sql.pyspark.udf.profiler": "perf"}):
+df.groupby("id").applyInPandas(normalize, schema="id long, v 
double").show()
+
+self.assertEqual(1, len(self.profile_results), 
str(self.profile_results.keys()))
+
+for id in self.profile_results:
+with self.trap_stdout() as io:
+self.spark.showPerfProfiles(id)
+
+self.assertIn(f"Profile of UDF", io.getvalue())
+self.assertRegex(
+io.getvalue(), 
f"2.*{os.path.basename(inspect.getfile(_do_computation))}"
+)
+
+@unittest.skipIf(
+not have_pandas or not have_pyarrow,
+cast(str, pandas_requirement_message or pyarrow_requirement_message),
+)
+def test_perf_profiler_cogroup_apply_in_pandas(self):
+# FlatMapCoGroupsInBatchExec
+import pandas as pd
+
+df1 = self.spark.createDataFrame(
+[(2101, 1, 1.0), (2101, 2, 2.0), (2102, 1, 3.0), 
(2102, 2, 4.0)],
+("time", "id", "v1"),
+)
+df2 = self.spark.createDataFrame(
+[(2101, 1, "x"), (2101, 2, "y")], ("time", "id", "v2")
+)
+
+def asof_join(left, right):
+return pd.merge_asof(left, right, on="time", by="id")
+
+with self.sql_conf({"spark.sql.pyspark.udf.profiler": "perf"}):
+df1.groupby("id").cogroup(df2.groupby("id")).applyInPandas(
+asof_join, schema="time int, id int, v1 double, v2 string"
+).show()
+
+self.assertEqual(1, len(self.profile_results), 
str(self.profile_results.keys()))
+
+for id in self.profile_results:
+with self.trap_stdout() as io:
+self.spark.showPerfProfiles(id)
+
+self.assertIn(f"Profile of UDF", io.getvalue())
+self.assertRegex(
+io.getvalue(), 
f"2.*{os.path.basename(inspect.getfile(_do_computation))}"
+)
+
+@unittest.skipIf(
+not have_pandas or not have_pyarrow,
+cast(str, pandas_requirement_message or pyarrow_requirement_messag

(spark) branch master updated: [SPARK-46867][PYTHON][CONNECT][TESTS] Remove unnecessary dependency from test_mixed_udf_and_sql.py

2024-01-25 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 79918028b142 [SPARK-46867][PYTHON][CONNECT][TESTS] Remove unnecessary 
dependency from test_mixed_udf_and_sql.py
79918028b142 is described below

commit 79918028b142685fe1c3871a3593e91100ab6bbf
Author: Xinrong Meng 
AuthorDate: Thu Jan 25 14:16:12 2024 -0800

[SPARK-46867][PYTHON][CONNECT][TESTS] Remove unnecessary dependency from 
test_mixed_udf_and_sql.py

### What changes were proposed in this pull request?
Remove unnecessary dependency from test_mixed_udf_and_sql.py.

### Why are the changes needed?
Otherwise, test_mixed_udf_and_sql.py depends on Spark Connect's dependency 
"grpc", possibly leading to conflicts or compatibility issues.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Test change only.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #44886 from xinrong-meng/fix_dep.

    Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/sql/tests/connect/test_parity_pandas_udf_scalar.py | 4 
 python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py | 5 +++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/tests/connect/test_parity_pandas_udf_scalar.py 
b/python/pyspark/sql/tests/connect/test_parity_pandas_udf_scalar.py
index c950ca2e17c3..6a3d03246549 100644
--- a/python/pyspark/sql/tests/connect/test_parity_pandas_udf_scalar.py
+++ b/python/pyspark/sql/tests/connect/test_parity_pandas_udf_scalar.py
@@ -15,6 +15,7 @@
 # limitations under the License.
 #
 import unittest
+from pyspark.sql.connect.column import Column
 from pyspark.sql.tests.pandas.test_pandas_udf_scalar import 
ScalarPandasUDFTestsMixin
 from pyspark.testing.connectutils import ReusedConnectTestCase
 
@@ -51,6 +52,9 @@ class PandasUDFScalarParityTests(ScalarPandasUDFTestsMixin, 
ReusedConnectTestCas
 def test_vectorized_udf_invalid_length(self):
 self.check_vectorized_udf_invalid_length()
 
+def test_mixed_udf_and_sql(self):
+self._test_mixed_udf_and_sql(Column)
+
 
 if __name__ == "__main__":
 from pyspark.sql.tests.connect.test_parity_pandas_udf_scalar import *  # 
noqa: F401
diff --git a/python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py 
b/python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py
index dfbab5c8b3cd..9f6bdb83caf7 100644
--- a/python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py
+++ b/python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py
@@ -1321,8 +1321,9 @@ class ScalarPandasUDFTestsMixin:
 self.assertEqual(expected_multi, df_multi_2.collect())
 
 def test_mixed_udf_and_sql(self):
-from pyspark.sql.connect.column import Column as ConnectColumn
+self._test_mixed_udf_and_sql(Column)
 
+def _test_mixed_udf_and_sql(self, col_type):
 df = self.spark.range(0, 1).toDF("v")
 
 # Test mixture of UDFs, Pandas UDFs and SQL expression.
@@ -1333,7 +1334,7 @@ class ScalarPandasUDFTestsMixin:
 return x + 1
 
 def f2(x):
-assert type(x) in (Column, ConnectColumn)
+assert type(x) == col_type
 return x + 10
 
 @pandas_udf("int")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46663][PYTHON] Disable memory profiler for pandas UDFs with iterators

2024-01-16 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 48152b1779a5 [SPARK-46663][PYTHON] Disable memory profiler for pandas 
UDFs with iterators
48152b1779a5 is described below

commit 48152b1779a5b8191dd0e09424fdb552cac55d49
Author: Xinrong Meng 
AuthorDate: Tue Jan 16 11:20:40 2024 -0800

[SPARK-46663][PYTHON] Disable memory profiler for pandas UDFs with iterators

### What changes were proposed in this pull request?
When using pandas UDFs with iterators, if users enable the profiling spark 
conf, a warning indicating non-support should be raised, and profiling should 
be disabled.

However, currently, after raising the not-supported warning, the memory 
profiler is still being enabled.

The PR proposed to fix that.

### Why are the changes needed?
A bug fix to eliminate misleading behavior.

### Does this PR introduce _any_ user-facing change?
The noticeable changes will affect only those using the PySpark shell. This 
is because, in the PySpark shell, the memory profiler will raise an error, 
which in turn blocks the execution of the UDF.

### How was this patch tested?
Manual test.

### Was this patch authored or co-authored using generative AI tooling?
Setup:
```py
$ ./bin/pyspark --conf spark.python.profile=true

>>> from typing import Iterator
>>> from pyspark.sql.functions import *
>>> import pandas as pd
>>> pandas_udf("long")
... def plus_one(iterator: Iterator[pd.Series]) -> Iterator[pd.Series]:
... for s in iterator:
... yield s + 1
...
>>> df = spark.createDataFrame(pd.DataFrame([1, 2, 3], columns=["v"]))
```

Before:
```
>>> df.select(plus_one(df.v)).show()
UserWarning: Profiling UDFs with iterators input/output is not supported.
Traceback (most recent call last):
...
OSError: could not get source code
```

After:
```
>>> df.select(plus_one(df.v)).show()
/Users/xinrong.meng/spark/python/pyspark/sql/udf.py:417: UserWarning: 
Profiling UDFs with iterators input/output is not supported.
+---+
|plus_one(v)|
+---+
|  2|
|  3|
|  4|
+---+
```

Closes #44668 from xinrong-meng/fix_mp.

Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/sql/tests/test_udf_profiler.py | 45 ++-
 python/pyspark/sql/udf.py | 33 ++--
 2 files changed, 60 insertions(+), 18 deletions(-)

diff --git a/python/pyspark/sql/tests/test_udf_profiler.py 
b/python/pyspark/sql/tests/test_udf_profiler.py
index 136f423d0a35..776d5da88bb2 100644
--- a/python/pyspark/sql/tests/test_udf_profiler.py
+++ b/python/pyspark/sql/tests/test_udf_profiler.py
@@ -19,11 +19,13 @@ import tempfile
 import unittest
 import os
 import sys
+import warnings
 from io import StringIO
+from typing import Iterator
 
 from pyspark import SparkConf
 from pyspark.sql import SparkSession
-from pyspark.sql.functions import udf
+from pyspark.sql.functions import udf, pandas_udf
 from pyspark.profiler import UDFBasicProfiler
 
 
@@ -101,6 +103,47 @@ class UDFProfilerTests(unittest.TestCase):
 df = self.spark.range(10)
 df.select(add1("id"), add2("id"), add1("id")).collect()
 
+# Unsupported
+def exec_pandas_udf_iter_to_iter(self):
+import pandas as pd
+
+@pandas_udf("int")
+def iter_to_iter(batch_ser: Iterator[pd.Series]) -> 
Iterator[pd.Series]:
+for ser in batch_ser:
+yield ser + 1
+
+self.spark.range(10).select(iter_to_iter("id")).collect()
+
+# Unsupported
+def exec_map(self):
+import pandas as pd
+
+def map(pdfs: Iterator[pd.DataFrame]) -> Iterator[pd.DataFrame]:
+for pdf in pdfs:
+yield pdf[pdf.id == 1]
+
+df = self.spark.createDataFrame([(1, 1.0), (1, 2.0), (2, 3.0), (2, 
5.0)], ("id", "v"))
+df.mapInPandas(map, schema=df.schema).collect()
+
+def test_unsupported(self):
+with warnings.catch_warnings(record=True) as warns:
+warnings.simplefilter("always")
+self.exec_pandas_udf_iter_to_iter()
+user_warns = [warn.message for warn in warns if 
isinstance(warn.message, UserWarning)]
+self.assertTrue(len(user_warns) > 0)
+self.assertTrue(
+"Profiling UDFs with iterators input/output is not supported" 
in str(user_warns[0])
+)
+
+with warnings.catch_warnin

(spark) branch master updated: [SPARK-46277][PYTHON] Validate startup urls with the config being set

2023-12-07 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 027aeb1764a8 [SPARK-46277][PYTHON] Validate startup urls with the 
config being set
027aeb1764a8 is described below

commit 027aeb1764a816858b7ea071cd2b620f02a6a525
Author: Xinrong Meng 
AuthorDate: Thu Dec 7 13:45:31 2023 -0800

[SPARK-46277][PYTHON] Validate startup urls with the config being set

### What changes were proposed in this pull request?
Validate startup urls with the config being set, see example in the "Does 
this PR introduce _any_ user-facing change".

### Why are the changes needed?
Clear and user-friendly error messages.

### Does this PR introduce _any_ user-facing change?
Yes.

FROM
```py
>>> SparkSession.builder.config(map={"spark.master": "x", "spark.remote": 
"y"})
>> SparkSession.builder.config(map={"spark.master": "x", "spark.remote": 
"y"}).config("x", "z")  # Only raises the error when adding new configs
Traceback (most recent call last):
...
RuntimeError: Spark master cannot be configured with Spark Connect server; 
however, found URL for Spark Connect [y]
```

TO
```py
>>> SparkSession.builder.config(map={"spark.master": "x", "spark.remote": 
"y"})
Traceback (most recent call last):
...
RuntimeError: Spark master cannot be configured with Spark Connect server; 
however, found URL for Spark Connect [y]
```

    ### How was this patch tested?
Unit tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #44194 from xinrong-meng/fix_session.

Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/errors/error_classes.py   |  6 +++---
 python/pyspark/sql/session.py| 28 +++-
 python/pyspark/sql/tests/test_session.py | 30 --
 3 files changed, 42 insertions(+), 22 deletions(-)

diff --git a/python/pyspark/errors/error_classes.py 
b/python/pyspark/errors/error_classes.py
index 965fd04a9135..cc8400270967 100644
--- a/python/pyspark/errors/error_classes.py
+++ b/python/pyspark/errors/error_classes.py
@@ -86,12 +86,12 @@ ERROR_CLASSES_JSON = """
   },
   "CANNOT_CONFIGURE_SPARK_CONNECT": {
 "message": [
-  "Spark Connect server cannot be configured with Spark master; however, 
found URL for Spark master []."
+  "Spark Connect server cannot be configured: Existing [], 
New []."
 ]
   },
-  "CANNOT_CONFIGURE_SPARK_MASTER": {
+  "CANNOT_CONFIGURE_SPARK_CONNECT_MASTER": {
 "message": [
-  "Spark master cannot be configured with Spark Connect server; however, 
found URL for Spark Connect []."
+  "Spark Connect server and Spark master cannot be configured together: 
Spark master [], Spark Connect []."
 ]
   },
   "CANNOT_CONVERT_COLUMN_INTO_BOOL": {
diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py
index 7f4589557cd2..86aacfa54c6e 100644
--- a/python/pyspark/sql/session.py
+++ b/python/pyspark/sql/session.py
@@ -286,17 +286,17 @@ class SparkSession(SparkConversionMixin):
 with self._lock:
 if conf is not None:
 for k, v in conf.getAll():
-self._validate_startup_urls()
 self._options[k] = v
+self._validate_startup_urls()
 elif map is not None:
 for k, v in map.items():  # type: ignore[assignment]
 v = to_str(v)  # type: ignore[assignment]
-self._validate_startup_urls()
 self._options[k] = v
+self._validate_startup_urls()
 else:
 value = to_str(value)
-self._validate_startup_urls()
 self._options[cast(str, key)] = value
+self._validate_startup_urls()
 return self
 
 def _validate_startup_urls(
@@ -306,22 +306,16 @@ class SparkSession(SparkConversionMixin):
 Helper function that validates the combination of startup URLs and 
raises an exception
 if incompatible options are selected.
 """
-if "spark.master" in self._options and (
+if ("spark.master" in self._options or "MASTER" in os.environ) and 
(
 "spark.remote" in self._options or "SP

[spark] branch branch-3.5 updated: [SPARK-44560][PYTHON][CONNECT] Improve tests and documentation for Arrow Python UDF

2023-07-27 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 36b93d07eb9 [SPARK-44560][PYTHON][CONNECT] Improve tests and 
documentation for Arrow Python UDF
36b93d07eb9 is described below

commit 36b93d07eb961905647c42fac80e22efdfb15f4f
Author: Xinrong Meng 
AuthorDate: Thu Jul 27 13:45:05 2023 -0700

[SPARK-44560][PYTHON][CONNECT] Improve tests and documentation for Arrow 
Python UDF

### What changes were proposed in this pull request?

- Test on complex return type
- Remove complex return type constraints for Arrow Python UDF on Spark 
Connect
- Update documentation of the related Spark conf

The change targets both Spark 3.5 and 4.0.

### Why are the changes needed?
Testability and parity with vanilla PySpark.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Unit tests.

Closes #42178 from xinrong-meng/conf.

Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
(cherry picked from commit 5f6537409383e2dbdd699108f708567c37db8151)
Signed-off-by: Xinrong Meng 
---
 python/pyspark/sql/connect/udf.py| 10 ++
 python/pyspark/sql/tests/test_arrow_python_udf.py|  5 -
 python/pyspark/sql/tests/test_udf.py | 16 
 .../scala/org/apache/spark/sql/internal/SQLConf.scala|  3 +--
 4 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/python/pyspark/sql/connect/udf.py 
b/python/pyspark/sql/connect/udf.py
index 0a5d06618b3..2d7e423d3d5 100644
--- a/python/pyspark/sql/connect/udf.py
+++ b/python/pyspark/sql/connect/udf.py
@@ -35,7 +35,7 @@ from pyspark.sql.connect.expressions import (
 )
 from pyspark.sql.connect.column import Column
 from pyspark.sql.connect.types import UnparsedDataType
-from pyspark.sql.types import ArrayType, DataType, MapType, StringType, 
StructType
+from pyspark.sql.types import DataType, StringType
 from pyspark.sql.udf import UDFRegistration as PySparkUDFRegistration
 from pyspark.errors import PySparkTypeError
 
@@ -70,18 +70,12 @@ def _create_py_udf(
 is_arrow_enabled = useArrow
 
 regular_udf = _create_udf(f, returnType, PythonEvalType.SQL_BATCHED_UDF)
-return_type = regular_udf.returnType
 try:
 is_func_with_args = len(getfullargspec(f).args) > 0
 except TypeError:
 is_func_with_args = False
-is_output_atomic_type = (
-not isinstance(return_type, StructType)
-and not isinstance(return_type, MapType)
-and not isinstance(return_type, ArrayType)
-)
 if is_arrow_enabled:
-if is_output_atomic_type and is_func_with_args:
+if is_func_with_args:
 return _create_arrow_py_udf(regular_udf)
 else:
 warnings.warn(
diff --git a/python/pyspark/sql/tests/test_arrow_python_udf.py 
b/python/pyspark/sql/tests/test_arrow_python_udf.py
index 264ea0b901f..f48f07666e1 100644
--- a/python/pyspark/sql/tests/test_arrow_python_udf.py
+++ b/python/pyspark/sql/tests/test_arrow_python_udf.py
@@ -47,11 +47,6 @@ class PythonUDFArrowTestsMixin(BaseUDFTestsMixin):
 def test_register_java_udaf(self):
 super(PythonUDFArrowTests, self).test_register_java_udaf()
 
-# TODO(SPARK-43903): Standardize ArrayType conversion for Python UDF
-@unittest.skip("Inconsistent ArrayType conversion with/without Arrow.")
-def test_nested_array(self):
-super(PythonUDFArrowTests, self).test_nested_array()
-
 def test_complex_input_types(self):
 row = (
 self.spark.range(1)
diff --git a/python/pyspark/sql/tests/test_udf.py 
b/python/pyspark/sql/tests/test_udf.py
index 8ffcb5e05a2..239ff27813b 100644
--- a/python/pyspark/sql/tests/test_udf.py
+++ b/python/pyspark/sql/tests/test_udf.py
@@ -882,6 +882,22 @@ class BaseUDFTestsMixin(object):
 row = df.select(f("nested_array")).first()
 self.assertEquals(row[0], [[1, 2], [3, 4], [4, 5]])
 
+def test_complex_return_types(self):
+row = (
+self.spark.range(1)
+.selectExpr("array(1, 2, 3) as array", "map('a', 'b') as map", 
"struct(1, 2) as struct")
+.select(
+udf(lambda x: x, "array")("array"),
+udf(lambda x: x, "map")("map"),
+udf(lambda x: x, "struct")("struct"),
+)
+.first()
+)
+
+self.assertEquals(row[0], [1, 2, 3])
+self.assertEquals(row[1], {"a": "b"})
+self.assertEquals(row[2], Row(col1=1, col2=2))
+
 
 class UDFTests(BaseUDFTestsMixin, ReusedSQLTestCase):
 @class

[spark] branch master updated: [SPARK-44560][PYTHON][CONNECT] Improve tests and documentation for Arrow Python UDF

2023-07-27 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5f653740938 [SPARK-44560][PYTHON][CONNECT] Improve tests and 
documentation for Arrow Python UDF
5f653740938 is described below

commit 5f6537409383e2dbdd699108f708567c37db8151
Author: Xinrong Meng 
AuthorDate: Thu Jul 27 13:45:05 2023 -0700

[SPARK-44560][PYTHON][CONNECT] Improve tests and documentation for Arrow 
Python UDF

### What changes were proposed in this pull request?

- Test on complex return type
- Remove complex return type constraints for Arrow Python UDF on Spark 
Connect
- Update documentation of the related Spark conf

The change targets both Spark 3.5 and 4.0.

### Why are the changes needed?
Testability and parity with vanilla PySpark.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Unit tests.

Closes #42178 from xinrong-meng/conf.

Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/sql/connect/udf.py| 10 ++
 python/pyspark/sql/tests/test_arrow_python_udf.py|  5 -
 python/pyspark/sql/tests/test_udf.py | 16 
 .../scala/org/apache/spark/sql/internal/SQLConf.scala|  3 +--
 4 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/python/pyspark/sql/connect/udf.py 
b/python/pyspark/sql/connect/udf.py
index 0a5d06618b3..2d7e423d3d5 100644
--- a/python/pyspark/sql/connect/udf.py
+++ b/python/pyspark/sql/connect/udf.py
@@ -35,7 +35,7 @@ from pyspark.sql.connect.expressions import (
 )
 from pyspark.sql.connect.column import Column
 from pyspark.sql.connect.types import UnparsedDataType
-from pyspark.sql.types import ArrayType, DataType, MapType, StringType, 
StructType
+from pyspark.sql.types import DataType, StringType
 from pyspark.sql.udf import UDFRegistration as PySparkUDFRegistration
 from pyspark.errors import PySparkTypeError
 
@@ -70,18 +70,12 @@ def _create_py_udf(
 is_arrow_enabled = useArrow
 
 regular_udf = _create_udf(f, returnType, PythonEvalType.SQL_BATCHED_UDF)
-return_type = regular_udf.returnType
 try:
 is_func_with_args = len(getfullargspec(f).args) > 0
 except TypeError:
 is_func_with_args = False
-is_output_atomic_type = (
-not isinstance(return_type, StructType)
-and not isinstance(return_type, MapType)
-and not isinstance(return_type, ArrayType)
-)
 if is_arrow_enabled:
-if is_output_atomic_type and is_func_with_args:
+if is_func_with_args:
 return _create_arrow_py_udf(regular_udf)
 else:
 warnings.warn(
diff --git a/python/pyspark/sql/tests/test_arrow_python_udf.py 
b/python/pyspark/sql/tests/test_arrow_python_udf.py
index 264ea0b901f..f48f07666e1 100644
--- a/python/pyspark/sql/tests/test_arrow_python_udf.py
+++ b/python/pyspark/sql/tests/test_arrow_python_udf.py
@@ -47,11 +47,6 @@ class PythonUDFArrowTestsMixin(BaseUDFTestsMixin):
 def test_register_java_udaf(self):
 super(PythonUDFArrowTests, self).test_register_java_udaf()
 
-# TODO(SPARK-43903): Standardize ArrayType conversion for Python UDF
-@unittest.skip("Inconsistent ArrayType conversion with/without Arrow.")
-def test_nested_array(self):
-super(PythonUDFArrowTests, self).test_nested_array()
-
 def test_complex_input_types(self):
 row = (
 self.spark.range(1)
diff --git a/python/pyspark/sql/tests/test_udf.py 
b/python/pyspark/sql/tests/test_udf.py
index 8ffcb5e05a2..239ff27813b 100644
--- a/python/pyspark/sql/tests/test_udf.py
+++ b/python/pyspark/sql/tests/test_udf.py
@@ -882,6 +882,22 @@ class BaseUDFTestsMixin(object):
 row = df.select(f("nested_array")).first()
 self.assertEquals(row[0], [[1, 2], [3, 4], [4, 5]])
 
+def test_complex_return_types(self):
+row = (
+self.spark.range(1)
+.selectExpr("array(1, 2, 3) as array", "map('a', 'b') as map", 
"struct(1, 2) as struct")
+.select(
+udf(lambda x: x, "array")("array"),
+udf(lambda x: x, "map")("map"),
+udf(lambda x: x, "struct")("struct"),
+)
+.first()
+)
+
+self.assertEquals(row[0], [1, 2, 3])
+self.assertEquals(row[1], {"a": "b"})
+self.assertEquals(row[2], Row(col1=1, col2=2))
+
 
 class UDFTests(BaseUDFTestsMixin, ReusedSQLTestCase):
 @classmethod
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/mai

[spark] branch master updated: [SPARK-40770][PYTHON][FOLLOW-UP] Improved error messages for mapInPandas for schema mismatch

2023-07-17 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a367fde24de [SPARK-40770][PYTHON][FOLLOW-UP] Improved error messages 
for mapInPandas for schema mismatch
a367fde24de is described below

commit a367fde24de0abab93eac97350fb4ae0b687286c
Author: Enrico Minack 
AuthorDate: Mon Jul 17 17:08:36 2023 -0700

[SPARK-40770][PYTHON][FOLLOW-UP] Improved error messages for mapInPandas 
for schema mismatch

### What changes were proposed in this pull request?
Similar to #38223, improve the error messages when a Python method provided 
to `DataFrame.mapInPandas` returns a Pandas DataFrame that does not match the 
expected schema.

With
```Python
df = spark.range(2).withColumn("v", col("id"))
```

**Mismatching column names:**
```Python
df.mapInPandas(lambda it: it, "id long, val long").show()
# was: KeyError: 'val'
# now: RuntimeError: Column names of the returned pandas.DataFrame do not 
match specified schema.
#  Missing: val  Unexpected: v
```

**Python function not returning iterator:**
```Python
df.mapInPandas(lambda it: 1, "id long").show()
# was: TypeError: 'int' object is not iterable
# now: TypeError: Return type of the user-defined function should be 
iterator of pandas.DataFrame, but is 
```

**Python function not returning iterator of pandas.DataFrame:**
```Python
df.mapInPandas(lambda it: [1], "id long").show()
# was: TypeError: Return type of the user-defined function should be 
Pandas.DataFrame, but is 
# now: TypeError: Return type of the user-defined function should be 
iterator of pandas.DataFrame, but is iterator of 
# sometimes: ValueError: A field of type StructType expects a 
pandas.DataFrame, but got: 
# now: TypeError: Return type of the user-defined function should be 
iterator of pandas.DataFrame, but is iterator of 
```

**Mismatching types (ValueError and TypeError):**
```Python
df.mapInPandas(lambda it: it, "id int, v string").show()
# was: pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got 
int64
# now: pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got 
int64
#  The above exception was the direct cause of the following exception:
#  TypeError: Exception thrown when converting pandas.Series (int64) 
with name 'v' to Arrow Array (string).

df.mapInPandas(lambda it: [pdf.assign(v=pdf["v"].apply(str)) for pdf in 
it], "id int, v double").show()
# was: pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried 
to convert to double
# now: pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried 
to convert to double
#  The above exception was the direct cause of the following exception:
#  ValueError: Exception thrown when converting pandas.Series (object) 
with name 'v' to Arrow Array (double).

with self.sql_conf({"spark.sql.execution.pandas.convertToArrowArraySafely": 
True}):
  df.mapInPandas(lambda it: [pdf.assign(v=pdf["v"].apply(str)) for pdf in 
it], "id int, v double").show()
# was: ValueError: Exception thrown when converting pandas.Series (object) 
to Arrow Array (double).
#  It can be caused by overflows or other unsafe conversions warned by 
Arrow. Arrow safe type check can be disabled
#  by using SQL config 
`spark.sql.execution.pandas.convertToArrowArraySafely`.
# now: ValueError: Exception thrown when converting pandas.Series (object) 
with name 'v' to Arrow Array (double).
#  It can be caused by overflows or other unsafe conversions warned by 
Arrow. Arrow safe type check can be disabled
#  by using SQL config 
`spark.sql.execution.pandas.convertToArrowArraySafely`.
```

### Why are the changes needed?
Existing errors are generic (`KeyError`) or meaningless (`'int' object is 
not iterable`). The errors should help users in spotting the mismatching 
columns by naming them.

The schema of the returned Pandas DataFrames can only be checked during 
processing the DataFrame, so such errors are very expensive. Therefore, they 
should be expressive.

### Does this PR introduce _any_ user-facing change?
This only changes error messages, not behaviour.

### How was this patch tested?
Tests all cases of schema mismatch for `DataFrame.mapInPandas`.

Closes #39952 from EnricoMi/branch-pyspark-map-in-pandas-schema-mismatch.

Authored-by: Enrico Minack 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/errors/error_classes.py  

[spark] branch master updated: [SPARK-44446][PYTHON] Add checks for expected list type special cases

2023-07-17 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e578d466d4e [SPARK-6][PYTHON] Add checks for expected list type 
special cases
e578d466d4e is described below

commit e578d466d4eae808a8ad5e42681b9e3e87fe6ca7
Author: Amanda Liu 
AuthorDate: Mon Jul 17 11:43:05 2023 -0700

[SPARK-6][PYTHON] Add checks for expected list type special cases

### What changes were proposed in this pull request?
This PR adds handling for special cases when `expected` is type list.

### Why are the changes needed?
The change is needed to handle all cases for when `expected` is type list.

### Does this PR introduce _any_ user-facing change?
Yes, the PR makes modifications to the user-facing function 
`assertDataFrameEqual`

### How was this patch tested?
Added tests to `runtime/python/pyspark/sql/tests/test_utils.py` and 
`runtime/python/pyspark/sql/tests/connect/test_utils.py`

Closes #42023 from asl3/fix-list-support.

Authored-by: Amanda Liu 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/sql/tests/test_utils.py | 24 
 python/pyspark/testing/utils.py| 15 +--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/tests/test_utils.py 
b/python/pyspark/sql/tests/test_utils.py
index 5b859ad15a5..eae3f528504 100644
--- a/python/pyspark/sql/tests/test_utils.py
+++ b/python/pyspark/sql/tests/test_utils.py
@@ -1119,6 +1119,30 @@ class UtilsTestsMixin:
 assertDataFrameEqual(df1, df2, checkRowOrder=False)
 assertDataFrameEqual(df1, df2, checkRowOrder=True)
 
+def test_empty_expected_list(self):
+df1 = self.spark.range(0, 10).drop("id")
+
+df2 = []
+
+assertDataFrameEqual(df1, df2, checkRowOrder=False)
+assertDataFrameEqual(df1, df2, checkRowOrder=True)
+
+def test_no_column_expected_list(self):
+df1 = self.spark.range(0, 10).limit(0)
+
+df2 = []
+
+assertDataFrameEqual(df1, df2, checkRowOrder=False)
+assertDataFrameEqual(df1, df2, checkRowOrder=True)
+
+def test_empty_no_column_expected_list(self):
+df1 = self.spark.range(0, 10).drop("id").limit(0)
+
+df2 = []
+
+assertDataFrameEqual(df1, df2, checkRowOrder=False)
+assertDataFrameEqual(df1, df2, checkRowOrder=True)
+
 def test_special_vals(self):
 df1 = self.spark.createDataFrame(
 data=[
diff --git a/python/pyspark/testing/utils.py b/python/pyspark/testing/utils.py
index 21c7b7e4dcd..14db9264209 100644
--- a/python/pyspark/testing/utils.py
+++ b/python/pyspark/testing/utils.py
@@ -349,6 +349,8 @@ def assertDataFrameEqual(
 For checkRowOrder, note that PySpark DataFrame ordering is 
non-deterministic, unless
 explicitly sorted.
 
+Note that schema equality is checked only when `expected` is a DataFrame 
(not a list of Rows).
+
 For DataFrames with float values, assertDataFrame asserts approximate 
equality.
 Two float values a and b are approximately equal if the following equation 
is True:
 
@@ -362,6 +364,9 @@ def assertDataFrameEqual(
 >>> df1 = spark.createDataFrame(data=[("1", 0.1), ("2", 3.23)], 
schema=["id", "amount"])
 >>> df2 = spark.createDataFrame(data=[("1", 0.109), ("2", 3.23)], 
schema=["id", "amount"])
 >>> assertDataFrameEqual(df1, df2, rtol=1e-1)  # pass, DataFrames are 
approx equal by rtol
+>>> df1 = spark.createDataFrame(data=[(1, 1000), (2, 3000)], schema=["id", 
"amount"])
+>>> list_of_rows = [Row(1, 1000), Row(2, 3000)]
+>>> assertDataFrameEqual(df1, list_of_rows)  # pass, actual and expected 
are equal
 >>> df1 = spark.createDataFrame(
 ... data=[("1", 1000.00), ("2", 3000.00), ("3", 2000.00)], 
schema=["id", "amount"])
 >>> df2 = spark.createDataFrame(
@@ -415,8 +420,14 @@ def assertDataFrameEqual(
 )
 
 # special cases: empty datasets, datasets with 0 columns
-if (actual.first() is None and expected.first() is None) or (
-len(actual.columns) == 0 and len(expected.columns) == 0
+if (
+isinstance(expected, DataFrame)
+and (
+(actual.first() is None and expected.first() is None)
+or (len(actual.columns) == 0 and len(expected.columns) == 0)
+)
+or isinstance(expected, list)
+and ((actual.first() is None or len(actual.columns) == 0) and 
len(expected) == 0)
 ):
 return True
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-44264][PYTHON][ML] FunctionPickler Class

2023-07-14 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e000cb868cc [SPARK-44264][PYTHON][ML] FunctionPickler Class
e000cb868cc is described below

commit e000cb868ccb1a4f48a8356ccfc736e16ed1c1b5
Author: Mathew Jacob 
AuthorDate: Fri Jul 14 14:12:08 2023 -0700

[SPARK-44264][PYTHON][ML] FunctionPickler Class

### What changes were proposed in this pull request?
This PR introduces the FunctionPickler utility class that will be 
responsible for pickling class and their arguments, creating scripts that will 
run those functions and pickle their output, as well as extracting objects from 
a pickle file.

### Why are the changes needed?
This is used to abstract away the responsibility of pickling from the 
TorchDistributor, as that is relatively tangential to the actual distributed 
training. Additionally, for future distributors or anything that uses pickling 
to transmit objects, this class can prove useful with built-in functionalities.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Wrote unit tests.

Checklist:

- [x] Pickles a function and its arguments to a file.
- [x] Creates a file that will execute that, given a path to pickled 
functions and arguments, run the function and arguments and then pickle the 
output to another location.
- [x] Extracts output given a pickle file.
- [x] Unit tests for first feature.
- [x] Unit tests for second feature.
- [x] Unit tests for third feature.

Closes #41946 from mathewjacob1002/function_pickler.

Lead-authored-by: Mathew Jacob 
Co-authored-by: Mathew Jacob 
<134338709+mathewjacob1...@users.noreply.github.com>
Signed-off-by: Xinrong Meng 
---
 python/pyspark/ml/dl_util.py| 150 ++
 python/pyspark/ml/tests/test_dl_util.py | 186 
 2 files changed, 336 insertions(+)

diff --git a/python/pyspark/ml/dl_util.py b/python/pyspark/ml/dl_util.py
new file mode 100644
index 000..8ead529d7b7
--- /dev/null
+++ b/python/pyspark/ml/dl_util.py
@@ -0,0 +1,150 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+import os
+import tempfile
+import textwrap
+from typing import Any, Callable
+
+from pyspark import cloudpickle
+
+
+class FunctionPickler:
+"""
+This class provides a way to pickle a function and its arguments.
+It also provides a way to create a script that can run a
+function with arguments if they have them pickled to a file.
+It also provides a way of extracting the conents of a pickle file.
+"""
+
+@staticmethod
+def pickle_fn_and_save(
+fn: Callable, file_path: str, save_dir: str, *args: Any, **kwargs: Any
+) -> str:
+"""
+Given a function and args, this function will pickle them to a file.
+
+Parameters
+--
+fn: Callable
+The picklable function that will be pickled to a file.
+file_path: str
+The path where to save the pickled function, args, and kwargs. If 
it's the
+empty string, the function will decide on a random name.
+save_dir: str
+The directory in which to save the file with the pickled function 
and arguments.
+Does nothing if the path is specified. If both file_path and 
save_dir are empty,
+the function will write the file to the current working directory 
with a random
+name.
+*args: Any
+Arguments of fn that will be pickled.
+**kwargs: Any
+Key word arguments to fn that will be pickled.
+
+Returns
+---
+str
+The path to the file where the function and arguments are pickled.
+"""
+if file_path != "":
+with open(file_path, "wb") as f:
+cloudpickle.dump((fn, args, kwargs), f

[spark] branch master updated: [SPARK-44398][CONNECT] Scala foreachBatch API

2023-07-13 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4771853c9bc [SPARK-44398][CONNECT] Scala foreachBatch API
4771853c9bc is described below

commit 4771853c9bc26b8741091d63d77c4b6487e74189
Author: Raghu Angadi 
AuthorDate: Thu Jul 13 10:47:49 2023 -0700

[SPARK-44398][CONNECT] Scala foreachBatch API

This implements Scala foreachBatch(). The implementation basic and needs 
some more enhancements. The server side will be shared by Python implementation 
as well.

One notable hack in this PR is that it runs user's `foreachBatch()` with 
regular(legacy) DataFrame, rather than setting up remote Spark connect session 
and connect DataFrame.

### Why are the changes needed?
Adds foreachBatch() support in Scala Spark Connect.

### Does this PR introduce _any_ user-facing change?
Yes. Adds foreachBatch() API

### How was this patch tested?
- A simple unit test.

Closes #41969 from rangadi/feb-scala.

Authored-by: Raghu Angadi 
Signed-off-by: Xinrong Meng 
---
 .../spark/sql/streaming/DataStreamWriter.scala | 28 ++-
 .../spark/sql/streaming/StreamingQuerySuite.scala  | 52 -
 .../src/main/protobuf/spark/connect/commands.proto | 11 +--
 .../sql/connect/planner/SparkConnectPlanner.scala  | 25 +-
 .../planner/StreamingForeachBatchHelper.scala  | 69 +
 python/pyspark/sql/connect/proto/commands_pb2.py   | 88 +++---
 python/pyspark/sql/connect/proto/commands_pb2.pyi  | 46 +++
 python/pyspark/sql/connect/streaming/readwriter.py |  4 +-
 8 files changed, 251 insertions(+), 72 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
index 9f63f68a000..ad76ab4a1bc 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
@@ -30,12 +30,15 @@ import org.apache.spark.connect.proto.Command
 import org.apache.spark.connect.proto.WriteStreamOperationStart
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.{Dataset, ForeachWriter}
+import org.apache.spark.sql.connect.common.DataTypeProtoConverter
 import org.apache.spark.sql.connect.common.ForeachWriterPacket
 import org.apache.spark.sql.execution.streaming.AvailableNowTrigger
 import org.apache.spark.sql.execution.streaming.ContinuousTrigger
 import org.apache.spark.sql.execution.streaming.OneTimeTrigger
 import org.apache.spark.sql.execution.streaming.ProcessingTimeTrigger
+import org.apache.spark.sql.types.NullType
 import org.apache.spark.util.SparkSerDeUtils
+import org.apache.spark.util.Utils
 
 /**
  * Interface used to write a streaming `Dataset` to external storage systems 
(e.g. file systems,
@@ -218,7 +221,30 @@ final class DataStreamWriter[T] private[sql] (ds: 
Dataset[T]) extends Logging {
 val scalaWriterBuilder = proto.ScalarScalaUDF
   .newBuilder()
   .setPayload(ByteString.copyFrom(serialized))
-sinkBuilder.getForeachWriterBuilder.setScalaWriter(scalaWriterBuilder)
+sinkBuilder.getForeachWriterBuilder.setScalaFunction(scalaWriterBuilder)
+this
+  }
+
+  /**
+   * :: Experimental ::
+   *
+   * (Scala-specific) Sets the output of the streaming query to be processed 
using the provided
+   * function. This is supported only in the micro-batch execution modes (that 
is, when the
+   * trigger is not continuous). In every micro-batch, the provided function 
will be called in
+   * every micro-batch with (i) the output rows as a Dataset and (ii) the 
batch identifier. The
+   * batchId can be used to deduplicate and transactionally write the output 
(that is, the
+   * provided Dataset) to external systems. The output Dataset is guaranteed 
to be exactly the
+   * same for the same batchId (assuming all operations are deterministic in 
the query).
+   *
+   * @since 3.5.0
+   */
+  @Evolving
+  def foreachBatch(function: (Dataset[T], Long) => Unit): DataStreamWriter[T] 
= {
+val serializedFn = Utils.serialize(function)
+sinkBuilder.getForeachBatchBuilder.getScalaFunctionBuilder
+  .setPayload(ByteString.copyFrom(serializedFn))
+  .setOutputType(DataTypeProtoConverter.toConnectProtoType(NullType)) // 
Unused.
+  .setNullable(true) // Unused.
 this
   }
 
diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
index 6ddcedf19cb..438e6e0c2fe 100644
--- 
a/connector/connect/cl

[spark] branch master updated: [SPARK-44150][PYTHON][FOLLOW-UP] Revert commits

2023-06-29 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e505244460b [SPARK-44150][PYTHON][FOLLOW-UP] Revert commits
e505244460b is described below

commit e505244460baa49f862d36333792c9d924cb4dde
Author: Xinrong Meng 
AuthorDate: Thu Jun 29 14:55:03 2023 -0700

[SPARK-44150][PYTHON][FOLLOW-UP] Revert commits

### What changes were proposed in this pull request?
Revert two commits of [SPARK-44150] that block master CI.

### Why are the changes needed?
N/A

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
N/A

Closes #41799 from xinrong-meng/revert.

Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/sql/pandas/serializers.py  | 32 +++
 python/pyspark/sql/tests/test_arrow_python_udf.py | 39 ---
 python/pyspark/worker.py  |  3 --
 3 files changed, 5 insertions(+), 69 deletions(-)

diff --git a/python/pyspark/sql/pandas/serializers.py 
b/python/pyspark/sql/pandas/serializers.py
index 12d4c3077fe..307fcc33752 100644
--- a/python/pyspark/sql/pandas/serializers.py
+++ b/python/pyspark/sql/pandas/serializers.py
@@ -190,7 +190,7 @@ class ArrowStreamPandasSerializer(ArrowStreamSerializer):
 )
 return converter(s)
 
-def _create_array(self, series, arrow_type, spark_type=None, 
arrow_cast=False):
+def _create_array(self, series, arrow_type, spark_type=None):
 """
 Create an Arrow Array from the given pandas.Series and optional type.
 
@@ -202,9 +202,6 @@ class ArrowStreamPandasSerializer(ArrowStreamSerializer):
 If None, pyarrow's inferred type will be used
 spark_type : DataType, optional
 If None, spark type converted from arrow_type will be used
-arrow_cast: bool, optional
-Whether to apply Arrow casting when the user-specified return type 
mismatches the
-actual return values.
 
 Returns
 ---
@@ -229,14 +226,7 @@ class ArrowStreamPandasSerializer(ArrowStreamSerializer):
 else:
 mask = series.isnull()
 try:
-if arrow_cast:
-return pa.Array.from_pandas(series, mask=mask).cast(
-target_type=arrow_type, safe=self._safecheck
-)
-else:
-return pa.Array.from_pandas(
-series, mask=mask, type=arrow_type, safe=self._safecheck
-)
+return pa.Array.from_pandas(series, mask=mask, type=arrow_type, 
safe=self._safecheck)
 except TypeError as e:
 error_msg = (
 "Exception thrown when converting pandas.Series (%s) "
@@ -329,14 +319,12 @@ class 
ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer):
 df_for_struct=False,
 struct_in_pandas="dict",
 ndarray_as_list=False,
-arrow_cast=False,
 ):
 super(ArrowStreamPandasUDFSerializer, self).__init__(timezone, 
safecheck)
 self._assign_cols_by_name = assign_cols_by_name
 self._df_for_struct = df_for_struct
 self._struct_in_pandas = struct_in_pandas
 self._ndarray_as_list = ndarray_as_list
-self._arrow_cast = arrow_cast
 
 def arrow_to_pandas(self, arrow_column):
 import pyarrow.types as types
@@ -398,13 +386,7 @@ class 
ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer):
 # Assign result columns by schema name if user labeled with 
strings
 elif self._assign_cols_by_name and any(isinstance(name, str) 
for name in s.columns):
 arrs_names = [
-(
-self._create_array(
-s[field.name], field.type, 
arrow_cast=self._arrow_cast
-),
-field.name,
-)
-for field in t
+(self._create_array(s[field.name], field.type), 
field.name) for field in t
 ]
 # Assign result columns by  position
 else:
@@ -412,11 +394,7 @@ class 
ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer):
 # the selected series has name '1', so we rename it to 
field.name
 # as the name is used by _create_array to provide a 
meaningful error message
 (
-self._create_array(
-s[s.columns[i]].rename(field.name),
-field.type,
- 

[spark] branch master updated (6e56cfeaca8 -> 414bc75ac5b)

2023-06-29 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 6e56cfeaca8 [SPARK-44150][PYTHON][CONNECT] Explicit Arrow casting for 
mismatched return type in Arrow Python UDF
 add 414bc75ac5b [SPARK-44150][PYTHON][FOLLOW-UP] Fix 
ArrowStreamPandasSerializer to set arguments properly

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/pandas/serializers.py | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-44150][PYTHON][CONNECT] Explicit Arrow casting for mismatched return type in Arrow Python UDF

2023-06-29 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6e56cfeaca8 [SPARK-44150][PYTHON][CONNECT] Explicit Arrow casting for 
mismatched return type in Arrow Python UDF
6e56cfeaca8 is described below

commit 6e56cfeaca884b1ccfaa8524c70f12f118bc840c
Author: Xinrong Meng 
AuthorDate: Thu Jun 29 11:46:06 2023 -0700

[SPARK-44150][PYTHON][CONNECT] Explicit Arrow casting for mismatched return 
type in Arrow Python UDF

### What changes were proposed in this pull request?
Explicit Arrow casting for the mismatched return type of Arrow Python UDF.

### Why are the changes needed?
A more standardized and coherent type coercion.

Please refer to https://github.com/apache/spark/pull/41706 for a 
comprehensive comparison between type coercion rules of Arrow and Pickle(used 
by the default Python UDF) separately.

See more at [[Design] Type-coercion in Arrow Python 
UDFs](https://docs.google.com/document/d/e/2PACX-1vTEGElOZfhl9NfgbBw4CTrlm-8F_xQCAKNOXouz-7mg5vYobS7lCGUsGkDZxPY0wV5YkgoZmkYlxccU/pub).

### Does this PR introduce _any_ user-facing change?
Yes.

FROM
```py
>>> df = spark.createDataFrame(['1', '2'], schema='string')
df.select(pandas_udf(lambda x: x, 'int')('value')).show()
>>> df.select(pandas_udf(lambda x: x, 'int')('value')).show()
...
org.apache.spark.api.python.PythonException: Traceback (most recent call 
last):
...
pyarrow.lib.ArrowInvalid: Could not convert '1' with type str: tried to 
convert to int32
```

TO
```py
>>> df = spark.createDataFrame(['1', '2'], schema='string')
>>> df.select(pandas_udf(lambda x: x, 'int')('value')).show()
+---+
|(value)|
+---+
|  1|
|  2|
+---+
```
    ### How was this patch tested?
Unit tests.

Closes #41503 from xinrong-meng/type_coersion.

Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/sql/pandas/serializers.py  | 30 ++---
 python/pyspark/sql/tests/test_arrow_python_udf.py | 39 +++
 python/pyspark/worker.py  |  3 ++
 3 files changed, 67 insertions(+), 5 deletions(-)

diff --git a/python/pyspark/sql/pandas/serializers.py 
b/python/pyspark/sql/pandas/serializers.py
index 307fcc33752..a99eda9cbea 100644
--- a/python/pyspark/sql/pandas/serializers.py
+++ b/python/pyspark/sql/pandas/serializers.py
@@ -190,7 +190,7 @@ class ArrowStreamPandasSerializer(ArrowStreamSerializer):
 )
 return converter(s)
 
-def _create_array(self, series, arrow_type, spark_type=None):
+def _create_array(self, series, arrow_type, spark_type=None, 
arrow_cast=False):
 """
 Create an Arrow Array from the given pandas.Series and optional type.
 
@@ -202,6 +202,9 @@ class ArrowStreamPandasSerializer(ArrowStreamSerializer):
 If None, pyarrow's inferred type will be used
 spark_type : DataType, optional
 If None, spark type converted from arrow_type will be used
+arrow_cast: bool, optional
+Whether to apply Arrow casting when the user-specified return type 
mismatches the
+actual return values.
 
 Returns
 ---
@@ -226,7 +229,12 @@ class ArrowStreamPandasSerializer(ArrowStreamSerializer):
 else:
 mask = series.isnull()
 try:
-return pa.Array.from_pandas(series, mask=mask, type=arrow_type, 
safe=self._safecheck)
+if arrow_cast:
+return pa.Array.from_pandas(series, mask=mask, 
type=arrow_type).cast(
+target_type=arrow_type, safe=self._safecheck
+)
+else:
+return pa.Array.from_pandas(series, mask=mask, 
safe=self._safecheck)
 except TypeError as e:
 error_msg = (
 "Exception thrown when converting pandas.Series (%s) "
@@ -319,12 +327,14 @@ class 
ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer):
 df_for_struct=False,
 struct_in_pandas="dict",
 ndarray_as_list=False,
+arrow_cast=False,
 ):
 super(ArrowStreamPandasUDFSerializer, self).__init__(timezone, 
safecheck)
 self._assign_cols_by_name = assign_cols_by_name
 self._df_for_struct = df_for_struct
 self._struct_in_pandas = struct_in_pandas
 self._ndarray_as_list = ndarray_as_list
+self._arrow_cast = arrow_cast
 
 def arrow_to_pandas(self, arrow_column)

[spark] branch master updated: [SPARK-43893][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF

2023-06-06 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 94098853592 [SPARK-43893][PYTHON][CONNECT] Non-atomic data type 
support in Arrow-optimized Python UDF
94098853592 is described below

commit 94098853592b524f52e9a340166b96ddeda4e898
Author: Xinrong Meng 
AuthorDate: Tue Jun 6 15:48:14 2023 -0700

[SPARK-43893][PYTHON][CONNECT] Non-atomic data type support in 
Arrow-optimized Python UDF

### What changes were proposed in this pull request?
Support non-atomic data types in input and output of Arrow-optimized Python 
UDF.

Non-atomic data types refer to: ArrayType, MapType, and StructType.

### Why are the changes needed?
Parity with pickled Python UDFs.

### Does this PR introduce _any_ user-facing change?
Non-atomic data types are accepted as both input and output of 
Arrow-optimized Python UDF.

For example,
```py
>>> df = spark.range(1).selectExpr("struct(1, struct('John', 30, ('value', 
10))) as nested_struct")
>>> df.select(udf(lambda x: str(x))("nested_struct")).first()
Row((nested_struct)="Row(col1=1, col2=Row(col1='John', col2=30, 
col3=Row(col1='value', col2=10)))")
```
    
### How was this patch tested?
Unit tests.

Closes #41321 from xinrong-meng/arrow_udf_struct.

Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/sql/pandas/serializers.py  | 22 ---
 python/pyspark/sql/tests/test_arrow_python_udf.py | 17 -
 python/pyspark/sql/tests/test_udf.py  | 45 +++
 python/pyspark/sql/udf.py | 15 +---
 python/pyspark/worker.py  | 13 +--
 5 files changed, 79 insertions(+), 33 deletions(-)

diff --git a/python/pyspark/sql/pandas/serializers.py 
b/python/pyspark/sql/pandas/serializers.py
index 84471143367..12d0bee88ad 100644
--- a/python/pyspark/sql/pandas/serializers.py
+++ b/python/pyspark/sql/pandas/serializers.py
@@ -172,7 +172,7 @@ class ArrowStreamPandasSerializer(ArrowStreamSerializer):
 self._timezone = timezone
 self._safecheck = safecheck
 
-def arrow_to_pandas(self, arrow_column):
+def arrow_to_pandas(self, arrow_column, struct_in_pandas="dict"):
 # If the given column is a date type column, creates a series of 
datetime.date directly
 # instead of creating datetime64[ns] as intermediate data to avoid 
overflow caused by
 # datetime64[ns] type handling.
@@ -184,7 +184,7 @@ class ArrowStreamPandasSerializer(ArrowStreamSerializer):
 data_type=from_arrow_type(arrow_column.type, 
prefer_timestamp_ntz=True),
 nullable=True,
 timezone=self._timezone,
-struct_in_pandas="dict",
+struct_in_pandas=struct_in_pandas,
 error_on_duplicated_field_names=True,
 )
 return converter(s)
@@ -310,10 +310,18 @@ class 
ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer):
 Serializer used by Python worker to evaluate Pandas UDFs
 """
 
-def __init__(self, timezone, safecheck, assign_cols_by_name, 
df_for_struct=False):
+def __init__(
+self,
+timezone,
+safecheck,
+assign_cols_by_name,
+df_for_struct=False,
+struct_in_pandas="dict",
+):
 super(ArrowStreamPandasUDFSerializer, self).__init__(timezone, 
safecheck)
 self._assign_cols_by_name = assign_cols_by_name
 self._df_for_struct = df_for_struct
+self._struct_in_pandas = struct_in_pandas
 
 def arrow_to_pandas(self, arrow_column):
 import pyarrow.types as types
@@ -323,13 +331,15 @@ class 
ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer):
 
 series = [
 super(ArrowStreamPandasUDFSerializer, self)
-.arrow_to_pandas(column)
+.arrow_to_pandas(column, self._struct_in_pandas)
 .rename(field.name)
 for column, field in zip(arrow_column.flatten(), 
arrow_column.type)
 ]
 s = pd.concat(series, axis=1)
 else:
-s = super(ArrowStreamPandasUDFSerializer, 
self).arrow_to_pandas(arrow_column)
+s = super(ArrowStreamPandasUDFSerializer, self).arrow_to_pandas(
+arrow_column, self._struct_in_pandas
+)
 return s
 
 def _create_batch(self, series):
@@ -360,7 +370,7 @@ class 
ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer):
 
 arrs = []
 for s, t in series:
-if t is not None and pa.types.is_struct(t):
+if self._struct_in_pandas ==

[spark] branch master updated: [SPARK-41532][CONNECT][FOLLOWUP] add error class `SESSION_NOT_SAME` into error_classes.py

2023-05-22 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 40dd5235373 [SPARK-41532][CONNECT][FOLLOWUP] add error class 
`SESSION_NOT_SAME` into error_classes.py
40dd5235373 is described below

commit 40dd5235373891bdcc536e25082597aca24e6507
Author: Jia Fan 
AuthorDate: Mon May 22 10:51:25 2023 -0700

[SPARK-41532][CONNECT][FOLLOWUP] add error class `SESSION_NOT_SAME` into 
error_classes.py

### What changes were proposed in this pull request?
This is a follow up PR for #40684 . Add error class `SESSION_NOT_SAME` 
define into `error_classes.py` with a template error message.

### Why are the changes needed?
Unified error message

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
add new test.

Closes #41259 from Hisoka-X/follow_up_session_not_same.

Authored-by: Jia Fan 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/errors/error_classes.py  |  5 +
 python/pyspark/sql/connect/dataframe.py |  5 -
 .../pyspark/sql/tests/connect/test_connect_basic.py | 21 ++---
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/python/pyspark/errors/error_classes.py 
b/python/pyspark/errors/error_classes.py
index c7b00e0736d..817b8ce60db 100644
--- a/python/pyspark/errors/error_classes.py
+++ b/python/pyspark/errors/error_classes.py
@@ -576,6 +576,11 @@ ERROR_CLASSES_JSON = """
   "Result vector from pandas_udf was not the required length: expected 
, got ."
 ]
   },
+  "SESSION_NOT_SAME" : {
+"message" : [
+  "Both Datasets must belong to the same SparkSession."
+]
+  },
   "SESSION_OR_CONTEXT_EXISTS" : {
 "message" : [
   "There should not be an existing Spark Session or Spark Context."
diff --git a/python/pyspark/sql/connect/dataframe.py 
b/python/pyspark/sql/connect/dataframe.py
index 7a5ba50b3c6..4563366ef0f 100644
--- a/python/pyspark/sql/connect/dataframe.py
+++ b/python/pyspark/sql/connect/dataframe.py
@@ -265,7 +265,10 @@ class DataFrame:
 
 def checkSameSparkSession(self, other: "DataFrame") -> None:
 if self._session.session_id != other._session.session_id:
-raise SessionNotSameException("Both Datasets must belong to the 
same SparkSession")
+raise SessionNotSameException(
+error_class="SESSION_NOT_SAME",
+message_parameters={},
+)
 
 def coalesce(self, numPartitions: int) -> "DataFrame":
 if not numPartitions > 0:
diff --git a/python/pyspark/sql/tests/connect/test_connect_basic.py 
b/python/pyspark/sql/tests/connect/test_connect_basic.py
index dd5e52894c9..7225b6aa8d0 100644
--- a/python/pyspark/sql/tests/connect/test_connect_basic.py
+++ b/python/pyspark/sql/tests/connect/test_connect_basic.py
@@ -1815,14 +1815,29 @@ class SparkConnectBasicTests(SparkConnectSQLTestCase):
 spark2 = RemoteSparkSession(connection="sc://localhost")
 df2 = spark2.range(10).limit(3)
 
-with self.assertRaises(SessionNotSameException):
+with self.assertRaises(SessionNotSameException) as e1:
 df.union(df2).collect()
+self.check_error(
+exception=e1.exception,
+error_class="SESSION_NOT_SAME",
+message_parameters={},
+)
 
-with self.assertRaises(SessionNotSameException):
+with self.assertRaises(SessionNotSameException) as e2:
 df.unionByName(df2).collect()
+self.check_error(
+exception=e2.exception,
+error_class="SESSION_NOT_SAME",
+message_parameters={},
+)
 
-with self.assertRaises(SessionNotSameException):
+with self.assertRaises(SessionNotSameException) as e3:
 df.join(df2).collect()
+self.check_error(
+exception=e3.exception,
+error_class="SESSION_NOT_SAME",
+message_parameters={},
+)
 
 def test_extended_hint_types(self):
 cdf = self.connect.range(100).toDF("id")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-43543][PYTHON] Fix nested MapType behavior in Pandas UDF

2023-05-19 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new bc6f69a988f [SPARK-43543][PYTHON] Fix nested MapType behavior in 
Pandas UDF
bc6f69a988f is described below

commit bc6f69a988f13e5e22cb055e60693a545f0cbadb
Author: Xinrong Meng 
AuthorDate: Fri May 19 14:54:59 2023 -0700

[SPARK-43543][PYTHON] Fix nested MapType behavior in Pandas UDF

### What changes were proposed in this pull request?
Fix nested MapType behavior in Pandas UDF (and Arrow-optimized Python UDF).

Previously during Arrow-pandas conversion, only the outermost layer is 
converted to a dictionary; but now nested MapType will be converted to nested 
dictionaries.

That applies to Spark Connect as well.

### Why are the changes needed?
Correctness and consistency (with `createDataFrame` and `toPandas` when 
Arrow is enabled).

### Does this PR introduce _any_ user-facing change?
Yes.

Nested MapType type support is corrected in Pandas UDF

```py
>>> schema = StructType([
...  StructField("id", StringType(), True),
...  StructField("attributes", MapType(StringType(), 
MapType(StringType(), StringType())), True)
... ])
>>>
>>> data = [
...("1", {"personal": {"name": "John", "city": "New York"}}),
... ]
>>> df = spark.createDataFrame(data, schema)
>>> pandas_udf(StringType())
... def f(s: pd.Series) -> pd.Series:
...return s.astype(str)
...
>>> df.select(f(df.attributes)).show(truncate=False)
```

The results of `df.select(f(df.attributes)).show(truncate=False)` is 
corrected

**FROM**
```py

+--+
|f(attributes) |
+--+
|{'personal': [('name', 'John'), ('city', 'New York')]}|
+--+
```

**TO**
```py
>>> df.select(f(df.attributes)).show(truncate=False)
+--+
|f(attributes) |
+--+
|{'personal': {'name': 'John', 'city': 'New York'}}|
+--+

```

**Another more obvious example:**
```py
>>> pandas_udf(StringType())
... def extract_name(s:pd.Series) -> pd.Series:
... return s.apply(lambda x: x['personal']['name'])
...
>>> df.select(extract_name(df.attributes)).show(truncate=False)
```

`df.select(extract_name(df.attributes)).show(truncate=False)` is corrected

**FROM**
```py
org.apache.spark.api.python.PythonException: Traceback (most recent call 
last):
...
TypeError: list indices must be integers or slices, not str
    ```

**TO**
```py
+--------+
|extract_name(attributes)|
++
|John|
++
```

### How was this patch tested?
Unit tests.

Closes #41147 from xinrong-meng/nestedType.

Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/sql/pandas/serializers.py   | 91 --
 .../sql/tests/pandas/test_pandas_udf_scalar.py | 30 +++
 2 files changed, 47 insertions(+), 74 deletions(-)

diff --git a/python/pyspark/sql/pandas/serializers.py 
b/python/pyspark/sql/pandas/serializers.py
index 9b5db2d000d..e81d90fc23e 100644
--- a/python/pyspark/sql/pandas/serializers.py
+++ b/python/pyspark/sql/pandas/serializers.py
@@ -21,7 +21,12 @@ Serializers for PyArrow and pandas conversions. See 
`pyspark.serializers` for mo
 
 from pyspark.errors import PySparkTypeError, PySparkValueError
 from pyspark.serializers import Serializer, read_int, write_int, 
UTF8Deserializer, CPickleSerializer
-from pyspark.sql.pandas.types import from_arrow_type, to_arrow_type, 
_create_converter_from_pandas
+from pyspark.sql.pandas.types import (
+from_arrow_type,
+to_arrow_type,
+_create_converter_from_pandas,
+_create_converter_to_pandas,
+)
 from pyspark.sql.types import StringType, StructType, BinaryType, StructField, 
LongType
 
 
@@ -168,23 +173,21 @@ class ArrowStreamPandasSerializer(ArrowStreamSerializer):
 self._safecheck = safecheck
 
 def arrow_to_pandas(self, arrow_column):
-from pyspark.sql.

[spark-website] branch asf-site updated: Update Apache Spark 3.5 Release Window

2023-05-11 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 18ca078b23 Update Apache Spark 3.5 Release Window
18ca078b23 is described below

commit 18ca078b23f826c24bed32df1dc89854a91cb580
Author: Xinrong Meng 
AuthorDate: Thu May 11 17:42:37 2023 -0700

Update Apache Spark 3.5 Release Window

Update Apache Spark 3.5 Release Window, with proposed dates:

```
| July 16th 2023 | Code freeze. Release branch cut.|
| Late July 2023 | QA period. Focus on bug fixes, tests, stability and 
docs. Generally, no new features merged.|
| August 2023| Release candidates (RC), voting, etc. until final 
release passes|
```

Author: Xinrong Meng 

Closes #461 from xinrong-meng/3.5release_window.
---
 site/versioning-policy.html | 8 
 versioning-policy.md| 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/site/versioning-policy.html b/site/versioning-policy.html
index d25bd676c7..74b559d5e8 100644
--- a/site/versioning-policy.html
+++ b/site/versioning-policy.html
@@ -250,7 +250,7 @@ available APIs.
 Hence, Spark 2.3.0 would generally be released about 6 months after 2.2.0. 
Maintenance releases happen as needed
 in between feature releases. Major releases do not happen according to a fixed 
schedule.
 
-Spark 3.4 release window
+Spark 3.5 release window
 
 
   
@@ -261,15 +261,15 @@ in between feature releases. Major releases do not happen 
according to a fixed s
   
   
 
-  January 16th 2023
+  July 16th 2023
   Code freeze. Release branch cut.
 
 
-  Late January 2023
+  Late July 2023
   QA period. Focus on bug fixes, tests, stability and docs. Generally, 
no new features merged.
 
 
-  February 2023
+  August 2023
   Release candidates (RC), voting, etc. until final release passes
 
   
diff --git a/versioning-policy.md b/versioning-policy.md
index 153085259f..0f3892e8a2 100644
--- a/versioning-policy.md
+++ b/versioning-policy.md
@@ -103,13 +103,13 @@ The branch is cut every January and July, so feature 
("minor") releases occur ab
 Hence, Spark 2.3.0 would generally be released about 6 months after 2.2.0. 
Maintenance releases happen as needed
 in between feature releases. Major releases do not happen according to a fixed 
schedule.
 
-Spark 3.4 release window
+Spark 3.5 release window
 
 | Date  | Event |
 | - | - |
-| January 16th 2023 | Code freeze. Release branch cut.|
-| Late January 2023 | QA period. Focus on bug fixes, tests, stability and 
docs. Generally, no new features merged.|
-| February 2023 | Release candidates (RC), voting, etc. until final release 
passes|
+| July 16th 2023 | Code freeze. Release branch cut.|
+| Late July 2023 | QA period. Focus on bug fixes, tests, stability and docs. 
Generally, no new features merged.|
+| August 2023 | Release candidates (RC), voting, etc. until final release 
passes|
 
 Maintenance releases and EOL
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-43412][PYTHON][CONNECT] Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow-optimized Python UDFs

2023-05-10 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 32ab341071a [SPARK-43412][PYTHON][CONNECT] Introduce 
`SQL_ARROW_BATCHED_UDF` EvalType for Arrow-optimized Python UDFs
32ab341071a is described below

commit 32ab341071aa69917f820baf5f61668c2455f1db
Author: Xinrong Meng 
AuthorDate: Wed May 10 13:09:15 2023 -0700

[SPARK-43412][PYTHON][CONNECT] Introduce `SQL_ARROW_BATCHED_UDF` EvalType 
for Arrow-optimized Python UDFs

### What changes were proposed in this pull request?
Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow-optimized Python UDFs.

An EvalType is used to uniquely identify a UDF type in PySpark.

### Why are the changes needed?
We are about to improve nested non-atomic input/output support of an 
Arrow-optimized Python UDF.

However, currently, it shares the same EvalType with a pickled Python UDF, 
but the same implementation with a Pandas UDF.

Introducing an EvalType enables isolating the changes to Arrow-optimized 
Python UDFs.

The PR is also a pre-requisite for registering an Arrow-optimized Python 
UDF.

### Does this PR introduce _any_ user-facing change?
No user-facing behavior/result changes for Arrow-optimized Python UDFs.

An `evalType`, as an attribute mainly designed for internal use, is changed 
as shown below:

```py
>>> udf(lambda x: str(x), useArrow=True).evalType == 
PythonEvalType.SQL_ARROW_BATCHED_UDF
True

# whereas

>>> udf(lambda x: str(x), useArrow=False).evalType == 
PythonEvalType.SQL_BATCHED_UDF
True
```

### How was this patch tested?
A new unit test `test_eval_type` and existing tests.

Closes #41053 from xinrong-meng/evalTypeArrowPyUDF.

Authored-by: Xinrong Meng 
    Signed-off-by: Xinrong Meng 
---
 .../main/scala/org/apache/spark/api/python/PythonRunner.scala| 2 ++
 python/pyspark/rdd.py| 3 ++-
 python/pyspark/sql/_typing.pyi   | 1 +
 python/pyspark/sql/connect/functions.py  | 7 +--
 python/pyspark/sql/connect/udf.py| 3 +--
 python/pyspark/sql/functions.py  | 6 +-
 python/pyspark/sql/pandas/functions.py   | 3 +++
 python/pyspark/sql/tests/test_arrow_python_udf.py| 9 +
 python/pyspark/sql/udf.py| 8 +++-
 python/pyspark/worker.py | 9 ++---
 .../org/apache/spark/sql/catalyst/expressions/PythonUDF.scala| 1 +
 .../apache/spark/sql/execution/python/ExtractPythonUDFs.scala| 3 ++-
 12 files changed, 32 insertions(+), 23 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala 
b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
index 0b420f268ee..912e76005f0 100644
--- a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
+++ b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
@@ -44,6 +44,7 @@ private[spark] object PythonEvalType {
   val NON_UDF = 0
 
   val SQL_BATCHED_UDF = 100
+  val SQL_ARROW_BATCHED_UDF = 101
 
   val SQL_SCALAR_PANDAS_UDF = 200
   val SQL_GROUPED_MAP_PANDAS_UDF = 201
@@ -58,6 +59,7 @@ private[spark] object PythonEvalType {
   def toString(pythonEvalType: Int): String = pythonEvalType match {
 case NON_UDF => "NON_UDF"
 case SQL_BATCHED_UDF => "SQL_BATCHED_UDF"
+case SQL_ARROW_BATCHED_UDF => "SQL_ARROW_BATCHED_UDF"
 case SQL_SCALAR_PANDAS_UDF => "SQL_SCALAR_PANDAS_UDF"
 case SQL_GROUPED_MAP_PANDAS_UDF => "SQL_GROUPED_MAP_PANDAS_UDF"
 case SQL_GROUPED_AGG_PANDAS_UDF => "SQL_GROUPED_AGG_PANDAS_UDF"
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py
index 13f93fbdad6..e6ef7f6108e 100644
--- a/python/pyspark/rdd.py
+++ b/python/pyspark/rdd.py
@@ -110,7 +110,7 @@ if TYPE_CHECKING:
 )
 from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.types import AtomicType, StructType
-from pyspark.sql._typing import AtomicValue, RowLike, SQLBatchedUDFType
+from pyspark.sql._typing import AtomicValue, RowLike, 
SQLArrowBatchedUDFType, SQLBatchedUDFType
 
 from py4j.java_gateway import JavaObject
 from py4j.java_collections import JavaArray
@@ -140,6 +140,7 @@ class PythonEvalType:
 NON_UDF: "NonUDFType" = 0
 
 SQL_BATCHED_UDF: "SQLBatchedUDFType" = 100
+SQL_ARROW_BATCHED_UDF: "SQLArrowBatchedUDFType" = 101
 
 SQL_SCALAR_PANDAS_UDF: "PandasScalarUDFType" = 200
 SQL_GROUPED_MAP_PANDAS_UDF: "Pand

[spark] branch master updated: [SPARK-41971][SQL][PYTHON] Add a config for pandas conversion how to handle struct types

2023-05-04 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 305aa4a89ef [SPARK-41971][SQL][PYTHON] Add a config for pandas 
conversion how to handle struct types
305aa4a89ef is described below

commit 305aa4a89efe02f517f82039225a99b31b20146f
Author: Takuya UESHIN 
AuthorDate: Thu May 4 11:01:28 2023 -0700

[SPARK-41971][SQL][PYTHON] Add a config for pandas conversion how to handle 
struct types

### What changes were proposed in this pull request?

Adds a config for pandas conversion how to handle struct types.

- `spark.sql.execution.pandas.structHandlingMode` (default: `"legacy"`)

The conversion mode of struct type when creating pandas DataFrame.

 When `"legacy"`, the behavior is the same as before, except that with 
Arrow and Spark Connect will raise a more readable exception when there are 
duplicated nested field names.

```py
>>> spark.sql("values (1, struct(1 as a, 2 as a)) as t(x, y)").toPandas()
Traceback (most recent call last):
...
pyspark.errors.exceptions.connect.UnsupportedOperationException: 
[DUPLICATED_FIELD_NAME_IN_ARROW_STRUCT] Duplicated field names in Arrow Struct 
are not allowed, got [a, a].
```

 When `"row"`, convert to Row object regardless of Arrow optimization.

```py
>>> spark.conf.set('spark.sql.execution.pandas.structHandlingMode', 'row')
>>> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', False)
>>> spark.sql("values (1, struct(1 as a, 2 as b)) as t(x, y)").toPandas()
   x   y
0  1  (1, 2)
>>> spark.sql("values (1, struct(1 as a, 2 as a)) as t(x, y)").toPandas()
   x   y
0  1  (1, 2)
>>> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True)
>>> spark.sql("values (1, struct(1 as a, 2 as b)) as t(x, y)").toPandas()
   x   y
0  1  (1, 2)
>>> spark.sql("values (1, struct(1 as a, 2 as a)) as t(x, y)").toPandas()
   x   y
0  1  (1, 2)
```

 When `"dict"`, convert to dict and use suffixed key names, e.g., 
`a_0`, `a_1`, if there are duplicated nested field names, regardless of Arrow 
optimization.

```py
>>> spark.conf.set('spark.sql.execution.pandas.structHandlingMode', 'dict')
>>> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', False)
>>> spark.sql("values (1, struct(1 as a, 2 as b)) as t(x, y)").toPandas()
   x y
0  1  {'a': 1, 'b': 2}
>>> spark.sql("values (1, struct(1 as a, 2 as a)) as t(x, y)").toPandas()
   x y
0  1  {'a_0': 1, 'a_1': 2}
>>> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True)
>>> spark.sql("values (1, struct(1 as a, 2 as b)) as t(x, y)").toPandas()
   x y
0  1  {'a': 1, 'b': 2}
>>> spark.sql("values (1, struct(1 as a, 2 as a)) as t(x, y)").toPandas()
   x y
0  1  {'a_0': 1, 'a_1': 2}
```

### Why are the changes needed?

Currently there are three behaviors when `df.toPandas()` with nested struct 
types:

- vanilla PySpark with Arrow optimization disabled

```py
>>> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', False)
>>> spark.sql("values (1, struct(1 as a, 2 as b)) as t(x, y)").toPandas()
   x   y
0  1  (1, 2)
```

using `Row` object for struct types.

It can use duplicated field names.

```py
>>> spark.sql("values (1, struct(1 as a, 2 as a)) as t(x, y)").toPandas()
   x   y
0  1  (1, 2)
```

- vanilla PySpark with Arrow optimization enabled

```py
>>> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True)
>>> spark.sql("values (1, struct(1 as a, 2 as b)) as t(x, y)").toPandas()
   x y
0  1  {'a': 1, 'b': 2}
```

using `dict` for struct types.

It raises an Exception when there are duplicated nested field names:

```py
>>> spark.sql("values (1, struct(1 as a, 2 as a)) as t(x, y)").toPandas()
Traceback (most recent call last):
...
pyarrow.lib.ArrowInvalid: Ran out of field metadata, likely malformed
```

- Spark C

[spark] branch master updated (8711c1a6ad9 -> 5cb7e6ffd91)

2023-05-02 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 8711c1a6ad9 [SPARK-42945][CONNECT][FOLLOW-UP] Add user_id and 
session_id when logging errors
 add 5cb7e6ffd91 [SPARK-43032][CONNECT][SS] Add Streaming query manager

No new revisions were added by this update.

Summary of changes:
 .../src/main/protobuf/spark/connect/base.proto |   3 +
 .../src/main/protobuf/spark/connect/commands.proto |  50 -
 .../sql/connect/planner/SparkConnectPlanner.scala  |  76 ++-
 .../sql/connect/service/SparkConnectService.scala  |   8 +-
 python/pyspark/sql/connect/client.py   |   3 +
 python/pyspark/sql/connect/proto/base_pb2.py   | 124 ++--
 python/pyspark/sql/connect/proto/base_pb2.pyi  |  13 ++
 python/pyspark/sql/connect/proto/commands_pb2.py   | 210 +--
 python/pyspark/sql/connect/proto/commands_pb2.pyi  | 225 +
 python/pyspark/sql/connect/session.py  |   9 +-
 python/pyspark/sql/connect/streaming/__init__.py   |   1 +
 python/pyspark/sql/connect/streaming/query.py  |  87 +++-
 python/pyspark/sql/connect/streaming/readwriter.py |   3 +-
 .../connect/streaming/test_parity_streaming.py |  27 +--
 .../sql/tests/connect/test_connect_basic.py|   1 -
 .../test_parity_pandas_grouped_map_with_state.py   |   6 +-
 16 files changed, 673 insertions(+), 173 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r61288 - in /dev/spark: v3.2.4-rc1-docs/ v3.4.0-rc7-docs/

2023-04-14 Thread xinrong
Author: xinrong
Date: Fri Apr 14 20:31:17 2023
New Revision: 61288

Log:
Removing RC artifacts.

Removed:
dev/spark/v3.2.4-rc1-docs/
dev/spark/v3.4.0-rc7-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-website] branch asf-site updated: Fix the download page of Spark 3.4.0

2023-04-14 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 624de69568 Fix the download page of Spark 3.4.0
624de69568 is described below

commit 624de69568e5c743206a63cfc49d8647e41e1167
Author: Gengliang Wang 
AuthorDate: Fri Apr 14 13:03:59 2023 -0700

Fix the download page of Spark 3.4.0


Currently it shows 3.3.2 on top
https://user-images.githubusercontent.com/1097932/232143660-9d97a7c0-5eb0-44af-9f06-41cb6386a2dd.png";>

After fix:
https://user-images.githubusercontent.com/1097932/232143685-ee5b06e6-3af9-43ea-8690-209f4d8cd25f.png";>

Author: Gengliang Wang 

Closes #451 from gengliangwang/fixDownload.
---
 js/downloads.js  | 2 +-
 site/js/downloads.js | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/js/downloads.js b/js/downloads.js
index 915b9c8809..9781273310 100644
--- a/js/downloads.js
+++ b/js/downloads.js
@@ -25,9 +25,9 @@ var packagesV12 = [hadoop3p3, hadoop3p3scala213, hadoop2p7, 
hadoopFree, sources]
 // 3.3.0+
 var packagesV13 = [hadoop3p, hadoop3pscala213, hadoop2p, hadoopFree, sources];
 
+addRelease("3.4.0", new Date("04/13/2023"), packagesV13, true);
 addRelease("3.3.2", new Date("02/17/2023"), packagesV13, true);
 addRelease("3.2.4", new Date("04/13/2023"), packagesV12, true);
-addRelease("3.4.0", new Date("04/13/2023"), packagesV13, true);
 
 function append(el, contents) {
   el.innerHTML += contents;
diff --git a/site/js/downloads.js b/site/js/downloads.js
index 915b9c8809..9781273310 100644
--- a/site/js/downloads.js
+++ b/site/js/downloads.js
@@ -25,9 +25,9 @@ var packagesV12 = [hadoop3p3, hadoop3p3scala213, hadoop2p7, 
hadoopFree, sources]
 // 3.3.0+
 var packagesV13 = [hadoop3p, hadoop3pscala213, hadoop2p, hadoopFree, sources];
 
+addRelease("3.4.0", new Date("04/13/2023"), packagesV13, true);
 addRelease("3.3.2", new Date("02/17/2023"), packagesV13, true);
 addRelease("3.2.4", new Date("04/13/2023"), packagesV12, true);
-addRelease("3.4.0", new Date("04/13/2023"), packagesV13, true);
 
 function append(el, contents) {
   el.innerHTML += contents;


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r61281 - in /dev/spark/v3.4.0-rc7-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api

2023-04-14 Thread xinrong
Author: xinrong
Date: Fri Apr 14 18:58:10 2023
New Revision: 61281

Log:
Apache Spark v3.4.0-rc7 docs


[This commit notification would consist of 2789 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r61236 - in /dev/spark: v3.4.0-rc1-bin/ v3.4.0-rc1-docs/ v3.4.0-rc2-bin/ v3.4.0-rc2-docs/ v3.4.0-rc3-bin/ v3.4.0-rc3-docs/ v3.4.0-rc4-bin/ v3.4.0-rc4-docs/ v3.4.0-rc5-bin/ v3.4.0-rc5-docs/

2023-04-13 Thread xinrong
Author: xinrong
Date: Thu Apr 13 19:33:23 2023
New Revision: 61236

Log:
Removing RC artifacts.

Removed:
dev/spark/v3.4.0-rc1-bin/
dev/spark/v3.4.0-rc1-docs/
dev/spark/v3.4.0-rc2-bin/
dev/spark/v3.4.0-rc2-docs/
dev/spark/v3.4.0-rc3-bin/
dev/spark/v3.4.0-rc3-docs/
dev/spark/v3.4.0-rc4-bin/
dev/spark/v3.4.0-rc4-docs/
dev/spark/v3.4.0-rc5-bin/
dev/spark/v3.4.0-rc5-docs/
dev/spark/v3.4.0-rc6-bin/
dev/spark/v3.4.0-rc6-docs/
dev/spark/v3.4.0-rc7-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r61125 - in /dev/spark/v3.4.0-rc7-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api

2023-04-07 Thread xinrong
Author: xinrong
Date: Fri Apr  7 19:11:49 2023
New Revision: 61125

Log:
Apache Spark v3.4.0-rc7 docs


[This commit notification would consist of 2789 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r61114 - /dev/spark/v3.4.0-rc7-bin/

2023-04-06 Thread xinrong
Author: xinrong
Date: Fri Apr  7 02:45:25 2023
New Revision: 61114

Log:
Apache Spark v3.4.0-rc7

Added:
dev/spark/v3.4.0-rc7-bin/
dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz   (with props)
dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz.asc
dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz.sha512
dev/spark/v3.4.0-rc7-bin/pyspark-3.4.0.tar.gz   (with props)
dev/spark/v3.4.0-rc7-bin/pyspark-3.4.0.tar.gz.asc
dev/spark/v3.4.0-rc7-bin/pyspark-3.4.0.tar.gz.sha512
dev/spark/v3.4.0-rc7-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz   (with 
props)
dev/spark/v3.4.0-rc7-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.asc
dev/spark/v3.4.0-rc7-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.sha512
dev/spark/v3.4.0-rc7-bin/spark-3.4.0-bin-hadoop3.tgz   (with props)
dev/spark/v3.4.0-rc7-bin/spark-3.4.0-bin-hadoop3.tgz.asc
dev/spark/v3.4.0-rc7-bin/spark-3.4.0-bin-hadoop3.tgz.sha512
dev/spark/v3.4.0-rc7-bin/spark-3.4.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.4.0-rc7-bin/spark-3.4.0-bin-without-hadoop.tgz.asc
dev/spark/v3.4.0-rc7-bin/spark-3.4.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.4.0-rc7-bin/spark-3.4.0.tgz   (with props)
dev/spark/v3.4.0-rc7-bin/spark-3.4.0.tgz.asc
dev/spark/v3.4.0-rc7-bin/spark-3.4.0.tgz.sha512

Added: dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz.asc
==
--- dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz.asc (added)
+++ dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz.asc Fri Apr  7 02:45:25 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQvg1kTHHhpbnJvbmdA
+YXBhY2hlLm9yZwAKCRCn5XkIx6ThsXF3D/9DJKcP/8+T/T2cddS049hOxspKDbm2
+Q1oIy04RZ1KllpeZtZVxpUCy7vE7F2srNjFrZ3OMY76/DeyBdwUBLGbrpA51FBRy
+RmVM2x9Z9zj2rhfWK02IqC9a7RueMif15UwIGQSCEsS3H5ep3eHR2O4Vqof42rpj
+Qf8hTqRC3y6OPxKS/kyhwof3CtzSe5TzmGeQ8GLlsr1cOQ1K8V6tRv4L4xtqYKlx
+NA0ekUWKMylVzNj7AxdoWUpRCJyy+GbzT8PKp53imwaUjVp3FU8F3yZTd3kj9rxY
+aNY5pWVTj2930gqDKHnJcGs3jq39GfjKu1hKMN+XAwmJEi//I2W96xvbEjoBxEh3
+SES5oyPLGCUHhWPFB+wsw3hD3JelJKI7X7KLdOl5KTccECbTIxm141zv/tB3RNRE
+07DmCYiVrvsi5+CTngbXCcJVG0PZJ59vlSE58bYLe0cafKjRXMHWX1YT+YeeES4m
+jWhU9PClnAnS4Z7uCrmcI9/nXFiavNkSdp2yRLfS4Eew1Mtavd49exk68NrVhKBs
+VY1h3Sl1NY7UfcaWtUrCng8bCyHbWNIwoZ8yNJaDXKbvKyxTX88T+x4ulysyB6Xo
+7bAnx1KlrZBaVRG/iE6dnLglokW7dbBoE09QcBDslPjbfTSX8ldaPSGKK1Bwe/D3
+1nb+LTsY6sZNQg==
+=s5Gu
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz.sha512
==
--- dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz.sha512 (added)
+++ dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz.sha512 Fri Apr  7 02:45:25 2023
@@ -0,0 +1 @@
+4c7c4fa6018cb000dca30a34b5d963b30334be7230161a4f01b5c3192a3f87b0a54030b08f9bbfd58a3a796fb4bb7c607c1ba757f303f4da21e0f50dbce77b94
  SparkR_3.4.0.tar.gz

Added: dev/spark/v3.4.0-rc7-bin/pyspark-3.4.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.0-rc7-bin/pyspark-3.4.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.0-rc7-bin/pyspark-3.4.0.tar.gz.asc
==
--- dev/spark/v3.4.0-rc7-bin/pyspark-3.4.0.tar.gz.asc (added)
+++ dev/spark/v3.4.0-rc7-bin/pyspark-3.4.0.tar.gz.asc Fri Apr  7 02:45:25 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQvg1sTHHhpbnJvbmdA
+YXBhY2hlLm9yZwAKCRCn5XkIx6ThsWuEEACDllGUxRb6YQEI2/pjfmRjtRo3WFMy
+SZzNTkBaGmJii8wB9I9OuUtJ5k1MCHSjWW6cwMZ4JJb8BDZ2ZVDy/66tZoKAIK7u
+aUvbjcF8IpudwqTTn0VBfQVsLzE9G4clEoGFJpeCvg61+CqY1sxkhtg6FFMTgyhb
+aMZOlz9uvEnYoYXoM0ZU4aLNAxclnhmE42+5j1MF3aiSR3Q/WaZEx/ECcEF4XhE1
+Q+53AmvnPm6yFFcqRQd023xWMnP6Y1zBBLnp2GZ2/SzCUkJrfvdueCDiOaiFrdnO
+Jrf45ZBMaOcloy/tGSKl/ykjjYKEUVk980Y6guC63Nym+sf19Da8eD2AqQSxxLiQ
+4tLH8owFHP4tr4C4MmfVD3R1HyNFk97scRDjCrCA0wMGLy9B3oSbE0yoRDRxZyei
+dT7y2OsGYQ7bSV1+sV6uQB59QarxBINOrl5nH/L8qz+H7tWA/UMCHCmlSyuYc/m4
+D0IMj4cDrpbahVN1dQelDOwO+pmMrlXMXkA4HAwJPQd5V0wcGWJWYlEz4FeoGr+0
+BkuNdngw21NnwH8ebbW2KbdNe235yfNfXK+pVQq5NerUKBuBpzM73AqI3idjFzTd
+pgeYrmbUMQxgPKZgZ/Fm025fwxW6e1z9aJdPJOa1baXT1gaiUtalzok7/En2t/Wj
+48RFugofvd1TGA==
+=s7ET
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.0-rc7-bin/pyspark-3.4.0.tar.gz.sha512
==
--- dev/spark/v3.4.0-rc7-bin/pyspark-3.4.0.tar.gz.sha512 (added)
+++ dev/spark

[spark] 01/01: Preparing development version 3.4.1-SNAPSHOT

2023-04-06 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git

commit e4eea55d0a2ef7a8b8a44994750fdfd383517944
Author: Xinrong Meng 
AuthorDate: Fri Apr 7 01:28:49 2023 +

Preparing development version 3.4.1-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 4a32762b34c..fa7028630a8 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.0
+Version: 3.4.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index c58da7aa112..b86fee4bceb 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 95ea15552da..f9ecfb3d692 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index e4d98471bf9..22ee65b7d25 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 7a6d5aedf65..2c67da81ca4 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 1c421754083..219682e047d 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch

[spark] branch branch-3.4 updated (b2ff4c4f7ec -> e4eea55d0a2)

2023-04-06 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


from b2ff4c4f7ec [SPARK-39696][CORE] Fix data race in access to 
TaskMetrics.externalAccums
 add 87a5442f7ed Preparing Spark release v3.4.0-rc7
 new e4eea55d0a2 Preparing development version 3.4.1-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing Spark release v3.4.0-rc7

2023-04-06 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to tag v3.4.0-rc7
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 87a5442f7ed96b11051d8a9333476d080054e5a0
Author: Xinrong Meng 
AuthorDate: Fri Apr 7 01:28:44 2023 +

Preparing Spark release v3.4.0-rc7
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index fa7028630a8..4a32762b34c 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.1
+Version: 3.4.0
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index b86fee4bceb..c58da7aa112 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index f9ecfb3d692..95ea15552da 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 22ee65b7d25..e4d98471bf9 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 2c67da81ca4..7a6d5aedf65 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 219682e047d..1c421754083 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 22ce7

[spark] tag v3.4.0-rc7 created (now 87a5442f7ed)

2023-04-06 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to tag v3.4.0-rc7
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 87a5442f7ed (commit)
This tag includes the following new commits:

 new 87a5442f7ed Preparing Spark release v3.4.0-rc7

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r61110 - in /dev/spark/v3.4.0-rc6-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api

2023-04-06 Thread xinrong
Author: xinrong
Date: Thu Apr  6 19:21:26 2023
New Revision: 61110

Log:
Apache Spark v3.4.0-rc6 docs


[This commit notification would consist of 2789 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r61108 - /dev/spark/v3.4.0-rc6-bin/

2023-04-06 Thread xinrong
Author: xinrong
Date: Thu Apr  6 17:58:16 2023
New Revision: 61108

Log:
Apache Spark v3.4.0-rc6

Added:
dev/spark/v3.4.0-rc6-bin/
dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz   (with props)
dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz.asc
dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz.sha512
dev/spark/v3.4.0-rc6-bin/pyspark-3.4.0.tar.gz   (with props)
dev/spark/v3.4.0-rc6-bin/pyspark-3.4.0.tar.gz.asc
dev/spark/v3.4.0-rc6-bin/pyspark-3.4.0.tar.gz.sha512
dev/spark/v3.4.0-rc6-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz   (with 
props)
dev/spark/v3.4.0-rc6-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.asc
dev/spark/v3.4.0-rc6-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.sha512
dev/spark/v3.4.0-rc6-bin/spark-3.4.0-bin-hadoop3.tgz   (with props)
dev/spark/v3.4.0-rc6-bin/spark-3.4.0-bin-hadoop3.tgz.asc
dev/spark/v3.4.0-rc6-bin/spark-3.4.0-bin-hadoop3.tgz.sha512
dev/spark/v3.4.0-rc6-bin/spark-3.4.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.4.0-rc6-bin/spark-3.4.0-bin-without-hadoop.tgz.asc
dev/spark/v3.4.0-rc6-bin/spark-3.4.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.4.0-rc6-bin/spark-3.4.0.tgz   (with props)
dev/spark/v3.4.0-rc6-bin/spark-3.4.0.tgz.asc
dev/spark/v3.4.0-rc6-bin/spark-3.4.0.tgz.sha512

Added: dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz.asc
==
--- dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz.asc (added)
+++ dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz.asc Thu Apr  6 17:58:16 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQvB8kTHHhpbnJvbmdA
+YXBhY2hlLm9yZwAKCRCn5XkIx6ThsToQEACqdYF76eiLZgfskKs8ZpVroBu6FV0l
+kT6CPB72l1l1vrfSDa9BbYW/Ty0QB0/t2ZV74p1avk5w/qyM6Otg7Gtkx3qFBMZw
+YIcMUFdeeXYc8hiOLFqoTHfdQVzvJNaoXofbfZAOcEOR4cRhofXPsgRYGQK8ZJwQ
+2Ek9a6KKUzn8bWfS2v+Z/bjLfArZ0QP2/qs9qdghsJqfhS6vGvFz9H45vfzpJyGw
+WdRQIRdmGvsxX9cyOG6QJv9Aq7MuT+hDBM0H/yip3wppEKSjIByj0MqapnuUrkML
+06SeK3fVx/sy9UzEHKWZKGDDiqlx5TCCaGC44N/+yiytmtrB3RxKhSiFy4G2s41+
+fqkMVgA3tbR2zIea/FJHYo7iO4YZMKN9YmXYFFZzARcwZgUVbyDvoLg07Rsww921
+FcoPYiUsFmA7Eb1vyp0HWmXYqwqSkuRujLkf4LkpX1JiRh0I2EEThPQ042nN+trN
+2iW35q9WCOJVbcdLcMv6KVP3Ipa6A9BGc4bvd+cmi7P9Fv8zgboDbIV8XiC45dRb
+v1C8NZ9Zca8V3XAdy+nds8fJW1Bvc6O12ch8MtMauV4TH22rTfmWBuVABsglQQlG
+c8sCLSOdRo1k80pBFZFg4ZFMFs/NjNa0PDtD8hZIhJEk24AaxCLQT/YlyUu9flqp
+37JM51CLEIL+xA==
+=2jm9
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz.sha512
==
--- dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz.sha512 (added)
+++ dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz.sha512 Thu Apr  6 17:58:16 2023
@@ -0,0 +1 @@
+2101b54ecaf72f6175a7e83651bee5cd2ca292ecb54f59461d7edb535d03130c9019eaa502912562d5aa1d5cec54f0eed3f862b51b6467b5beb6fcf09945b7b4
  SparkR_3.4.0.tar.gz

Added: dev/spark/v3.4.0-rc6-bin/pyspark-3.4.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.0-rc6-bin/pyspark-3.4.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.0-rc6-bin/pyspark-3.4.0.tar.gz.asc
==
--- dev/spark/v3.4.0-rc6-bin/pyspark-3.4.0.tar.gz.asc (added)
+++ dev/spark/v3.4.0-rc6-bin/pyspark-3.4.0.tar.gz.asc Thu Apr  6 17:58:16 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQvB8sTHHhpbnJvbmdA
+YXBhY2hlLm9yZwAKCRCn5XkIx6ThsQb0D/4072E3g9OGkQjzd1KuzEoY3bk79eRC
+l6qohjnUlhP4j1l/sqNdtiS1hz1g5PgDpJlxlj20st3A9P7A0bOZN3r4BsMd38nA
++D0xIjdgqhax3XZVHhETudPwKWyboWM+Id32cuiJifGYPz9MnJBkTFQMxlZWz7Ny
+hbwNC2H435anO1BGiuyiUaFztfoOJ5aMZZaQHfXTAszwm7KJhkpZP1NC0YVdklhI
+71id0OYNziIIkYLJpSAlzQk2RLvR8Ok9NyELSOc6AzQ5tmLIVLWFVb9tfH69cYo8
+DHOEQqD4KdwDsb030lvXbQ4n6blns1b+i7gOdWzr6a/sQd1TwGq2SDkYlcQ++8/W
+HU7+9C9Oula/RpzYcvPiWnneoAjN7zZgJfYm8aCEP62mCH/eQVJePDBnRLQUTtWD
+gbBIId/qFXDYi1DOmFzz6Awh/EGA04TrnBbKqVSPC9g6p3VQCoUTNKwVKLCyvQx5
+QxbtpP7FjSxdB4TQAiDyo0U/o6b6AEx+wz43G14sv9gD3wNK8wtIBbh2PMrQuL0M
+7QSgFwVkp6vLmRjsrSslrxW8zqbfc0HkrTSNnV2odtRcv0ZsAEikWMki68cnkjbC
+GPFiUxjlNz1yMRrG/3dnmfHOvnlt84HtzUzxObxVO2xXgjSV5mEG94hywTEHeTA1
+dceim2kd4JjGSg==
+=0nLo
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.0-rc6-bin/pyspark-3.4.0.tar.gz.sha512
==
--- dev/spark/v3.4.0-rc6-bin/pyspark-3.4.0.tar.gz.sha512 (added)
+++ dev/spark

[spark] 01/01: Preparing development version 3.4.1-SNAPSHOT

2023-04-06 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 1d974a7a78fa2a4d688d5e8606dcd084ab08b220
Author: Xinrong Meng 
AuthorDate: Thu Apr 6 16:38:33 2023 +

Preparing development version 3.4.1-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 4a32762b34c..fa7028630a8 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.0
+Version: 3.4.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index c58da7aa112..b86fee4bceb 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 95ea15552da..f9ecfb3d692 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index e4d98471bf9..22ee65b7d25 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 7a6d5aedf65..2c67da81ca4 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 1c421754083..219682e047d 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch

[spark] branch branch-3.4 updated (90376424779 -> 1d974a7a78f)

2023-04-06 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


from 90376424779 [SPARK-43041][SQL] Restore constructors of exceptions for 
compatibility in connector API
 add 28d0723beb3 Preparing Spark release v3.4.0-rc6
 new 1d974a7a78f Preparing development version 3.4.1-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing Spark release v3.4.0-rc6

2023-04-06 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to tag v3.4.0-rc6
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 28d0723beb3579c17df84bb22c98a487d7a72023
Author: Xinrong Meng 
AuthorDate: Thu Apr 6 16:38:28 2023 +

Preparing Spark release v3.4.0-rc6
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index fa7028630a8..4a32762b34c 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.1
+Version: 3.4.0
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index b86fee4bceb..c58da7aa112 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index f9ecfb3d692..95ea15552da 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 22ee65b7d25..e4d98471bf9 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 2c67da81ca4..7a6d5aedf65 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 219682e047d..1c421754083 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 22ce7

[spark] tag v3.4.0-rc6 created (now 28d0723beb3)

2023-04-06 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to tag v3.4.0-rc6
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 28d0723beb3 (commit)
This tag includes the following new commits:

 new 28d0723beb3 Preparing Spark release v3.4.0-rc6

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r60926 - in /dev/spark/v3.4.0-rc5-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api

2023-03-29 Thread xinrong
Author: xinrong
Date: Thu Mar 30 04:53:57 2023
New Revision: 60926

Log:
Apache Spark v3.4.0-rc5 docs


[This commit notification would consist of 2789 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r60925 - /dev/spark/v3.4.0-rc5-bin/

2023-03-29 Thread xinrong
Author: xinrong
Date: Thu Mar 30 03:39:03 2023
New Revision: 60925

Log:
Apache Spark v3.4.0-rc5

Added:
dev/spark/v3.4.0-rc5-bin/
dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz   (with props)
dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz.asc
dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz.sha512
dev/spark/v3.4.0-rc5-bin/pyspark-3.4.0.tar.gz   (with props)
dev/spark/v3.4.0-rc5-bin/pyspark-3.4.0.tar.gz.asc
dev/spark/v3.4.0-rc5-bin/pyspark-3.4.0.tar.gz.sha512
dev/spark/v3.4.0-rc5-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz   (with 
props)
dev/spark/v3.4.0-rc5-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.asc
dev/spark/v3.4.0-rc5-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.sha512
dev/spark/v3.4.0-rc5-bin/spark-3.4.0-bin-hadoop3.tgz   (with props)
dev/spark/v3.4.0-rc5-bin/spark-3.4.0-bin-hadoop3.tgz.asc
dev/spark/v3.4.0-rc5-bin/spark-3.4.0-bin-hadoop3.tgz.sha512
dev/spark/v3.4.0-rc5-bin/spark-3.4.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.4.0-rc5-bin/spark-3.4.0-bin-without-hadoop.tgz.asc
dev/spark/v3.4.0-rc5-bin/spark-3.4.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.4.0-rc5-bin/spark-3.4.0.tgz   (with props)
dev/spark/v3.4.0-rc5-bin/spark-3.4.0.tgz.asc
dev/spark/v3.4.0-rc5-bin/spark-3.4.0.tgz.sha512

Added: dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz.asc
==
--- dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz.asc (added)
+++ dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz.asc Thu Mar 30 03:39:03 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQlA+wTHHhpbnJvbmdA
+YXBhY2hlLm9yZwAKCRCn5XkIx6Thsdo/D/9CLT5v+RVNTX0mmZq501F205cDUan+
+tiC/G2ddtGfSLcRAWeWqoDFWOkeupwEqtKMoqQGnElXM7qVF2miBfcohBxm3151l
+UBJD6paLgSrI2omxxqBNTB265BbojbmQcZx5UjHzO/opVahllET/7RXI6I8k/gsC
+hpoSJe77SHPXsLQpSFPaxct7Qy6IwwLq8yvVZIFlrYgjqvWBa3zsnqb4T6W859lb
+uiAAWJTJ0xQPF/u9TmXM8a9vFRfo3rXuttW8W7wKlHQjZgDJpNSJyQCaVmWYUssM
+2nzrfiwy7/E5wGzFsdxzO8lOlyeA6Cdmhwo8G5xcZnjNt9032DrAYFdo5rIoim9v
+irsqWyOJ5XclUOWpxKpXdYPcQGpEW74vUBymAW5P6jt0Yi2/3qvZSiwh1qceJ8Fo
+nut0HUWIFkohDoattkCjoA1yconcJd4+FuoDxrCX+QWAlchgR4eijMWfYCyH/7LX
+SucOJOK80psdGnZGuecuRjCzhvnbPjjNjS3dYMrudLlgxHyb2ahjeHXpVyDjI/O6
+AwUmJtUEGHk0Ypa8OHlgzB8UUaZRQDIiwL8j8tlIHYMt+VdQLUtvyK+hqe45It6F
+OAlocOnign7Ej/9EGyJfKXX0gZr6NmkuANWggPRIrIs1NSnqz4bDWQRGwVOkpb7x
+NOdLdMoi6QMC0A==
+=H+Kf
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz.sha512
==
--- dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz.sha512 (added)
+++ dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz.sha512 Thu Mar 30 03:39:03 2023
@@ -0,0 +1 @@
+c3086edefab6656535e234fd11d0a2a4d4c6ede97b85f94801d06064bd89c6f58196714e335e92ffd2ac83c82714ad8a9a51165621ecff194af290c1eb537ef2
  SparkR_3.4.0.tar.gz

Added: dev/spark/v3.4.0-rc5-bin/pyspark-3.4.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.0-rc5-bin/pyspark-3.4.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.0-rc5-bin/pyspark-3.4.0.tar.gz.asc
==
--- dev/spark/v3.4.0-rc5-bin/pyspark-3.4.0.tar.gz.asc (added)
+++ dev/spark/v3.4.0-rc5-bin/pyspark-3.4.0.tar.gz.asc Thu Mar 30 03:39:03 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQlA+4THHhpbnJvbmdA
+YXBhY2hlLm9yZwAKCRCn5XkIx6Thsb0pEACXvrvU/8Xh7ns7J8RtV/Wmf4oMu9Mk
+i6G8JwBUTS1kqRe9Xb1g3GJxNil8HTta1yNKgjvkTDc6EXIYrtQD4PpL6cuumckW
+0+itx9dih22OcvfN6sJNizAtRoTcpXx7UHq00dAjzHHbOv0dwGqnjKRU3UUQ/XnY
+RjT3kM4isf95TzAmEFwsXNSzkUY0+EzDgfhnDAwb60nzTzZ2bEiZnLP1JC2iScDI
+jSXMoWtZTaJz51bssKzzXpVmrwBxLDgSPlDM5KVmeD+WQMqS7Hk51bSikSEW1X39
+CO7hEXw+SYLQB5yKaqu03diErTOWmP6aJ8tbHCPWNrs3JMJkm4/Cj6Sc2JOktixO
+Ns8Pc82kpnvG0eWCMXwihZa7pxnq59ByZsxYAfmcIdf4q02VJNetFjplgXAs2jjy
+n9UZ6l8ZrCjUW2/AB3TSSibXLXMvuI6PLSYnKY9IP0t0dqxnBIKkACTx8qBA/o+I
+0n02LBJCD8ZPJvHpI2MGlaFGftbQx4LUXX4CFlAz+RI9iizCbpjrDYFzvXBEY7ri
+46i5uL+sHkP6Uj/8fNJ3QRhggb19i0NajzofSs5vNsVk2qHjHokIjG/kOkpCfBzC
+6rM5zd/OyQNZmbHThlOjAdEvTSgasXb/5uHpwWDHbTlPGJYMZOWzuBdDSfBlHW/t
+56VKCDfYO11shA==
+=a3bs
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.0-rc5-bin/pyspark-3.4.0.tar.gz.sha512
==
--- dev/spark/v3.4.0-rc5-bin/pyspark-3.4.0.tar.gz.sha512 (added)
+++ dev/spark

[spark] 01/01: Preparing development version 3.4.1-SNAPSHOT

2023-03-29 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 6a6f50444d43af24773ecc158aa127027f088288
Author: Xinrong Meng 
AuthorDate: Thu Mar 30 02:18:32 2023 +

Preparing development version 3.4.1-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 4a32762b34c..fa7028630a8 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.0
+Version: 3.4.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index c58da7aa112..b86fee4bceb 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 95ea15552da..f9ecfb3d692 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index e4d98471bf9..22ee65b7d25 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 7a6d5aedf65..2c67da81ca4 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 1c421754083..219682e047d 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch

[spark] branch branch-3.4 updated (ce36692eeee -> 6a6f50444d4)

2023-03-29 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


from ce36692 [SPARK-42631][CONNECT][FOLLOW-UP] Expose Column.expr to 
extensions
 add f39ad617d32 Preparing Spark release v3.4.0-rc5
 new 6a6f50444d4 Preparing development version 3.4.1-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] tag v3.4.0-rc5 created (now f39ad617d32)

2023-03-29 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to tag v3.4.0-rc5
in repository https://gitbox.apache.org/repos/asf/spark.git


  at f39ad617d32 (commit)
This tag includes the following new commits:

 new f39ad617d32 Preparing Spark release v3.4.0-rc5

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing Spark release v3.4.0-rc5

2023-03-29 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to tag v3.4.0-rc5
in repository https://gitbox.apache.org/repos/asf/spark.git

commit f39ad617d32a671e120464e4a75986241d72c487
Author: Xinrong Meng 
AuthorDate: Thu Mar 30 02:18:27 2023 +

Preparing Spark release v3.4.0-rc5
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index fa7028630a8..4a32762b34c 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.1
+Version: 3.4.0
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index b86fee4bceb..c58da7aa112 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index f9ecfb3d692..95ea15552da 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 22ee65b7d25..e4d98471bf9 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 2c67da81ca4..7a6d5aedf65 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 219682e047d..1c421754083 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
inde

[spark] branch branch-3.4 updated (b74f7922577 -> 3122d4f4c76)

2023-03-24 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


from b74f7922577 [SPARK-42861][SQL] Use private[sql] instead of 
protected[sql] to avoid generating API doc
 add 3122d4f4c76 [SPARK-42891][CONNECT][PYTHON][3.4] Implement CoGrouped 
Map API

No new revisions were added by this update.

Summary of changes:
 .../main/protobuf/spark/connect/relations.proto|  18 ++
 .../sql/connect/planner/SparkConnectPlanner.scala  |  22 ++
 dev/sparktestsupport/modules.py|   1 +
 python/pyspark/sql/connect/_typing.py  |   2 +
 python/pyspark/sql/connect/group.py|  49 +++-
 python/pyspark/sql/connect/plan.py |  40 
 python/pyspark/sql/connect/proto/relations_pb2.py  | 250 +++--
 python/pyspark/sql/connect/proto/relations_pb2.pyi |  80 +++
 python/pyspark/sql/pandas/group_ops.py |   9 +
 .../sql/tests/connect/test_connect_basic.py|   5 +-
 ..._map.py => test_parity_pandas_cogrouped_map.py} |  54 ++---
 .../sql/tests/pandas/test_pandas_cogrouped_map.py  |   6 +-
 12 files changed, 374 insertions(+), 162 deletions(-)
 copy python/pyspark/sql/tests/connect/{test_parity_pandas_grouped_map.py => 
test_parity_pandas_cogrouped_map.py} (61%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [SPARK-42864][ML][3.4] Make `IsotonicRegression.PointsAccumulator` private

2023-03-20 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 5222cfd58a7 [SPARK-42864][ML][3.4] Make 
`IsotonicRegression.PointsAccumulator` private
5222cfd58a7 is described below

commit 5222cfd58a717fec7a025fdf4dfcde0bb4daf80c
Author: Ruifeng Zheng 
AuthorDate: Tue Mar 21 12:55:44 2023 +0800

[SPARK-42864][ML][3.4] Make `IsotonicRegression.PointsAccumulator` private

### What changes were proposed in this pull request?
Make `IsotonicRegression.PointsAccumulator` private, which was introduced 
in 
https://github.com/apache/spark/commit/3d05c7e037eff79de8ef9f6231aca8340bcc65ef

### Why are the changes needed?
`PointsAccumulator` is implementation details, should not be exposed

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
existing UT

Closes #40500 from zhengruifeng/isotonicRegression_private.

Authored-by: Ruifeng Zheng 
Signed-off-by: Xinrong Meng 
---
 .../org/apache/spark/mllib/regression/IsotonicRegression.scala  | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala
 
b/mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala
index fbf0dc9c357..12a78ef4ec1 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala
@@ -331,7 +331,7 @@ class IsotonicRegression private (private var isotonic: 
Boolean) extends Seriali
 if (cleanInput.length <= 1) {
   cleanInput
 } else {
-  val pointsAccumulator = new IsotonicRegression.PointsAccumulator
+  val pointsAccumulator = new PointsAccumulator
 
   // Go through input points, merging all points with equal feature values 
into a single point.
   // Equality of features is defined by shouldAccumulate method. The label 
of the accumulated
@@ -490,15 +490,13 @@ class IsotonicRegression private (private var isotonic: 
Boolean) extends Seriali
   .sortBy(_._2)
 poolAdjacentViolators(parallelStepResult)
   }
-}
 
-object IsotonicRegression {
   /**
* Utility class, holds a buffer of all points with unique features so far, 
and performs
* weighted sum accumulation of points. Hides these details for better 
readability of the
* main algorithm.
*/
-  class PointsAccumulator {
+  private class PointsAccumulator {
 private val output = ArrayBuffer[(Double, Double, Double)]()
 private var (currentLabel: Double, currentFeature: Double, currentWeight: 
Double) =
   (0d, 0d, 0d)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r60509 - in /dev/spark/v3.4.0-rc4-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api

2023-03-09 Thread xinrong
Author: xinrong
Date: Fri Mar 10 06:12:05 2023
New Revision: 60509

Log:
Apache Spark v3.4.0-rc4 docs


[This commit notification would consist of 2807 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r60507 - /dev/spark/v3.4.0-rc4-bin/

2023-03-09 Thread xinrong
Author: xinrong
Date: Fri Mar 10 04:47:20 2023
New Revision: 60507

Log:
Apache Spark v3.4.0-rc4

Added:
dev/spark/v3.4.0-rc4-bin/
dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz   (with props)
dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz.asc
dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz.sha512
dev/spark/v3.4.0-rc4-bin/pyspark-3.4.0.tar.gz   (with props)
dev/spark/v3.4.0-rc4-bin/pyspark-3.4.0.tar.gz.asc
dev/spark/v3.4.0-rc4-bin/pyspark-3.4.0.tar.gz.sha512
dev/spark/v3.4.0-rc4-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz   (with 
props)
dev/spark/v3.4.0-rc4-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.asc
dev/spark/v3.4.0-rc4-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.sha512
dev/spark/v3.4.0-rc4-bin/spark-3.4.0-bin-hadoop3.tgz   (with props)
dev/spark/v3.4.0-rc4-bin/spark-3.4.0-bin-hadoop3.tgz.asc
dev/spark/v3.4.0-rc4-bin/spark-3.4.0-bin-hadoop3.tgz.sha512
dev/spark/v3.4.0-rc4-bin/spark-3.4.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.4.0-rc4-bin/spark-3.4.0-bin-without-hadoop.tgz.asc
dev/spark/v3.4.0-rc4-bin/spark-3.4.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.4.0-rc4-bin/spark-3.4.0.tgz   (with props)
dev/spark/v3.4.0-rc4-bin/spark-3.4.0.tgz.asc
dev/spark/v3.4.0-rc4-bin/spark-3.4.0.tgz.sha512

Added: dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz.asc
==
--- dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz.asc (added)
+++ dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz.asc Fri Mar 10 04:47:20 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQKtesTHHhpbnJvbmdA
+YXBhY2hlLm9yZwAKCRCn5XkIx6Thsed/D/9ECWrN2Ra7rPZt1lvSh9H/DON0HzZ0
+UXLPKZpCXkdFM7TXMksVF0qE/iqPwfgfxv9uY0Ura71+to/+6L1l9U+svKwNl7ze
+0vby8tZMLwiqpVlIihLObrLXLSfUF9hBOo1Xuh60DZjiNaACZ/5Pi0vIhIQiiLJb
+TOG5bFejim9/8pbK9l54M2eP9e1fxYDLAwZCGCvtzN0Ddf1hhZQomG4QJeCJV9YZ
+/rSF6cmyale+0U/UIE/ci9Jj7gzzxAxa5CBFVYyjsNLRksM9LzbYGck2VuC6UZT4
+TdcF1Ia834BnSCOEgesyPrM7FD6ljNr7ks7UMI3PG4yVtAdeNzDCyZhX6OXU+zCY
+olbqHl1RzAgrvA+rUoQH6vRaKVKTFQTSkohrQSg3tmSqPYfxNxac75K7I3F9A5qM
+DXHkXrSAdCOV+T88yw75zjr2xLiLLGIuBrYc/5lk3JxS9Rw6aDrfxLgZMpfdnsuL
+PxAMai2xnZhvQrAAIPUKRN+TR72fpVFIAJB9nEReDF6m9cmhdhQt+xKR6xCDs9fb
+Cx+G8ZBPvJeheGFiKmjeAT4zh+C3B7BxhlvvCP5Q6GOtWv+8CBardAVV2OSP2T/t
+SxFEjBZwqNrwtBFY0txYnDTGnv6vK3dG86FnaE6R57p2W5vAKrmSmp3ZL+YhUKe7
+HGk4OdoEG93bww==
+=FJdV
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz.sha512
==
--- dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz.sha512 (added)
+++ dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz.sha512 Fri Mar 10 04:47:20 2023
@@ -0,0 +1 @@
+6e7bc60a43243e9026e92af81386fc9d57a6231c1a59a6fb4e39cf16cd150a3b5e1e9237b377d3e5a74d16f804f9a5d13a897e8455f640eacec1c79ef3d10407
  SparkR_3.4.0.tar.gz

Added: dev/spark/v3.4.0-rc4-bin/pyspark-3.4.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.0-rc4-bin/pyspark-3.4.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.0-rc4-bin/pyspark-3.4.0.tar.gz.asc
==
--- dev/spark/v3.4.0-rc4-bin/pyspark-3.4.0.tar.gz.asc (added)
+++ dev/spark/v3.4.0-rc4-bin/pyspark-3.4.0.tar.gz.asc Fri Mar 10 04:47:20 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQKte0THHhpbnJvbmdA
+YXBhY2hlLm9yZwAKCRCn5XkIx6ThsWYaD/9zcUOMr+07jy7Ok2xKkEq4hlsSH1xu
+4Y61P1CTFhtOc9DG/O2jfX8Tsnp/b6gY3nJHGhrtdY0LCMPiMG+5uHO3/wO53pE0
+6DEtZH1I38rbILpb9kDCftCQS6keZR79Zl8N0G5D+P56grNdI4aqDo1Ntxvs366r
+0rAWGIpVbvr5w5MBqvyn96Sk2ac/SbZVeE5NHCVwPWCQz6povLTDDESWETQIW5TZ
+VTQsErI4joWplWWlI8D8x8XABVaD0BaKFwuJpPploKVkhSyOECUDM5W0xhuGNArn
+h5GofcXXvCBKqoI3ngXg72G6fVamDJ0b/DCsmpLflwEaInhlDYj9BVbTUAgvYHwa
+eDgLEbvZ4at/5OVf+A/VxnXLfL1DJLiGgfk7J4QqNMTdqfCtyEs4yxQ4t6OZ93mN
+g6VcNYzayKEZffmC29QDtce5wpl530C543cSW7QFMgIg0ly0pfDF1J63hsQ86TZV
+D/Nu41KiQXFq4CMD08mxu1gSTllTIED+5VUcbJpmep2Pa28tIvleVCxXQBXpx5Bw
+pz3AJIU/Og4y8xZfspeUON9qvSHAwLGO6T9QAslaciJA/mK2vNzHLgaTSZtXRSzv
+MIsmpfEHoE8HsgUk/YLCheSNTZRkKgCWySMBnNaY0HFF86R/HvA+rL97CoFTKX9C
+Gpsg/vHReYkRFw==
+=4f38
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.0-rc4-bin/pyspark-3.4.0.tar.gz.sha512
==
--- dev/spark/v3.4.0-rc4-bin/pyspark-3.4.0.tar.gz.sha512 (added)
+++ dev/spark

[spark] branch branch-3.4 updated (49cf58e30c7 -> bc1671023c3)

2023-03-09 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


from 49cf58e30c7 [SPARK-42739][BUILD] Ensure release tag to be pushed to 
release branch
 add 4000d6884ce Preparing Spark release v3.4.0-rc4
 new bc1671023c3 Preparing development version 3.4.1-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing development version 3.4.1-SNAPSHOT

2023-03-09 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git

commit bc1671023c3360380bbb67ae8fec959efb072996
Author: Xinrong Meng 
AuthorDate: Fri Mar 10 03:26:54 2023 +

Preparing development version 3.4.1-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 4a32762b34c..fa7028630a8 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.0
+Version: 3.4.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index c58da7aa112..b86fee4bceb 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 95ea15552da..f9ecfb3d692 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index e4d98471bf9..22ee65b7d25 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 7a6d5aedf65..2c67da81ca4 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 1c421754083..219682e047d 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch

[spark] 01/01: Preparing Spark release v3.4.0-rc4

2023-03-09 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to tag v3.4.0-rc4
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 4000d6884ce973eb420e871c8d333431490be763
Author: Xinrong Meng 
AuthorDate: Fri Mar 10 03:26:48 2023 +

Preparing Spark release v3.4.0-rc4
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index fa7028630a8..4a32762b34c 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.1
+Version: 3.4.0
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index b86fee4bceb..c58da7aa112 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index f9ecfb3d692..95ea15552da 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 22ee65b7d25..e4d98471bf9 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 2c67da81ca4..7a6d5aedf65 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 219682e047d..1c421754083 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
inde

[spark] tag v3.4.0-rc4 created (now 4000d6884ce)

2023-03-09 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to tag v3.4.0-rc4
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 4000d6884ce (commit)
This tag includes the following new commits:

 new 4000d6884ce Preparing Spark release v3.4.0-rc4

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [SPARK-42739][BUILD] Ensure release tag to be pushed to release branch

2023-03-09 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 49cf58e30c7 [SPARK-42739][BUILD] Ensure release tag to be pushed to 
release branch
49cf58e30c7 is described below

commit 49cf58e30c79734af4a30787a0220aeba69839c5
Author: Xinrong Meng 
AuthorDate: Fri Mar 10 11:04:34 2023 +0800

[SPARK-42739][BUILD] Ensure release tag to be pushed to release branch

### What changes were proposed in this pull request?
In the release script, add a check to ensure release tag to be pushed to 
release branch.

### Why are the changes needed?
To ensure the success of a RC cut. Otherwise, release conductors have to 
manually check that.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manual test.

```
~/spark [_d_branch] $ git commit -am '_d_commmit'
...
~/spark [_d_branch] $ git tag '_d_tag'
~/spark [_d_branch] $ git push origin _d_tag
~/spark [_d_branch] $ git branch -r --contains tags/_d_tag | grep origin
~/spark [_d_branch] $ echo $?
1
~/spark [_d_branch] $ git push origin HEAD:_d_branch
...
~/spark [_d_branch] $ git branch -r --contains tags/_d_tag | grep origin
  origin/_d_branch
~/spark [_d_branch] $ echo $?
0

```

    Closes #40357 from xinrong-meng/chk_release.
    
Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
(cherry picked from commit 785188dd8b5e74510c29edbff5b9991d88855e43)
Signed-off-by: Xinrong Meng 
---
 dev/create-release/release-tag.sh | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/dev/create-release/release-tag.sh 
b/dev/create-release/release-tag.sh
index 255bda37ad8..fa701dd74b2 100755
--- a/dev/create-release/release-tag.sh
+++ b/dev/create-release/release-tag.sh
@@ -122,6 +122,12 @@ if ! is_dry_run; then
   git push origin $RELEASE_TAG
   if [[ $RELEASE_VERSION != *"preview"* ]]; then
 git push origin HEAD:$GIT_BRANCH
+if git branch -r --contains tags/$RELEASE_TAG | grep origin; then
+  echo "Pushed $RELEASE_TAG to $GIT_BRANCH."
+else
+  echo "Failed to push $RELEASE_TAG to $GIT_BRANCH. Please start over."
+  exit 1
+fi
   else
 echo "It's preview release. We only push $RELEASE_TAG to remote."
   fi


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-42739][BUILD] Ensure release tag to be pushed to release branch

2023-03-09 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 785188dd8b5 [SPARK-42739][BUILD] Ensure release tag to be pushed to 
release branch
785188dd8b5 is described below

commit 785188dd8b5e74510c29edbff5b9991d88855e43
Author: Xinrong Meng 
AuthorDate: Fri Mar 10 11:04:34 2023 +0800

[SPARK-42739][BUILD] Ensure release tag to be pushed to release branch

### What changes were proposed in this pull request?
In the release script, add a check to ensure release tag to be pushed to 
release branch.

### Why are the changes needed?
To ensure the success of a RC cut. Otherwise, release conductors have to 
manually check that.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manual test.

```
~/spark [_d_branch] $ git commit -am '_d_commmit'
...
~/spark [_d_branch] $ git tag '_d_tag'
~/spark [_d_branch] $ git push origin _d_tag
~/spark [_d_branch] $ git branch -r --contains tags/_d_tag | grep origin
~/spark [_d_branch] $ echo $?
1
~/spark [_d_branch] $ git push origin HEAD:_d_branch
...
~/spark [_d_branch] $ git branch -r --contains tags/_d_tag | grep origin
  origin/_d_branch
~/spark [_d_branch] $ echo $?
0

```

    Closes #40357 from xinrong-meng/chk_release.
    
Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 dev/create-release/release-tag.sh | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/dev/create-release/release-tag.sh 
b/dev/create-release/release-tag.sh
index 255bda37ad8..fa701dd74b2 100755
--- a/dev/create-release/release-tag.sh
+++ b/dev/create-release/release-tag.sh
@@ -122,6 +122,12 @@ if ! is_dry_run; then
   git push origin $RELEASE_TAG
   if [[ $RELEASE_VERSION != *"preview"* ]]; then
 git push origin HEAD:$GIT_BRANCH
+if git branch -r --contains tags/$RELEASE_TAG | grep origin; then
+  echo "Pushed $RELEASE_TAG to $GIT_BRANCH."
+else
+  echo "Failed to push $RELEASE_TAG to $GIT_BRANCH. Please start over."
+  exit 1
+fi
   else
 echo "It's preview release. We only push $RELEASE_TAG to remote."
   fi


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r60500 - in /dev/spark/v3.4.0-rc3-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api

2023-03-08 Thread xinrong
Author: xinrong
Date: Thu Mar  9 07:54:14 2023
New Revision: 60500

Log:
Apache Spark v3.4.0-rc3 docs


[This commit notification would consist of 2807 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r60498 - /dev/spark/v3.4.0-rc3-bin/

2023-03-08 Thread xinrong
Author: xinrong
Date: Thu Mar  9 07:11:38 2023
New Revision: 60498

Log:
Apache Spark v3.4.0-rc3

Added:
dev/spark/v3.4.0-rc3-bin/
dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz   (with props)
dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz.asc
dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz.sha512
dev/spark/v3.4.0-rc3-bin/pyspark-3.4.0.tar.gz   (with props)
dev/spark/v3.4.0-rc3-bin/pyspark-3.4.0.tar.gz.asc
dev/spark/v3.4.0-rc3-bin/pyspark-3.4.0.tar.gz.sha512
dev/spark/v3.4.0-rc3-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz   (with 
props)
dev/spark/v3.4.0-rc3-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.asc
dev/spark/v3.4.0-rc3-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.sha512
dev/spark/v3.4.0-rc3-bin/spark-3.4.0-bin-hadoop3.tgz   (with props)
dev/spark/v3.4.0-rc3-bin/spark-3.4.0-bin-hadoop3.tgz.asc
dev/spark/v3.4.0-rc3-bin/spark-3.4.0-bin-hadoop3.tgz.sha512
dev/spark/v3.4.0-rc3-bin/spark-3.4.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.4.0-rc3-bin/spark-3.4.0-bin-without-hadoop.tgz.asc
dev/spark/v3.4.0-rc3-bin/spark-3.4.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.4.0-rc3-bin/spark-3.4.0.tgz   (with props)
dev/spark/v3.4.0-rc3-bin/spark-3.4.0.tgz.asc
dev/spark/v3.4.0-rc3-bin/spark-3.4.0.tgz.sha512

Added: dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz.asc
==
--- dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz.asc (added)
+++ dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz.asc Thu Mar  9 07:11:38 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQJhjwTHHhpbnJvbmdA
+YXBhY2hlLm9yZwAKCRCn5XkIx6ThsRdDEACd98Pk0bSFtKVHER3hjis2R2cg1pgG
+gWiqBZArn1GiB6ck0KHglMklJTFFsw2q9/mro42uVhj0b0hJYcTb2hBO+7vyEYeU
+a+YGhik6FXaQQBL1+oB5aTn2FcnNi7no1Qa+x4opkG7d1giapzQe/oZK1D7RNiYZ
+FAdoDhsUTYCeWDVXbRAcEMca49ltsZDPe45XRHwSgXT45hi6s9oRd78G6v2srbMb
++g7ce4KzAhupZrb5wCnP1MmiWWG1gnfcG0n11LDsiAhYPzzDgW/S4urcqIhWu0+4
+uUSrL6es4mprt1SMybBbmyGrHLuXjdmbBy5XHWy576GoCANdJRffImtmbXFFqp5q
+uau5MDCMFcQwp8pOGjTIDYL4q0p9Kpx3mQ2ykQxWiWg/TgVBQ2leadya8yUV9zZ9
+Y6vuRf9R3iYcXTp3B5XlOWtzjYBICa2XQlizOV3U35xybhSFQHLdUSdBBPMLFsDS
+YxYw1+dm8SjGfHhtsTOsk0ZhgSNgpDC8PBP6UUlz/8qRy4UdjQRrVgkqFmIFcLZs
+CPdX5XlH32PQYtN55qGc6AZECoUpbpigGZetvKqdD5SWyf8maRZZsD+XdR7BT9rk
+LLQTJKak3VQRAn80ONx+JxgzH3B5uV1ldN22vr5nLECpJZDbGjC6etystZDujEYh
+szr47LujCxLTNw==
+=l4pQ
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz.sha512
==
--- dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz.sha512 (added)
+++ dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz.sha512 Thu Mar  9 07:11:38 2023
@@ -0,0 +1 @@
+4703ffdbf82aaf5b30b6afe680a2b21ca15c957863c3648e7e5f120663506fc9e633727a6b7809f7cff7763a9f6227902f6d83fac7c87d3791234afef147cfc3
  SparkR_3.4.0.tar.gz

Added: dev/spark/v3.4.0-rc3-bin/pyspark-3.4.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.0-rc3-bin/pyspark-3.4.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.0-rc3-bin/pyspark-3.4.0.tar.gz.asc
==
--- dev/spark/v3.4.0-rc3-bin/pyspark-3.4.0.tar.gz.asc (added)
+++ dev/spark/v3.4.0-rc3-bin/pyspark-3.4.0.tar.gz.asc Thu Mar  9 07:11:38 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQJhj4THHhpbnJvbmdA
+YXBhY2hlLm9yZwAKCRCn5XkIx6ThsaMFD/0VbikHk10VpDiRp7RVhquRXR/qHkiK
+ioI02DrZJsZiRElV69Bfxvb1HQSKJhE9xXC+GkS7N+s0neNMXBpYsSxigRICG+Vi
+nPJifZVCNzpckkD5t8t+07X5eTRR7VoRPsHkaYSNKxXiMfXYbOpBOLcP/cvrdPSi
+nXsOnLm3dhxU7kMS+Qy4jbCzQN1fb4XPagxdvPji/aKo6LBw/YiqWHPhHcHlW89h
+cGRAQpN1VjfNkO1zfGxV/h5kD8L/my0zsVMOxtF/r6Qc7FZGBilfMuw8d+8WSVAr
+kRx+s2kB8vuH/undWoRSwpItqv0/gcyFCCvMmLQlbEA0Ku/ldE88XESIuI25uTcC
+tVJFC01Gauh7KlkI4hzsuwlhcDH/geLE1DS59fKC5UMqEYvaKQyQZFzyX0/eFIIS
+8KRZo3B5NUfEXE3fMDOGE8FgJ76QPQ3HO2tB9f+ICeu1/1RioqgucZ7jcKfFIx/J
+FzZ7FkNuLSl3CEnH5BlqdoaCCdmOsZVqcPgaZaGUncgK6ygBWEIEK/I6pE9Sye+Y
+ncBM76ZJf3NsE4Kzdw/v0NCrLaTdIMIK3W3fvVY94IPdk2EY6MuEnGDqG1bn88u4
+zYfP118WS4KtN6fSkczHGf+7+LQIiWrovIb+cQP+TXKeCinRbK1/I6pBWnn4/0u1
+DApXYisgegSYPg==
+=ykwM
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.0-rc3-bin/pyspark-3.4.0.tar.gz.sha512
==
--- dev/spark/v3.4.0-rc3-bin/pyspark-3.4.0.tar.gz.sha512 (added)
+++ dev/spark

[spark] 01/01: Preparing Spark release v3.4.0-rc3

2023-03-08 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to tag v3.4.0-rc3
in repository https://gitbox.apache.org/repos/asf/spark.git

commit b9be9ce15a82b18cca080ee365d308c0820a29a9
Author: Xinrong Meng 
AuthorDate: Thu Mar 9 05:34:00 2023 +

Preparing Spark release v3.4.0-rc3
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index fa7028630a8..4a32762b34c 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.1
+Version: 3.4.0
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index b86fee4bceb..c58da7aa112 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index f9ecfb3d692..95ea15552da 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 22ee65b7d25..e4d98471bf9 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 2c67da81ca4..7a6d5aedf65 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 219682e047d..1c421754083 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 22ce7

[spark] tag v3.4.0-rc3 created (now b9be9ce15a8)

2023-03-08 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to tag v3.4.0-rc3
in repository https://gitbox.apache.org/repos/asf/spark.git


  at b9be9ce15a8 (commit)
This tag includes the following new commits:

 new b9be9ce15a8 Preparing Spark release v3.4.0-rc3

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [SPARK-42643][CONNECT][PYTHON] Register Java (aggregate) user-defined functions

2023-03-07 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 0e959a53908 [SPARK-42643][CONNECT][PYTHON] Register Java (aggregate) 
user-defined functions
0e959a53908 is described below

commit 0e959a539086cda5dd911477ee5568ab540a2249
Author: Xinrong Meng 
AuthorDate: Wed Mar 8 14:23:18 2023 +0800

[SPARK-42643][CONNECT][PYTHON] Register Java (aggregate) user-defined 
functions

### What changes were proposed in this pull request?
Implement `spark.udf.registerJavaFunction` and `spark.udf.registerJavaUDAF`.
 A new proto `JavaUDF` is introduced.

### Why are the changes needed?
Parity with vanilla PySpark.

### Does this PR introduce _any_ user-facing change?
Yes. `spark.udf.registerJavaFunction` and `spark.udf.registerJavaUDAF` are 
supported now.

### How was this patch tested?
Parity unit tests.

Closes #40244 from xinrong-meng/registerJava.

Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
(cherry picked from commit 92aa08786feaf473330a863d19b0c902b721789e)
Signed-off-by: Xinrong Meng 
---
 .../main/protobuf/spark/connect/expressions.proto  | 13 -
 .../sql/connect/planner/SparkConnectPlanner.scala  | 21 
 python/pyspark/sql/connect/client.py   | 39 ++-
 python/pyspark/sql/connect/expressions.py  | 44 +++--
 .../pyspark/sql/connect/proto/expressions_pb2.py   | 26 +++---
 .../pyspark/sql/connect/proto/expressions_pb2.pyi  | 56 +-
 python/pyspark/sql/connect/udf.py  | 17 ++-
 .../pyspark/sql/tests/connect/test_parity_udf.py   | 30 +++-
 python/pyspark/sql/udf.py  |  6 +++
 9 files changed, 212 insertions(+), 40 deletions(-)

diff --git 
a/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto 
b/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto
index 6eb769ad27e..0aee3ca13b9 100644
--- a/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto
+++ b/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto
@@ -312,7 +312,7 @@ message Expression {
 message CommonInlineUserDefinedFunction {
   // (Required) Name of the user-defined function.
   string function_name = 1;
-  // (Required) Indicate if the user-defined function is deterministic.
+  // (Optional) Indicate if the user-defined function is deterministic.
   bool deterministic = 2;
   // (Optional) Function arguments. Empty arguments are allowed.
   repeated Expression arguments = 3;
@@ -320,6 +320,7 @@ message CommonInlineUserDefinedFunction {
   oneof function {
 PythonUDF python_udf = 4;
 ScalarScalaUDF scalar_scala_udf = 5;
+JavaUDF java_udf = 6;
   }
 }
 
@@ -345,3 +346,13 @@ message ScalarScalaUDF {
   bool nullable = 4;
 }
 
+message JavaUDF {
+  // (Required) Fully qualified name of Java class
+  string class_name = 1;
+
+  // (Optional) Output type of the Java UDF
+  optional string output_type = 2;
+
+  // (Required) Indicate if the Java user-defined function is an aggregate 
function
+  bool aggregate = 3;
+}
diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
index d7b3c057d92..3b9443f4e3c 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
@@ -1552,6 +1552,8 @@ class SparkConnectPlanner(val session: SparkSession) {
 fun.getFunctionCase match {
   case proto.CommonInlineUserDefinedFunction.FunctionCase.PYTHON_UDF =>
 handleRegisterPythonUDF(fun)
+  case proto.CommonInlineUserDefinedFunction.FunctionCase.JAVA_UDF =>
+handleRegisterJavaUDF(fun)
   case _ =>
 throw InvalidPlanInput(
   s"Function with ID: ${fun.getFunctionCase.getNumber} is not 
supported")
@@ -1577,6 +1579,25 @@ class SparkConnectPlanner(val session: SparkSession) {
 session.udf.registerPython(fun.getFunctionName, udpf)
   }
 
+  private def handleRegisterJavaUDF(fun: 
proto.CommonInlineUserDefinedFunction): Unit = {
+val udf = fun.getJavaUdf
+val dataType =
+  if (udf.hasOutputType) {
+DataType.parseTypeWithFallback(
+  schema = udf.getOutputType,
+  parser = DataType.fromDDL,
+  fallbackParser = DataType.fromJson) match {
+  case s: DataType => s
+  case other => throw InvalidPlanInput(s"Invalid return type $other")
+}
+  } else null
+if (udf.getAggregate) {

[spark] branch master updated: [SPARK-42643][CONNECT][PYTHON] Register Java (aggregate) user-defined functions

2023-03-07 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 92aa08786fe [SPARK-42643][CONNECT][PYTHON] Register Java (aggregate) 
user-defined functions
92aa08786fe is described below

commit 92aa08786feaf473330a863d19b0c902b721789e
Author: Xinrong Meng 
AuthorDate: Wed Mar 8 14:23:18 2023 +0800

[SPARK-42643][CONNECT][PYTHON] Register Java (aggregate) user-defined 
functions

### What changes were proposed in this pull request?
Implement `spark.udf.registerJavaFunction` and `spark.udf.registerJavaUDAF`.
 A new proto `JavaUDF` is introduced.

### Why are the changes needed?
Parity with vanilla PySpark.

### Does this PR introduce _any_ user-facing change?
Yes. `spark.udf.registerJavaFunction` and `spark.udf.registerJavaUDAF` are 
supported now.

### How was this patch tested?
Parity unit tests.

Closes #40244 from xinrong-meng/registerJava.

Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 .../main/protobuf/spark/connect/expressions.proto  | 13 -
 .../sql/connect/planner/SparkConnectPlanner.scala  | 21 
 python/pyspark/sql/connect/client.py   | 39 ++-
 python/pyspark/sql/connect/expressions.py  | 44 +++--
 .../pyspark/sql/connect/proto/expressions_pb2.py   | 26 +++---
 .../pyspark/sql/connect/proto/expressions_pb2.pyi  | 56 +-
 python/pyspark/sql/connect/udf.py  | 17 ++-
 .../pyspark/sql/tests/connect/test_parity_udf.py   | 30 +++-
 python/pyspark/sql/udf.py  |  6 +++
 9 files changed, 212 insertions(+), 40 deletions(-)

diff --git 
a/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto 
b/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto
index 6eb769ad27e..0aee3ca13b9 100644
--- a/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto
+++ b/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto
@@ -312,7 +312,7 @@ message Expression {
 message CommonInlineUserDefinedFunction {
   // (Required) Name of the user-defined function.
   string function_name = 1;
-  // (Required) Indicate if the user-defined function is deterministic.
+  // (Optional) Indicate if the user-defined function is deterministic.
   bool deterministic = 2;
   // (Optional) Function arguments. Empty arguments are allowed.
   repeated Expression arguments = 3;
@@ -320,6 +320,7 @@ message CommonInlineUserDefinedFunction {
   oneof function {
 PythonUDF python_udf = 4;
 ScalarScalaUDF scalar_scala_udf = 5;
+JavaUDF java_udf = 6;
   }
 }
 
@@ -345,3 +346,13 @@ message ScalarScalaUDF {
   bool nullable = 4;
 }
 
+message JavaUDF {
+  // (Required) Fully qualified name of Java class
+  string class_name = 1;
+
+  // (Optional) Output type of the Java UDF
+  optional string output_type = 2;
+
+  // (Required) Indicate if the Java user-defined function is an aggregate 
function
+  bool aggregate = 3;
+}
diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
index d7b3c057d92..3b9443f4e3c 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
@@ -1552,6 +1552,8 @@ class SparkConnectPlanner(val session: SparkSession) {
 fun.getFunctionCase match {
   case proto.CommonInlineUserDefinedFunction.FunctionCase.PYTHON_UDF =>
 handleRegisterPythonUDF(fun)
+  case proto.CommonInlineUserDefinedFunction.FunctionCase.JAVA_UDF =>
+handleRegisterJavaUDF(fun)
   case _ =>
 throw InvalidPlanInput(
   s"Function with ID: ${fun.getFunctionCase.getNumber} is not 
supported")
@@ -1577,6 +1579,25 @@ class SparkConnectPlanner(val session: SparkSession) {
 session.udf.registerPython(fun.getFunctionName, udpf)
   }
 
+  private def handleRegisterJavaUDF(fun: 
proto.CommonInlineUserDefinedFunction): Unit = {
+val udf = fun.getJavaUdf
+val dataType =
+  if (udf.hasOutputType) {
+DataType.parseTypeWithFallback(
+  schema = udf.getOutputType,
+  parser = DataType.fromDDL,
+  fallbackParser = DataType.fromJson) match {
+  case s: DataType => s
+  case other => throw InvalidPlanInput(s"Invalid return type $other")
+}
+  } else null
+if (udf.getAggregate) {
+  session.udf.registerJavaUDAF(fun.getFunctionName, udf.getClassName)
+} else {
+  session.udf.registerJava(fu

svn commit: r60407 - in /dev/spark/v3.4.0-rc2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api

2023-03-02 Thread xinrong
Author: xinrong
Date: Thu Mar  2 09:35:10 2023
New Revision: 60407

Log:
Apache Spark v3.4.0-rc2 docs


[This commit notification would consist of 2806 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r60406 - /dev/spark/v3.4.0-rc2-bin/

2023-03-01 Thread xinrong
Author: xinrong
Date: Thu Mar  2 07:42:27 2023
New Revision: 60406

Log:
Apache Spark v3.4.0-rc2

Added:
dev/spark/v3.4.0-rc2-bin/
dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz   (with props)
dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz.asc
dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz.sha512
dev/spark/v3.4.0-rc2-bin/pyspark-3.4.0.tar.gz   (with props)
dev/spark/v3.4.0-rc2-bin/pyspark-3.4.0.tar.gz.asc
dev/spark/v3.4.0-rc2-bin/pyspark-3.4.0.tar.gz.sha512
dev/spark/v3.4.0-rc2-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz   (with 
props)
dev/spark/v3.4.0-rc2-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.asc
dev/spark/v3.4.0-rc2-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.sha512
dev/spark/v3.4.0-rc2-bin/spark-3.4.0-bin-hadoop3.tgz   (with props)
dev/spark/v3.4.0-rc2-bin/spark-3.4.0-bin-hadoop3.tgz.asc
dev/spark/v3.4.0-rc2-bin/spark-3.4.0-bin-hadoop3.tgz.sha512
dev/spark/v3.4.0-rc2-bin/spark-3.4.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.4.0-rc2-bin/spark-3.4.0-bin-without-hadoop.tgz.asc
dev/spark/v3.4.0-rc2-bin/spark-3.4.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.4.0-rc2-bin/spark-3.4.0.tgz   (with props)
dev/spark/v3.4.0-rc2-bin/spark-3.4.0.tgz.asc
dev/spark/v3.4.0-rc2-bin/spark-3.4.0.tgz.sha512

Added: dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz.asc
==
--- dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz.asc (added)
+++ dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz.asc Thu Mar  2 07:42:27 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQAUvITHHhpbnJvbmdA
+YXBhY2hlLm9yZwAKCRCn5XkIx6ThsRPdEACTNy0qOWmOXidbPjyZzJCdr8zkAIZX
+UGhVWPrlF0sQR1FzTtPPwwI4sywSC+DNAetcYVXEzOVfNf3I/UgE02183xCQJVfx
+EaE+lpCmIwFjY+AcPGwz7fZ+aTxFa2f9wu04G+q9Uaw40Ys/WMmvck/Wg4Ih0nj3
+PbBuftQIy5K1YHJOx6PvkzCpZsmP4njNGrJ+IJU8vpYh35zp8E3jkfbECCvKkTWE
+ABWGxpAKjN5npkarbNpZp8Emd6EtrRYaJzDPApjW6GFSQAmZwE0WJj2nKJu4Aszu
+fstx27dZ4bvx3bgbfSEmRgTc5VD7glzvWKIWqt0PdkDq1AQdwdFodZfJFqXUccuk
+G3yL+RTrggtvDBEjcMh+ym6kOrHmUBgy7SqPfOI5UPO8PQ+KdhE94tqXfhHAl5QS
+Okw1XWc2EQzDyeu/j+Kp4yc0tbZRnuqkAzS5yLJVix0z4GBOyRyvTsDLykwEM9h+
+jniFAkWfu+su9JRMfIdaXqak1DgyVZ9bxZOfLIo7lA5U4vYxCZM5TU8ToNDnnOWd
+O0pbweQ/W4UdXP6AYEJt2J8wItDiv+xry4jI9JqTEPV5IbrAZjZmJ/RoMzjeh+eA
+WwqSEXuWXrUStb9bPfhFnryYmbKGYGG7dRP6HnnaFlevBc6qrNlMPL3xedZsk12b
+opcLL5skNQoHuA==
+=6ENL
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz.sha512
==
--- dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz.sha512 (added)
+++ dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz.sha512 Thu Mar  2 07:42:27 2023
@@ -0,0 +1 @@
+9f719616f1547d449488957cc74d6dd9080e32096e1178deb0c339be47bb06e158d8b0c7a80f1e53595b34467d5b5b7f23d66643cca1fa9f5e8c7b9687893b59
  SparkR_3.4.0.tar.gz

Added: dev/spark/v3.4.0-rc2-bin/pyspark-3.4.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.0-rc2-bin/pyspark-3.4.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.0-rc2-bin/pyspark-3.4.0.tar.gz.asc
==
--- dev/spark/v3.4.0-rc2-bin/pyspark-3.4.0.tar.gz.asc (added)
+++ dev/spark/v3.4.0-rc2-bin/pyspark-3.4.0.tar.gz.asc Thu Mar  2 07:42:27 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQAUvQTHHhpbnJvbmdA
+YXBhY2hlLm9yZwAKCRCn5XkIx6ThsSCWD/9LhMRWQlceBu5wuEwGdwCTSEvQvpCl
+zDn1HKCwqXQvzj5YPOllSBolHCQgy1U3S08CeGF8kB+hT/MSozif/+qzMNTFWfz8
+EEyB02XxWjOXO38muJ51/r3WXseoB0L/yMqdipgZAQRT5A5i9xBZqH718a7k6pow
+m+/8qD4oMYmnWE9X2TwW47uSCMpKOgZRSALBwx5HAQ6HADHfW3q6Rwdm6yL6vv0J
+n/FTMjeKAKwetSYhwDwPCXaTTKaw8h90IWHOykZdv8IoynUO4egKfoeHeOKQ8Dyl
+8mlqIWsQi0wdcrfAlKp2HjD001j0iUV8ZfDkZsmReTRNf8Y7yKdFF6BBAW+zPwAw
+ILsb0HeP50s36WiON7Ywjy8pXJdOBN+6QiM9CIP7c5D45RNAbPe8ARhDZwuHZTMy
+7jzAYnrjDIXlrFGmpFS2I+xk0/ZoI2H6BC8V7t5ZvhJ8Qm7SifAgfOt5G9rlUwu0
+BnCE3INQghRq5mv9aH40aHZPhVUN8woTxUussNXeqds4cAVXdvj7BQJMqZtplj1N
+k4bFKvjjtO/GbrbTcNTClqk7CtII4GRQCJWmV7ksvDejavRfDMJn6Bt/ZhHYfDPw
+rOXXuMX/HdVgH1E+RhntqnejilGuKNsWf08dZPgQ1kwMd2fnygDMoaUbG769nJqW
+JLAkWKLvu+YXFA==
+=R11G
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.0-rc2-bin/pyspark-3.4.0.tar.gz.sha512
==
--- dev/spark/v3.4.0-rc2-bin/pyspark-3.4.0.tar.gz.sha512 (added)
+++ dev/spark

[spark] branch branch-3.4 updated (4fa4d2fd54c -> aeacf0d0f24)

2023-03-01 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


from 4fa4d2fd54c [SPARK-41823][CONNECT][FOLLOW-UP][TESTS] Disable ANSI mode 
in ProtoToParsedPlanTestSuite
 add 759511bb59b Preparing Spark release v3.4.0-rc2
 new aeacf0d0f24 Preparing development version 3.4.1-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing development version 3.4.1-SNAPSHOT

2023-03-01 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git

commit aeacf0d0f24ec509b7bbf318bb71edb1cba8bc36
Author: Xinrong Meng 
AuthorDate: Thu Mar 2 06:25:37 2023 +

Preparing development version 3.4.1-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 4a32762b34c..fa7028630a8 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.0
+Version: 3.4.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 58dd9ef46e0..a4111eb64d9 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 95ea15552da..f9ecfb3d692 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index e4d98471bf9..22ee65b7d25 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 7a6d5aedf65..2c67da81ca4 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 1c421754083..219682e047d 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch

[spark] tag v3.4.0-rc2 created (now 759511bb59b)

2023-03-01 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to tag v3.4.0-rc2
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 759511bb59b (commit)
This tag includes the following new commits:

 new 759511bb59b Preparing Spark release v3.4.0-rc2

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing Spark release v3.4.0-rc2

2023-03-01 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to tag v3.4.0-rc2
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 759511bb59b206ac5ff18f377c239a2f38bf5db6
Author: Xinrong Meng 
AuthorDate: Thu Mar 2 06:25:32 2023 +

Preparing Spark release v3.4.0-rc2
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index fa7028630a8..4a32762b34c 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.1
+Version: 3.4.0
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index a4111eb64d9..58dd9ef46e0 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index f9ecfb3d692..95ea15552da 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 22ee65b7d25..e4d98471bf9 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 2c67da81ca4..7a6d5aedf65 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 219682e047d..1c421754083 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 22ce7

[spark] branch branch-3.4 updated: [SPARK-42510][CONNECT][PYTHON] Implement `DataFrame.mapInPandas`

2023-02-24 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 000895da3f6 [SPARK-42510][CONNECT][PYTHON] Implement 
`DataFrame.mapInPandas`
000895da3f6 is described below

commit 000895da3f6c0d17ccfdfe79c0ca34dfb9fb6e7b
Author: Xinrong Meng 
AuthorDate: Sat Feb 25 07:39:54 2023 +0800

[SPARK-42510][CONNECT][PYTHON] Implement `DataFrame.mapInPandas`

### What changes were proposed in this pull request?
Implement `DataFrame.mapInPandas` and enable parity tests to vanilla 
PySpark.

A proto message `FrameMap` is intorudced for `mapInPandas` and 
`mapInArrow`(to implement next).

### Why are the changes needed?
To reach parity with vanilla PySpark.

### Does this PR introduce _any_ user-facing change?
Yes. `DataFrame.mapInPandas` is supported. An example is as shown below.

```py
>>> df = spark.range(2)
>>> def filter_func(iterator):
...   for pdf in iterator:
... yield pdf[pdf.id == 1]
...
>>> df.mapInPandas(filter_func, df.schema)
DataFrame[id: bigint]
>>> df.mapInPandas(filter_func, df.schema).show()
+---+
| id|
+---+
|  1|
+---+
```

### How was this patch tested?
Unit tests.

Closes #40104 from xinrong-meng/mapInPandas.

Lead-authored-by: Xinrong Meng ]
Co-authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
(cherry picked from commit 9abccad1d93a243d7e47e53dcbc85568a460c529)
Signed-off-by: Xinrong Meng 
---
 .../main/protobuf/spark/connect/relations.proto|  10 +
 .../sql/connect/planner/SparkConnectPlanner.scala  |  18 +-
 dev/sparktestsupport/modules.py|   1 +
 python/pyspark/sql/connect/_typing.py  |   8 +-
 python/pyspark/sql/connect/client.py   |   2 +-
 python/pyspark/sql/connect/dataframe.py|  22 +-
 python/pyspark/sql/connect/expressions.py  |   6 +-
 python/pyspark/sql/connect/plan.py |  25 ++-
 python/pyspark/sql/connect/proto/relations_pb2.py  | 222 +++--
 python/pyspark/sql/connect/proto/relations_pb2.pyi |  36 
 python/pyspark/sql/connect/types.py|   4 +-
 python/pyspark/sql/connect/udf.py  |  20 +-
 python/pyspark/sql/pandas/map_ops.py   |   3 +
 .../sql/tests/connect/test_parity_pandas_map.py|  50 +
 python/pyspark/sql/tests/pandas/test_pandas_map.py |  46 +++--
 15 files changed, 331 insertions(+), 142 deletions(-)

diff --git 
a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto 
b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto
index 29fffd65c75..4d96b6b0c7e 100644
--- a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto
+++ b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto
@@ -60,6 +60,7 @@ message Relation {
 Unpivot unpivot = 25;
 ToSchema to_schema = 26;
 RepartitionByExpression repartition_by_expression = 27;
+FrameMap frame_map = 28;
 
 // NA functions
 NAFill fill_na = 90;
@@ -768,3 +769,12 @@ message RepartitionByExpression {
   // (Optional) number of partitions, must be positive.
   optional int32 num_partitions = 3;
 }
+
+message FrameMap {
+  // (Required) Input relation for a Frame Map API: mapInPandas, mapInArrow.
+  Relation input = 1;
+
+  // (Required) Input user-defined function of a Frame Map API.
+  CommonInlineUserDefinedFunction func = 2;
+}
+
diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
index 268bf02fad9..cc43c1cace3 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
@@ -24,7 +24,7 @@ import com.google.common.collect.{Lists, Maps}
 import com.google.protobuf.{Any => ProtoAny}
 
 import org.apache.spark.TaskContext
-import org.apache.spark.api.python.SimplePythonFunction
+import org.apache.spark.api.python.{PythonEvalType, SimplePythonFunction}
 import org.apache.spark.connect.proto
 import org.apache.spark.sql.{Column, Dataset, Encoders, SparkSession}
 import org.apache.spark.sql.catalyst.{expressions, AliasIdentifier, 
FunctionIdentifier}
@@ -106,6 +106,8 @@ class SparkConnectPlanner(val session: SparkSession) {
   case proto.Relation.RelTypeCase.UNPIVOT => 
transformUnpivot(rel.getUnpivot)
   case proto.Relation.RelTypeCase.REPARTITION_BY_EXPRESSION =>
 transformRepartitionByExpression(rel.getRepartitionByExpression)
+  

[spark] branch master updated: [SPARK-42510][CONNECT][PYTHON] Implement `DataFrame.mapInPandas`

2023-02-24 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9abccad1d93 [SPARK-42510][CONNECT][PYTHON] Implement 
`DataFrame.mapInPandas`
9abccad1d93 is described below

commit 9abccad1d93a243d7e47e53dcbc85568a460c529
Author: Xinrong Meng 
AuthorDate: Sat Feb 25 07:39:54 2023 +0800

[SPARK-42510][CONNECT][PYTHON] Implement `DataFrame.mapInPandas`

### What changes were proposed in this pull request?
Implement `DataFrame.mapInPandas` and enable parity tests to vanilla 
PySpark.

A proto message `FrameMap` is intorudced for `mapInPandas` and 
`mapInArrow`(to implement next).

### Why are the changes needed?
To reach parity with vanilla PySpark.

### Does this PR introduce _any_ user-facing change?
Yes. `DataFrame.mapInPandas` is supported. An example is as shown below.

```py
>>> df = spark.range(2)
>>> def filter_func(iterator):
...   for pdf in iterator:
... yield pdf[pdf.id == 1]
...
>>> df.mapInPandas(filter_func, df.schema)
DataFrame[id: bigint]
>>> df.mapInPandas(filter_func, df.schema).show()
+---+
| id|
+---+
|  1|
+---+
```

### How was this patch tested?
Unit tests.

Closes #40104 from xinrong-meng/mapInPandas.

Lead-authored-by: Xinrong Meng ]
Co-authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 .../main/protobuf/spark/connect/relations.proto|  10 +
 .../sql/connect/planner/SparkConnectPlanner.scala  |  18 +-
 dev/sparktestsupport/modules.py|   1 +
 python/pyspark/sql/connect/_typing.py  |   8 +-
 python/pyspark/sql/connect/client.py   |   2 +-
 python/pyspark/sql/connect/dataframe.py|  22 +-
 python/pyspark/sql/connect/expressions.py  |   6 +-
 python/pyspark/sql/connect/plan.py |  25 ++-
 python/pyspark/sql/connect/proto/relations_pb2.py  | 222 +++--
 python/pyspark/sql/connect/proto/relations_pb2.pyi |  36 
 python/pyspark/sql/connect/types.py|   4 +-
 python/pyspark/sql/connect/udf.py  |  20 +-
 python/pyspark/sql/pandas/map_ops.py   |   3 +
 .../sql/tests/connect/test_parity_pandas_map.py|  50 +
 python/pyspark/sql/tests/pandas/test_pandas_map.py |  46 +++--
 15 files changed, 331 insertions(+), 142 deletions(-)

diff --git 
a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto 
b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto
index 29fffd65c75..4d96b6b0c7e 100644
--- a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto
+++ b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto
@@ -60,6 +60,7 @@ message Relation {
 Unpivot unpivot = 25;
 ToSchema to_schema = 26;
 RepartitionByExpression repartition_by_expression = 27;
+FrameMap frame_map = 28;
 
 // NA functions
 NAFill fill_na = 90;
@@ -768,3 +769,12 @@ message RepartitionByExpression {
   // (Optional) number of partitions, must be positive.
   optional int32 num_partitions = 3;
 }
+
+message FrameMap {
+  // (Required) Input relation for a Frame Map API: mapInPandas, mapInArrow.
+  Relation input = 1;
+
+  // (Required) Input user-defined function of a Frame Map API.
+  CommonInlineUserDefinedFunction func = 2;
+}
+
diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
index 268bf02fad9..cc43c1cace3 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
@@ -24,7 +24,7 @@ import com.google.common.collect.{Lists, Maps}
 import com.google.protobuf.{Any => ProtoAny}
 
 import org.apache.spark.TaskContext
-import org.apache.spark.api.python.SimplePythonFunction
+import org.apache.spark.api.python.{PythonEvalType, SimplePythonFunction}
 import org.apache.spark.connect.proto
 import org.apache.spark.sql.{Column, Dataset, Encoders, SparkSession}
 import org.apache.spark.sql.catalyst.{expressions, AliasIdentifier, 
FunctionIdentifier}
@@ -106,6 +106,8 @@ class SparkConnectPlanner(val session: SparkSession) {
   case proto.Relation.RelTypeCase.UNPIVOT => 
transformUnpivot(rel.getUnpivot)
   case proto.Relation.RelTypeCase.REPARTITION_BY_EXPRESSION =>
 transformRepartitionByExpression(rel.getRepartitionByExpression)
+  case proto.Relation.RelTypeCase.FRAME_MAP =>
+transformFrameMap(rel.getFrameMap)
   case proto.Relation.R

svn commit: r60251 - /dev/spark/KEYS

2023-02-21 Thread xinrong
Author: xinrong
Date: Wed Feb 22 07:01:27 2023
New Revision: 60251

Log:
Update KEYS

Modified:
dev/spark/KEYS

Modified: dev/spark/KEYS
==
--- dev/spark/KEYS (original)
+++ dev/spark/KEYS Wed Feb 22 07:01:27 2023
@@ -1850,59 +1850,59 @@ JznYPjY83fSKkeCh
 =3Ggj
 -END PGP PUBLIC KEY BLOCK-
 
-pub   rsa4096 2022-08-16 [SC]
-  0C33D35E1A9296B32CF31005ACD84F20930B47E8
-uid   [ultimate] Xinrong Meng (CODE SIGNING KEY) 
-sub   rsa4096 2022-08-16 [E]
+pub   rsa4096 2023-02-21 [SC]
+  CC68B3D16FE33A766705160BA7E57908C7A4E1B1
+uid   [ultimate] Xinrong Meng (RELEASE SIGNING KEY) 

+sub   rsa4096 2023-02-21 [E]
 -BEGIN PGP PUBLIC KEY BLOCK-
 
-mQINBGL64s8BEADCeefEm9XB63o/xIGpnwurEL24h5LsZdA7k7juZ5C1Fu6m5amT
-0A1n49YncYv6jDQD8xh+eiZ11+mYEAzkmGD+aVEMQA0/Zrp0rMe22Ymq5fQHfRCO
-88sQl4PvmqaElcAswFz7RP+55GWSIfEbZIJhZQdukaVCZuC+Xpb68TAj2OSXZ+Mt
-m8RdJXIJpmD0P6R7bvY4LPZL8tY7wtnxUj1I9wRnXc0AnbPfI6gGyF+b0x54b4Ey
-2+sZ6tNH501I9hgdEOWj+nqQFZTTzZQPI1r3nPIA28T9VDOKi5dmoI6iXFjCWZ2N
-dmsw8GN+45V1udOgylE2Mop7URzOQYlqaFnJvXzO/nZhAqbetrMmZ6jmlbqLEq/D
-C8cgYFuMwER3oAC0OwpSz2HLCya95xHDdPqX+Iag0h0bbFBxSNpgzQiUk1mvSYXa
-+7HGQ3rIfy7+87hA1BIHaN0L1oOw37UWk2IGDvS29JlGJ3SJDX5Ir5uBvW6k9So6
-xG9vT+l+R878rLcjJLJT4Me4pk4z8O4Uo+IY0uptiTYnvYRXBOw9wk9KpSckbr+s
-I2keVwa+0fui4c1ESwNHR8HviALho9skvwaCAP3TUZ43SHeDU840M9LwDWc6VNc1
-x30YbgYeKtyU1deh7pcBhykUJPrZ457OllG8SbnhAncwmf8TaJjUkQARAQAB
-tDRYaW5yb25nIE1lbmcgKENPREUgU0lHTklORyBLRVkpIDx4aW5yb25nQGFwYWNo
-ZS5vcmc+iQJOBBMBCAA4FiEEDDPTXhqSlrMs8xAFrNhPIJMLR+gFAmL64s8CGwMF
-CwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQrNhPIJMLR+gNSRAAkhNM7vAFRwaX
-MachhS97+L2ZklerzeZuCP0zeYZ9gZloGUx+eM3MWOglUcKH0f6DjPitMMCr1Qbo
-OsENANTS5ZOp4r4rhbbNhYbA8Wbx8H+ZABmCuUNJMjmeVh3qL1WmHclApegqxiSH
-uc9xXB1RZOJH2pS2v7UXW2c/Y745oT/YxWX9hBeJUPWmg6M6jn1/osnqmUngXSvB
-HNzxzHT1gJJNEcRU3r5bKAJlLWBZzLO4pIgtFqIfpS79ieG54OwedrW3oqOheFKa
-LTYInFAdscmZwIo8jHakqf+UMu3H5dzABBRATDvcci7nBPi+J8F7qLvklzb1zd0L
-Ir/QnAy3zFUYUbwwRXDy0Gi0HsU5xP9QYT3pmtW3I+Xlwpso417XoE+1DYtizjbx
-FuJaSNs7K7VPaELezdvtFL0SGYNkpxz7EiVcW6TxmLsLBoNAeaKhHYtwhblQKznv
-6mEbjmiAo3oB68ghI+3xW2mZ+T+t3sgl5aNWiZ6RQx5v4liYc4vShmewcKGWvN7T
-RC5Ert0GxMJGsx7fIRAgWDOI1aMj5bx9H23d3RKxJWrRCXhSlg1lyzVj+GCrhYAy
-16/JH5ph0m+FCVwAP0GhHsZCQV1AT+YL7lgEZvmGq0ucDShc69lLh7qsxMg7zckk
-l66F14Imuz0EasVCdI3IwkuTFch9Quu5Ag0EYvrizwEQANpINEPd+Vio1D0opPBO
-Sa4keWk5IvvGETt6jUBemQten1gOB89Zba3E8ZgJpPobaThFrpsQJ9wNM8+KBHGm
-U+DTP+JC+65J9Eq6KA8qcH2jn3xKBWipWUACKUCvpFSNq63f3+RVbAyTYdykRhEU
-Ih+7eFtl3X0Q6v92TMZL26euXqt73UoOsoulKEmfSyhiQBQX7WNCtq3JR/mZ4+OA
-/N3J7qw+emvKG3t8h3/5CtpZWEMaJwaGyyENScsw5KEOYjl9o11mMeYRYfZ0n0h7
-DA8BmBl/k71+UvdopdzuwjRib02uZfdCC15tltLpoVeL/pa0GRmTRuCJARwjDD95
-xbrrYYqw2wD6l3Mtv/EooIBdzGpP15VnD4DFC5W9vxnxuEfSnX0DxCObsd6MCzZw
-GOiF4HudfFzB2SiE/OXNaAxdpSD9C8n0Y3ac74dk6uamzCkSnCjzzAOytFZY18fi
-N5ihDA9+2TeEOL0RVrQw0Mdc4X80A1dlCJ6Gh1Py4WOtDxB5UmSY2olvV6p5pRRD
-1HEnM9bivPdEErYpUI72K4L5feXFxt/obQ0rZMmmnYMldAcPcqsTMVgPWZICK/z0
-X/SrOR0YEa28XA+V69o4TwPR77oUK6t3SiFzAi3VmQtAP6NkqL+FNMa0V1ZiEPse
-lZhKVziNh5Jb8bnkQA6+9Md3ABEBAAGJAjYEGAEIACAWIQQMM9NeGpKWsyzzEAWs
-2E8gkwtH6AUCYvrizwIbDAAKCRCs2E8gkwtH6OYIEACtPjMCg+x+vxVU8KhqwxpA
-UyDOuNbzB2TSMmETgGqHDqk/F4eSlMvZTukGlo5yPDYXhd7vUT45mrlRq8ljzBLr
-NkX2mkGgocdjAjSF2rgugMb+APpKNFxZtUPKosyyOPS9z4+4tjxfCpj2u2hZy8PD
-C3/6dz9Yga0kgWu2GWFZFFZiGxPyUCkjnUBWz53dT/1JwWt3W81bihVfhLX9CVgO
-KPEoZ96BaEucAHY0r/yq0zAq/+DCTYRrDLkeuZaDTB1RThWOrW+GCoPcIxbLi4/j
-/YkIGQCaYvpVsuacklwqhSxhucqctRklGHLrjLdxrqcS1pIfraCsRJazUoO1Uu7n
-DQ/aF9fczzX9nKv7t341lGn+Ujv5EEuaA/y38XSffsHxCmpEcvjGAH0NZsjHbYd/
-abeFTAnMV1r2r9/UcyuosEsaRyjW4Ljd51wWyGVv4Ky40HJYRmtefJX+1QDAntPJ
-lVPHQCa2B/YIDrFeokXFxDqONkA+fFm+lDb83lhAAhjxCwfbytZqJFTvYh7TQTLx
-3+ZA1BoFhxIHnR2mrFK+yqny9w6YAeZ8YMG5edH1EKoNVfic7OwwId1eQL6FCKCv
-F3sNZiCC3i7P6THg9hZSF1eNbfiuZuMxUbw3OZgYhyXLB023vEZ1mUQCAcbfsQxU
-sw6Rs2zVSxvPcg5CN8APig==
-=fujW
+mQINBGP0Hf0BEACyHWHb/DyfpkIC64sJQKR7GGLBicFOxsVNYrxxcZJvdnfjFnHC
+ajib6m6dIQ5g+YgH23U/jIpHhZbXLWrQkyuYW4JbaG8uobK5S7crAqpYjtwRJHRe
+R4f8DO6nWUNxZGHYFU46zvt7GuBjN005u+X2Oxq9xau+CVgkS1r/vbykxDwGOcYM
+/vmgITo+Zk2zs2Krea+ul0aVZRvhGB8ZHHSdz83NTDm0DwlzALFodLWIRvSblqtZ
+SPVKntzmN6OYjVjPMK6HgLlVlH2WqOIexuZnbadioM6+Hg/eihXQVLU7wpBBliFA
+KTUnCNRRxEF8M7zPKEpyQbV2KJqMLdGLpE+ZEfzOKUxbCBmzF1MQ5Pxm4mm8RlvA
+DDoOI/I3IstoizsxI6hV7U3w22R4c++qmFtX/lzgDnCKfISBTQaofiVlvMg7fx+f
+7bA1oJxlMJMpjNO9s3qudMAxtrSzHUnIt2ThsxcsL+wfu/HxvR1+PfX6eCCXaVjN
+/ii0EkWbHBq6Jb1IDzKuU02oX0TWQisDqn+IHq8/Q46PH3H2nF6hfg8zJXMkTusc
+T8AmCoQCeVEPMbnVTWW9sVJC2gQPrCQJHEUbu5OHb9REtJ3GqtRw+mogTrpO5ads
+PO61a94fJQcTDgR59hShrXiXxUK07C/rXqexcVnXEZyfn/5ZnqmgdVNt2wARAQAB
+tDdYaW5yb25nIE1lbmcgKFJFTEVBU0UgU0lHTklORyBLRVkpIDx4aW5yb25nQGFw
+YWNoZS5vcmc+iQJOBBMBCgA4FiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmP0Hf0C
+GwMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQp+V5CMek4bFlWg//YIN9HNQ2
+yj3gW9lXVTWtSzJvlnwZr5V9JBGevpWMNF3U38Dk0nlQUiSvHdpfQjIyITOYR9Iv
+GxuZCp5szVaRc00pfQWFy684zLvwqrjKekLzCpkqTOGXHO2RxeJH2ZBqcI9OSpR5
+B2J94dlQItM/bKsXhMNOwmVtS6kSW36aN/0Nd9ZQF

svn commit: r60249 - /dev/spark/KEYS

2023-02-21 Thread xinrong
Author: xinrong
Date: Wed Feb 22 03:51:47 2023
New Revision: 60249

Log:
Update KEYS

Modified:
dev/spark/KEYS

Modified: dev/spark/KEYS
==
--- dev/spark/KEYS (original)
+++ dev/spark/KEYS Wed Feb 22 03:51:47 2023
@@ -1848,4 +1848,61 @@ P+3d/bY7eHLaFnkIuQR2dzaJti/nf2b/7VQHLm6H
 Y2wH1LgDJJsoBLPFNxhgTLjMlErwsZlacmXyogrmOS+ZvgQz/LZ1mIryTAkd1Gym
 JznYPjY83fSKkeCh
 =3Ggj
--END PGP PUBLIC KEY BLOCK-
\ No newline at end of file
+-END PGP PUBLIC KEY BLOCK-
+
+pub   rsa4096 2022-08-16 [SC]
+  0C33D35E1A9296B32CF31005ACD84F20930B47E8
+uid   [ultimate] Xinrong Meng (CODE SIGNING KEY) 
+sub   rsa4096 2022-08-16 [E]
+-BEGIN PGP PUBLIC KEY BLOCK-
+
+mQINBGL64s8BEADCeefEm9XB63o/xIGpnwurEL24h5LsZdA7k7juZ5C1Fu6m5amT
+0A1n49YncYv6jDQD8xh+eiZ11+mYEAzkmGD+aVEMQA0/Zrp0rMe22Ymq5fQHfRCO
+88sQl4PvmqaElcAswFz7RP+55GWSIfEbZIJhZQdukaVCZuC+Xpb68TAj2OSXZ+Mt
+m8RdJXIJpmD0P6R7bvY4LPZL8tY7wtnxUj1I9wRnXc0AnbPfI6gGyF+b0x54b4Ey
+2+sZ6tNH501I9hgdEOWj+nqQFZTTzZQPI1r3nPIA28T9VDOKi5dmoI6iXFjCWZ2N
+dmsw8GN+45V1udOgylE2Mop7URzOQYlqaFnJvXzO/nZhAqbetrMmZ6jmlbqLEq/D
+C8cgYFuMwER3oAC0OwpSz2HLCya95xHDdPqX+Iag0h0bbFBxSNpgzQiUk1mvSYXa
++7HGQ3rIfy7+87hA1BIHaN0L1oOw37UWk2IGDvS29JlGJ3SJDX5Ir5uBvW6k9So6
+xG9vT+l+R878rLcjJLJT4Me4pk4z8O4Uo+IY0uptiTYnvYRXBOw9wk9KpSckbr+s
+I2keVwa+0fui4c1ESwNHR8HviALho9skvwaCAP3TUZ43SHeDU840M9LwDWc6VNc1
+x30YbgYeKtyU1deh7pcBhykUJPrZ457OllG8SbnhAncwmf8TaJjUkQARAQAB
+tDRYaW5yb25nIE1lbmcgKENPREUgU0lHTklORyBLRVkpIDx4aW5yb25nQGFwYWNo
+ZS5vcmc+iQJOBBMBCAA4FiEEDDPTXhqSlrMs8xAFrNhPIJMLR+gFAmL64s8CGwMF
+CwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQrNhPIJMLR+gNSRAAkhNM7vAFRwaX
+MachhS97+L2ZklerzeZuCP0zeYZ9gZloGUx+eM3MWOglUcKH0f6DjPitMMCr1Qbo
+OsENANTS5ZOp4r4rhbbNhYbA8Wbx8H+ZABmCuUNJMjmeVh3qL1WmHclApegqxiSH
+uc9xXB1RZOJH2pS2v7UXW2c/Y745oT/YxWX9hBeJUPWmg6M6jn1/osnqmUngXSvB
+HNzxzHT1gJJNEcRU3r5bKAJlLWBZzLO4pIgtFqIfpS79ieG54OwedrW3oqOheFKa
+LTYInFAdscmZwIo8jHakqf+UMu3H5dzABBRATDvcci7nBPi+J8F7qLvklzb1zd0L
+Ir/QnAy3zFUYUbwwRXDy0Gi0HsU5xP9QYT3pmtW3I+Xlwpso417XoE+1DYtizjbx
+FuJaSNs7K7VPaELezdvtFL0SGYNkpxz7EiVcW6TxmLsLBoNAeaKhHYtwhblQKznv
+6mEbjmiAo3oB68ghI+3xW2mZ+T+t3sgl5aNWiZ6RQx5v4liYc4vShmewcKGWvN7T
+RC5Ert0GxMJGsx7fIRAgWDOI1aMj5bx9H23d3RKxJWrRCXhSlg1lyzVj+GCrhYAy
+16/JH5ph0m+FCVwAP0GhHsZCQV1AT+YL7lgEZvmGq0ucDShc69lLh7qsxMg7zckk
+l66F14Imuz0EasVCdI3IwkuTFch9Quu5Ag0EYvrizwEQANpINEPd+Vio1D0opPBO
+Sa4keWk5IvvGETt6jUBemQten1gOB89Zba3E8ZgJpPobaThFrpsQJ9wNM8+KBHGm
+U+DTP+JC+65J9Eq6KA8qcH2jn3xKBWipWUACKUCvpFSNq63f3+RVbAyTYdykRhEU
+Ih+7eFtl3X0Q6v92TMZL26euXqt73UoOsoulKEmfSyhiQBQX7WNCtq3JR/mZ4+OA
+/N3J7qw+emvKG3t8h3/5CtpZWEMaJwaGyyENScsw5KEOYjl9o11mMeYRYfZ0n0h7
+DA8BmBl/k71+UvdopdzuwjRib02uZfdCC15tltLpoVeL/pa0GRmTRuCJARwjDD95
+xbrrYYqw2wD6l3Mtv/EooIBdzGpP15VnD4DFC5W9vxnxuEfSnX0DxCObsd6MCzZw
+GOiF4HudfFzB2SiE/OXNaAxdpSD9C8n0Y3ac74dk6uamzCkSnCjzzAOytFZY18fi
+N5ihDA9+2TeEOL0RVrQw0Mdc4X80A1dlCJ6Gh1Py4WOtDxB5UmSY2olvV6p5pRRD
+1HEnM9bivPdEErYpUI72K4L5feXFxt/obQ0rZMmmnYMldAcPcqsTMVgPWZICK/z0
+X/SrOR0YEa28XA+V69o4TwPR77oUK6t3SiFzAi3VmQtAP6NkqL+FNMa0V1ZiEPse
+lZhKVziNh5Jb8bnkQA6+9Md3ABEBAAGJAjYEGAEIACAWIQQMM9NeGpKWsyzzEAWs
+2E8gkwtH6AUCYvrizwIbDAAKCRCs2E8gkwtH6OYIEACtPjMCg+x+vxVU8KhqwxpA
+UyDOuNbzB2TSMmETgGqHDqk/F4eSlMvZTukGlo5yPDYXhd7vUT45mrlRq8ljzBLr
+NkX2mkGgocdjAjSF2rgugMb+APpKNFxZtUPKosyyOPS9z4+4tjxfCpj2u2hZy8PD
+C3/6dz9Yga0kgWu2GWFZFFZiGxPyUCkjnUBWz53dT/1JwWt3W81bihVfhLX9CVgO
+KPEoZ96BaEucAHY0r/yq0zAq/+DCTYRrDLkeuZaDTB1RThWOrW+GCoPcIxbLi4/j
+/YkIGQCaYvpVsuacklwqhSxhucqctRklGHLrjLdxrqcS1pIfraCsRJazUoO1Uu7n
+DQ/aF9fczzX9nKv7t341lGn+Ujv5EEuaA/y38XSffsHxCmpEcvjGAH0NZsjHbYd/
+abeFTAnMV1r2r9/UcyuosEsaRyjW4Ljd51wWyGVv4Ky40HJYRmtefJX+1QDAntPJ
+lVPHQCa2B/YIDrFeokXFxDqONkA+fFm+lDb83lhAAhjxCwfbytZqJFTvYh7TQTLx
+3+ZA1BoFhxIHnR2mrFK+yqny9w6YAeZ8YMG5edH1EKoNVfic7OwwId1eQL6FCKCv
+F3sNZiCC3i7P6THg9hZSF1eNbfiuZuMxUbw3OZgYhyXLB023vEZ1mUQCAcbfsQxU
+sw6Rs2zVSxvPcg5CN8APig==
+=fujW
+-END PGP PUBLIC KEY BLOCK-



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r60241 - in /dev/spark/v3.4.0-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api

2023-02-21 Thread xinrong
Author: xinrong
Date: Tue Feb 21 13:34:14 2023
New Revision: 60241

Log:
Apache Spark v3.4.0-rc1 docs


[This commit notification would consist of 2806 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r60238 - /dev/spark/v3.4.0-rc1-bin/

2023-02-21 Thread xinrong
Author: xinrong
Date: Tue Feb 21 11:57:55 2023
New Revision: 60238

Log:
Apache Spark v3.4.0-rc1

Added:
dev/spark/v3.4.0-rc1-bin/
dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz   (with props)
dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.asc
dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.sha512
dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz   (with props)
dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.asc
dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.sha512
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz   (with 
props)
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.asc
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.sha512
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3.tgz   (with props)
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3.tgz.asc
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3.tgz.sha512
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-without-hadoop.tgz.asc
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.4.0-rc1-bin/spark-3.4.0.tgz   (with props)
dev/spark/v3.4.0-rc1-bin/spark-3.4.0.tgz.asc
dev/spark/v3.4.0-rc1-bin/spark-3.4.0.tgz.sha512

Added: dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.asc
==
--- dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.asc (added)
+++ dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.asc Tue Feb 21 11:57:55 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmP0sVMTHHhpbnJvbmdA
+YXBhY2hlLm9yZwAKCRCn5XkIx6Thsbk1D/4wKDoCUBbr0bOOPpGKbMyWggJQDdvl
+xCDXR5nFFkLdY6vZFerIp32jX1JFQA2Enr24iCBy00ERszFT9LMRP66nOG3OseU1
+6eI4Y4l5ACAD35qdUjFsuPNPy71Q2HqWrY52isMZWfj8TYY9X3T3w9Wox6KgTOon
+rGoOtj+N6tAF5ACvJIX43li8JPesJQNl1epbu2LtrZa+tFyfgQBowuHmhiQ5PQ/v
+EufANZytLWllzX81EfNbiJ9hN9geqIHgXew6b1rtd8IS05PdDimA/uwtP+LqBBqq
+MKfUA6Tf8T9SpN36ZN6/lfOKVKu0OFXc9qfJIj9cdBfhTcoP1vUGVMqNtWEQQFqo
+DZVRnBrnnx5lQOYry3gm4UgdLtHpwqvOZtqpmbvSHV503+JCqBnFnw8jvGzaVfWZ
+OIPa4AuhjAxqMcnCdLHmpg/QcX07/tPXPO0kpEWz7a1QjF6C+gidtbgIghY/HIzs
+lNfI3TdWop3Wwnpa0kHHlwi15jfeaxnPQDtIw/YRWojbztE0wG8rXycoWl2h0o05
+XQ55Rl9qEviW3GPOW52SGAD47+2j3eU6lFEs+xz85E/jxIneYkuweMJ5Vk1iTdEH
+7yfjQqVozR3QeyaYll9W1ax50LUtrMx5vTMdy82L0yzg0NQctqEa+I3HRQjgxVFB
+7gqTLxqG8bpyPA==
+=+Kud
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.sha512
==
--- dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.sha512 (added)
+++ dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.sha512 Tue Feb 21 11:57:55 2023
@@ -0,0 +1 @@
+21574f5fb95f397640c896678002559a10b6e264b3887115128bde380682065e8a3883dd94136c318c78f3047a7cd4a2763b617863686329b47532983f171240
  SparkR_3.4.0.tar.gz

Added: dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.asc
==
--- dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.asc (added)
+++ dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.asc Tue Feb 21 11:57:55 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmP0sVUTHHhpbnJvbmdA
+YXBhY2hlLm9yZwAKCRCn5XkIx6ThsWbPD/9dWcxjrRR54QccE8zwX5oaiboVFXuI
+0BLahV54IQi4HZjVgRHzbEWD/qaemW5Brcos003nsaGnXT0m0oi656X2967ZuJTk
+zYanrIafACwplVo7uxcq2VBp6IKcDkWEUL42fAcV5GN1/1NpNHqzZqZMGe5ufKLB
+05Np0ac8L6XXMpIG0to6H1LEmAW7/4PBARpzt6/TgZjoEI7a7YHMUlL0OjmHmP/m
+3Ck8slg+Osk2opYJL4AXycFh36Ns43OG3TnhfLYyDG0jtiXpWBZ4Yt2bin55j0f/
+yrDe1lDlRJ14pXay2f/s5eFrz16qHfRluWZzxcEyJjZva1AD5V1XMh/zsRGDfvUZ
+BkEM2GHYn3gZH9uuGfYbqL+pcZgrmVjZMgcZfhjyxLrRW8WBFr9g5lCIQF+4lpU8
+JwM4W3eOLyaC3wpVTfPU8rJfGExeBLhJ7zAyw65+yUx27KMUWatzGuQSA63iE1bg
+FIruQABSDsenFARnLybB8l41t0PTGlWU9+g5E4BlU/+GbnxaQEuOTSnZOenhPOGe
+n2g4Yfr81aYqVX8VKL0wzYXeB39SaXrtGhUaWVjFookNb42SNB1IPG2xQ+qQtcMw
+jv1m+1BIMWXDLZcLlrIViEzoyNhIy83CipDujJpoh4tlXb3OHOJqYuIZjMPhgVcB
+vtJFP8xIOdwRIg==
+=058e
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.sha512
==
--- dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.sha512 (added)
+++ dev/spark

[spark] branch branch-3.4 updated (f394322be3b -> 63be7fd7334)

2023-02-21 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


from f394322be3b [SPARK-42507][SQL][TESTS] Simplify ORC schema merging 
conflict error check
 add e2484f626bb Preparing Spark release v3.4.0-rc1
 new 63be7fd7334 Preparing development version 3.4.1-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing development version 3.4.1-SNAPSHOT

2023-02-21 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 63be7fd7334111474e79d88c687d376ede30e37f
Author: Xinrong Meng 
AuthorDate: Tue Feb 21 10:39:26 2023 +

Preparing development version 3.4.1-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 4a32762b34c..fa7028630a8 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.0
+Version: 3.4.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 58dd9ef46e0..a4111eb64d9 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 95ea15552da..f9ecfb3d692 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index e4d98471bf9..22ee65b7d25 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 7a6d5aedf65..2c67da81ca4 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 1c421754083..219682e047d 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch

[spark] tag v3.4.0-rc1 created (now e2484f626bb)

2023-02-21 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to tag v3.4.0-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git


  at e2484f626bb (commit)
This tag includes the following new commits:

 new e2484f626bb Preparing Spark release v3.4.0-rc1

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing Spark release v3.4.0-rc1

2023-02-21 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to tag v3.4.0-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git

commit e2484f626bb338274665a49078b528365ea18c3b
Author: Xinrong Meng 
AuthorDate: Tue Feb 21 10:39:21 2023 +

Preparing Spark release v3.4.0-rc1
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index fa7028630a8..4a32762b34c 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.1
+Version: 3.4.0
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index a4111eb64d9..58dd9ef46e0 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index f9ecfb3d692..95ea15552da 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 22ee65b7d25..e4d98471bf9 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 2c67da81ca4..7a6d5aedf65 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 219682e047d..1c421754083 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
inde

[spark] branch branch-3.4 updated: [SPARK-42507][SQL][TESTS] Simplify ORC schema merging conflict error check

2023-02-21 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new f394322be3b [SPARK-42507][SQL][TESTS] Simplify ORC schema merging 
conflict error check
f394322be3b is described below

commit f394322be3b9a0451e0dff158129b607549b9160
Author: Dongjoon Hyun 
AuthorDate: Tue Feb 21 17:48:09 2023 +0800

[SPARK-42507][SQL][TESTS] Simplify ORC schema merging conflict error check

### What changes were proposed in this pull request?

This PR aims to simplify ORC schema merging conflict error check.

### Why are the changes needed?

Currently, `branch-3.4` CI is broken because the order of partitions.
- https://github.com/apache/spark/runs/11463120795
- https://github.com/apache/spark/runs/11463886897
- https://github.com/apache/spark/runs/11467827738
- https://github.com/apache/spark/runs/11471484144
- https://github.com/apache/spark/runs/11471507531
- https://github.com/apache/spark/runs/11474764316

![Screenshot 2023-02-20 at 12 30 19 
PM](https://user-images.githubusercontent.com/9700541/220193503-6d6ce2ce-3fd6-4b01-b91c-bc1ec1f41c03.png)

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass the CIs.

Closes #40101 from dongjoon-hyun/SPARK-42507.

Authored-by: Dongjoon Hyun 
Signed-off-by: Xinrong Meng 
(cherry picked from commit 0c20263dcd0c394f8bfd6fa2bfc62031135de06a)
Signed-off-by: Xinrong Meng 
---
 .../spark/sql/execution/datasources/orc/OrcSourceSuite.scala   | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
index c821276431e..024f5f6b67e 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
@@ -455,11 +455,8 @@ abstract class OrcSuite
 throw new UnsupportedOperationException(s"Unknown ORC 
implementation: $impl")
 }
 
-checkError(
-  exception = innerException.asInstanceOf[SparkException],
-  errorClass = "CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE",
-  parameters = Map("left" -> "\"BIGINT\"", "right" -> "\"STRING\"")
-)
+assert(innerException.asInstanceOf[SparkException].getErrorClass ===
+  "CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE")
   }
 
   // it is ok if no schema merging


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (0e8a20e6da1 -> 0c20263dcd0)

2023-02-21 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 0e8a20e6da1 [SPARK-37099][SQL] Introduce the group limit of Window for 
rank-based filter to optimize top-k computation
 add 0c20263dcd0 [SPARK-42507][SQL][TESTS] Simplify ORC schema merging 
conflict error check

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/orc/OrcSourceSuite.scala   | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing development version 3.4.1-SNAPSHOT

2023-02-20 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 1dfa58d78eba7080a244945c23f7b35b62dde12b
Author: Xinrong Meng 
AuthorDate: Tue Feb 21 02:43:10 2023 +

Preparing development version 3.4.1-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 4a32762b34c..fa7028630a8 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.0
+Version: 3.4.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 58dd9ef46e0..a4111eb64d9 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 95ea15552da..f9ecfb3d692 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index e4d98471bf9..22ee65b7d25 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 7a6d5aedf65..2c67da81ca4 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 1c421754083..219682e047d 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch

[spark] branch branch-3.4 updated (4560d4c4f75 -> 1dfa58d78eb)

2023-02-20 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


from 4560d4c4f75 [SPARK-41952][SQL] Fix Parquet zstd off-heap memory leak 
as a workaround for PARQUET-2160
 add 81d39dcf742 Preparing Spark release v3.4.0-rc1
 new 1dfa58d78eb Preparing development version 3.4.1-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] tag v3.4.0-rc1 created (now 81d39dcf742)

2023-02-20 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to tag v3.4.0-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 81d39dcf742 (commit)
This tag includes the following new commits:

 new 81d39dcf742 Preparing Spark release v3.4.0-rc1

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing Spark release v3.4.0-rc1

2023-02-20 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to tag v3.4.0-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 81d39dcf742ed7114d6e01ecc2487825651e30cb
Author: Xinrong Meng 
AuthorDate: Tue Feb 21 02:43:05 2023 +

Preparing Spark release v3.4.0-rc1
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index fa7028630a8..4a32762b34c 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.1
+Version: 3.4.0
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index a4111eb64d9..58dd9ef46e0 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index f9ecfb3d692..95ea15552da 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 22ee65b7d25..e4d98471bf9 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 2c67da81ca4..7a6d5aedf65 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 219682e047d..1c421754083 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
inde

svn commit: r60229 - /dev/spark/v3.4.0-rc1-bin/

2023-02-20 Thread xinrong
Author: xinrong
Date: Tue Feb 21 00:44:12 2023
New Revision: 60229

Log:
Removing RC artifacts.

Removed:
dev/spark/v3.4.0-rc1-bin/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r60203 - /dev/spark/v3.4.0-rc1-bin/

2023-02-19 Thread xinrong
Author: xinrong
Date: Mon Feb 20 01:01:42 2023
New Revision: 60203

Log:
Apache Spark v3.4.0-rc1

Added:
dev/spark/v3.4.0-rc1-bin/
dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz   (with props)
dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.asc
dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.sha512
dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz   (with props)
dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.asc
dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.sha512
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz   (with 
props)
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.asc
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.sha512
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3.tgz   (with props)
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3.tgz.asc
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3.tgz.sha512
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-without-hadoop.tgz.asc
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.4.0-rc1-bin/spark-3.4.0.tgz   (with props)
dev/spark/v3.4.0-rc1-bin/spark-3.4.0.tgz.asc
dev/spark/v3.4.0-rc1-bin/spark-3.4.0.tgz.sha512

Added: dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.asc
==
--- dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.asc (added)
+++ dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.asc Mon Feb 20 01:01:42 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEDDPTXhqSlrMs8xAFrNhPIJMLR+gFAmPxf5cTHHhpbnJvbmdA
+YXBhY2hlLm9yZwAKCRCs2E8gkwtH6I6gEACmdKxXlIrG6Nzi7Hv8Xie11LIRzVUP
+59kSQ/bOYEdloW5gx5nLg+Cpcwh+yvgEvT0clvTNznXD4NEDRuS9XyPsRoXos+Ct
+YL/xJo23+3zX1/OGE4P/fi7NXrgC3GmX3KKzpn3RkKuC6QRh6U1R1jlkl896LcHK
+fOcLDuLCAKA6fy+EmlkX6H4sZGGLM5b2gYJcukvbA8bH5kdyWF2mPgprYwVUtryE
+UfciZ9O5BSaawA5fo2MTmaI/9JAN9j1Vnxg+CQVnDN9arnQMp/0PegblyEa7ZRjt
+ww8r/Ylq5F9Yi1wFLLhkgyF7KzLQtO8Bl/ar1UoDhWnTnNaAEUbEtVCN2Wy1E1y/
+BK2nKYzNM3cqXnLXMyXxSVVl6Cx4NXVpDxt94VlvO5S+ijFmyd2DyN2G/MCF9yJg
+IQcad+vVtt6BdXbmFW+lD4eVFtXbX+eKrDPVKLMYCaWyTZkw3aCachSprjJabX0l
+ph4ogML8iOVQiODobKzI+S4EXRMe5KDD9VXAVbN+1jOyTdnU7WYqSWI3rh7BGBwO
+ihwBOHOjI+dkr0awBTmDKMXWaLeUYiDfXqeoVxNtXJ7SptPJcfkd47XpR9Tgw6yU
+oxYMHLMrYYAC6qFMxjWbJz029FJxBvRJCmynQPCd7p0tmPL0qteqGymckjGUv8ko
+TdJcHjdc2+UyeQ==
+=TUhq
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.sha512
==
--- dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.sha512 (added)
+++ dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.sha512 Mon Feb 20 01:01:42 2023
@@ -0,0 +1 @@
+38b2b86698d182620785b8f34d6f9a35e0a7f2ae2208e999cece2928ff66d50e75c621ce35189610d830f2475c2c134c3be5d4460050da65da23523d88707ceb
  SparkR_3.4.0.tar.gz

Added: dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.asc
==
--- dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.asc (added)
+++ dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.asc Mon Feb 20 01:01:42 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEDDPTXhqSlrMs8xAFrNhPIJMLR+gFAmPxf5wTHHhpbnJvbmdA
+YXBhY2hlLm9yZwAKCRCs2E8gkwtH6F2ZEACP0qBBbAv0z3lbq2Hvn3jeWyZVWbBy
+BVWvfadOOKqKeC9VAgdfY6t6WT8yti0g5Ax+WqmgWHHLgjOKRECTWdlaSqD5m9bh
+ALNphiKafoQjneqkwegNuN4uWNikGQzmCGqJLQG7bGy+9NoO2ib/pN6an4bmIxtb
+uqdglfB7bC+MXB4YKdqyW5LfE1gi3diSXngBdU0p0nBqsDiUcC+gCZPIt8z5AN8i
+c9rNoFrEEZ3jb14335AtkIufP6ebK2YT/1NF/FdirNB1hgtAfIRREi7jzptAuHYt
+jDvuNxo6O2+G80ExbK0z7Ab3Qv3seSzLJYaIalRSAIn+NqH60g9PRv1/80FYLVUv
+VYKKf4Y+KqGn4/rwaxWiUL1ggkbcbay1cpbJWxMc1ARKO1uUaTwjgEPoNEIXg0uU
+VYsQwfS61Tp+wkRLFQ/2yXp5S4kOgI+gyOpe2QVXioJvtgUc3CWCWBOsRvPUOLQt
+wv91pnqu+m7YcUfOmosJvtQudBCT/STz1fnMCug0YygWMj6u5QhTXpbj+UycOVkq
+Q0TvFe+kDsptQWKX2uHlYOvBA8CfzVDeauoDTvEOwx4lxPB1C6GZ1LrD/RTk5SEh
+5r8Wotul5JdbCxHpynqcDruGXBZv2SOa7ChF8q8S6CdrSxLdWWPekt0Q0zzg63cJ
+n4x/dQdcXBDaXA==
+=O8hd
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.sha512
==
--- dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.sha512 (added)
+++ dev/spark

svn commit: r60202 - /dev/spark/v3.4.0-rc1-bin/

2023-02-19 Thread xinrong
Author: xinrong
Date: Mon Feb 20 00:49:21 2023
New Revision: 60202

Log:
Removing RC artifacts.

Removed:
dev/spark/v3.4.0-rc1-bin/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated (2b54f076794 -> fdbc57aaf43)

2023-02-18 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


from 2b54f076794 [SPARK-42430][DOC][FOLLOW-UP] Revise the java doc for 
TimestampNTZ & ANSI interval types
 add 96cff939031 Preparing Spark release v3.4.0-rc1
 new fdbc57aaf43 Preparing development version 3.4.1-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing development version 3.4.1-SNAPSHOT

2023-02-18 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git

commit fdbc57aaf431745ced4a1bea4057553e0c939d32
Author: Xinrong Meng 
AuthorDate: Sat Feb 18 12:12:49 2023 +

Preparing development version 3.4.1-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 4a32762b34c..fa7028630a8 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.0
+Version: 3.4.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 58dd9ef46e0..a4111eb64d9 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 95ea15552da..f9ecfb3d692 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index e4d98471bf9..22ee65b7d25 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 7a6d5aedf65..2c67da81ca4 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 1c421754083..219682e047d 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch

[spark] 01/01: Preparing Spark release v3.4.0-rc1

2023-02-18 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to tag v3.4.0-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 96cff93903153a3bcdca02d346daa9d65614d00a
Author: Xinrong Meng 
AuthorDate: Sat Feb 18 12:11:25 2023 +

Preparing Spark release v3.4.0-rc1
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index fa7028630a8..4a32762b34c 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.1
+Version: 3.4.0
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index a4111eb64d9..58dd9ef46e0 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index f9ecfb3d692..95ea15552da 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 22ee65b7d25..e4d98471bf9 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 2c67da81ca4..7a6d5aedf65 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 219682e047d..1c421754083 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
inde

  1   2   >