(spark) branch master updated: [SPARK-49999][PYTHON][CONNECT] Support optional "column" parameter in box, kde and hist plots
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 03e051b5e400 [SPARK-4][PYTHON][CONNECT] Support optional "column" parameter in box, kde and hist plots 03e051b5e400 is described below commit 03e051b5e4007b21e12b188ae5e940706c1da7dc Author: Xinrong Meng AuthorDate: Mon Oct 28 10:01:53 2024 +0800 [SPARK-4][PYTHON][CONNECT] Support optional "column" parameter in box, kde and hist plots ### What changes were proposed in this pull request? Support for the optional “column” parameter has been added in box, kde, and hist plots. Now, when the column is not provided, all columns of valid types (NumericType, DateType, TimestampType) will be used to build the plots. ### Why are the changes needed? - Reach parity with Pandas (on Spark) default behavior. - Simplify usage by reducing the need for explicitly specifying columns. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48628 from xinrong-meng/column_param. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- python/pyspark/errors/error-conditions.json| 5 ++ python/pyspark/sql/plot/core.py| 30 + python/pyspark/sql/plot/plotly.py | 54 +++- .../sql/tests/plot/test_frame_plot_plotly.py | 72 +- 4 files changed, 121 insertions(+), 40 deletions(-) diff --git a/python/pyspark/errors/error-conditions.json b/python/pyspark/errors/error-conditions.json index ae9fbccceb3e..5aa0313631c0 100644 --- a/python/pyspark/errors/error-conditions.json +++ b/python/pyspark/errors/error-conditions.json @@ -816,6 +816,11 @@ "message": [ "Pipe function `` exited with error code ." ] + }, +"PLOT_INVALID_TYPE_COLUMN": { +"message": [ + "Column must be one of for plotting, got ." +] }, "PLOT_NOT_NUMERIC_COLUMN": { "message": [ diff --git a/python/pyspark/sql/plot/core.py b/python/pyspark/sql/plot/core.py index 158d9130560a..328ebe348878 100644 --- a/python/pyspark/sql/plot/core.py +++ b/python/pyspark/sql/plot/core.py @@ -360,7 +360,7 @@ class PySparkPlotAccessor: ) return self(kind="pie", x=x, y=y, **kwargs) -def box(self, column: Union[str, List[str]], **kwargs: Any) -> "Figure": +def box(self, column: Optional[Union[str, List[str]]] = None, **kwargs: Any) -> "Figure": """ Make a box plot of the DataFrame columns. @@ -374,8 +374,9 @@ class PySparkPlotAccessor: Parameters -- -column: str or list of str -Column name or list of names to be used for creating the boxplot. +column: str or list of str, optional +Column name or list of names to be used for creating the box plot. +If None (default), all numeric columns will be used. **kwargs Extra arguments to `precision`: refer to a float that is used by pyspark to compute approximate statistics for building a boxplot. @@ -399,6 +400,7 @@ class PySparkPlotAccessor: ... ] >>> columns = ["student", "math_score", "english_score"] >>> df = spark.createDataFrame(data, columns) +>>> df.plot.box() # doctest: +SKIP >>> df.plot.box(column="math_score") # doctest: +SKIP >>> df.plot.box(column=["math_score", "english_score"]) # doctest: +SKIP """ @@ -406,9 +408,9 @@ class PySparkPlotAccessor: def kde( self, -column: Union[str, List[str]], bw_method: Union[int, float], -ind: Union["np.ndarray", int, None] = None, +column: Optional[Union[str, List[str]]] = None, +ind: Optional[Union["np.ndarray", int]] = None, **kwargs: Any, ) -> "Figure": """ @@ -420,11 +422,12 @@ class PySparkPlotAccessor: Parameters -- -column: str or list of str -Column name or list of names to be used for creating the kde plot. bw_method : int or float The method used to calculate the estimator bandwidth. See KernelDensity in PySpark for more information. +column: str or list of str, optional +Column name or list of names to be used for creating the kde plot. +If None (def
(spark) branch master updated: [SPARK-50001][PYTHON][PS][CONNECT] Adjust "precision" to be part of kwargs for box plots
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d2e322314c78 [SPARK-50001][PYTHON][PS][CONNECT] Adjust "precision" to be part of kwargs for box plots d2e322314c78 is described below commit d2e322314c786b892f4d8b37f383fae8e8827ca9 Author: Xinrong Meng AuthorDate: Mon Oct 21 11:57:30 2024 +0800 [SPARK-50001][PYTHON][PS][CONNECT] Adjust "precision" to be part of kwargs for box plots ### What changes were proposed in this pull request? Adjust "precision" to be kwargs for box plots in both Pandas on Spark and PySpark. ### Why are the changes needed? Per discussion here (https://github.com/apache/spark/pull/48445#discussion_r1804042377), precision is Spark-specific implementation detail, so we wanted to keep “precision” as part of kwargs for box plots. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48513 from xinrong-meng/precision. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- python/pyspark/pandas/plot/core.py | 15 +++ python/pyspark/sql/plot/core.py| 13 + 2 files changed, 12 insertions(+), 16 deletions(-) diff --git a/python/pyspark/pandas/plot/core.py b/python/pyspark/pandas/plot/core.py index 12c17a06f153..f5652177fe4a 100644 --- a/python/pyspark/pandas/plot/core.py +++ b/python/pyspark/pandas/plot/core.py @@ -841,7 +841,7 @@ class PandasOnSparkPlotAccessor(PandasObject): elif isinstance(self.data, DataFrame): return self(kind="barh", x=x, y=y, **kwargs) -def box(self, precision=0.01, **kwds): +def box(self, **kwds): """ Make a box plot of the DataFrame columns. @@ -857,12 +857,11 @@ class PandasOnSparkPlotAccessor(PandasObject): Parameters -- -precision: scalar, default = 0.01 -This argument is used by pandas-on-Spark to compute approximate statistics -for building a boxplot. Use *smaller* values to get more precise -statistics. -**kwds : optional -Additional keyword arguments are documented in +**kwds : dict, optional +Extra arguments to `precision`: refer to a float that is used by +pandas-on-Spark to compute approximate statistics for building a +boxplot. The default value is 0.01. Use smaller values to get more +precise statistics. Additional keyword arguments are documented in :meth:`pyspark.pandas.Series.plot`. Returns @@ -901,7 +900,7 @@ class PandasOnSparkPlotAccessor(PandasObject): from pyspark.pandas import DataFrame, Series if isinstance(self.data, (Series, DataFrame)): -return self(kind="box", precision=precision, **kwds) +return self(kind="box", **kwds) def hist(self, bins=10, **kwds): """ diff --git a/python/pyspark/sql/plot/core.py b/python/pyspark/sql/plot/core.py index f44c0768d433..178411e5c5ef 100644 --- a/python/pyspark/sql/plot/core.py +++ b/python/pyspark/sql/plot/core.py @@ -359,9 +359,7 @@ class PySparkPlotAccessor: ) return self(kind="pie", x=x, y=y, **kwargs) -def box( -self, column: Union[str, List[str]], precision: float = 0.01, **kwargs: Any -) -> "Figure": +def box(self, column: Union[str, List[str]], **kwargs: Any) -> "Figure": """ Make a box plot of the DataFrame columns. @@ -377,11 +375,10 @@ class PySparkPlotAccessor: -- column: str or list of str Column name or list of names to be used for creating the boxplot. -precision: float, default = 0.01 -This argument is used by pyspark to compute approximate statistics -for building a boxplot. **kwargs -Additional keyword arguments. +Extra arguments to `precision`: refer to a float that is used by +pyspark to compute approximate statistics for building a boxplot. +The default value is 0.01. Use smaller values to get more precise statistics. Returns --- @@ -404,7 +401,7 @@ class PySparkPlotAccessor: >>> df.plot.box(column="math_score") # doctest: +SKIP >>> df.plot.box(column=["math_score", "english_score"]) # doctest: +SKIP """ -return self(kind="box", column=column, precision=precision, **kwargs) +
(spark) branch master updated: [SPARK-49948][PS][CONNECT] Add parameter "precision" to pandas on Spark box plot
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 861b5e98e6e4 [SPARK-49948][PS][CONNECT] Add parameter "precision" to pandas on Spark box plot 861b5e98e6e4 is described below commit 861b5e98e6e4f61e376d756f085e0290e01fc8f4 Author: Xinrong Meng AuthorDate: Wed Oct 16 08:49:10 2024 +0800 [SPARK-49948][PS][CONNECT] Add parameter "precision" to pandas on Spark box plot ### What changes were proposed in this pull request? Add parameter "precision" to pandas on Spark box plot. ### Why are the changes needed? Previously, the box method used **kwds, allowing precision to be passed implicitly. Now, adding precision directly to the signature ensures clarity and explicit control, improving usability. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48445 from xinrong-meng/ps_box. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- python/pyspark/pandas/plot/core.py | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/python/pyspark/pandas/plot/core.py b/python/pyspark/pandas/plot/core.py index 7333fae1ad43..12c17a06f153 100644 --- a/python/pyspark/pandas/plot/core.py +++ b/python/pyspark/pandas/plot/core.py @@ -841,7 +841,7 @@ class PandasOnSparkPlotAccessor(PandasObject): elif isinstance(self.data, DataFrame): return self(kind="barh", x=x, y=y, **kwargs) -def box(self, **kwds): +def box(self, precision=0.01, **kwds): """ Make a box plot of the DataFrame columns. @@ -857,14 +857,13 @@ class PandasOnSparkPlotAccessor(PandasObject): Parameters -- -**kwds : optional -Additional keyword arguments are documented in -:meth:`pyspark.pandas.Series.plot`. - precision: scalar, default = 0.01 This argument is used by pandas-on-Spark to compute approximate statistics for building a boxplot. Use *smaller* values to get more precise -statistics (matplotlib-only). +statistics. +**kwds : optional +Additional keyword arguments are documented in +:meth:`pyspark.pandas.Series.plot`. Returns --- @@ -902,7 +901,7 @@ class PandasOnSparkPlotAccessor(PandasObject): from pyspark.pandas import DataFrame, Series if isinstance(self.data, (Series, DataFrame)): -return self(kind="box", **kwds) +return self(kind="box", precision=precision, **kwds) def hist(self, bins=10, **kwds): """ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49929][PYTHON][CONNECT] Support box plots
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 488f68090b22 [SPARK-49929][PYTHON][CONNECT] Support box plots 488f68090b22 is described below commit 488f68090b228b30ba4a3b75596c9904eef1f584 Author: Xinrong Meng AuthorDate: Tue Oct 15 08:31:33 2024 +0800 [SPARK-49929][PYTHON][CONNECT] Support box plots ### What changes were proposed in this pull request? Support box plots with plotly backend on both Spark Connect and Spark classic. ### Why are the changes needed? While Pandas on Spark supports plotting, PySpark currently lacks this feature. The proposed API will enable users to generate visualizations. This will provide users with an intuitive, interactive way to explore and understand large datasets directly from PySpark DataFrames, streamlining the data analysis workflow in distributed environments. See more at [PySpark Plotting API Specification](https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit?usp=sharing) in progress. Part of https://issues.apache.org/jira/browse/SPARK-49530. ### Does this PR introduce _any_ user-facing change? Yes. Box plots are supported as shown below. ```py >>> data = [ ... ("A", 50, 55), ... ("B", 55, 60), ... ("C", 60, 65), ... ("D", 65, 70), ... ("E", 70, 75), ... # outliers ... ("F", 10, 15), ... ("G", 85, 90), ... ("H", 5, 150), ... ] >>> columns = ["student", "math_score", "english_score"] >>> sdf = spark.createDataFrame(data, columns) >>> fig1 = sdf.plot.box(column=["math_score", "english_score"]) >>> fig1.show() # see below >>> fig2 = sdf.plot(kind="box", column="math_score") >>> fig2.show() # see below ``` fig1: ![newplot (17)](https://github.com/user-attachments/assets/8c36c344-f6de-47e3-bd63-c0f3b57efc43) fig2: ![newplot (18)](https://github.com/user-attachments/assets/9b7b60f6-58ec-4eff-9544-d5ab88a88631) ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48447 from xinrong-meng/box. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- python/pyspark/errors/error-conditions.json| 5 + python/pyspark/sql/plot/core.py| 153 - python/pyspark/sql/plot/plotly.py | 77 ++- .../sql/tests/plot/test_frame_plot_plotly.py | 77 ++- 4 files changed, 307 insertions(+), 5 deletions(-) diff --git a/python/pyspark/errors/error-conditions.json b/python/pyspark/errors/error-conditions.json index 6ca21d5d..ab01d386645b 100644 --- a/python/pyspark/errors/error-conditions.json +++ b/python/pyspark/errors/error-conditions.json @@ -1103,6 +1103,11 @@ "`` is not supported, it should be one of the values from " ] }, + "UNSUPPORTED_PLOT_BACKEND_PARAM": { +"message": [ + "`` does not support `` set to , it should be one of the values from " +] + }, "UNSUPPORTED_SIGNATURE": { "message": [ "Unsupported signature: ." diff --git a/python/pyspark/sql/plot/core.py b/python/pyspark/sql/plot/core.py index f9667ee2c0d6..4bf75474d92c 100644 --- a/python/pyspark/sql/plot/core.py +++ b/python/pyspark/sql/plot/core.py @@ -15,15 +15,17 @@ # limitations under the License. # -from typing import Any, TYPE_CHECKING, Optional, Union +from typing import Any, TYPE_CHECKING, List, Optional, Union from types import ModuleType from pyspark.errors import PySparkRuntimeError, PySparkTypeError, PySparkValueError +from pyspark.sql import Column, functions as F from pyspark.sql.types import NumericType -from pyspark.sql.utils import require_minimum_plotly_version +from pyspark.sql.utils import is_remote, require_minimum_plotly_version if TYPE_CHECKING: -from pyspark.sql import DataFrame +from pyspark.sql import DataFrame, Row +from pyspark.sql._typing import ColumnOrName import pandas as pd from plotly.graph_objs import Figure @@ -338,3 +340,148 @@ class PySparkPlotAccessor: }, ) return self(kind="pie", x=x, y=y, **kwargs) + +def box( +self, column: Union[str, List[str]], precision: float = 0.01, **kwargs: Any +) -> "Figure": +
(spark) branch master updated (1abfd490d072 -> 1aae16089601)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 1abfd490d072 [SPARK-49943][PS] Remove `timestamp_ntz_to_long` from `PythonSQLUtils` add 1aae16089601 [SPARK-49928][PYTHON][TESTS] Refactor plot-related unit tests No new revisions were added by this update. Summary of changes: .../sql/tests/plot/test_frame_plot_plotly.py | 242 - 1 file changed, 192 insertions(+), 50 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49776][PYTHON][CONNECT] Support pie plots
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 488c3f604490 [SPARK-49776][PYTHON][CONNECT] Support pie plots 488c3f604490 is described below commit 488c3f604490c8632dde67a00118d49ccfcbf578 Author: Xinrong Meng AuthorDate: Fri Sep 27 08:35:10 2024 +0800 [SPARK-49776][PYTHON][CONNECT] Support pie plots ### What changes were proposed in this pull request? Support area plots with plotly backend on both Spark Connect and Spark classic. ### Why are the changes needed? While Pandas on Spark supports plotting, PySpark currently lacks this feature. The proposed API will enable users to generate visualizations. This will provide users with an intuitive, interactive way to explore and understand large datasets directly from PySpark DataFrames, streamlining the data analysis workflow in distributed environments. See more at [PySpark Plotting API Specification](https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit?usp=sharing) in progress. Part of https://issues.apache.org/jira/browse/SPARK-49530. ### Does this PR introduce _any_ user-facing change? Yes. Area plots are supported as shown below. ```py >>> from datetime import datetime >>> data = [ ... (3, 5, 20, datetime(2018, 1, 31)), ... (2, 5, 42, datetime(2018, 2, 28)), ... (3, 6, 28, datetime(2018, 3, 31)), ... (9, 12, 62, datetime(2018, 4, 30))] >>> columns = ["sales", "signups", "visits", "date"] >>> df = spark.createDataFrame(data, columns) >>> fig = df.plot(kind="pie", x="date", y="sales") # df.plot(kind="pie", x="date", y="sales") >>> fig.show() ``` ![newplot (8)](https://github.com/user-attachments/assets/c4078bb7-4d84-4607-bcd7-bdd6fbbf8e28) ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48256 from xinrong-meng/plot_pie. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- python/pyspark/errors/error-conditions.json| 5 +++ python/pyspark/sql/plot/core.py| 41 +- python/pyspark/sql/plot/plotly.py | 15 .../sql/tests/plot/test_frame_plot_plotly.py | 25 + 4 files changed, 85 insertions(+), 1 deletion(-) diff --git a/python/pyspark/errors/error-conditions.json b/python/pyspark/errors/error-conditions.json index 115ad658e32f..ed62ea117d36 100644 --- a/python/pyspark/errors/error-conditions.json +++ b/python/pyspark/errors/error-conditions.json @@ -812,6 +812,11 @@ "Pipe function `` exited with error code ." ] }, + "PLOT_NOT_NUMERIC_COLUMN": { +"message": [ + "Argument must be a numerical column for plotting, got ." +] + }, "PYTHON_HASH_SEED_NOT_SET": { "message": [ "Randomness of hash of string should be disabled via PYTHONHASHSEED." diff --git a/python/pyspark/sql/plot/core.py b/python/pyspark/sql/plot/core.py index 9f83d0069652..f9667ee2c0d6 100644 --- a/python/pyspark/sql/plot/core.py +++ b/python/pyspark/sql/plot/core.py @@ -17,7 +17,8 @@ from typing import Any, TYPE_CHECKING, Optional, Union from types import ModuleType -from pyspark.errors import PySparkRuntimeError, PySparkValueError +from pyspark.errors import PySparkRuntimeError, PySparkTypeError, PySparkValueError +from pyspark.sql.types import NumericType from pyspark.sql.utils import require_minimum_plotly_version @@ -97,6 +98,7 @@ class PySparkPlotAccessor: "bar": PySparkTopNPlotBase().get_top_n, "barh": PySparkTopNPlotBase().get_top_n, "line": PySparkSampledPlotBase().get_sampled, +"pie": PySparkTopNPlotBase().get_top_n, "scatter": PySparkSampledPlotBase().get_sampled, } _backends = {} # type: ignore[var-annotated] @@ -299,3 +301,40 @@ class PySparkPlotAccessor: >>> df.plot.area(x='date', y=['sales', 'signups', 'visits']) # doctest: +SKIP """ return self(kind="area", x=x, y=y, **kwargs) + +def pie(self, x: str, y: str, **kwargs: Any) -> "Figure": +""" +Generate a pie plot. + +A pie plot is a proportional representation of the numerical data in a +column. + +Parameters +-- +x : str +Name of column to be u
(spark) branch master updated: [SPARK-49694][PYTHON][CONNECT] Support scatter plots
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6bdd151d5775 [SPARK-49694][PYTHON][CONNECT] Support scatter plots 6bdd151d5775 is described below commit 6bdd151d57759d73870f20780fc54ab2aa250409 Author: Xinrong Meng AuthorDate: Tue Sep 24 15:40:38 2024 +0800 [SPARK-49694][PYTHON][CONNECT] Support scatter plots ### What changes were proposed in this pull request? Support scatter plots with plotly backend on both Spark Connect and Spark classic. ### Why are the changes needed? While Pandas on Spark supports plotting, PySpark currently lacks this feature. The proposed API will enable users to generate visualizations. This will provide users with an intuitive, interactive way to explore and understand large datasets directly from PySpark DataFrames, streamlining the data analysis workflow in distributed environments. See more at [PySpark Plotting API Specification](https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit?usp=sharing) in progress. Part of https://issues.apache.org/jira/browse/SPARK-49530. ### Does this PR introduce _any_ user-facing change? Yes. Scatter plots are supported as shown below. ```py >>> data = [(5.1, 3.5, 0), (4.9, 3.0, 0), (7.0, 3.2, 1), (6.4, 3.2, 1), (5.9, 3.0, 2)] >>> columns = ["length", "width", "species"] >>> sdf = spark.createDataFrame(data, columns) >>> fig = sdf.plot(kind="scatter", x="length", y="width") # or fig = sdf.plot.scatter(x="length", y="width") >>> fig.show() ``` ![newplot (6)](https://github.com/user-attachments/assets/deef452b-74d1-4f6d-b1ae-60722f3c2b17) ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48219 from xinrong-meng/plot_scatter. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- python/pyspark/sql/plot/core.py| 34 ++ .../sql/tests/plot/test_frame_plot_plotly.py | 19 2 files changed, 53 insertions(+) diff --git a/python/pyspark/sql/plot/core.py b/python/pyspark/sql/plot/core.py index eb00b8a04f97..0a3a0101e189 100644 --- a/python/pyspark/sql/plot/core.py +++ b/python/pyspark/sql/plot/core.py @@ -96,6 +96,7 @@ class PySparkPlotAccessor: "bar": PySparkTopNPlotBase().get_top_n, "barh": PySparkTopNPlotBase().get_top_n, "line": PySparkSampledPlotBase().get_sampled, +"scatter": PySparkSampledPlotBase().get_sampled, } _backends = {} # type: ignore[var-annotated] @@ -230,3 +231,36 @@ class PySparkPlotAccessor: ... ) # doctest: +SKIP """ return self(kind="barh", x=x, y=y, **kwargs) + +def scatter(self, x: str, y: str, **kwargs: Any) -> "Figure": +""" +Create a scatter plot with varying marker point size and color. + +The coordinates of each point are defined by two dataframe columns and +filled circles are used to represent each point. This kind of plot is +useful to see complex correlations between two variables. Points could +be for instance natural 2D coordinates like longitude and latitude in +a map or, in general, any pair of metrics that can be plotted against +each other. + +Parameters +-- +x : str +Name of column to use as horizontal coordinates for each point. +y : str or list of str +Name of column to use as vertical coordinates for each point. +**kwargs: Optional +Additional keyword arguments. + +Returns +--- +:class:`plotly.graph_objs.Figure` + +Examples + +>>> data = [(5.1, 3.5, 0), (4.9, 3.0, 0), (7.0, 3.2, 1), (6.4, 3.2, 1), (5.9, 3.0, 2)] +>>> columns = ['length', 'width', 'species'] +>>> df = spark.createDataFrame(data, columns) +>>> df.plot.scatter(x='length', y='width') # doctest: +SKIP +""" +return self(kind="scatter", x=x, y=y, **kwargs) diff --git a/python/pyspark/sql/tests/plot/test_frame_plot_plotly.py b/python/pyspark/sql/tests/plot/test_frame_plot_plotly.py index 1c52c93a23d3..ccfe1a75424e 100644 --- a/python/pyspark/sql/tests/plot/test_frame_plot_plotly.py +++ b/python/pyspark/sql/tests/plot/test_frame_plot_plotly.py @@ -28,6 +28,12 @@ class DataFramePlo
(spark) branch master updated: [SPARK-49626][PYTHON][CONNECT] Support horizontal and vertical bar plots
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 44ec70f5103f [SPARK-49626][PYTHON][CONNECT] Support horizontal and vertical bar plots 44ec70f5103f is described below commit 44ec70f5103fc5674497373ac5c23e8145ae5660 Author: Xinrong Meng AuthorDate: Mon Sep 23 18:28:19 2024 +0800 [SPARK-49626][PYTHON][CONNECT] Support horizontal and vertical bar plots ### What changes were proposed in this pull request? Support horizontal and vertical bar plots with plotly backend on both Spark Connect and Spark classic. ### Why are the changes needed? While Pandas on Spark supports plotting, PySpark currently lacks this feature. The proposed API will enable users to generate visualizations. This will provide users with an intuitive, interactive way to explore and understand large datasets directly from PySpark DataFrames, streamlining the data analysis workflow in distributed environments. See more at [PySpark Plotting API Specification](https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit?usp=sharing) in progress. Part of https://issues.apache.org/jira/browse/SPARK-49530. ### Does this PR introduce _any_ user-facing change? Yes. ```python >>> data = [("A", 10, 1.5), ("B", 30, 2.5), ("C", 20, 3.5)] >>> columns = ["category", "int_val", "float_val"] >>> sdf = spark.createDataFrame(data, columns) >>> sdf.show() ++---+-+ |category|int_val|float_val| ++---+-+ | A| 10| 1.5| | B| 30| 2.5| | C| 20| 3.5| ++---+-+ >>> f = sdf.plot(kind="bar", x="category", y=["int_val", "float_val"]) >>> f.show() # see below >>> g = sdf.plot.barh(x=["int_val", "float_val"], y="category") >>> g.show() # see below ``` `f.show()`: ![newplot (4)](https://github.com/user-attachments/assets/0df9ee86-fb48-4796-b6c3-aaf2879217aa) `g.show()`: ![newplot (3)](https://github.com/user-attachments/assets/f39b01c3-66e6-464b-b2e8-badebb39bc67) ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48100 from xinrong-meng/plot_bar. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- python/pyspark/sql/plot/core.py| 79 ++ .../sql/tests/plot/test_frame_plot_plotly.py | 44 ++-- 2 files changed, 117 insertions(+), 6 deletions(-) diff --git a/python/pyspark/sql/plot/core.py b/python/pyspark/sql/plot/core.py index 392ef73b3884..ed22d02370ca 100644 --- a/python/pyspark/sql/plot/core.py +++ b/python/pyspark/sql/plot/core.py @@ -75,6 +75,8 @@ class PySparkSampledPlotBase: class PySparkPlotAccessor: plot_data_map = { +"bar": PySparkTopNPlotBase().get_top_n, +"barh": PySparkTopNPlotBase().get_top_n, "line": PySparkSampledPlotBase().get_sampled, } _backends = {} # type: ignore[var-annotated] @@ -133,3 +135,80 @@ class PySparkPlotAccessor: >>> df.plot.line(x="category", y=["int_val", "float_val"]) # doctest: +SKIP """ return self(kind="line", x=x, y=y, **kwargs) + +def bar(self, x: str, y: Union[str, list[str]], **kwargs: Any) -> "Figure": +""" +Vertical bar plot. + +A bar plot is a plot that presents categorical data with rectangular bars with lengths +proportional to the values that they represent. A bar plot shows comparisons among +discrete categories. One axis of the plot shows the specific categories being compared, +and the other axis represents a measured value. + +Parameters +-- +x : str +Name of column to use for the horizontal axis. +y : str or list of str +Name(s) of the column(s) to use for the vertical axis. +Multiple columns can be plotted. +**kwargs : optional +Additional keyword arguments. + +Returns +--- +:class:`plotly.graph_objs.Figure` + +Examples + +>>> data = [("A", 10, 1.5), ("B", 30, 2.5), ("C", 20, 3.5)] +>>> columns = ["category", "int_val", "float_val"] +>>>
(spark) branch master updated: [SPARK-47864][FOLLOWUP][PYTHON][DOCS] Fix minor typo: "MLLib" -> "MLlib"
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e50737be366a [SPARK-47864][FOLLOWUP][PYTHON][DOCS] Fix minor typo: "MLLib" -> "MLlib" e50737be366a is described below commit e50737be366ac0e8d5466b714f7d41991d0b05a8 Author: Haejoon Lee AuthorDate: Tue Apr 23 10:10:20 2024 -0700 [SPARK-47864][FOLLOWUP][PYTHON][DOCS] Fix minor typo: "MLLib" -> "MLlib" ### What changes were proposed in this pull request? This PR followups for https://github.com/apache/spark/pull/46096 to fix minor typo. ### Why are the changes needed? To use official naming from documentation for `MLlib` instead of `MLLib`. See https://spark.apache.org/mllib/. ### Does this PR introduce _any_ user-facing change? No API change, but the user-facing documentation will be updated. ### How was this patch tested? Manually built the doc from local test envs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46174 from itholic/minor_typo_installation. Authored-by: Haejoon Lee Signed-off-by: Xinrong Meng --- python/docs/source/getting_started/install.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/python/docs/source/getting_started/install.rst b/python/docs/source/getting_started/install.rst index 33a0560764df..ee894981387a 100644 --- a/python/docs/source/getting_started/install.rst +++ b/python/docs/source/getting_started/install.rst @@ -244,7 +244,7 @@ Additional libraries that enhance functionality but are not included in the inst - **matplotlib**: Provide plotting for visualization. The default is **plotly**. -MLLib DataFrame-based API +MLlib DataFrame-based API ^ Installable with ``pip install "pyspark[ml]"``. @@ -252,7 +252,7 @@ Installable with ``pip install "pyspark[ml]"``. === = == Package Supported version Note === = == -`numpy` >=1.21Required for MLLib DataFrame-based API +`numpy` >=1.21Required for MLlib DataFrame-based API === = == Additional libraries that enhance functionality but are not included in the installation packages: @@ -272,5 +272,5 @@ Installable with ``pip install "pyspark[mllib]"``. === = == Package Supported version Note === = == -`numpy` >=1.21Required for MLLib +`numpy` >=1.21Required for MLlib === = == - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (f9ebe1b3d24b -> 6c827c10dc15)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from f9ebe1b3d24b [SPARK-46375][DOCS] Add user guide for Python data source API add 6c827c10dc15 [SPARK-47876][PYTHON][DOCS] Improve docstring of mapInArrow No new revisions were added by this update. Summary of changes: python/pyspark/sql/pandas/map_ops.py | 19 +-- 1 file changed, 9 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 501999a834ea [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling 501999a834ea is described below commit 501999a834ea7761a792b823c543e40fba84231d Author: Xinrong Meng AuthorDate: Thu Mar 7 13:20:39 2024 -0800 [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling ### What changes were proposed in this pull request? Introduce `spark.profile.clear` for SparkSession-based profiling. ### Why are the changes needed? A straightforward and unified interface for managing and resetting profiling results for SparkSession-based profilers. ### Does this PR introduce _any_ user-facing change? Yes. `spark.profile.clear` is supported as shown below. Preparation: ```py >>> from pyspark.sql.functions import pandas_udf >>> df = spark.range(3) >>> pandas_udf("long") ... def add1(x): ... return x + 1 ... >>> added = df.select(add1("id")) >>> spark.conf.set("spark.sql.pyspark.udf.profiler", "perf") >>> added.show() ++ |add1(id)| ++ ... ++ >>> spark.profile.show() Profile of UDF 1410 function calls (1374 primitive calls) in 0.004 seconds ... ``` Example usage: ```py >>> spark.profile.profiler_collector._profile_results {2: (, None)} >>> spark.profile.clear(1) # id mismatch >>> spark.profile.profiler_collector._profile_results {2: (, None)} >>> spark.profile.clear(type="memory") # type mismatch >>> spark.profile.profiler_collector._profile_results {2: (, None)} >>> spark.profile.clear() # clear all >>> spark.profile.profiler_collector._profile_results {} >>> spark.profile.show() >>> ``` ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45378 from xinrong-meng/profile_clear. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- python/pyspark/sql/profiler.py| 79 +++ python/pyspark/sql/tests/test_session.py | 27 + python/pyspark/sql/tests/test_udf_profiler.py | 26 + python/pyspark/tests/test_memory_profiler.py | 59 4 files changed, 191 insertions(+) diff --git a/python/pyspark/sql/profiler.py b/python/pyspark/sql/profiler.py index 5ab27bce2582..711e39de4723 100644 --- a/python/pyspark/sql/profiler.py +++ b/python/pyspark/sql/profiler.py @@ -224,6 +224,56 @@ class ProfilerCollector(ABC): for id in sorted(code_map.keys()): dump(id) +def clear_perf_profiles(self, id: Optional[int] = None) -> None: +""" +Clear the perf profile results. + +.. versionadded:: 4.0.0 + +Parameters +-- +id : int, optional +The UDF ID whose profiling results should be cleared. +If not specified, all the results will be cleared. +""" +with self._lock: +if id is not None: +if id in self._profile_results: +perf, mem, *_ = self._profile_results[id] +self._profile_results[id] = (None, mem, *_) +if mem is None: +self._profile_results.pop(id, None) +else: +for id, (perf, mem, *_) in list(self._profile_results.items()): +self._profile_results[id] = (None, mem, *_) +if mem is None: +self._profile_results.pop(id, None) + +def clear_memory_profiles(self, id: Optional[int] = None) -> None: +""" +Clear the memory profile results. + +.. versionadded:: 4.0.0 + +Parameters +-- +id : int, optional +The UDF ID whose profiling results should be cleared. +If not specified, all the results will be cleared. +""" +with self._lock: +if id is not None: +if id in self._profile_results: +perf, mem, *_ = self._profile_results[id] +self._profile_results[id] = (perf, None, *_) +if perf is N
(spark) branch master updated (06c741a0061b -> d20650bc8cf2)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 06c741a0061b [SPARK-47129][CONNECT][SQL] Make `ResolveRelations` cache connect plan properly add d20650bc8cf2 [SPARK-46975][PS] Support dedicated fallback methods No new revisions were added by this update. Summary of changes: python/pyspark/pandas/frame.py | 49 +++--- 1 file changed, 36 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (6de527e9ee94 -> 6185e5cad7be)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 6de527e9ee94 [SPARK-43259][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_2024 add 6185e5cad7be [SPARK-47132][DOCS][PYTHON] Correct docstring for pyspark's dataframe.head No new revisions were added by this update. Summary of changes: python/pyspark/sql/dataframe.py | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47014][PYTHON][CONNECT] Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4b9e9d7a9b7c [SPARK-47014][PYTHON][CONNECT] Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession 4b9e9d7a9b7c is described below commit 4b9e9d7a9b7c1b21c7d04cdf0095cc069a35b757 Author: Xinrong Meng AuthorDate: Wed Feb 14 10:37:33 2024 -0800 [SPARK-47014][PYTHON][CONNECT] Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession ### What changes were proposed in this pull request? Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession ### Why are the changes needed? Complete support of (v2) SparkSession-based profiling. ### Does this PR introduce _any_ user-facing change? Yes. dumpPerfProfiles and dumpMemoryProfiles of SparkSession are supported. An example of dumpPerfProfiles is shown below. ```py >>> udf("long") ... def add(x): ... return x + 1 ... >>> spark.conf.set("spark.sql.pyspark.udf.profiler", "perf") >>> spark.range(10).select(add("id")).collect() ... >>> spark.dumpPerfProfiles("dummy_dir") >>> os.listdir("dummy_dir") ['udf_2.pstats'] ``` ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45073 from xinrong-meng/dump_profile. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- python/pyspark/sql/connect/session.py | 10 + python/pyspark/sql/profiler.py| 65 +++ python/pyspark/sql/session.py | 10 + python/pyspark/sql/tests/test_udf_profiler.py | 20 + python/pyspark/tests/test_memory_profiler.py | 22 + 5 files changed, 110 insertions(+), 17 deletions(-) diff --git a/python/pyspark/sql/connect/session.py b/python/pyspark/sql/connect/session.py index 9a678c28a6cc..764f71ccc415 100644 --- a/python/pyspark/sql/connect/session.py +++ b/python/pyspark/sql/connect/session.py @@ -958,6 +958,16 @@ class SparkSession: showMemoryProfiles.__doc__ = PySparkSession.showMemoryProfiles.__doc__ +def dumpPerfProfiles(self, path: str, id: Optional[int] = None) -> None: +self._profiler_collector.dump_perf_profiles(path, id) + +dumpPerfProfiles.__doc__ = PySparkSession.dumpPerfProfiles.__doc__ + +def dumpMemoryProfiles(self, path: str, id: Optional[int] = None) -> None: +self._profiler_collector.dump_memory_profiles(path, id) + +dumpMemoryProfiles.__doc__ = PySparkSession.dumpMemoryProfiles.__doc__ + SparkSession.__doc__ = PySparkSession.__doc__ diff --git a/python/pyspark/sql/profiler.py b/python/pyspark/sql/profiler.py index 565752197238..0db9d9b8b9b4 100644 --- a/python/pyspark/sql/profiler.py +++ b/python/pyspark/sql/profiler.py @@ -15,6 +15,7 @@ # limitations under the License. # from abc import ABC, abstractmethod +import os import pstats from threading import RLock from typing import Dict, Optional, TYPE_CHECKING @@ -158,6 +159,70 @@ class ProfilerCollector(ABC): """ ... +def dump_perf_profiles(self, path: str, id: Optional[int] = None) -> None: +""" +Dump the perf profile results into directory `path`. + +.. versionadded:: 4.0.0 + +Parameters +-- +path: str +A directory in which to dump the perf profile. +id : int, optional +A UDF ID to be shown. If not specified, all the results will be shown. +""" +with self._lock: +stats = self._perf_profile_results + +def dump(id: int) -> None: +s = stats.get(id) + +if s is not None: +if not os.path.exists(path): +os.makedirs(path) +p = os.path.join(path, f"udf_{id}_perf.pstats") +s.dump_stats(p) + +if id is not None: +dump(id) +else: +for id in sorted(stats.keys()): +dump(id) + +def dump_memory_profiles(self, path: str, id: Optional[int] = None) -> None: +""" +Dump the memory profile results into directory `path`. + +.. versionadded:: 4.0.0 + +Parameters +-- +path: str +A directory in which to dump the memory profile. +id : int, optional +A UDF ID to be shown. If not specified, all the results will be shown. +""" +with self._lock: +code_map = self._memory_profile_res
(spark) branch master updated: [SPARK-46689][SPARK-46690][PYTHON][CONNECT] Support v2 profiling in group/cogroup applyInPandas/applyInArrow
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1a66c8c78a46 [SPARK-46689][SPARK-46690][PYTHON][CONNECT] Support v2 profiling in group/cogroup applyInPandas/applyInArrow 1a66c8c78a46 is described below commit 1a66c8c78a468a5bdc6c033e8c7a26693e4bf62e Author: Xinrong Meng AuthorDate: Thu Feb 8 10:56:28 2024 -0800 [SPARK-46689][SPARK-46690][PYTHON][CONNECT] Support v2 profiling in group/cogroup applyInPandas/applyInArrow ### What changes were proposed in this pull request? Support v2 (perf, memory) profiling in group/cogroup applyInPandas/applyInArrow, which rely on physical plan nodes FlatMapGroupsInBatchExec and FlatMapCoGroupsInBatchExec. ### Why are the changes needed? Complete v2 profiling support. ### Does this PR introduce _any_ user-facing change? Yes. V2 profiling in group/cogroup applyInPandas/applyInArrow is supported. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45050 from xinrong-meng/other_p2. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- python/pyspark/sql/tests/test_udf_profiler.py | 123 + python/pyspark/tests/test_memory_profiler.py | 123 + .../python/FlatMapCoGroupsInBatchExec.scala| 2 +- .../python/FlatMapGroupsInBatchExec.scala | 2 +- 4 files changed, 248 insertions(+), 2 deletions(-) diff --git a/python/pyspark/sql/tests/test_udf_profiler.py b/python/pyspark/sql/tests/test_udf_profiler.py index 99719b5475c1..4f767d274414 100644 --- a/python/pyspark/sql/tests/test_udf_profiler.py +++ b/python/pyspark/sql/tests/test_udf_profiler.py @@ -394,6 +394,129 @@ class UDFProfiler2TestsMixin: io.getvalue(), f"2.*{os.path.basename(inspect.getfile(_do_computation))}" ) +@unittest.skipIf( +not have_pandas or not have_pyarrow, +cast(str, pandas_requirement_message or pyarrow_requirement_message), +) +def test_perf_profiler_group_apply_in_pandas(self): +# FlatMapGroupsInBatchExec +df = self.spark.createDataFrame( +[(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)], ("id", "v") +) + +def normalize(pdf): +v = pdf.v +return pdf.assign(v=(v - v.mean()) / v.std()) + +with self.sql_conf({"spark.sql.pyspark.udf.profiler": "perf"}): +df.groupby("id").applyInPandas(normalize, schema="id long, v double").show() + +self.assertEqual(1, len(self.profile_results), str(self.profile_results.keys())) + +for id in self.profile_results: +with self.trap_stdout() as io: +self.spark.showPerfProfiles(id) + +self.assertIn(f"Profile of UDF", io.getvalue()) +self.assertRegex( +io.getvalue(), f"2.*{os.path.basename(inspect.getfile(_do_computation))}" +) + +@unittest.skipIf( +not have_pandas or not have_pyarrow, +cast(str, pandas_requirement_message or pyarrow_requirement_message), +) +def test_perf_profiler_cogroup_apply_in_pandas(self): +# FlatMapCoGroupsInBatchExec +import pandas as pd + +df1 = self.spark.createDataFrame( +[(2101, 1, 1.0), (2101, 2, 2.0), (2102, 1, 3.0), (2102, 2, 4.0)], +("time", "id", "v1"), +) +df2 = self.spark.createDataFrame( +[(2101, 1, "x"), (2101, 2, "y")], ("time", "id", "v2") +) + +def asof_join(left, right): +return pd.merge_asof(left, right, on="time", by="id") + +with self.sql_conf({"spark.sql.pyspark.udf.profiler": "perf"}): +df1.groupby("id").cogroup(df2.groupby("id")).applyInPandas( +asof_join, schema="time int, id int, v1 double, v2 string" +).show() + +self.assertEqual(1, len(self.profile_results), str(self.profile_results.keys())) + +for id in self.profile_results: +with self.trap_stdout() as io: +self.spark.showPerfProfiles(id) + +self.assertIn(f"Profile of UDF", io.getvalue()) +self.assertRegex( +io.getvalue(), f"2.*{os.path.basename(inspect.getfile(_do_computation))}" +) + +@unittest.skipIf( +not have_pandas or not have_pyarrow, +cast(str, pandas_requirement_message or pyarrow_requirement_messag
(spark) branch master updated: [SPARK-46867][PYTHON][CONNECT][TESTS] Remove unnecessary dependency from test_mixed_udf_and_sql.py
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 79918028b142 [SPARK-46867][PYTHON][CONNECT][TESTS] Remove unnecessary dependency from test_mixed_udf_and_sql.py 79918028b142 is described below commit 79918028b142685fe1c3871a3593e91100ab6bbf Author: Xinrong Meng AuthorDate: Thu Jan 25 14:16:12 2024 -0800 [SPARK-46867][PYTHON][CONNECT][TESTS] Remove unnecessary dependency from test_mixed_udf_and_sql.py ### What changes were proposed in this pull request? Remove unnecessary dependency from test_mixed_udf_and_sql.py. ### Why are the changes needed? Otherwise, test_mixed_udf_and_sql.py depends on Spark Connect's dependency "grpc", possibly leading to conflicts or compatibility issues. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Test change only. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44886 from xinrong-meng/fix_dep. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- python/pyspark/sql/tests/connect/test_parity_pandas_udf_scalar.py | 4 python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py | 5 +++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/python/pyspark/sql/tests/connect/test_parity_pandas_udf_scalar.py b/python/pyspark/sql/tests/connect/test_parity_pandas_udf_scalar.py index c950ca2e17c3..6a3d03246549 100644 --- a/python/pyspark/sql/tests/connect/test_parity_pandas_udf_scalar.py +++ b/python/pyspark/sql/tests/connect/test_parity_pandas_udf_scalar.py @@ -15,6 +15,7 @@ # limitations under the License. # import unittest +from pyspark.sql.connect.column import Column from pyspark.sql.tests.pandas.test_pandas_udf_scalar import ScalarPandasUDFTestsMixin from pyspark.testing.connectutils import ReusedConnectTestCase @@ -51,6 +52,9 @@ class PandasUDFScalarParityTests(ScalarPandasUDFTestsMixin, ReusedConnectTestCas def test_vectorized_udf_invalid_length(self): self.check_vectorized_udf_invalid_length() +def test_mixed_udf_and_sql(self): +self._test_mixed_udf_and_sql(Column) + if __name__ == "__main__": from pyspark.sql.tests.connect.test_parity_pandas_udf_scalar import * # noqa: F401 diff --git a/python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py b/python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py index dfbab5c8b3cd..9f6bdb83caf7 100644 --- a/python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py +++ b/python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py @@ -1321,8 +1321,9 @@ class ScalarPandasUDFTestsMixin: self.assertEqual(expected_multi, df_multi_2.collect()) def test_mixed_udf_and_sql(self): -from pyspark.sql.connect.column import Column as ConnectColumn +self._test_mixed_udf_and_sql(Column) +def _test_mixed_udf_and_sql(self, col_type): df = self.spark.range(0, 1).toDF("v") # Test mixture of UDFs, Pandas UDFs and SQL expression. @@ -1333,7 +1334,7 @@ class ScalarPandasUDFTestsMixin: return x + 1 def f2(x): -assert type(x) in (Column, ConnectColumn) +assert type(x) == col_type return x + 10 @pandas_udf("int") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46663][PYTHON] Disable memory profiler for pandas UDFs with iterators
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 48152b1779a5 [SPARK-46663][PYTHON] Disable memory profiler for pandas UDFs with iterators 48152b1779a5 is described below commit 48152b1779a5b8191dd0e09424fdb552cac55d49 Author: Xinrong Meng AuthorDate: Tue Jan 16 11:20:40 2024 -0800 [SPARK-46663][PYTHON] Disable memory profiler for pandas UDFs with iterators ### What changes were proposed in this pull request? When using pandas UDFs with iterators, if users enable the profiling spark conf, a warning indicating non-support should be raised, and profiling should be disabled. However, currently, after raising the not-supported warning, the memory profiler is still being enabled. The PR proposed to fix that. ### Why are the changes needed? A bug fix to eliminate misleading behavior. ### Does this PR introduce _any_ user-facing change? The noticeable changes will affect only those using the PySpark shell. This is because, in the PySpark shell, the memory profiler will raise an error, which in turn blocks the execution of the UDF. ### How was this patch tested? Manual test. ### Was this patch authored or co-authored using generative AI tooling? Setup: ```py $ ./bin/pyspark --conf spark.python.profile=true >>> from typing import Iterator >>> from pyspark.sql.functions import * >>> import pandas as pd >>> pandas_udf("long") ... def plus_one(iterator: Iterator[pd.Series]) -> Iterator[pd.Series]: ... for s in iterator: ... yield s + 1 ... >>> df = spark.createDataFrame(pd.DataFrame([1, 2, 3], columns=["v"])) ``` Before: ``` >>> df.select(plus_one(df.v)).show() UserWarning: Profiling UDFs with iterators input/output is not supported. Traceback (most recent call last): ... OSError: could not get source code ``` After: ``` >>> df.select(plus_one(df.v)).show() /Users/xinrong.meng/spark/python/pyspark/sql/udf.py:417: UserWarning: Profiling UDFs with iterators input/output is not supported. +---+ |plus_one(v)| +---+ | 2| | 3| | 4| +---+ ``` Closes #44668 from xinrong-meng/fix_mp. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- python/pyspark/sql/tests/test_udf_profiler.py | 45 ++- python/pyspark/sql/udf.py | 33 ++-- 2 files changed, 60 insertions(+), 18 deletions(-) diff --git a/python/pyspark/sql/tests/test_udf_profiler.py b/python/pyspark/sql/tests/test_udf_profiler.py index 136f423d0a35..776d5da88bb2 100644 --- a/python/pyspark/sql/tests/test_udf_profiler.py +++ b/python/pyspark/sql/tests/test_udf_profiler.py @@ -19,11 +19,13 @@ import tempfile import unittest import os import sys +import warnings from io import StringIO +from typing import Iterator from pyspark import SparkConf from pyspark.sql import SparkSession -from pyspark.sql.functions import udf +from pyspark.sql.functions import udf, pandas_udf from pyspark.profiler import UDFBasicProfiler @@ -101,6 +103,47 @@ class UDFProfilerTests(unittest.TestCase): df = self.spark.range(10) df.select(add1("id"), add2("id"), add1("id")).collect() +# Unsupported +def exec_pandas_udf_iter_to_iter(self): +import pandas as pd + +@pandas_udf("int") +def iter_to_iter(batch_ser: Iterator[pd.Series]) -> Iterator[pd.Series]: +for ser in batch_ser: +yield ser + 1 + +self.spark.range(10).select(iter_to_iter("id")).collect() + +# Unsupported +def exec_map(self): +import pandas as pd + +def map(pdfs: Iterator[pd.DataFrame]) -> Iterator[pd.DataFrame]: +for pdf in pdfs: +yield pdf[pdf.id == 1] + +df = self.spark.createDataFrame([(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0)], ("id", "v")) +df.mapInPandas(map, schema=df.schema).collect() + +def test_unsupported(self): +with warnings.catch_warnings(record=True) as warns: +warnings.simplefilter("always") +self.exec_pandas_udf_iter_to_iter() +user_warns = [warn.message for warn in warns if isinstance(warn.message, UserWarning)] +self.assertTrue(len(user_warns) > 0) +self.assertTrue( +"Profiling UDFs with iterators input/output is not supported" in str(user_warns[0]) +) + +with warnings.catch_warnin
(spark) branch master updated: [SPARK-46277][PYTHON] Validate startup urls with the config being set
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 027aeb1764a8 [SPARK-46277][PYTHON] Validate startup urls with the config being set 027aeb1764a8 is described below commit 027aeb1764a816858b7ea071cd2b620f02a6a525 Author: Xinrong Meng AuthorDate: Thu Dec 7 13:45:31 2023 -0800 [SPARK-46277][PYTHON] Validate startup urls with the config being set ### What changes were proposed in this pull request? Validate startup urls with the config being set, see example in the "Does this PR introduce _any_ user-facing change". ### Why are the changes needed? Clear and user-friendly error messages. ### Does this PR introduce _any_ user-facing change? Yes. FROM ```py >>> SparkSession.builder.config(map={"spark.master": "x", "spark.remote": "y"}) >> SparkSession.builder.config(map={"spark.master": "x", "spark.remote": "y"}).config("x", "z") # Only raises the error when adding new configs Traceback (most recent call last): ... RuntimeError: Spark master cannot be configured with Spark Connect server; however, found URL for Spark Connect [y] ``` TO ```py >>> SparkSession.builder.config(map={"spark.master": "x", "spark.remote": "y"}) Traceback (most recent call last): ... RuntimeError: Spark master cannot be configured with Spark Connect server; however, found URL for Spark Connect [y] ``` ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44194 from xinrong-meng/fix_session. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- python/pyspark/errors/error_classes.py | 6 +++--- python/pyspark/sql/session.py| 28 +++- python/pyspark/sql/tests/test_session.py | 30 -- 3 files changed, 42 insertions(+), 22 deletions(-) diff --git a/python/pyspark/errors/error_classes.py b/python/pyspark/errors/error_classes.py index 965fd04a9135..cc8400270967 100644 --- a/python/pyspark/errors/error_classes.py +++ b/python/pyspark/errors/error_classes.py @@ -86,12 +86,12 @@ ERROR_CLASSES_JSON = """ }, "CANNOT_CONFIGURE_SPARK_CONNECT": { "message": [ - "Spark Connect server cannot be configured with Spark master; however, found URL for Spark master []." + "Spark Connect server cannot be configured: Existing [], New []." ] }, - "CANNOT_CONFIGURE_SPARK_MASTER": { + "CANNOT_CONFIGURE_SPARK_CONNECT_MASTER": { "message": [ - "Spark master cannot be configured with Spark Connect server; however, found URL for Spark Connect []." + "Spark Connect server and Spark master cannot be configured together: Spark master [], Spark Connect []." ] }, "CANNOT_CONVERT_COLUMN_INTO_BOOL": { diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py index 7f4589557cd2..86aacfa54c6e 100644 --- a/python/pyspark/sql/session.py +++ b/python/pyspark/sql/session.py @@ -286,17 +286,17 @@ class SparkSession(SparkConversionMixin): with self._lock: if conf is not None: for k, v in conf.getAll(): -self._validate_startup_urls() self._options[k] = v +self._validate_startup_urls() elif map is not None: for k, v in map.items(): # type: ignore[assignment] v = to_str(v) # type: ignore[assignment] -self._validate_startup_urls() self._options[k] = v +self._validate_startup_urls() else: value = to_str(value) -self._validate_startup_urls() self._options[cast(str, key)] = value +self._validate_startup_urls() return self def _validate_startup_urls( @@ -306,22 +306,16 @@ class SparkSession(SparkConversionMixin): Helper function that validates the combination of startup URLs and raises an exception if incompatible options are selected. """ -if "spark.master" in self._options and ( +if ("spark.master" in self._options or "MASTER" in os.environ) and ( "spark.remote" in self._options or "SP
[spark] branch branch-3.5 updated: [SPARK-44560][PYTHON][CONNECT] Improve tests and documentation for Arrow Python UDF
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 36b93d07eb9 [SPARK-44560][PYTHON][CONNECT] Improve tests and documentation for Arrow Python UDF 36b93d07eb9 is described below commit 36b93d07eb961905647c42fac80e22efdfb15f4f Author: Xinrong Meng AuthorDate: Thu Jul 27 13:45:05 2023 -0700 [SPARK-44560][PYTHON][CONNECT] Improve tests and documentation for Arrow Python UDF ### What changes were proposed in this pull request? - Test on complex return type - Remove complex return type constraints for Arrow Python UDF on Spark Connect - Update documentation of the related Spark conf The change targets both Spark 3.5 and 4.0. ### Why are the changes needed? Testability and parity with vanilla PySpark. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. Closes #42178 from xinrong-meng/conf. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng (cherry picked from commit 5f6537409383e2dbdd699108f708567c37db8151) Signed-off-by: Xinrong Meng --- python/pyspark/sql/connect/udf.py| 10 ++ python/pyspark/sql/tests/test_arrow_python_udf.py| 5 - python/pyspark/sql/tests/test_udf.py | 16 .../scala/org/apache/spark/sql/internal/SQLConf.scala| 3 +-- 4 files changed, 19 insertions(+), 15 deletions(-) diff --git a/python/pyspark/sql/connect/udf.py b/python/pyspark/sql/connect/udf.py index 0a5d06618b3..2d7e423d3d5 100644 --- a/python/pyspark/sql/connect/udf.py +++ b/python/pyspark/sql/connect/udf.py @@ -35,7 +35,7 @@ from pyspark.sql.connect.expressions import ( ) from pyspark.sql.connect.column import Column from pyspark.sql.connect.types import UnparsedDataType -from pyspark.sql.types import ArrayType, DataType, MapType, StringType, StructType +from pyspark.sql.types import DataType, StringType from pyspark.sql.udf import UDFRegistration as PySparkUDFRegistration from pyspark.errors import PySparkTypeError @@ -70,18 +70,12 @@ def _create_py_udf( is_arrow_enabled = useArrow regular_udf = _create_udf(f, returnType, PythonEvalType.SQL_BATCHED_UDF) -return_type = regular_udf.returnType try: is_func_with_args = len(getfullargspec(f).args) > 0 except TypeError: is_func_with_args = False -is_output_atomic_type = ( -not isinstance(return_type, StructType) -and not isinstance(return_type, MapType) -and not isinstance(return_type, ArrayType) -) if is_arrow_enabled: -if is_output_atomic_type and is_func_with_args: +if is_func_with_args: return _create_arrow_py_udf(regular_udf) else: warnings.warn( diff --git a/python/pyspark/sql/tests/test_arrow_python_udf.py b/python/pyspark/sql/tests/test_arrow_python_udf.py index 264ea0b901f..f48f07666e1 100644 --- a/python/pyspark/sql/tests/test_arrow_python_udf.py +++ b/python/pyspark/sql/tests/test_arrow_python_udf.py @@ -47,11 +47,6 @@ class PythonUDFArrowTestsMixin(BaseUDFTestsMixin): def test_register_java_udaf(self): super(PythonUDFArrowTests, self).test_register_java_udaf() -# TODO(SPARK-43903): Standardize ArrayType conversion for Python UDF -@unittest.skip("Inconsistent ArrayType conversion with/without Arrow.") -def test_nested_array(self): -super(PythonUDFArrowTests, self).test_nested_array() - def test_complex_input_types(self): row = ( self.spark.range(1) diff --git a/python/pyspark/sql/tests/test_udf.py b/python/pyspark/sql/tests/test_udf.py index 8ffcb5e05a2..239ff27813b 100644 --- a/python/pyspark/sql/tests/test_udf.py +++ b/python/pyspark/sql/tests/test_udf.py @@ -882,6 +882,22 @@ class BaseUDFTestsMixin(object): row = df.select(f("nested_array")).first() self.assertEquals(row[0], [[1, 2], [3, 4], [4, 5]]) +def test_complex_return_types(self): +row = ( +self.spark.range(1) +.selectExpr("array(1, 2, 3) as array", "map('a', 'b') as map", "struct(1, 2) as struct") +.select( +udf(lambda x: x, "array")("array"), +udf(lambda x: x, "map")("map"), +udf(lambda x: x, "struct")("struct"), +) +.first() +) + +self.assertEquals(row[0], [1, 2, 3]) +self.assertEquals(row[1], {"a": "b"}) +self.assertEquals(row[2], Row(col1=1, col2=2)) + class UDFTests(BaseUDFTestsMixin, ReusedSQLTestCase): @class
[spark] branch master updated: [SPARK-44560][PYTHON][CONNECT] Improve tests and documentation for Arrow Python UDF
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5f653740938 [SPARK-44560][PYTHON][CONNECT] Improve tests and documentation for Arrow Python UDF 5f653740938 is described below commit 5f6537409383e2dbdd699108f708567c37db8151 Author: Xinrong Meng AuthorDate: Thu Jul 27 13:45:05 2023 -0700 [SPARK-44560][PYTHON][CONNECT] Improve tests and documentation for Arrow Python UDF ### What changes were proposed in this pull request? - Test on complex return type - Remove complex return type constraints for Arrow Python UDF on Spark Connect - Update documentation of the related Spark conf The change targets both Spark 3.5 and 4.0. ### Why are the changes needed? Testability and parity with vanilla PySpark. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. Closes #42178 from xinrong-meng/conf. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- python/pyspark/sql/connect/udf.py| 10 ++ python/pyspark/sql/tests/test_arrow_python_udf.py| 5 - python/pyspark/sql/tests/test_udf.py | 16 .../scala/org/apache/spark/sql/internal/SQLConf.scala| 3 +-- 4 files changed, 19 insertions(+), 15 deletions(-) diff --git a/python/pyspark/sql/connect/udf.py b/python/pyspark/sql/connect/udf.py index 0a5d06618b3..2d7e423d3d5 100644 --- a/python/pyspark/sql/connect/udf.py +++ b/python/pyspark/sql/connect/udf.py @@ -35,7 +35,7 @@ from pyspark.sql.connect.expressions import ( ) from pyspark.sql.connect.column import Column from pyspark.sql.connect.types import UnparsedDataType -from pyspark.sql.types import ArrayType, DataType, MapType, StringType, StructType +from pyspark.sql.types import DataType, StringType from pyspark.sql.udf import UDFRegistration as PySparkUDFRegistration from pyspark.errors import PySparkTypeError @@ -70,18 +70,12 @@ def _create_py_udf( is_arrow_enabled = useArrow regular_udf = _create_udf(f, returnType, PythonEvalType.SQL_BATCHED_UDF) -return_type = regular_udf.returnType try: is_func_with_args = len(getfullargspec(f).args) > 0 except TypeError: is_func_with_args = False -is_output_atomic_type = ( -not isinstance(return_type, StructType) -and not isinstance(return_type, MapType) -and not isinstance(return_type, ArrayType) -) if is_arrow_enabled: -if is_output_atomic_type and is_func_with_args: +if is_func_with_args: return _create_arrow_py_udf(regular_udf) else: warnings.warn( diff --git a/python/pyspark/sql/tests/test_arrow_python_udf.py b/python/pyspark/sql/tests/test_arrow_python_udf.py index 264ea0b901f..f48f07666e1 100644 --- a/python/pyspark/sql/tests/test_arrow_python_udf.py +++ b/python/pyspark/sql/tests/test_arrow_python_udf.py @@ -47,11 +47,6 @@ class PythonUDFArrowTestsMixin(BaseUDFTestsMixin): def test_register_java_udaf(self): super(PythonUDFArrowTests, self).test_register_java_udaf() -# TODO(SPARK-43903): Standardize ArrayType conversion for Python UDF -@unittest.skip("Inconsistent ArrayType conversion with/without Arrow.") -def test_nested_array(self): -super(PythonUDFArrowTests, self).test_nested_array() - def test_complex_input_types(self): row = ( self.spark.range(1) diff --git a/python/pyspark/sql/tests/test_udf.py b/python/pyspark/sql/tests/test_udf.py index 8ffcb5e05a2..239ff27813b 100644 --- a/python/pyspark/sql/tests/test_udf.py +++ b/python/pyspark/sql/tests/test_udf.py @@ -882,6 +882,22 @@ class BaseUDFTestsMixin(object): row = df.select(f("nested_array")).first() self.assertEquals(row[0], [[1, 2], [3, 4], [4, 5]]) +def test_complex_return_types(self): +row = ( +self.spark.range(1) +.selectExpr("array(1, 2, 3) as array", "map('a', 'b') as map", "struct(1, 2) as struct") +.select( +udf(lambda x: x, "array")("array"), +udf(lambda x: x, "map")("map"), +udf(lambda x: x, "struct")("struct"), +) +.first() +) + +self.assertEquals(row[0], [1, 2, 3]) +self.assertEquals(row[1], {"a": "b"}) +self.assertEquals(row[2], Row(col1=1, col2=2)) + class UDFTests(BaseUDFTestsMixin, ReusedSQLTestCase): @classmethod diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/mai
[spark] branch master updated: [SPARK-40770][PYTHON][FOLLOW-UP] Improved error messages for mapInPandas for schema mismatch
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a367fde24de [SPARK-40770][PYTHON][FOLLOW-UP] Improved error messages for mapInPandas for schema mismatch a367fde24de is described below commit a367fde24de0abab93eac97350fb4ae0b687286c Author: Enrico Minack AuthorDate: Mon Jul 17 17:08:36 2023 -0700 [SPARK-40770][PYTHON][FOLLOW-UP] Improved error messages for mapInPandas for schema mismatch ### What changes were proposed in this pull request? Similar to #38223, improve the error messages when a Python method provided to `DataFrame.mapInPandas` returns a Pandas DataFrame that does not match the expected schema. With ```Python df = spark.range(2).withColumn("v", col("id")) ``` **Mismatching column names:** ```Python df.mapInPandas(lambda it: it, "id long, val long").show() # was: KeyError: 'val' # now: RuntimeError: Column names of the returned pandas.DataFrame do not match specified schema. # Missing: val Unexpected: v ``` **Python function not returning iterator:** ```Python df.mapInPandas(lambda it: 1, "id long").show() # was: TypeError: 'int' object is not iterable # now: TypeError: Return type of the user-defined function should be iterator of pandas.DataFrame, but is ``` **Python function not returning iterator of pandas.DataFrame:** ```Python df.mapInPandas(lambda it: [1], "id long").show() # was: TypeError: Return type of the user-defined function should be Pandas.DataFrame, but is # now: TypeError: Return type of the user-defined function should be iterator of pandas.DataFrame, but is iterator of # sometimes: ValueError: A field of type StructType expects a pandas.DataFrame, but got: # now: TypeError: Return type of the user-defined function should be iterator of pandas.DataFrame, but is iterator of ``` **Mismatching types (ValueError and TypeError):** ```Python df.mapInPandas(lambda it: it, "id int, v string").show() # was: pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64 # now: pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64 # The above exception was the direct cause of the following exception: # TypeError: Exception thrown when converting pandas.Series (int64) with name 'v' to Arrow Array (string). df.mapInPandas(lambda it: [pdf.assign(v=pdf["v"].apply(str)) for pdf in it], "id int, v double").show() # was: pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to convert to double # now: pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to convert to double # The above exception was the direct cause of the following exception: # ValueError: Exception thrown when converting pandas.Series (object) with name 'v' to Arrow Array (double). with self.sql_conf({"spark.sql.execution.pandas.convertToArrowArraySafely": True}): df.mapInPandas(lambda it: [pdf.assign(v=pdf["v"].apply(str)) for pdf in it], "id int, v double").show() # was: ValueError: Exception thrown when converting pandas.Series (object) to Arrow Array (double). # It can be caused by overflows or other unsafe conversions warned by Arrow. Arrow safe type check can be disabled # by using SQL config `spark.sql.execution.pandas.convertToArrowArraySafely`. # now: ValueError: Exception thrown when converting pandas.Series (object) with name 'v' to Arrow Array (double). # It can be caused by overflows or other unsafe conversions warned by Arrow. Arrow safe type check can be disabled # by using SQL config `spark.sql.execution.pandas.convertToArrowArraySafely`. ``` ### Why are the changes needed? Existing errors are generic (`KeyError`) or meaningless (`'int' object is not iterable`). The errors should help users in spotting the mismatching columns by naming them. The schema of the returned Pandas DataFrames can only be checked during processing the DataFrame, so such errors are very expensive. Therefore, they should be expressive. ### Does this PR introduce _any_ user-facing change? This only changes error messages, not behaviour. ### How was this patch tested? Tests all cases of schema mismatch for `DataFrame.mapInPandas`. Closes #39952 from EnricoMi/branch-pyspark-map-in-pandas-schema-mismatch. Authored-by: Enrico Minack Signed-off-by: Xinrong Meng --- python/pyspark/errors/error_classes.py
[spark] branch master updated: [SPARK-44446][PYTHON] Add checks for expected list type special cases
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e578d466d4e [SPARK-6][PYTHON] Add checks for expected list type special cases e578d466d4e is described below commit e578d466d4eae808a8ad5e42681b9e3e87fe6ca7 Author: Amanda Liu AuthorDate: Mon Jul 17 11:43:05 2023 -0700 [SPARK-6][PYTHON] Add checks for expected list type special cases ### What changes were proposed in this pull request? This PR adds handling for special cases when `expected` is type list. ### Why are the changes needed? The change is needed to handle all cases for when `expected` is type list. ### Does this PR introduce _any_ user-facing change? Yes, the PR makes modifications to the user-facing function `assertDataFrameEqual` ### How was this patch tested? Added tests to `runtime/python/pyspark/sql/tests/test_utils.py` and `runtime/python/pyspark/sql/tests/connect/test_utils.py` Closes #42023 from asl3/fix-list-support. Authored-by: Amanda Liu Signed-off-by: Xinrong Meng --- python/pyspark/sql/tests/test_utils.py | 24 python/pyspark/testing/utils.py| 15 +-- 2 files changed, 37 insertions(+), 2 deletions(-) diff --git a/python/pyspark/sql/tests/test_utils.py b/python/pyspark/sql/tests/test_utils.py index 5b859ad15a5..eae3f528504 100644 --- a/python/pyspark/sql/tests/test_utils.py +++ b/python/pyspark/sql/tests/test_utils.py @@ -1119,6 +1119,30 @@ class UtilsTestsMixin: assertDataFrameEqual(df1, df2, checkRowOrder=False) assertDataFrameEqual(df1, df2, checkRowOrder=True) +def test_empty_expected_list(self): +df1 = self.spark.range(0, 10).drop("id") + +df2 = [] + +assertDataFrameEqual(df1, df2, checkRowOrder=False) +assertDataFrameEqual(df1, df2, checkRowOrder=True) + +def test_no_column_expected_list(self): +df1 = self.spark.range(0, 10).limit(0) + +df2 = [] + +assertDataFrameEqual(df1, df2, checkRowOrder=False) +assertDataFrameEqual(df1, df2, checkRowOrder=True) + +def test_empty_no_column_expected_list(self): +df1 = self.spark.range(0, 10).drop("id").limit(0) + +df2 = [] + +assertDataFrameEqual(df1, df2, checkRowOrder=False) +assertDataFrameEqual(df1, df2, checkRowOrder=True) + def test_special_vals(self): df1 = self.spark.createDataFrame( data=[ diff --git a/python/pyspark/testing/utils.py b/python/pyspark/testing/utils.py index 21c7b7e4dcd..14db9264209 100644 --- a/python/pyspark/testing/utils.py +++ b/python/pyspark/testing/utils.py @@ -349,6 +349,8 @@ def assertDataFrameEqual( For checkRowOrder, note that PySpark DataFrame ordering is non-deterministic, unless explicitly sorted. +Note that schema equality is checked only when `expected` is a DataFrame (not a list of Rows). + For DataFrames with float values, assertDataFrame asserts approximate equality. Two float values a and b are approximately equal if the following equation is True: @@ -362,6 +364,9 @@ def assertDataFrameEqual( >>> df1 = spark.createDataFrame(data=[("1", 0.1), ("2", 3.23)], schema=["id", "amount"]) >>> df2 = spark.createDataFrame(data=[("1", 0.109), ("2", 3.23)], schema=["id", "amount"]) >>> assertDataFrameEqual(df1, df2, rtol=1e-1) # pass, DataFrames are approx equal by rtol +>>> df1 = spark.createDataFrame(data=[(1, 1000), (2, 3000)], schema=["id", "amount"]) +>>> list_of_rows = [Row(1, 1000), Row(2, 3000)] +>>> assertDataFrameEqual(df1, list_of_rows) # pass, actual and expected are equal >>> df1 = spark.createDataFrame( ... data=[("1", 1000.00), ("2", 3000.00), ("3", 2000.00)], schema=["id", "amount"]) >>> df2 = spark.createDataFrame( @@ -415,8 +420,14 @@ def assertDataFrameEqual( ) # special cases: empty datasets, datasets with 0 columns -if (actual.first() is None and expected.first() is None) or ( -len(actual.columns) == 0 and len(expected.columns) == 0 +if ( +isinstance(expected, DataFrame) +and ( +(actual.first() is None and expected.first() is None) +or (len(actual.columns) == 0 and len(expected.columns) == 0) +) +or isinstance(expected, list) +and ((actual.first() is None or len(actual.columns) == 0) and len(expected) == 0) ): return True - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44264][PYTHON][ML] FunctionPickler Class
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e000cb868cc [SPARK-44264][PYTHON][ML] FunctionPickler Class e000cb868cc is described below commit e000cb868ccb1a4f48a8356ccfc736e16ed1c1b5 Author: Mathew Jacob AuthorDate: Fri Jul 14 14:12:08 2023 -0700 [SPARK-44264][PYTHON][ML] FunctionPickler Class ### What changes were proposed in this pull request? This PR introduces the FunctionPickler utility class that will be responsible for pickling class and their arguments, creating scripts that will run those functions and pickle their output, as well as extracting objects from a pickle file. ### Why are the changes needed? This is used to abstract away the responsibility of pickling from the TorchDistributor, as that is relatively tangential to the actual distributed training. Additionally, for future distributors or anything that uses pickling to transmit objects, this class can prove useful with built-in functionalities. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Wrote unit tests. Checklist: - [x] Pickles a function and its arguments to a file. - [x] Creates a file that will execute that, given a path to pickled functions and arguments, run the function and arguments and then pickle the output to another location. - [x] Extracts output given a pickle file. - [x] Unit tests for first feature. - [x] Unit tests for second feature. - [x] Unit tests for third feature. Closes #41946 from mathewjacob1002/function_pickler. Lead-authored-by: Mathew Jacob Co-authored-by: Mathew Jacob <134338709+mathewjacob1...@users.noreply.github.com> Signed-off-by: Xinrong Meng --- python/pyspark/ml/dl_util.py| 150 ++ python/pyspark/ml/tests/test_dl_util.py | 186 2 files changed, 336 insertions(+) diff --git a/python/pyspark/ml/dl_util.py b/python/pyspark/ml/dl_util.py new file mode 100644 index 000..8ead529d7b7 --- /dev/null +++ b/python/pyspark/ml/dl_util.py @@ -0,0 +1,150 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +import os +import tempfile +import textwrap +from typing import Any, Callable + +from pyspark import cloudpickle + + +class FunctionPickler: +""" +This class provides a way to pickle a function and its arguments. +It also provides a way to create a script that can run a +function with arguments if they have them pickled to a file. +It also provides a way of extracting the conents of a pickle file. +""" + +@staticmethod +def pickle_fn_and_save( +fn: Callable, file_path: str, save_dir: str, *args: Any, **kwargs: Any +) -> str: +""" +Given a function and args, this function will pickle them to a file. + +Parameters +-- +fn: Callable +The picklable function that will be pickled to a file. +file_path: str +The path where to save the pickled function, args, and kwargs. If it's the +empty string, the function will decide on a random name. +save_dir: str +The directory in which to save the file with the pickled function and arguments. +Does nothing if the path is specified. If both file_path and save_dir are empty, +the function will write the file to the current working directory with a random +name. +*args: Any +Arguments of fn that will be pickled. +**kwargs: Any +Key word arguments to fn that will be pickled. + +Returns +--- +str +The path to the file where the function and arguments are pickled. +""" +if file_path != "": +with open(file_path, "wb") as f: +cloudpickle.dump((fn, args, kwargs), f
[spark] branch master updated: [SPARK-44398][CONNECT] Scala foreachBatch API
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4771853c9bc [SPARK-44398][CONNECT] Scala foreachBatch API 4771853c9bc is described below commit 4771853c9bc26b8741091d63d77c4b6487e74189 Author: Raghu Angadi AuthorDate: Thu Jul 13 10:47:49 2023 -0700 [SPARK-44398][CONNECT] Scala foreachBatch API This implements Scala foreachBatch(). The implementation basic and needs some more enhancements. The server side will be shared by Python implementation as well. One notable hack in this PR is that it runs user's `foreachBatch()` with regular(legacy) DataFrame, rather than setting up remote Spark connect session and connect DataFrame. ### Why are the changes needed? Adds foreachBatch() support in Scala Spark Connect. ### Does this PR introduce _any_ user-facing change? Yes. Adds foreachBatch() API ### How was this patch tested? - A simple unit test. Closes #41969 from rangadi/feb-scala. Authored-by: Raghu Angadi Signed-off-by: Xinrong Meng --- .../spark/sql/streaming/DataStreamWriter.scala | 28 ++- .../spark/sql/streaming/StreamingQuerySuite.scala | 52 - .../src/main/protobuf/spark/connect/commands.proto | 11 +-- .../sql/connect/planner/SparkConnectPlanner.scala | 25 +- .../planner/StreamingForeachBatchHelper.scala | 69 + python/pyspark/sql/connect/proto/commands_pb2.py | 88 +++--- python/pyspark/sql/connect/proto/commands_pb2.pyi | 46 +++ python/pyspark/sql/connect/streaming/readwriter.py | 4 +- 8 files changed, 251 insertions(+), 72 deletions(-) diff --git a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala index 9f63f68a000..ad76ab4a1bc 100644 --- a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala +++ b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala @@ -30,12 +30,15 @@ import org.apache.spark.connect.proto.Command import org.apache.spark.connect.proto.WriteStreamOperationStart import org.apache.spark.internal.Logging import org.apache.spark.sql.{Dataset, ForeachWriter} +import org.apache.spark.sql.connect.common.DataTypeProtoConverter import org.apache.spark.sql.connect.common.ForeachWriterPacket import org.apache.spark.sql.execution.streaming.AvailableNowTrigger import org.apache.spark.sql.execution.streaming.ContinuousTrigger import org.apache.spark.sql.execution.streaming.OneTimeTrigger import org.apache.spark.sql.execution.streaming.ProcessingTimeTrigger +import org.apache.spark.sql.types.NullType import org.apache.spark.util.SparkSerDeUtils +import org.apache.spark.util.Utils /** * Interface used to write a streaming `Dataset` to external storage systems (e.g. file systems, @@ -218,7 +221,30 @@ final class DataStreamWriter[T] private[sql] (ds: Dataset[T]) extends Logging { val scalaWriterBuilder = proto.ScalarScalaUDF .newBuilder() .setPayload(ByteString.copyFrom(serialized)) -sinkBuilder.getForeachWriterBuilder.setScalaWriter(scalaWriterBuilder) +sinkBuilder.getForeachWriterBuilder.setScalaFunction(scalaWriterBuilder) +this + } + + /** + * :: Experimental :: + * + * (Scala-specific) Sets the output of the streaming query to be processed using the provided + * function. This is supported only in the micro-batch execution modes (that is, when the + * trigger is not continuous). In every micro-batch, the provided function will be called in + * every micro-batch with (i) the output rows as a Dataset and (ii) the batch identifier. The + * batchId can be used to deduplicate and transactionally write the output (that is, the + * provided Dataset) to external systems. The output Dataset is guaranteed to be exactly the + * same for the same batchId (assuming all operations are deterministic in the query). + * + * @since 3.5.0 + */ + @Evolving + def foreachBatch(function: (Dataset[T], Long) => Unit): DataStreamWriter[T] = { +val serializedFn = Utils.serialize(function) +sinkBuilder.getForeachBatchBuilder.getScalaFunctionBuilder + .setPayload(ByteString.copyFrom(serializedFn)) + .setOutputType(DataTypeProtoConverter.toConnectProtoType(NullType)) // Unused. + .setNullable(true) // Unused. this } diff --git a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala index 6ddcedf19cb..438e6e0c2fe 100644 --- a/connector/connect/cl
[spark] branch master updated: [SPARK-44150][PYTHON][FOLLOW-UP] Revert commits
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e505244460b [SPARK-44150][PYTHON][FOLLOW-UP] Revert commits e505244460b is described below commit e505244460baa49f862d36333792c9d924cb4dde Author: Xinrong Meng AuthorDate: Thu Jun 29 14:55:03 2023 -0700 [SPARK-44150][PYTHON][FOLLOW-UP] Revert commits ### What changes were proposed in this pull request? Revert two commits of [SPARK-44150] that block master CI. ### Why are the changes needed? N/A ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? N/A Closes #41799 from xinrong-meng/revert. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- python/pyspark/sql/pandas/serializers.py | 32 +++ python/pyspark/sql/tests/test_arrow_python_udf.py | 39 --- python/pyspark/worker.py | 3 -- 3 files changed, 5 insertions(+), 69 deletions(-) diff --git a/python/pyspark/sql/pandas/serializers.py b/python/pyspark/sql/pandas/serializers.py index 12d4c3077fe..307fcc33752 100644 --- a/python/pyspark/sql/pandas/serializers.py +++ b/python/pyspark/sql/pandas/serializers.py @@ -190,7 +190,7 @@ class ArrowStreamPandasSerializer(ArrowStreamSerializer): ) return converter(s) -def _create_array(self, series, arrow_type, spark_type=None, arrow_cast=False): +def _create_array(self, series, arrow_type, spark_type=None): """ Create an Arrow Array from the given pandas.Series and optional type. @@ -202,9 +202,6 @@ class ArrowStreamPandasSerializer(ArrowStreamSerializer): If None, pyarrow's inferred type will be used spark_type : DataType, optional If None, spark type converted from arrow_type will be used -arrow_cast: bool, optional -Whether to apply Arrow casting when the user-specified return type mismatches the -actual return values. Returns --- @@ -229,14 +226,7 @@ class ArrowStreamPandasSerializer(ArrowStreamSerializer): else: mask = series.isnull() try: -if arrow_cast: -return pa.Array.from_pandas(series, mask=mask).cast( -target_type=arrow_type, safe=self._safecheck -) -else: -return pa.Array.from_pandas( -series, mask=mask, type=arrow_type, safe=self._safecheck -) +return pa.Array.from_pandas(series, mask=mask, type=arrow_type, safe=self._safecheck) except TypeError as e: error_msg = ( "Exception thrown when converting pandas.Series (%s) " @@ -329,14 +319,12 @@ class ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer): df_for_struct=False, struct_in_pandas="dict", ndarray_as_list=False, -arrow_cast=False, ): super(ArrowStreamPandasUDFSerializer, self).__init__(timezone, safecheck) self._assign_cols_by_name = assign_cols_by_name self._df_for_struct = df_for_struct self._struct_in_pandas = struct_in_pandas self._ndarray_as_list = ndarray_as_list -self._arrow_cast = arrow_cast def arrow_to_pandas(self, arrow_column): import pyarrow.types as types @@ -398,13 +386,7 @@ class ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer): # Assign result columns by schema name if user labeled with strings elif self._assign_cols_by_name and any(isinstance(name, str) for name in s.columns): arrs_names = [ -( -self._create_array( -s[field.name], field.type, arrow_cast=self._arrow_cast -), -field.name, -) -for field in t +(self._create_array(s[field.name], field.type), field.name) for field in t ] # Assign result columns by position else: @@ -412,11 +394,7 @@ class ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer): # the selected series has name '1', so we rename it to field.name # as the name is used by _create_array to provide a meaningful error message ( -self._create_array( -s[s.columns[i]].rename(field.name), -field.type, -
[spark] branch master updated (6e56cfeaca8 -> 414bc75ac5b)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 6e56cfeaca8 [SPARK-44150][PYTHON][CONNECT] Explicit Arrow casting for mismatched return type in Arrow Python UDF add 414bc75ac5b [SPARK-44150][PYTHON][FOLLOW-UP] Fix ArrowStreamPandasSerializer to set arguments properly No new revisions were added by this update. Summary of changes: python/pyspark/sql/pandas/serializers.py | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44150][PYTHON][CONNECT] Explicit Arrow casting for mismatched return type in Arrow Python UDF
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6e56cfeaca8 [SPARK-44150][PYTHON][CONNECT] Explicit Arrow casting for mismatched return type in Arrow Python UDF 6e56cfeaca8 is described below commit 6e56cfeaca884b1ccfaa8524c70f12f118bc840c Author: Xinrong Meng AuthorDate: Thu Jun 29 11:46:06 2023 -0700 [SPARK-44150][PYTHON][CONNECT] Explicit Arrow casting for mismatched return type in Arrow Python UDF ### What changes were proposed in this pull request? Explicit Arrow casting for the mismatched return type of Arrow Python UDF. ### Why are the changes needed? A more standardized and coherent type coercion. Please refer to https://github.com/apache/spark/pull/41706 for a comprehensive comparison between type coercion rules of Arrow and Pickle(used by the default Python UDF) separately. See more at [[Design] Type-coercion in Arrow Python UDFs](https://docs.google.com/document/d/e/2PACX-1vTEGElOZfhl9NfgbBw4CTrlm-8F_xQCAKNOXouz-7mg5vYobS7lCGUsGkDZxPY0wV5YkgoZmkYlxccU/pub). ### Does this PR introduce _any_ user-facing change? Yes. FROM ```py >>> df = spark.createDataFrame(['1', '2'], schema='string') df.select(pandas_udf(lambda x: x, 'int')('value')).show() >>> df.select(pandas_udf(lambda x: x, 'int')('value')).show() ... org.apache.spark.api.python.PythonException: Traceback (most recent call last): ... pyarrow.lib.ArrowInvalid: Could not convert '1' with type str: tried to convert to int32 ``` TO ```py >>> df = spark.createDataFrame(['1', '2'], schema='string') >>> df.select(pandas_udf(lambda x: x, 'int')('value')).show() +---+ |(value)| +---+ | 1| | 2| +---+ ``` ### How was this patch tested? Unit tests. Closes #41503 from xinrong-meng/type_coersion. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- python/pyspark/sql/pandas/serializers.py | 30 ++--- python/pyspark/sql/tests/test_arrow_python_udf.py | 39 +++ python/pyspark/worker.py | 3 ++ 3 files changed, 67 insertions(+), 5 deletions(-) diff --git a/python/pyspark/sql/pandas/serializers.py b/python/pyspark/sql/pandas/serializers.py index 307fcc33752..a99eda9cbea 100644 --- a/python/pyspark/sql/pandas/serializers.py +++ b/python/pyspark/sql/pandas/serializers.py @@ -190,7 +190,7 @@ class ArrowStreamPandasSerializer(ArrowStreamSerializer): ) return converter(s) -def _create_array(self, series, arrow_type, spark_type=None): +def _create_array(self, series, arrow_type, spark_type=None, arrow_cast=False): """ Create an Arrow Array from the given pandas.Series and optional type. @@ -202,6 +202,9 @@ class ArrowStreamPandasSerializer(ArrowStreamSerializer): If None, pyarrow's inferred type will be used spark_type : DataType, optional If None, spark type converted from arrow_type will be used +arrow_cast: bool, optional +Whether to apply Arrow casting when the user-specified return type mismatches the +actual return values. Returns --- @@ -226,7 +229,12 @@ class ArrowStreamPandasSerializer(ArrowStreamSerializer): else: mask = series.isnull() try: -return pa.Array.from_pandas(series, mask=mask, type=arrow_type, safe=self._safecheck) +if arrow_cast: +return pa.Array.from_pandas(series, mask=mask, type=arrow_type).cast( +target_type=arrow_type, safe=self._safecheck +) +else: +return pa.Array.from_pandas(series, mask=mask, safe=self._safecheck) except TypeError as e: error_msg = ( "Exception thrown when converting pandas.Series (%s) " @@ -319,12 +327,14 @@ class ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer): df_for_struct=False, struct_in_pandas="dict", ndarray_as_list=False, +arrow_cast=False, ): super(ArrowStreamPandasUDFSerializer, self).__init__(timezone, safecheck) self._assign_cols_by_name = assign_cols_by_name self._df_for_struct = df_for_struct self._struct_in_pandas = struct_in_pandas self._ndarray_as_list = ndarray_as_list +self._arrow_cast = arrow_cast def arrow_to_pandas(self, arrow_column)
[spark] branch master updated: [SPARK-43893][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 94098853592 [SPARK-43893][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF 94098853592 is described below commit 94098853592b524f52e9a340166b96ddeda4e898 Author: Xinrong Meng AuthorDate: Tue Jun 6 15:48:14 2023 -0700 [SPARK-43893][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF ### What changes were proposed in this pull request? Support non-atomic data types in input and output of Arrow-optimized Python UDF. Non-atomic data types refer to: ArrayType, MapType, and StructType. ### Why are the changes needed? Parity with pickled Python UDFs. ### Does this PR introduce _any_ user-facing change? Non-atomic data types are accepted as both input and output of Arrow-optimized Python UDF. For example, ```py >>> df = spark.range(1).selectExpr("struct(1, struct('John', 30, ('value', 10))) as nested_struct") >>> df.select(udf(lambda x: str(x))("nested_struct")).first() Row((nested_struct)="Row(col1=1, col2=Row(col1='John', col2=30, col3=Row(col1='value', col2=10)))") ``` ### How was this patch tested? Unit tests. Closes #41321 from xinrong-meng/arrow_udf_struct. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- python/pyspark/sql/pandas/serializers.py | 22 --- python/pyspark/sql/tests/test_arrow_python_udf.py | 17 - python/pyspark/sql/tests/test_udf.py | 45 +++ python/pyspark/sql/udf.py | 15 +--- python/pyspark/worker.py | 13 +-- 5 files changed, 79 insertions(+), 33 deletions(-) diff --git a/python/pyspark/sql/pandas/serializers.py b/python/pyspark/sql/pandas/serializers.py index 84471143367..12d0bee88ad 100644 --- a/python/pyspark/sql/pandas/serializers.py +++ b/python/pyspark/sql/pandas/serializers.py @@ -172,7 +172,7 @@ class ArrowStreamPandasSerializer(ArrowStreamSerializer): self._timezone = timezone self._safecheck = safecheck -def arrow_to_pandas(self, arrow_column): +def arrow_to_pandas(self, arrow_column, struct_in_pandas="dict"): # If the given column is a date type column, creates a series of datetime.date directly # instead of creating datetime64[ns] as intermediate data to avoid overflow caused by # datetime64[ns] type handling. @@ -184,7 +184,7 @@ class ArrowStreamPandasSerializer(ArrowStreamSerializer): data_type=from_arrow_type(arrow_column.type, prefer_timestamp_ntz=True), nullable=True, timezone=self._timezone, -struct_in_pandas="dict", +struct_in_pandas=struct_in_pandas, error_on_duplicated_field_names=True, ) return converter(s) @@ -310,10 +310,18 @@ class ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer): Serializer used by Python worker to evaluate Pandas UDFs """ -def __init__(self, timezone, safecheck, assign_cols_by_name, df_for_struct=False): +def __init__( +self, +timezone, +safecheck, +assign_cols_by_name, +df_for_struct=False, +struct_in_pandas="dict", +): super(ArrowStreamPandasUDFSerializer, self).__init__(timezone, safecheck) self._assign_cols_by_name = assign_cols_by_name self._df_for_struct = df_for_struct +self._struct_in_pandas = struct_in_pandas def arrow_to_pandas(self, arrow_column): import pyarrow.types as types @@ -323,13 +331,15 @@ class ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer): series = [ super(ArrowStreamPandasUDFSerializer, self) -.arrow_to_pandas(column) +.arrow_to_pandas(column, self._struct_in_pandas) .rename(field.name) for column, field in zip(arrow_column.flatten(), arrow_column.type) ] s = pd.concat(series, axis=1) else: -s = super(ArrowStreamPandasUDFSerializer, self).arrow_to_pandas(arrow_column) +s = super(ArrowStreamPandasUDFSerializer, self).arrow_to_pandas( +arrow_column, self._struct_in_pandas +) return s def _create_batch(self, series): @@ -360,7 +370,7 @@ class ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer): arrs = [] for s, t in series: -if t is not None and pa.types.is_struct(t): +if self._struct_in_pandas ==
[spark] branch master updated: [SPARK-41532][CONNECT][FOLLOWUP] add error class `SESSION_NOT_SAME` into error_classes.py
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 40dd5235373 [SPARK-41532][CONNECT][FOLLOWUP] add error class `SESSION_NOT_SAME` into error_classes.py 40dd5235373 is described below commit 40dd5235373891bdcc536e25082597aca24e6507 Author: Jia Fan AuthorDate: Mon May 22 10:51:25 2023 -0700 [SPARK-41532][CONNECT][FOLLOWUP] add error class `SESSION_NOT_SAME` into error_classes.py ### What changes were proposed in this pull request? This is a follow up PR for #40684 . Add error class `SESSION_NOT_SAME` define into `error_classes.py` with a template error message. ### Why are the changes needed? Unified error message ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? add new test. Closes #41259 from Hisoka-X/follow_up_session_not_same. Authored-by: Jia Fan Signed-off-by: Xinrong Meng --- python/pyspark/errors/error_classes.py | 5 + python/pyspark/sql/connect/dataframe.py | 5 - .../pyspark/sql/tests/connect/test_connect_basic.py | 21 ++--- 3 files changed, 27 insertions(+), 4 deletions(-) diff --git a/python/pyspark/errors/error_classes.py b/python/pyspark/errors/error_classes.py index c7b00e0736d..817b8ce60db 100644 --- a/python/pyspark/errors/error_classes.py +++ b/python/pyspark/errors/error_classes.py @@ -576,6 +576,11 @@ ERROR_CLASSES_JSON = """ "Result vector from pandas_udf was not the required length: expected , got ." ] }, + "SESSION_NOT_SAME" : { +"message" : [ + "Both Datasets must belong to the same SparkSession." +] + }, "SESSION_OR_CONTEXT_EXISTS" : { "message" : [ "There should not be an existing Spark Session or Spark Context." diff --git a/python/pyspark/sql/connect/dataframe.py b/python/pyspark/sql/connect/dataframe.py index 7a5ba50b3c6..4563366ef0f 100644 --- a/python/pyspark/sql/connect/dataframe.py +++ b/python/pyspark/sql/connect/dataframe.py @@ -265,7 +265,10 @@ class DataFrame: def checkSameSparkSession(self, other: "DataFrame") -> None: if self._session.session_id != other._session.session_id: -raise SessionNotSameException("Both Datasets must belong to the same SparkSession") +raise SessionNotSameException( +error_class="SESSION_NOT_SAME", +message_parameters={}, +) def coalesce(self, numPartitions: int) -> "DataFrame": if not numPartitions > 0: diff --git a/python/pyspark/sql/tests/connect/test_connect_basic.py b/python/pyspark/sql/tests/connect/test_connect_basic.py index dd5e52894c9..7225b6aa8d0 100644 --- a/python/pyspark/sql/tests/connect/test_connect_basic.py +++ b/python/pyspark/sql/tests/connect/test_connect_basic.py @@ -1815,14 +1815,29 @@ class SparkConnectBasicTests(SparkConnectSQLTestCase): spark2 = RemoteSparkSession(connection="sc://localhost") df2 = spark2.range(10).limit(3) -with self.assertRaises(SessionNotSameException): +with self.assertRaises(SessionNotSameException) as e1: df.union(df2).collect() +self.check_error( +exception=e1.exception, +error_class="SESSION_NOT_SAME", +message_parameters={}, +) -with self.assertRaises(SessionNotSameException): +with self.assertRaises(SessionNotSameException) as e2: df.unionByName(df2).collect() +self.check_error( +exception=e2.exception, +error_class="SESSION_NOT_SAME", +message_parameters={}, +) -with self.assertRaises(SessionNotSameException): +with self.assertRaises(SessionNotSameException) as e3: df.join(df2).collect() +self.check_error( +exception=e3.exception, +error_class="SESSION_NOT_SAME", +message_parameters={}, +) def test_extended_hint_types(self): cdf = self.connect.range(100).toDF("id") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43543][PYTHON] Fix nested MapType behavior in Pandas UDF
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new bc6f69a988f [SPARK-43543][PYTHON] Fix nested MapType behavior in Pandas UDF bc6f69a988f is described below commit bc6f69a988f13e5e22cb055e60693a545f0cbadb Author: Xinrong Meng AuthorDate: Fri May 19 14:54:59 2023 -0700 [SPARK-43543][PYTHON] Fix nested MapType behavior in Pandas UDF ### What changes were proposed in this pull request? Fix nested MapType behavior in Pandas UDF (and Arrow-optimized Python UDF). Previously during Arrow-pandas conversion, only the outermost layer is converted to a dictionary; but now nested MapType will be converted to nested dictionaries. That applies to Spark Connect as well. ### Why are the changes needed? Correctness and consistency (with `createDataFrame` and `toPandas` when Arrow is enabled). ### Does this PR introduce _any_ user-facing change? Yes. Nested MapType type support is corrected in Pandas UDF ```py >>> schema = StructType([ ... StructField("id", StringType(), True), ... StructField("attributes", MapType(StringType(), MapType(StringType(), StringType())), True) ... ]) >>> >>> data = [ ...("1", {"personal": {"name": "John", "city": "New York"}}), ... ] >>> df = spark.createDataFrame(data, schema) >>> pandas_udf(StringType()) ... def f(s: pd.Series) -> pd.Series: ...return s.astype(str) ... >>> df.select(f(df.attributes)).show(truncate=False) ``` The results of `df.select(f(df.attributes)).show(truncate=False)` is corrected **FROM** ```py +--+ |f(attributes) | +--+ |{'personal': [('name', 'John'), ('city', 'New York')]}| +--+ ``` **TO** ```py >>> df.select(f(df.attributes)).show(truncate=False) +--+ |f(attributes) | +--+ |{'personal': {'name': 'John', 'city': 'New York'}}| +--+ ``` **Another more obvious example:** ```py >>> pandas_udf(StringType()) ... def extract_name(s:pd.Series) -> pd.Series: ... return s.apply(lambda x: x['personal']['name']) ... >>> df.select(extract_name(df.attributes)).show(truncate=False) ``` `df.select(extract_name(df.attributes)).show(truncate=False)` is corrected **FROM** ```py org.apache.spark.api.python.PythonException: Traceback (most recent call last): ... TypeError: list indices must be integers or slices, not str ``` **TO** ```py +--------+ |extract_name(attributes)| ++ |John| ++ ``` ### How was this patch tested? Unit tests. Closes #41147 from xinrong-meng/nestedType. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- python/pyspark/sql/pandas/serializers.py | 91 -- .../sql/tests/pandas/test_pandas_udf_scalar.py | 30 +++ 2 files changed, 47 insertions(+), 74 deletions(-) diff --git a/python/pyspark/sql/pandas/serializers.py b/python/pyspark/sql/pandas/serializers.py index 9b5db2d000d..e81d90fc23e 100644 --- a/python/pyspark/sql/pandas/serializers.py +++ b/python/pyspark/sql/pandas/serializers.py @@ -21,7 +21,12 @@ Serializers for PyArrow and pandas conversions. See `pyspark.serializers` for mo from pyspark.errors import PySparkTypeError, PySparkValueError from pyspark.serializers import Serializer, read_int, write_int, UTF8Deserializer, CPickleSerializer -from pyspark.sql.pandas.types import from_arrow_type, to_arrow_type, _create_converter_from_pandas +from pyspark.sql.pandas.types import ( +from_arrow_type, +to_arrow_type, +_create_converter_from_pandas, +_create_converter_to_pandas, +) from pyspark.sql.types import StringType, StructType, BinaryType, StructField, LongType @@ -168,23 +173,21 @@ class ArrowStreamPandasSerializer(ArrowStreamSerializer): self._safecheck = safecheck def arrow_to_pandas(self, arrow_column): -from pyspark.sql.
[spark-website] branch asf-site updated: Update Apache Spark 3.5 Release Window
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 18ca078b23 Update Apache Spark 3.5 Release Window 18ca078b23 is described below commit 18ca078b23f826c24bed32df1dc89854a91cb580 Author: Xinrong Meng AuthorDate: Thu May 11 17:42:37 2023 -0700 Update Apache Spark 3.5 Release Window Update Apache Spark 3.5 Release Window, with proposed dates: ``` | July 16th 2023 | Code freeze. Release branch cut.| | Late July 2023 | QA period. Focus on bug fixes, tests, stability and docs. Generally, no new features merged.| | August 2023| Release candidates (RC), voting, etc. until final release passes| ``` Author: Xinrong Meng Closes #461 from xinrong-meng/3.5release_window. --- site/versioning-policy.html | 8 versioning-policy.md| 8 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/site/versioning-policy.html b/site/versioning-policy.html index d25bd676c7..74b559d5e8 100644 --- a/site/versioning-policy.html +++ b/site/versioning-policy.html @@ -250,7 +250,7 @@ available APIs. Hence, Spark 2.3.0 would generally be released about 6 months after 2.2.0. Maintenance releases happen as needed in between feature releases. Major releases do not happen according to a fixed schedule. -Spark 3.4 release window +Spark 3.5 release window @@ -261,15 +261,15 @@ in between feature releases. Major releases do not happen according to a fixed s - January 16th 2023 + July 16th 2023 Code freeze. Release branch cut. - Late January 2023 + Late July 2023 QA period. Focus on bug fixes, tests, stability and docs. Generally, no new features merged. - February 2023 + August 2023 Release candidates (RC), voting, etc. until final release passes diff --git a/versioning-policy.md b/versioning-policy.md index 153085259f..0f3892e8a2 100644 --- a/versioning-policy.md +++ b/versioning-policy.md @@ -103,13 +103,13 @@ The branch is cut every January and July, so feature ("minor") releases occur ab Hence, Spark 2.3.0 would generally be released about 6 months after 2.2.0. Maintenance releases happen as needed in between feature releases. Major releases do not happen according to a fixed schedule. -Spark 3.4 release window +Spark 3.5 release window | Date | Event | | - | - | -| January 16th 2023 | Code freeze. Release branch cut.| -| Late January 2023 | QA period. Focus on bug fixes, tests, stability and docs. Generally, no new features merged.| -| February 2023 | Release candidates (RC), voting, etc. until final release passes| +| July 16th 2023 | Code freeze. Release branch cut.| +| Late July 2023 | QA period. Focus on bug fixes, tests, stability and docs. Generally, no new features merged.| +| August 2023 | Release candidates (RC), voting, etc. until final release passes| Maintenance releases and EOL - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43412][PYTHON][CONNECT] Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow-optimized Python UDFs
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 32ab341071a [SPARK-43412][PYTHON][CONNECT] Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow-optimized Python UDFs 32ab341071a is described below commit 32ab341071aa69917f820baf5f61668c2455f1db Author: Xinrong Meng AuthorDate: Wed May 10 13:09:15 2023 -0700 [SPARK-43412][PYTHON][CONNECT] Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow-optimized Python UDFs ### What changes were proposed in this pull request? Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow-optimized Python UDFs. An EvalType is used to uniquely identify a UDF type in PySpark. ### Why are the changes needed? We are about to improve nested non-atomic input/output support of an Arrow-optimized Python UDF. However, currently, it shares the same EvalType with a pickled Python UDF, but the same implementation with a Pandas UDF. Introducing an EvalType enables isolating the changes to Arrow-optimized Python UDFs. The PR is also a pre-requisite for registering an Arrow-optimized Python UDF. ### Does this PR introduce _any_ user-facing change? No user-facing behavior/result changes for Arrow-optimized Python UDFs. An `evalType`, as an attribute mainly designed for internal use, is changed as shown below: ```py >>> udf(lambda x: str(x), useArrow=True).evalType == PythonEvalType.SQL_ARROW_BATCHED_UDF True # whereas >>> udf(lambda x: str(x), useArrow=False).evalType == PythonEvalType.SQL_BATCHED_UDF True ``` ### How was this patch tested? A new unit test `test_eval_type` and existing tests. Closes #41053 from xinrong-meng/evalTypeArrowPyUDF. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- .../main/scala/org/apache/spark/api/python/PythonRunner.scala| 2 ++ python/pyspark/rdd.py| 3 ++- python/pyspark/sql/_typing.pyi | 1 + python/pyspark/sql/connect/functions.py | 7 +-- python/pyspark/sql/connect/udf.py| 3 +-- python/pyspark/sql/functions.py | 6 +- python/pyspark/sql/pandas/functions.py | 3 +++ python/pyspark/sql/tests/test_arrow_python_udf.py| 9 + python/pyspark/sql/udf.py| 8 +++- python/pyspark/worker.py | 9 ++--- .../org/apache/spark/sql/catalyst/expressions/PythonUDF.scala| 1 + .../apache/spark/sql/execution/python/ExtractPythonUDFs.scala| 3 ++- 12 files changed, 32 insertions(+), 23 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala index 0b420f268ee..912e76005f0 100644 --- a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala +++ b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala @@ -44,6 +44,7 @@ private[spark] object PythonEvalType { val NON_UDF = 0 val SQL_BATCHED_UDF = 100 + val SQL_ARROW_BATCHED_UDF = 101 val SQL_SCALAR_PANDAS_UDF = 200 val SQL_GROUPED_MAP_PANDAS_UDF = 201 @@ -58,6 +59,7 @@ private[spark] object PythonEvalType { def toString(pythonEvalType: Int): String = pythonEvalType match { case NON_UDF => "NON_UDF" case SQL_BATCHED_UDF => "SQL_BATCHED_UDF" +case SQL_ARROW_BATCHED_UDF => "SQL_ARROW_BATCHED_UDF" case SQL_SCALAR_PANDAS_UDF => "SQL_SCALAR_PANDAS_UDF" case SQL_GROUPED_MAP_PANDAS_UDF => "SQL_GROUPED_MAP_PANDAS_UDF" case SQL_GROUPED_AGG_PANDAS_UDF => "SQL_GROUPED_AGG_PANDAS_UDF" diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py index 13f93fbdad6..e6ef7f6108e 100644 --- a/python/pyspark/rdd.py +++ b/python/pyspark/rdd.py @@ -110,7 +110,7 @@ if TYPE_CHECKING: ) from pyspark.sql.dataframe import DataFrame from pyspark.sql.types import AtomicType, StructType -from pyspark.sql._typing import AtomicValue, RowLike, SQLBatchedUDFType +from pyspark.sql._typing import AtomicValue, RowLike, SQLArrowBatchedUDFType, SQLBatchedUDFType from py4j.java_gateway import JavaObject from py4j.java_collections import JavaArray @@ -140,6 +140,7 @@ class PythonEvalType: NON_UDF: "NonUDFType" = 0 SQL_BATCHED_UDF: "SQLBatchedUDFType" = 100 +SQL_ARROW_BATCHED_UDF: "SQLArrowBatchedUDFType" = 101 SQL_SCALAR_PANDAS_UDF: "PandasScalarUDFType" = 200 SQL_GROUPED_MAP_PANDAS_UDF: "Pand
[spark] branch master updated: [SPARK-41971][SQL][PYTHON] Add a config for pandas conversion how to handle struct types
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 305aa4a89ef [SPARK-41971][SQL][PYTHON] Add a config for pandas conversion how to handle struct types 305aa4a89ef is described below commit 305aa4a89efe02f517f82039225a99b31b20146f Author: Takuya UESHIN AuthorDate: Thu May 4 11:01:28 2023 -0700 [SPARK-41971][SQL][PYTHON] Add a config for pandas conversion how to handle struct types ### What changes were proposed in this pull request? Adds a config for pandas conversion how to handle struct types. - `spark.sql.execution.pandas.structHandlingMode` (default: `"legacy"`) The conversion mode of struct type when creating pandas DataFrame. When `"legacy"`, the behavior is the same as before, except that with Arrow and Spark Connect will raise a more readable exception when there are duplicated nested field names. ```py >>> spark.sql("values (1, struct(1 as a, 2 as a)) as t(x, y)").toPandas() Traceback (most recent call last): ... pyspark.errors.exceptions.connect.UnsupportedOperationException: [DUPLICATED_FIELD_NAME_IN_ARROW_STRUCT] Duplicated field names in Arrow Struct are not allowed, got [a, a]. ``` When `"row"`, convert to Row object regardless of Arrow optimization. ```py >>> spark.conf.set('spark.sql.execution.pandas.structHandlingMode', 'row') >>> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', False) >>> spark.sql("values (1, struct(1 as a, 2 as b)) as t(x, y)").toPandas() x y 0 1 (1, 2) >>> spark.sql("values (1, struct(1 as a, 2 as a)) as t(x, y)").toPandas() x y 0 1 (1, 2) >>> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True) >>> spark.sql("values (1, struct(1 as a, 2 as b)) as t(x, y)").toPandas() x y 0 1 (1, 2) >>> spark.sql("values (1, struct(1 as a, 2 as a)) as t(x, y)").toPandas() x y 0 1 (1, 2) ``` When `"dict"`, convert to dict and use suffixed key names, e.g., `a_0`, `a_1`, if there are duplicated nested field names, regardless of Arrow optimization. ```py >>> spark.conf.set('spark.sql.execution.pandas.structHandlingMode', 'dict') >>> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', False) >>> spark.sql("values (1, struct(1 as a, 2 as b)) as t(x, y)").toPandas() x y 0 1 {'a': 1, 'b': 2} >>> spark.sql("values (1, struct(1 as a, 2 as a)) as t(x, y)").toPandas() x y 0 1 {'a_0': 1, 'a_1': 2} >>> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True) >>> spark.sql("values (1, struct(1 as a, 2 as b)) as t(x, y)").toPandas() x y 0 1 {'a': 1, 'b': 2} >>> spark.sql("values (1, struct(1 as a, 2 as a)) as t(x, y)").toPandas() x y 0 1 {'a_0': 1, 'a_1': 2} ``` ### Why are the changes needed? Currently there are three behaviors when `df.toPandas()` with nested struct types: - vanilla PySpark with Arrow optimization disabled ```py >>> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', False) >>> spark.sql("values (1, struct(1 as a, 2 as b)) as t(x, y)").toPandas() x y 0 1 (1, 2) ``` using `Row` object for struct types. It can use duplicated field names. ```py >>> spark.sql("values (1, struct(1 as a, 2 as a)) as t(x, y)").toPandas() x y 0 1 (1, 2) ``` - vanilla PySpark with Arrow optimization enabled ```py >>> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True) >>> spark.sql("values (1, struct(1 as a, 2 as b)) as t(x, y)").toPandas() x y 0 1 {'a': 1, 'b': 2} ``` using `dict` for struct types. It raises an Exception when there are duplicated nested field names: ```py >>> spark.sql("values (1, struct(1 as a, 2 as a)) as t(x, y)").toPandas() Traceback (most recent call last): ... pyarrow.lib.ArrowInvalid: Ran out of field metadata, likely malformed ``` - Spark C
[spark] branch master updated (8711c1a6ad9 -> 5cb7e6ffd91)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 8711c1a6ad9 [SPARK-42945][CONNECT][FOLLOW-UP] Add user_id and session_id when logging errors add 5cb7e6ffd91 [SPARK-43032][CONNECT][SS] Add Streaming query manager No new revisions were added by this update. Summary of changes: .../src/main/protobuf/spark/connect/base.proto | 3 + .../src/main/protobuf/spark/connect/commands.proto | 50 - .../sql/connect/planner/SparkConnectPlanner.scala | 76 ++- .../sql/connect/service/SparkConnectService.scala | 8 +- python/pyspark/sql/connect/client.py | 3 + python/pyspark/sql/connect/proto/base_pb2.py | 124 ++-- python/pyspark/sql/connect/proto/base_pb2.pyi | 13 ++ python/pyspark/sql/connect/proto/commands_pb2.py | 210 +-- python/pyspark/sql/connect/proto/commands_pb2.pyi | 225 + python/pyspark/sql/connect/session.py | 9 +- python/pyspark/sql/connect/streaming/__init__.py | 1 + python/pyspark/sql/connect/streaming/query.py | 87 +++- python/pyspark/sql/connect/streaming/readwriter.py | 3 +- .../connect/streaming/test_parity_streaming.py | 27 +-- .../sql/tests/connect/test_connect_basic.py| 1 - .../test_parity_pandas_grouped_map_with_state.py | 6 +- 16 files changed, 673 insertions(+), 173 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r61288 - in /dev/spark: v3.2.4-rc1-docs/ v3.4.0-rc7-docs/
Author: xinrong Date: Fri Apr 14 20:31:17 2023 New Revision: 61288 Log: Removing RC artifacts. Removed: dev/spark/v3.2.4-rc1-docs/ dev/spark/v3.4.0-rc7-docs/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Fix the download page of Spark 3.4.0
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 624de69568 Fix the download page of Spark 3.4.0 624de69568 is described below commit 624de69568e5c743206a63cfc49d8647e41e1167 Author: Gengliang Wang AuthorDate: Fri Apr 14 13:03:59 2023 -0700 Fix the download page of Spark 3.4.0 Currently it shows 3.3.2 on top https://user-images.githubusercontent.com/1097932/232143660-9d97a7c0-5eb0-44af-9f06-41cb6386a2dd.png";> After fix: https://user-images.githubusercontent.com/1097932/232143685-ee5b06e6-3af9-43ea-8690-209f4d8cd25f.png";> Author: Gengliang Wang Closes #451 from gengliangwang/fixDownload. --- js/downloads.js | 2 +- site/js/downloads.js | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/js/downloads.js b/js/downloads.js index 915b9c8809..9781273310 100644 --- a/js/downloads.js +++ b/js/downloads.js @@ -25,9 +25,9 @@ var packagesV12 = [hadoop3p3, hadoop3p3scala213, hadoop2p7, hadoopFree, sources] // 3.3.0+ var packagesV13 = [hadoop3p, hadoop3pscala213, hadoop2p, hadoopFree, sources]; +addRelease("3.4.0", new Date("04/13/2023"), packagesV13, true); addRelease("3.3.2", new Date("02/17/2023"), packagesV13, true); addRelease("3.2.4", new Date("04/13/2023"), packagesV12, true); -addRelease("3.4.0", new Date("04/13/2023"), packagesV13, true); function append(el, contents) { el.innerHTML += contents; diff --git a/site/js/downloads.js b/site/js/downloads.js index 915b9c8809..9781273310 100644 --- a/site/js/downloads.js +++ b/site/js/downloads.js @@ -25,9 +25,9 @@ var packagesV12 = [hadoop3p3, hadoop3p3scala213, hadoop2p7, hadoopFree, sources] // 3.3.0+ var packagesV13 = [hadoop3p, hadoop3pscala213, hadoop2p, hadoopFree, sources]; +addRelease("3.4.0", new Date("04/13/2023"), packagesV13, true); addRelease("3.3.2", new Date("02/17/2023"), packagesV13, true); addRelease("3.2.4", new Date("04/13/2023"), packagesV12, true); -addRelease("3.4.0", new Date("04/13/2023"), packagesV13, true); function append(el, contents) { el.innerHTML += contents; - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r61281 - in /dev/spark/v3.4.0-rc7-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api
Author: xinrong Date: Fri Apr 14 18:58:10 2023 New Revision: 61281 Log: Apache Spark v3.4.0-rc7 docs [This commit notification would consist of 2789 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r61236 - in /dev/spark: v3.4.0-rc1-bin/ v3.4.0-rc1-docs/ v3.4.0-rc2-bin/ v3.4.0-rc2-docs/ v3.4.0-rc3-bin/ v3.4.0-rc3-docs/ v3.4.0-rc4-bin/ v3.4.0-rc4-docs/ v3.4.0-rc5-bin/ v3.4.0-rc5-docs/
Author: xinrong Date: Thu Apr 13 19:33:23 2023 New Revision: 61236 Log: Removing RC artifacts. Removed: dev/spark/v3.4.0-rc1-bin/ dev/spark/v3.4.0-rc1-docs/ dev/spark/v3.4.0-rc2-bin/ dev/spark/v3.4.0-rc2-docs/ dev/spark/v3.4.0-rc3-bin/ dev/spark/v3.4.0-rc3-docs/ dev/spark/v3.4.0-rc4-bin/ dev/spark/v3.4.0-rc4-docs/ dev/spark/v3.4.0-rc5-bin/ dev/spark/v3.4.0-rc5-docs/ dev/spark/v3.4.0-rc6-bin/ dev/spark/v3.4.0-rc6-docs/ dev/spark/v3.4.0-rc7-docs/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r61125 - in /dev/spark/v3.4.0-rc7-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api
Author: xinrong Date: Fri Apr 7 19:11:49 2023 New Revision: 61125 Log: Apache Spark v3.4.0-rc7 docs [This commit notification would consist of 2789 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r61114 - /dev/spark/v3.4.0-rc7-bin/
Author: xinrong Date: Fri Apr 7 02:45:25 2023 New Revision: 61114 Log: Apache Spark v3.4.0-rc7 Added: dev/spark/v3.4.0-rc7-bin/ dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz (with props) dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz.asc dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz.sha512 dev/spark/v3.4.0-rc7-bin/pyspark-3.4.0.tar.gz (with props) dev/spark/v3.4.0-rc7-bin/pyspark-3.4.0.tar.gz.asc dev/spark/v3.4.0-rc7-bin/pyspark-3.4.0.tar.gz.sha512 dev/spark/v3.4.0-rc7-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz (with props) dev/spark/v3.4.0-rc7-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.asc dev/spark/v3.4.0-rc7-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.sha512 dev/spark/v3.4.0-rc7-bin/spark-3.4.0-bin-hadoop3.tgz (with props) dev/spark/v3.4.0-rc7-bin/spark-3.4.0-bin-hadoop3.tgz.asc dev/spark/v3.4.0-rc7-bin/spark-3.4.0-bin-hadoop3.tgz.sha512 dev/spark/v3.4.0-rc7-bin/spark-3.4.0-bin-without-hadoop.tgz (with props) dev/spark/v3.4.0-rc7-bin/spark-3.4.0-bin-without-hadoop.tgz.asc dev/spark/v3.4.0-rc7-bin/spark-3.4.0-bin-without-hadoop.tgz.sha512 dev/spark/v3.4.0-rc7-bin/spark-3.4.0.tgz (with props) dev/spark/v3.4.0-rc7-bin/spark-3.4.0.tgz.asc dev/spark/v3.4.0-rc7-bin/spark-3.4.0.tgz.sha512 Added: dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz.asc == --- dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz.asc (added) +++ dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz.asc Fri Apr 7 02:45:25 2023 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQvg1kTHHhpbnJvbmdA +YXBhY2hlLm9yZwAKCRCn5XkIx6ThsXF3D/9DJKcP/8+T/T2cddS049hOxspKDbm2 +Q1oIy04RZ1KllpeZtZVxpUCy7vE7F2srNjFrZ3OMY76/DeyBdwUBLGbrpA51FBRy +RmVM2x9Z9zj2rhfWK02IqC9a7RueMif15UwIGQSCEsS3H5ep3eHR2O4Vqof42rpj +Qf8hTqRC3y6OPxKS/kyhwof3CtzSe5TzmGeQ8GLlsr1cOQ1K8V6tRv4L4xtqYKlx +NA0ekUWKMylVzNj7AxdoWUpRCJyy+GbzT8PKp53imwaUjVp3FU8F3yZTd3kj9rxY +aNY5pWVTj2930gqDKHnJcGs3jq39GfjKu1hKMN+XAwmJEi//I2W96xvbEjoBxEh3 +SES5oyPLGCUHhWPFB+wsw3hD3JelJKI7X7KLdOl5KTccECbTIxm141zv/tB3RNRE +07DmCYiVrvsi5+CTngbXCcJVG0PZJ59vlSE58bYLe0cafKjRXMHWX1YT+YeeES4m +jWhU9PClnAnS4Z7uCrmcI9/nXFiavNkSdp2yRLfS4Eew1Mtavd49exk68NrVhKBs +VY1h3Sl1NY7UfcaWtUrCng8bCyHbWNIwoZ8yNJaDXKbvKyxTX88T+x4ulysyB6Xo +7bAnx1KlrZBaVRG/iE6dnLglokW7dbBoE09QcBDslPjbfTSX8ldaPSGKK1Bwe/D3 +1nb+LTsY6sZNQg== +=s5Gu +-END PGP SIGNATURE- Added: dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz.sha512 == --- dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz.sha512 (added) +++ dev/spark/v3.4.0-rc7-bin/SparkR_3.4.0.tar.gz.sha512 Fri Apr 7 02:45:25 2023 @@ -0,0 +1 @@ +4c7c4fa6018cb000dca30a34b5d963b30334be7230161a4f01b5c3192a3f87b0a54030b08f9bbfd58a3a796fb4bb7c607c1ba757f303f4da21e0f50dbce77b94 SparkR_3.4.0.tar.gz Added: dev/spark/v3.4.0-rc7-bin/pyspark-3.4.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.4.0-rc7-bin/pyspark-3.4.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.4.0-rc7-bin/pyspark-3.4.0.tar.gz.asc == --- dev/spark/v3.4.0-rc7-bin/pyspark-3.4.0.tar.gz.asc (added) +++ dev/spark/v3.4.0-rc7-bin/pyspark-3.4.0.tar.gz.asc Fri Apr 7 02:45:25 2023 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQvg1sTHHhpbnJvbmdA +YXBhY2hlLm9yZwAKCRCn5XkIx6ThsWuEEACDllGUxRb6YQEI2/pjfmRjtRo3WFMy +SZzNTkBaGmJii8wB9I9OuUtJ5k1MCHSjWW6cwMZ4JJb8BDZ2ZVDy/66tZoKAIK7u +aUvbjcF8IpudwqTTn0VBfQVsLzE9G4clEoGFJpeCvg61+CqY1sxkhtg6FFMTgyhb +aMZOlz9uvEnYoYXoM0ZU4aLNAxclnhmE42+5j1MF3aiSR3Q/WaZEx/ECcEF4XhE1 +Q+53AmvnPm6yFFcqRQd023xWMnP6Y1zBBLnp2GZ2/SzCUkJrfvdueCDiOaiFrdnO +Jrf45ZBMaOcloy/tGSKl/ykjjYKEUVk980Y6guC63Nym+sf19Da8eD2AqQSxxLiQ +4tLH8owFHP4tr4C4MmfVD3R1HyNFk97scRDjCrCA0wMGLy9B3oSbE0yoRDRxZyei +dT7y2OsGYQ7bSV1+sV6uQB59QarxBINOrl5nH/L8qz+H7tWA/UMCHCmlSyuYc/m4 +D0IMj4cDrpbahVN1dQelDOwO+pmMrlXMXkA4HAwJPQd5V0wcGWJWYlEz4FeoGr+0 +BkuNdngw21NnwH8ebbW2KbdNe235yfNfXK+pVQq5NerUKBuBpzM73AqI3idjFzTd +pgeYrmbUMQxgPKZgZ/Fm025fwxW6e1z9aJdPJOa1baXT1gaiUtalzok7/En2t/Wj +48RFugofvd1TGA== +=s7ET +-END PGP SIGNATURE- Added: dev/spark/v3.4.0-rc7-bin/pyspark-3.4.0.tar.gz.sha512 == --- dev/spark/v3.4.0-rc7-bin/pyspark-3.4.0.tar.gz.sha512 (added) +++ dev/spark
[spark] 01/01: Preparing development version 3.4.1-SNAPSHOT
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git commit e4eea55d0a2ef7a8b8a44994750fdfd383517944 Author: Xinrong Meng AuthorDate: Fri Apr 7 01:28:49 2023 + Preparing development version 3.4.1-SNAPSHOT --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/connect/common/pom.xml | 2 +- connector/connect/server/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 43 files changed, 45 insertions(+), 45 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 4a32762b34c..fa7028630a8 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.4.0 +Version: 3.4.1 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: diff --git a/assembly/pom.xml b/assembly/pom.xml index c58da7aa112..b86fee4bceb 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 95ea15552da..f9ecfb3d692 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index e4d98471bf9..22ee65b7d25 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 7a6d5aedf65..2c67da81ca4 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 1c421754083..219682e047d 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch
[spark] branch branch-3.4 updated (b2ff4c4f7ec -> e4eea55d0a2)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git from b2ff4c4f7ec [SPARK-39696][CORE] Fix data race in access to TaskMetrics.externalAccums add 87a5442f7ed Preparing Spark release v3.4.0-rc7 new e4eea55d0a2 Preparing development version 3.4.1-SNAPSHOT The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing Spark release v3.4.0-rc7
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to tag v3.4.0-rc7 in repository https://gitbox.apache.org/repos/asf/spark.git commit 87a5442f7ed96b11051d8a9333476d080054e5a0 Author: Xinrong Meng AuthorDate: Fri Apr 7 01:28:44 2023 + Preparing Spark release v3.4.0-rc7 --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/connect/common/pom.xml | 2 +- connector/connect/server/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 43 files changed, 45 insertions(+), 45 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index fa7028630a8..4a32762b34c 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.4.1 +Version: 3.4.0 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: diff --git a/assembly/pom.xml b/assembly/pom.xml index b86fee4bceb..c58da7aa112 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index f9ecfb3d692..95ea15552da 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 22ee65b7d25..e4d98471bf9 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 2c67da81ca4..7a6d5aedf65 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 219682e047d..1c421754083 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index 22ce7
[spark] tag v3.4.0-rc7 created (now 87a5442f7ed)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to tag v3.4.0-rc7 in repository https://gitbox.apache.org/repos/asf/spark.git at 87a5442f7ed (commit) This tag includes the following new commits: new 87a5442f7ed Preparing Spark release v3.4.0-rc7 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r61110 - in /dev/spark/v3.4.0-rc6-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api
Author: xinrong Date: Thu Apr 6 19:21:26 2023 New Revision: 61110 Log: Apache Spark v3.4.0-rc6 docs [This commit notification would consist of 2789 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r61108 - /dev/spark/v3.4.0-rc6-bin/
Author: xinrong Date: Thu Apr 6 17:58:16 2023 New Revision: 61108 Log: Apache Spark v3.4.0-rc6 Added: dev/spark/v3.4.0-rc6-bin/ dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz (with props) dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz.asc dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz.sha512 dev/spark/v3.4.0-rc6-bin/pyspark-3.4.0.tar.gz (with props) dev/spark/v3.4.0-rc6-bin/pyspark-3.4.0.tar.gz.asc dev/spark/v3.4.0-rc6-bin/pyspark-3.4.0.tar.gz.sha512 dev/spark/v3.4.0-rc6-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz (with props) dev/spark/v3.4.0-rc6-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.asc dev/spark/v3.4.0-rc6-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.sha512 dev/spark/v3.4.0-rc6-bin/spark-3.4.0-bin-hadoop3.tgz (with props) dev/spark/v3.4.0-rc6-bin/spark-3.4.0-bin-hadoop3.tgz.asc dev/spark/v3.4.0-rc6-bin/spark-3.4.0-bin-hadoop3.tgz.sha512 dev/spark/v3.4.0-rc6-bin/spark-3.4.0-bin-without-hadoop.tgz (with props) dev/spark/v3.4.0-rc6-bin/spark-3.4.0-bin-without-hadoop.tgz.asc dev/spark/v3.4.0-rc6-bin/spark-3.4.0-bin-without-hadoop.tgz.sha512 dev/spark/v3.4.0-rc6-bin/spark-3.4.0.tgz (with props) dev/spark/v3.4.0-rc6-bin/spark-3.4.0.tgz.asc dev/spark/v3.4.0-rc6-bin/spark-3.4.0.tgz.sha512 Added: dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz.asc == --- dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz.asc (added) +++ dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz.asc Thu Apr 6 17:58:16 2023 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQvB8kTHHhpbnJvbmdA +YXBhY2hlLm9yZwAKCRCn5XkIx6ThsToQEACqdYF76eiLZgfskKs8ZpVroBu6FV0l +kT6CPB72l1l1vrfSDa9BbYW/Ty0QB0/t2ZV74p1avk5w/qyM6Otg7Gtkx3qFBMZw +YIcMUFdeeXYc8hiOLFqoTHfdQVzvJNaoXofbfZAOcEOR4cRhofXPsgRYGQK8ZJwQ +2Ek9a6KKUzn8bWfS2v+Z/bjLfArZ0QP2/qs9qdghsJqfhS6vGvFz9H45vfzpJyGw +WdRQIRdmGvsxX9cyOG6QJv9Aq7MuT+hDBM0H/yip3wppEKSjIByj0MqapnuUrkML +06SeK3fVx/sy9UzEHKWZKGDDiqlx5TCCaGC44N/+yiytmtrB3RxKhSiFy4G2s41+ +fqkMVgA3tbR2zIea/FJHYo7iO4YZMKN9YmXYFFZzARcwZgUVbyDvoLg07Rsww921 +FcoPYiUsFmA7Eb1vyp0HWmXYqwqSkuRujLkf4LkpX1JiRh0I2EEThPQ042nN+trN +2iW35q9WCOJVbcdLcMv6KVP3Ipa6A9BGc4bvd+cmi7P9Fv8zgboDbIV8XiC45dRb +v1C8NZ9Zca8V3XAdy+nds8fJW1Bvc6O12ch8MtMauV4TH22rTfmWBuVABsglQQlG +c8sCLSOdRo1k80pBFZFg4ZFMFs/NjNa0PDtD8hZIhJEk24AaxCLQT/YlyUu9flqp +37JM51CLEIL+xA== +=2jm9 +-END PGP SIGNATURE- Added: dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz.sha512 == --- dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz.sha512 (added) +++ dev/spark/v3.4.0-rc6-bin/SparkR_3.4.0.tar.gz.sha512 Thu Apr 6 17:58:16 2023 @@ -0,0 +1 @@ +2101b54ecaf72f6175a7e83651bee5cd2ca292ecb54f59461d7edb535d03130c9019eaa502912562d5aa1d5cec54f0eed3f862b51b6467b5beb6fcf09945b7b4 SparkR_3.4.0.tar.gz Added: dev/spark/v3.4.0-rc6-bin/pyspark-3.4.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.4.0-rc6-bin/pyspark-3.4.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.4.0-rc6-bin/pyspark-3.4.0.tar.gz.asc == --- dev/spark/v3.4.0-rc6-bin/pyspark-3.4.0.tar.gz.asc (added) +++ dev/spark/v3.4.0-rc6-bin/pyspark-3.4.0.tar.gz.asc Thu Apr 6 17:58:16 2023 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQvB8sTHHhpbnJvbmdA +YXBhY2hlLm9yZwAKCRCn5XkIx6ThsQb0D/4072E3g9OGkQjzd1KuzEoY3bk79eRC +l6qohjnUlhP4j1l/sqNdtiS1hz1g5PgDpJlxlj20st3A9P7A0bOZN3r4BsMd38nA ++D0xIjdgqhax3XZVHhETudPwKWyboWM+Id32cuiJifGYPz9MnJBkTFQMxlZWz7Ny +hbwNC2H435anO1BGiuyiUaFztfoOJ5aMZZaQHfXTAszwm7KJhkpZP1NC0YVdklhI +71id0OYNziIIkYLJpSAlzQk2RLvR8Ok9NyELSOc6AzQ5tmLIVLWFVb9tfH69cYo8 +DHOEQqD4KdwDsb030lvXbQ4n6blns1b+i7gOdWzr6a/sQd1TwGq2SDkYlcQ++8/W +HU7+9C9Oula/RpzYcvPiWnneoAjN7zZgJfYm8aCEP62mCH/eQVJePDBnRLQUTtWD +gbBIId/qFXDYi1DOmFzz6Awh/EGA04TrnBbKqVSPC9g6p3VQCoUTNKwVKLCyvQx5 +QxbtpP7FjSxdB4TQAiDyo0U/o6b6AEx+wz43G14sv9gD3wNK8wtIBbh2PMrQuL0M +7QSgFwVkp6vLmRjsrSslrxW8zqbfc0HkrTSNnV2odtRcv0ZsAEikWMki68cnkjbC +GPFiUxjlNz1yMRrG/3dnmfHOvnlt84HtzUzxObxVO2xXgjSV5mEG94hywTEHeTA1 +dceim2kd4JjGSg== +=0nLo +-END PGP SIGNATURE- Added: dev/spark/v3.4.0-rc6-bin/pyspark-3.4.0.tar.gz.sha512 == --- dev/spark/v3.4.0-rc6-bin/pyspark-3.4.0.tar.gz.sha512 (added) +++ dev/spark
[spark] 01/01: Preparing development version 3.4.1-SNAPSHOT
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git commit 1d974a7a78fa2a4d688d5e8606dcd084ab08b220 Author: Xinrong Meng AuthorDate: Thu Apr 6 16:38:33 2023 + Preparing development version 3.4.1-SNAPSHOT --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/connect/common/pom.xml | 2 +- connector/connect/server/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 43 files changed, 45 insertions(+), 45 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 4a32762b34c..fa7028630a8 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.4.0 +Version: 3.4.1 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: diff --git a/assembly/pom.xml b/assembly/pom.xml index c58da7aa112..b86fee4bceb 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 95ea15552da..f9ecfb3d692 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index e4d98471bf9..22ee65b7d25 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 7a6d5aedf65..2c67da81ca4 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 1c421754083..219682e047d 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch
[spark] branch branch-3.4 updated (90376424779 -> 1d974a7a78f)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git from 90376424779 [SPARK-43041][SQL] Restore constructors of exceptions for compatibility in connector API add 28d0723beb3 Preparing Spark release v3.4.0-rc6 new 1d974a7a78f Preparing development version 3.4.1-SNAPSHOT The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing Spark release v3.4.0-rc6
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to tag v3.4.0-rc6 in repository https://gitbox.apache.org/repos/asf/spark.git commit 28d0723beb3579c17df84bb22c98a487d7a72023 Author: Xinrong Meng AuthorDate: Thu Apr 6 16:38:28 2023 + Preparing Spark release v3.4.0-rc6 --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/connect/common/pom.xml | 2 +- connector/connect/server/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 43 files changed, 45 insertions(+), 45 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index fa7028630a8..4a32762b34c 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.4.1 +Version: 3.4.0 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: diff --git a/assembly/pom.xml b/assembly/pom.xml index b86fee4bceb..c58da7aa112 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index f9ecfb3d692..95ea15552da 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 22ee65b7d25..e4d98471bf9 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 2c67da81ca4..7a6d5aedf65 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 219682e047d..1c421754083 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index 22ce7
[spark] tag v3.4.0-rc6 created (now 28d0723beb3)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to tag v3.4.0-rc6 in repository https://gitbox.apache.org/repos/asf/spark.git at 28d0723beb3 (commit) This tag includes the following new commits: new 28d0723beb3 Preparing Spark release v3.4.0-rc6 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r60926 - in /dev/spark/v3.4.0-rc5-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api
Author: xinrong Date: Thu Mar 30 04:53:57 2023 New Revision: 60926 Log: Apache Spark v3.4.0-rc5 docs [This commit notification would consist of 2789 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r60925 - /dev/spark/v3.4.0-rc5-bin/
Author: xinrong Date: Thu Mar 30 03:39:03 2023 New Revision: 60925 Log: Apache Spark v3.4.0-rc5 Added: dev/spark/v3.4.0-rc5-bin/ dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz (with props) dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz.asc dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz.sha512 dev/spark/v3.4.0-rc5-bin/pyspark-3.4.0.tar.gz (with props) dev/spark/v3.4.0-rc5-bin/pyspark-3.4.0.tar.gz.asc dev/spark/v3.4.0-rc5-bin/pyspark-3.4.0.tar.gz.sha512 dev/spark/v3.4.0-rc5-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz (with props) dev/spark/v3.4.0-rc5-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.asc dev/spark/v3.4.0-rc5-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.sha512 dev/spark/v3.4.0-rc5-bin/spark-3.4.0-bin-hadoop3.tgz (with props) dev/spark/v3.4.0-rc5-bin/spark-3.4.0-bin-hadoop3.tgz.asc dev/spark/v3.4.0-rc5-bin/spark-3.4.0-bin-hadoop3.tgz.sha512 dev/spark/v3.4.0-rc5-bin/spark-3.4.0-bin-without-hadoop.tgz (with props) dev/spark/v3.4.0-rc5-bin/spark-3.4.0-bin-without-hadoop.tgz.asc dev/spark/v3.4.0-rc5-bin/spark-3.4.0-bin-without-hadoop.tgz.sha512 dev/spark/v3.4.0-rc5-bin/spark-3.4.0.tgz (with props) dev/spark/v3.4.0-rc5-bin/spark-3.4.0.tgz.asc dev/spark/v3.4.0-rc5-bin/spark-3.4.0.tgz.sha512 Added: dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz.asc == --- dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz.asc (added) +++ dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz.asc Thu Mar 30 03:39:03 2023 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQlA+wTHHhpbnJvbmdA +YXBhY2hlLm9yZwAKCRCn5XkIx6Thsdo/D/9CLT5v+RVNTX0mmZq501F205cDUan+ +tiC/G2ddtGfSLcRAWeWqoDFWOkeupwEqtKMoqQGnElXM7qVF2miBfcohBxm3151l +UBJD6paLgSrI2omxxqBNTB265BbojbmQcZx5UjHzO/opVahllET/7RXI6I8k/gsC +hpoSJe77SHPXsLQpSFPaxct7Qy6IwwLq8yvVZIFlrYgjqvWBa3zsnqb4T6W859lb +uiAAWJTJ0xQPF/u9TmXM8a9vFRfo3rXuttW8W7wKlHQjZgDJpNSJyQCaVmWYUssM +2nzrfiwy7/E5wGzFsdxzO8lOlyeA6Cdmhwo8G5xcZnjNt9032DrAYFdo5rIoim9v +irsqWyOJ5XclUOWpxKpXdYPcQGpEW74vUBymAW5P6jt0Yi2/3qvZSiwh1qceJ8Fo +nut0HUWIFkohDoattkCjoA1yconcJd4+FuoDxrCX+QWAlchgR4eijMWfYCyH/7LX +SucOJOK80psdGnZGuecuRjCzhvnbPjjNjS3dYMrudLlgxHyb2ahjeHXpVyDjI/O6 +AwUmJtUEGHk0Ypa8OHlgzB8UUaZRQDIiwL8j8tlIHYMt+VdQLUtvyK+hqe45It6F +OAlocOnign7Ej/9EGyJfKXX0gZr6NmkuANWggPRIrIs1NSnqz4bDWQRGwVOkpb7x +NOdLdMoi6QMC0A== +=H+Kf +-END PGP SIGNATURE- Added: dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz.sha512 == --- dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz.sha512 (added) +++ dev/spark/v3.4.0-rc5-bin/SparkR_3.4.0.tar.gz.sha512 Thu Mar 30 03:39:03 2023 @@ -0,0 +1 @@ +c3086edefab6656535e234fd11d0a2a4d4c6ede97b85f94801d06064bd89c6f58196714e335e92ffd2ac83c82714ad8a9a51165621ecff194af290c1eb537ef2 SparkR_3.4.0.tar.gz Added: dev/spark/v3.4.0-rc5-bin/pyspark-3.4.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.4.0-rc5-bin/pyspark-3.4.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.4.0-rc5-bin/pyspark-3.4.0.tar.gz.asc == --- dev/spark/v3.4.0-rc5-bin/pyspark-3.4.0.tar.gz.asc (added) +++ dev/spark/v3.4.0-rc5-bin/pyspark-3.4.0.tar.gz.asc Thu Mar 30 03:39:03 2023 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQlA+4THHhpbnJvbmdA +YXBhY2hlLm9yZwAKCRCn5XkIx6Thsb0pEACXvrvU/8Xh7ns7J8RtV/Wmf4oMu9Mk +i6G8JwBUTS1kqRe9Xb1g3GJxNil8HTta1yNKgjvkTDc6EXIYrtQD4PpL6cuumckW +0+itx9dih22OcvfN6sJNizAtRoTcpXx7UHq00dAjzHHbOv0dwGqnjKRU3UUQ/XnY +RjT3kM4isf95TzAmEFwsXNSzkUY0+EzDgfhnDAwb60nzTzZ2bEiZnLP1JC2iScDI +jSXMoWtZTaJz51bssKzzXpVmrwBxLDgSPlDM5KVmeD+WQMqS7Hk51bSikSEW1X39 +CO7hEXw+SYLQB5yKaqu03diErTOWmP6aJ8tbHCPWNrs3JMJkm4/Cj6Sc2JOktixO +Ns8Pc82kpnvG0eWCMXwihZa7pxnq59ByZsxYAfmcIdf4q02VJNetFjplgXAs2jjy +n9UZ6l8ZrCjUW2/AB3TSSibXLXMvuI6PLSYnKY9IP0t0dqxnBIKkACTx8qBA/o+I +0n02LBJCD8ZPJvHpI2MGlaFGftbQx4LUXX4CFlAz+RI9iizCbpjrDYFzvXBEY7ri +46i5uL+sHkP6Uj/8fNJ3QRhggb19i0NajzofSs5vNsVk2qHjHokIjG/kOkpCfBzC +6rM5zd/OyQNZmbHThlOjAdEvTSgasXb/5uHpwWDHbTlPGJYMZOWzuBdDSfBlHW/t +56VKCDfYO11shA== +=a3bs +-END PGP SIGNATURE- Added: dev/spark/v3.4.0-rc5-bin/pyspark-3.4.0.tar.gz.sha512 == --- dev/spark/v3.4.0-rc5-bin/pyspark-3.4.0.tar.gz.sha512 (added) +++ dev/spark
[spark] 01/01: Preparing development version 3.4.1-SNAPSHOT
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git commit 6a6f50444d43af24773ecc158aa127027f088288 Author: Xinrong Meng AuthorDate: Thu Mar 30 02:18:32 2023 + Preparing development version 3.4.1-SNAPSHOT --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/connect/common/pom.xml | 2 +- connector/connect/server/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 43 files changed, 45 insertions(+), 45 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 4a32762b34c..fa7028630a8 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.4.0 +Version: 3.4.1 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: diff --git a/assembly/pom.xml b/assembly/pom.xml index c58da7aa112..b86fee4bceb 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 95ea15552da..f9ecfb3d692 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index e4d98471bf9..22ee65b7d25 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 7a6d5aedf65..2c67da81ca4 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 1c421754083..219682e047d 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch
[spark] branch branch-3.4 updated (ce36692eeee -> 6a6f50444d4)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git from ce36692 [SPARK-42631][CONNECT][FOLLOW-UP] Expose Column.expr to extensions add f39ad617d32 Preparing Spark release v3.4.0-rc5 new 6a6f50444d4 Preparing development version 3.4.1-SNAPSHOT The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] tag v3.4.0-rc5 created (now f39ad617d32)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to tag v3.4.0-rc5 in repository https://gitbox.apache.org/repos/asf/spark.git at f39ad617d32 (commit) This tag includes the following new commits: new f39ad617d32 Preparing Spark release v3.4.0-rc5 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing Spark release v3.4.0-rc5
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to tag v3.4.0-rc5 in repository https://gitbox.apache.org/repos/asf/spark.git commit f39ad617d32a671e120464e4a75986241d72c487 Author: Xinrong Meng AuthorDate: Thu Mar 30 02:18:27 2023 + Preparing Spark release v3.4.0-rc5 --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/connect/common/pom.xml | 2 +- connector/connect/server/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 43 files changed, 45 insertions(+), 45 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index fa7028630a8..4a32762b34c 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.4.1 +Version: 3.4.0 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: diff --git a/assembly/pom.xml b/assembly/pom.xml index b86fee4bceb..c58da7aa112 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index f9ecfb3d692..95ea15552da 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 22ee65b7d25..e4d98471bf9 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 2c67da81ca4..7a6d5aedf65 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 219682e047d..1c421754083 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml inde
[spark] branch branch-3.4 updated (b74f7922577 -> 3122d4f4c76)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git from b74f7922577 [SPARK-42861][SQL] Use private[sql] instead of protected[sql] to avoid generating API doc add 3122d4f4c76 [SPARK-42891][CONNECT][PYTHON][3.4] Implement CoGrouped Map API No new revisions were added by this update. Summary of changes: .../main/protobuf/spark/connect/relations.proto| 18 ++ .../sql/connect/planner/SparkConnectPlanner.scala | 22 ++ dev/sparktestsupport/modules.py| 1 + python/pyspark/sql/connect/_typing.py | 2 + python/pyspark/sql/connect/group.py| 49 +++- python/pyspark/sql/connect/plan.py | 40 python/pyspark/sql/connect/proto/relations_pb2.py | 250 +++-- python/pyspark/sql/connect/proto/relations_pb2.pyi | 80 +++ python/pyspark/sql/pandas/group_ops.py | 9 + .../sql/tests/connect/test_connect_basic.py| 5 +- ..._map.py => test_parity_pandas_cogrouped_map.py} | 54 ++--- .../sql/tests/pandas/test_pandas_cogrouped_map.py | 6 +- 12 files changed, 374 insertions(+), 162 deletions(-) copy python/pyspark/sql/tests/connect/{test_parity_pandas_grouped_map.py => test_parity_pandas_cogrouped_map.py} (61%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.4 updated: [SPARK-42864][ML][3.4] Make `IsotonicRegression.PointsAccumulator` private
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 5222cfd58a7 [SPARK-42864][ML][3.4] Make `IsotonicRegression.PointsAccumulator` private 5222cfd58a7 is described below commit 5222cfd58a717fec7a025fdf4dfcde0bb4daf80c Author: Ruifeng Zheng AuthorDate: Tue Mar 21 12:55:44 2023 +0800 [SPARK-42864][ML][3.4] Make `IsotonicRegression.PointsAccumulator` private ### What changes were proposed in this pull request? Make `IsotonicRegression.PointsAccumulator` private, which was introduced in https://github.com/apache/spark/commit/3d05c7e037eff79de8ef9f6231aca8340bcc65ef ### Why are the changes needed? `PointsAccumulator` is implementation details, should not be exposed ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing UT Closes #40500 from zhengruifeng/isotonicRegression_private. Authored-by: Ruifeng Zheng Signed-off-by: Xinrong Meng --- .../org/apache/spark/mllib/regression/IsotonicRegression.scala | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala b/mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala index fbf0dc9c357..12a78ef4ec1 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala @@ -331,7 +331,7 @@ class IsotonicRegression private (private var isotonic: Boolean) extends Seriali if (cleanInput.length <= 1) { cleanInput } else { - val pointsAccumulator = new IsotonicRegression.PointsAccumulator + val pointsAccumulator = new PointsAccumulator // Go through input points, merging all points with equal feature values into a single point. // Equality of features is defined by shouldAccumulate method. The label of the accumulated @@ -490,15 +490,13 @@ class IsotonicRegression private (private var isotonic: Boolean) extends Seriali .sortBy(_._2) poolAdjacentViolators(parallelStepResult) } -} -object IsotonicRegression { /** * Utility class, holds a buffer of all points with unique features so far, and performs * weighted sum accumulation of points. Hides these details for better readability of the * main algorithm. */ - class PointsAccumulator { + private class PointsAccumulator { private val output = ArrayBuffer[(Double, Double, Double)]() private var (currentLabel: Double, currentFeature: Double, currentWeight: Double) = (0d, 0d, 0d) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r60509 - in /dev/spark/v3.4.0-rc4-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api
Author: xinrong Date: Fri Mar 10 06:12:05 2023 New Revision: 60509 Log: Apache Spark v3.4.0-rc4 docs [This commit notification would consist of 2807 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r60507 - /dev/spark/v3.4.0-rc4-bin/
Author: xinrong Date: Fri Mar 10 04:47:20 2023 New Revision: 60507 Log: Apache Spark v3.4.0-rc4 Added: dev/spark/v3.4.0-rc4-bin/ dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz (with props) dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz.asc dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz.sha512 dev/spark/v3.4.0-rc4-bin/pyspark-3.4.0.tar.gz (with props) dev/spark/v3.4.0-rc4-bin/pyspark-3.4.0.tar.gz.asc dev/spark/v3.4.0-rc4-bin/pyspark-3.4.0.tar.gz.sha512 dev/spark/v3.4.0-rc4-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz (with props) dev/spark/v3.4.0-rc4-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.asc dev/spark/v3.4.0-rc4-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.sha512 dev/spark/v3.4.0-rc4-bin/spark-3.4.0-bin-hadoop3.tgz (with props) dev/spark/v3.4.0-rc4-bin/spark-3.4.0-bin-hadoop3.tgz.asc dev/spark/v3.4.0-rc4-bin/spark-3.4.0-bin-hadoop3.tgz.sha512 dev/spark/v3.4.0-rc4-bin/spark-3.4.0-bin-without-hadoop.tgz (with props) dev/spark/v3.4.0-rc4-bin/spark-3.4.0-bin-without-hadoop.tgz.asc dev/spark/v3.4.0-rc4-bin/spark-3.4.0-bin-without-hadoop.tgz.sha512 dev/spark/v3.4.0-rc4-bin/spark-3.4.0.tgz (with props) dev/spark/v3.4.0-rc4-bin/spark-3.4.0.tgz.asc dev/spark/v3.4.0-rc4-bin/spark-3.4.0.tgz.sha512 Added: dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz.asc == --- dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz.asc (added) +++ dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz.asc Fri Mar 10 04:47:20 2023 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQKtesTHHhpbnJvbmdA +YXBhY2hlLm9yZwAKCRCn5XkIx6Thsed/D/9ECWrN2Ra7rPZt1lvSh9H/DON0HzZ0 +UXLPKZpCXkdFM7TXMksVF0qE/iqPwfgfxv9uY0Ura71+to/+6L1l9U+svKwNl7ze +0vby8tZMLwiqpVlIihLObrLXLSfUF9hBOo1Xuh60DZjiNaACZ/5Pi0vIhIQiiLJb +TOG5bFejim9/8pbK9l54M2eP9e1fxYDLAwZCGCvtzN0Ddf1hhZQomG4QJeCJV9YZ +/rSF6cmyale+0U/UIE/ci9Jj7gzzxAxa5CBFVYyjsNLRksM9LzbYGck2VuC6UZT4 +TdcF1Ia834BnSCOEgesyPrM7FD6ljNr7ks7UMI3PG4yVtAdeNzDCyZhX6OXU+zCY +olbqHl1RzAgrvA+rUoQH6vRaKVKTFQTSkohrQSg3tmSqPYfxNxac75K7I3F9A5qM +DXHkXrSAdCOV+T88yw75zjr2xLiLLGIuBrYc/5lk3JxS9Rw6aDrfxLgZMpfdnsuL +PxAMai2xnZhvQrAAIPUKRN+TR72fpVFIAJB9nEReDF6m9cmhdhQt+xKR6xCDs9fb +Cx+G8ZBPvJeheGFiKmjeAT4zh+C3B7BxhlvvCP5Q6GOtWv+8CBardAVV2OSP2T/t +SxFEjBZwqNrwtBFY0txYnDTGnv6vK3dG86FnaE6R57p2W5vAKrmSmp3ZL+YhUKe7 +HGk4OdoEG93bww== +=FJdV +-END PGP SIGNATURE- Added: dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz.sha512 == --- dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz.sha512 (added) +++ dev/spark/v3.4.0-rc4-bin/SparkR_3.4.0.tar.gz.sha512 Fri Mar 10 04:47:20 2023 @@ -0,0 +1 @@ +6e7bc60a43243e9026e92af81386fc9d57a6231c1a59a6fb4e39cf16cd150a3b5e1e9237b377d3e5a74d16f804f9a5d13a897e8455f640eacec1c79ef3d10407 SparkR_3.4.0.tar.gz Added: dev/spark/v3.4.0-rc4-bin/pyspark-3.4.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.4.0-rc4-bin/pyspark-3.4.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.4.0-rc4-bin/pyspark-3.4.0.tar.gz.asc == --- dev/spark/v3.4.0-rc4-bin/pyspark-3.4.0.tar.gz.asc (added) +++ dev/spark/v3.4.0-rc4-bin/pyspark-3.4.0.tar.gz.asc Fri Mar 10 04:47:20 2023 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQKte0THHhpbnJvbmdA +YXBhY2hlLm9yZwAKCRCn5XkIx6ThsWYaD/9zcUOMr+07jy7Ok2xKkEq4hlsSH1xu +4Y61P1CTFhtOc9DG/O2jfX8Tsnp/b6gY3nJHGhrtdY0LCMPiMG+5uHO3/wO53pE0 +6DEtZH1I38rbILpb9kDCftCQS6keZR79Zl8N0G5D+P56grNdI4aqDo1Ntxvs366r +0rAWGIpVbvr5w5MBqvyn96Sk2ac/SbZVeE5NHCVwPWCQz6povLTDDESWETQIW5TZ +VTQsErI4joWplWWlI8D8x8XABVaD0BaKFwuJpPploKVkhSyOECUDM5W0xhuGNArn +h5GofcXXvCBKqoI3ngXg72G6fVamDJ0b/DCsmpLflwEaInhlDYj9BVbTUAgvYHwa +eDgLEbvZ4at/5OVf+A/VxnXLfL1DJLiGgfk7J4QqNMTdqfCtyEs4yxQ4t6OZ93mN +g6VcNYzayKEZffmC29QDtce5wpl530C543cSW7QFMgIg0ly0pfDF1J63hsQ86TZV +D/Nu41KiQXFq4CMD08mxu1gSTllTIED+5VUcbJpmep2Pa28tIvleVCxXQBXpx5Bw +pz3AJIU/Og4y8xZfspeUON9qvSHAwLGO6T9QAslaciJA/mK2vNzHLgaTSZtXRSzv +MIsmpfEHoE8HsgUk/YLCheSNTZRkKgCWySMBnNaY0HFF86R/HvA+rL97CoFTKX9C +Gpsg/vHReYkRFw== +=4f38 +-END PGP SIGNATURE- Added: dev/spark/v3.4.0-rc4-bin/pyspark-3.4.0.tar.gz.sha512 == --- dev/spark/v3.4.0-rc4-bin/pyspark-3.4.0.tar.gz.sha512 (added) +++ dev/spark
[spark] branch branch-3.4 updated (49cf58e30c7 -> bc1671023c3)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git from 49cf58e30c7 [SPARK-42739][BUILD] Ensure release tag to be pushed to release branch add 4000d6884ce Preparing Spark release v3.4.0-rc4 new bc1671023c3 Preparing development version 3.4.1-SNAPSHOT The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing development version 3.4.1-SNAPSHOT
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git commit bc1671023c3360380bbb67ae8fec959efb072996 Author: Xinrong Meng AuthorDate: Fri Mar 10 03:26:54 2023 + Preparing development version 3.4.1-SNAPSHOT --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/connect/common/pom.xml | 2 +- connector/connect/server/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 43 files changed, 45 insertions(+), 45 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 4a32762b34c..fa7028630a8 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.4.0 +Version: 3.4.1 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: diff --git a/assembly/pom.xml b/assembly/pom.xml index c58da7aa112..b86fee4bceb 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 95ea15552da..f9ecfb3d692 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index e4d98471bf9..22ee65b7d25 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 7a6d5aedf65..2c67da81ca4 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 1c421754083..219682e047d 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch
[spark] 01/01: Preparing Spark release v3.4.0-rc4
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to tag v3.4.0-rc4 in repository https://gitbox.apache.org/repos/asf/spark.git commit 4000d6884ce973eb420e871c8d333431490be763 Author: Xinrong Meng AuthorDate: Fri Mar 10 03:26:48 2023 + Preparing Spark release v3.4.0-rc4 --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/connect/common/pom.xml | 2 +- connector/connect/server/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 43 files changed, 45 insertions(+), 45 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index fa7028630a8..4a32762b34c 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.4.1 +Version: 3.4.0 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: diff --git a/assembly/pom.xml b/assembly/pom.xml index b86fee4bceb..c58da7aa112 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index f9ecfb3d692..95ea15552da 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 22ee65b7d25..e4d98471bf9 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 2c67da81ca4..7a6d5aedf65 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 219682e047d..1c421754083 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml inde
[spark] tag v3.4.0-rc4 created (now 4000d6884ce)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to tag v3.4.0-rc4 in repository https://gitbox.apache.org/repos/asf/spark.git at 4000d6884ce (commit) This tag includes the following new commits: new 4000d6884ce Preparing Spark release v3.4.0-rc4 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.4 updated: [SPARK-42739][BUILD] Ensure release tag to be pushed to release branch
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 49cf58e30c7 [SPARK-42739][BUILD] Ensure release tag to be pushed to release branch 49cf58e30c7 is described below commit 49cf58e30c79734af4a30787a0220aeba69839c5 Author: Xinrong Meng AuthorDate: Fri Mar 10 11:04:34 2023 +0800 [SPARK-42739][BUILD] Ensure release tag to be pushed to release branch ### What changes were proposed in this pull request? In the release script, add a check to ensure release tag to be pushed to release branch. ### Why are the changes needed? To ensure the success of a RC cut. Otherwise, release conductors have to manually check that. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test. ``` ~/spark [_d_branch] $ git commit -am '_d_commmit' ... ~/spark [_d_branch] $ git tag '_d_tag' ~/spark [_d_branch] $ git push origin _d_tag ~/spark [_d_branch] $ git branch -r --contains tags/_d_tag | grep origin ~/spark [_d_branch] $ echo $? 1 ~/spark [_d_branch] $ git push origin HEAD:_d_branch ... ~/spark [_d_branch] $ git branch -r --contains tags/_d_tag | grep origin origin/_d_branch ~/spark [_d_branch] $ echo $? 0 ``` Closes #40357 from xinrong-meng/chk_release. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng (cherry picked from commit 785188dd8b5e74510c29edbff5b9991d88855e43) Signed-off-by: Xinrong Meng --- dev/create-release/release-tag.sh | 6 ++ 1 file changed, 6 insertions(+) diff --git a/dev/create-release/release-tag.sh b/dev/create-release/release-tag.sh index 255bda37ad8..fa701dd74b2 100755 --- a/dev/create-release/release-tag.sh +++ b/dev/create-release/release-tag.sh @@ -122,6 +122,12 @@ if ! is_dry_run; then git push origin $RELEASE_TAG if [[ $RELEASE_VERSION != *"preview"* ]]; then git push origin HEAD:$GIT_BRANCH +if git branch -r --contains tags/$RELEASE_TAG | grep origin; then + echo "Pushed $RELEASE_TAG to $GIT_BRANCH." +else + echo "Failed to push $RELEASE_TAG to $GIT_BRANCH. Please start over." + exit 1 +fi else echo "It's preview release. We only push $RELEASE_TAG to remote." fi - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-42739][BUILD] Ensure release tag to be pushed to release branch
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 785188dd8b5 [SPARK-42739][BUILD] Ensure release tag to be pushed to release branch 785188dd8b5 is described below commit 785188dd8b5e74510c29edbff5b9991d88855e43 Author: Xinrong Meng AuthorDate: Fri Mar 10 11:04:34 2023 +0800 [SPARK-42739][BUILD] Ensure release tag to be pushed to release branch ### What changes were proposed in this pull request? In the release script, add a check to ensure release tag to be pushed to release branch. ### Why are the changes needed? To ensure the success of a RC cut. Otherwise, release conductors have to manually check that. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test. ``` ~/spark [_d_branch] $ git commit -am '_d_commmit' ... ~/spark [_d_branch] $ git tag '_d_tag' ~/spark [_d_branch] $ git push origin _d_tag ~/spark [_d_branch] $ git branch -r --contains tags/_d_tag | grep origin ~/spark [_d_branch] $ echo $? 1 ~/spark [_d_branch] $ git push origin HEAD:_d_branch ... ~/spark [_d_branch] $ git branch -r --contains tags/_d_tag | grep origin origin/_d_branch ~/spark [_d_branch] $ echo $? 0 ``` Closes #40357 from xinrong-meng/chk_release. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- dev/create-release/release-tag.sh | 6 ++ 1 file changed, 6 insertions(+) diff --git a/dev/create-release/release-tag.sh b/dev/create-release/release-tag.sh index 255bda37ad8..fa701dd74b2 100755 --- a/dev/create-release/release-tag.sh +++ b/dev/create-release/release-tag.sh @@ -122,6 +122,12 @@ if ! is_dry_run; then git push origin $RELEASE_TAG if [[ $RELEASE_VERSION != *"preview"* ]]; then git push origin HEAD:$GIT_BRANCH +if git branch -r --contains tags/$RELEASE_TAG | grep origin; then + echo "Pushed $RELEASE_TAG to $GIT_BRANCH." +else + echo "Failed to push $RELEASE_TAG to $GIT_BRANCH. Please start over." + exit 1 +fi else echo "It's preview release. We only push $RELEASE_TAG to remote." fi - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r60500 - in /dev/spark/v3.4.0-rc3-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api
Author: xinrong Date: Thu Mar 9 07:54:14 2023 New Revision: 60500 Log: Apache Spark v3.4.0-rc3 docs [This commit notification would consist of 2807 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r60498 - /dev/spark/v3.4.0-rc3-bin/
Author: xinrong Date: Thu Mar 9 07:11:38 2023 New Revision: 60498 Log: Apache Spark v3.4.0-rc3 Added: dev/spark/v3.4.0-rc3-bin/ dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz (with props) dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz.asc dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz.sha512 dev/spark/v3.4.0-rc3-bin/pyspark-3.4.0.tar.gz (with props) dev/spark/v3.4.0-rc3-bin/pyspark-3.4.0.tar.gz.asc dev/spark/v3.4.0-rc3-bin/pyspark-3.4.0.tar.gz.sha512 dev/spark/v3.4.0-rc3-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz (with props) dev/spark/v3.4.0-rc3-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.asc dev/spark/v3.4.0-rc3-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.sha512 dev/spark/v3.4.0-rc3-bin/spark-3.4.0-bin-hadoop3.tgz (with props) dev/spark/v3.4.0-rc3-bin/spark-3.4.0-bin-hadoop3.tgz.asc dev/spark/v3.4.0-rc3-bin/spark-3.4.0-bin-hadoop3.tgz.sha512 dev/spark/v3.4.0-rc3-bin/spark-3.4.0-bin-without-hadoop.tgz (with props) dev/spark/v3.4.0-rc3-bin/spark-3.4.0-bin-without-hadoop.tgz.asc dev/spark/v3.4.0-rc3-bin/spark-3.4.0-bin-without-hadoop.tgz.sha512 dev/spark/v3.4.0-rc3-bin/spark-3.4.0.tgz (with props) dev/spark/v3.4.0-rc3-bin/spark-3.4.0.tgz.asc dev/spark/v3.4.0-rc3-bin/spark-3.4.0.tgz.sha512 Added: dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz.asc == --- dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz.asc (added) +++ dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz.asc Thu Mar 9 07:11:38 2023 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQJhjwTHHhpbnJvbmdA +YXBhY2hlLm9yZwAKCRCn5XkIx6ThsRdDEACd98Pk0bSFtKVHER3hjis2R2cg1pgG +gWiqBZArn1GiB6ck0KHglMklJTFFsw2q9/mro42uVhj0b0hJYcTb2hBO+7vyEYeU +a+YGhik6FXaQQBL1+oB5aTn2FcnNi7no1Qa+x4opkG7d1giapzQe/oZK1D7RNiYZ +FAdoDhsUTYCeWDVXbRAcEMca49ltsZDPe45XRHwSgXT45hi6s9oRd78G6v2srbMb ++g7ce4KzAhupZrb5wCnP1MmiWWG1gnfcG0n11LDsiAhYPzzDgW/S4urcqIhWu0+4 +uUSrL6es4mprt1SMybBbmyGrHLuXjdmbBy5XHWy576GoCANdJRffImtmbXFFqp5q +uau5MDCMFcQwp8pOGjTIDYL4q0p9Kpx3mQ2ykQxWiWg/TgVBQ2leadya8yUV9zZ9 +Y6vuRf9R3iYcXTp3B5XlOWtzjYBICa2XQlizOV3U35xybhSFQHLdUSdBBPMLFsDS +YxYw1+dm8SjGfHhtsTOsk0ZhgSNgpDC8PBP6UUlz/8qRy4UdjQRrVgkqFmIFcLZs +CPdX5XlH32PQYtN55qGc6AZECoUpbpigGZetvKqdD5SWyf8maRZZsD+XdR7BT9rk +LLQTJKak3VQRAn80ONx+JxgzH3B5uV1ldN22vr5nLECpJZDbGjC6etystZDujEYh +szr47LujCxLTNw== +=l4pQ +-END PGP SIGNATURE- Added: dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz.sha512 == --- dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz.sha512 (added) +++ dev/spark/v3.4.0-rc3-bin/SparkR_3.4.0.tar.gz.sha512 Thu Mar 9 07:11:38 2023 @@ -0,0 +1 @@ +4703ffdbf82aaf5b30b6afe680a2b21ca15c957863c3648e7e5f120663506fc9e633727a6b7809f7cff7763a9f6227902f6d83fac7c87d3791234afef147cfc3 SparkR_3.4.0.tar.gz Added: dev/spark/v3.4.0-rc3-bin/pyspark-3.4.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.4.0-rc3-bin/pyspark-3.4.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.4.0-rc3-bin/pyspark-3.4.0.tar.gz.asc == --- dev/spark/v3.4.0-rc3-bin/pyspark-3.4.0.tar.gz.asc (added) +++ dev/spark/v3.4.0-rc3-bin/pyspark-3.4.0.tar.gz.asc Thu Mar 9 07:11:38 2023 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQJhj4THHhpbnJvbmdA +YXBhY2hlLm9yZwAKCRCn5XkIx6ThsaMFD/0VbikHk10VpDiRp7RVhquRXR/qHkiK +ioI02DrZJsZiRElV69Bfxvb1HQSKJhE9xXC+GkS7N+s0neNMXBpYsSxigRICG+Vi +nPJifZVCNzpckkD5t8t+07X5eTRR7VoRPsHkaYSNKxXiMfXYbOpBOLcP/cvrdPSi +nXsOnLm3dhxU7kMS+Qy4jbCzQN1fb4XPagxdvPji/aKo6LBw/YiqWHPhHcHlW89h +cGRAQpN1VjfNkO1zfGxV/h5kD8L/my0zsVMOxtF/r6Qc7FZGBilfMuw8d+8WSVAr +kRx+s2kB8vuH/undWoRSwpItqv0/gcyFCCvMmLQlbEA0Ku/ldE88XESIuI25uTcC +tVJFC01Gauh7KlkI4hzsuwlhcDH/geLE1DS59fKC5UMqEYvaKQyQZFzyX0/eFIIS +8KRZo3B5NUfEXE3fMDOGE8FgJ76QPQ3HO2tB9f+ICeu1/1RioqgucZ7jcKfFIx/J +FzZ7FkNuLSl3CEnH5BlqdoaCCdmOsZVqcPgaZaGUncgK6ygBWEIEK/I6pE9Sye+Y +ncBM76ZJf3NsE4Kzdw/v0NCrLaTdIMIK3W3fvVY94IPdk2EY6MuEnGDqG1bn88u4 +zYfP118WS4KtN6fSkczHGf+7+LQIiWrovIb+cQP+TXKeCinRbK1/I6pBWnn4/0u1 +DApXYisgegSYPg== +=ykwM +-END PGP SIGNATURE- Added: dev/spark/v3.4.0-rc3-bin/pyspark-3.4.0.tar.gz.sha512 == --- dev/spark/v3.4.0-rc3-bin/pyspark-3.4.0.tar.gz.sha512 (added) +++ dev/spark
[spark] 01/01: Preparing Spark release v3.4.0-rc3
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to tag v3.4.0-rc3 in repository https://gitbox.apache.org/repos/asf/spark.git commit b9be9ce15a82b18cca080ee365d308c0820a29a9 Author: Xinrong Meng AuthorDate: Thu Mar 9 05:34:00 2023 + Preparing Spark release v3.4.0-rc3 --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/connect/common/pom.xml | 2 +- connector/connect/server/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 43 files changed, 45 insertions(+), 45 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index fa7028630a8..4a32762b34c 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.4.1 +Version: 3.4.0 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: diff --git a/assembly/pom.xml b/assembly/pom.xml index b86fee4bceb..c58da7aa112 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index f9ecfb3d692..95ea15552da 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 22ee65b7d25..e4d98471bf9 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 2c67da81ca4..7a6d5aedf65 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 219682e047d..1c421754083 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index 22ce7
[spark] tag v3.4.0-rc3 created (now b9be9ce15a8)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to tag v3.4.0-rc3 in repository https://gitbox.apache.org/repos/asf/spark.git at b9be9ce15a8 (commit) This tag includes the following new commits: new b9be9ce15a8 Preparing Spark release v3.4.0-rc3 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.4 updated: [SPARK-42643][CONNECT][PYTHON] Register Java (aggregate) user-defined functions
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 0e959a53908 [SPARK-42643][CONNECT][PYTHON] Register Java (aggregate) user-defined functions 0e959a53908 is described below commit 0e959a539086cda5dd911477ee5568ab540a2249 Author: Xinrong Meng AuthorDate: Wed Mar 8 14:23:18 2023 +0800 [SPARK-42643][CONNECT][PYTHON] Register Java (aggregate) user-defined functions ### What changes were proposed in this pull request? Implement `spark.udf.registerJavaFunction` and `spark.udf.registerJavaUDAF`. A new proto `JavaUDF` is introduced. ### Why are the changes needed? Parity with vanilla PySpark. ### Does this PR introduce _any_ user-facing change? Yes. `spark.udf.registerJavaFunction` and `spark.udf.registerJavaUDAF` are supported now. ### How was this patch tested? Parity unit tests. Closes #40244 from xinrong-meng/registerJava. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng (cherry picked from commit 92aa08786feaf473330a863d19b0c902b721789e) Signed-off-by: Xinrong Meng --- .../main/protobuf/spark/connect/expressions.proto | 13 - .../sql/connect/planner/SparkConnectPlanner.scala | 21 python/pyspark/sql/connect/client.py | 39 ++- python/pyspark/sql/connect/expressions.py | 44 +++-- .../pyspark/sql/connect/proto/expressions_pb2.py | 26 +++--- .../pyspark/sql/connect/proto/expressions_pb2.pyi | 56 +- python/pyspark/sql/connect/udf.py | 17 ++- .../pyspark/sql/tests/connect/test_parity_udf.py | 30 +++- python/pyspark/sql/udf.py | 6 +++ 9 files changed, 212 insertions(+), 40 deletions(-) diff --git a/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto b/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto index 6eb769ad27e..0aee3ca13b9 100644 --- a/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto +++ b/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto @@ -312,7 +312,7 @@ message Expression { message CommonInlineUserDefinedFunction { // (Required) Name of the user-defined function. string function_name = 1; - // (Required) Indicate if the user-defined function is deterministic. + // (Optional) Indicate if the user-defined function is deterministic. bool deterministic = 2; // (Optional) Function arguments. Empty arguments are allowed. repeated Expression arguments = 3; @@ -320,6 +320,7 @@ message CommonInlineUserDefinedFunction { oneof function { PythonUDF python_udf = 4; ScalarScalaUDF scalar_scala_udf = 5; +JavaUDF java_udf = 6; } } @@ -345,3 +346,13 @@ message ScalarScalaUDF { bool nullable = 4; } +message JavaUDF { + // (Required) Fully qualified name of Java class + string class_name = 1; + + // (Optional) Output type of the Java UDF + optional string output_type = 2; + + // (Required) Indicate if the Java user-defined function is an aggregate function + bool aggregate = 3; +} diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala index d7b3c057d92..3b9443f4e3c 100644 --- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala +++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala @@ -1552,6 +1552,8 @@ class SparkConnectPlanner(val session: SparkSession) { fun.getFunctionCase match { case proto.CommonInlineUserDefinedFunction.FunctionCase.PYTHON_UDF => handleRegisterPythonUDF(fun) + case proto.CommonInlineUserDefinedFunction.FunctionCase.JAVA_UDF => +handleRegisterJavaUDF(fun) case _ => throw InvalidPlanInput( s"Function with ID: ${fun.getFunctionCase.getNumber} is not supported") @@ -1577,6 +1579,25 @@ class SparkConnectPlanner(val session: SparkSession) { session.udf.registerPython(fun.getFunctionName, udpf) } + private def handleRegisterJavaUDF(fun: proto.CommonInlineUserDefinedFunction): Unit = { +val udf = fun.getJavaUdf +val dataType = + if (udf.hasOutputType) { +DataType.parseTypeWithFallback( + schema = udf.getOutputType, + parser = DataType.fromDDL, + fallbackParser = DataType.fromJson) match { + case s: DataType => s + case other => throw InvalidPlanInput(s"Invalid return type $other") +} + } else null +if (udf.getAggregate) {
[spark] branch master updated: [SPARK-42643][CONNECT][PYTHON] Register Java (aggregate) user-defined functions
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 92aa08786fe [SPARK-42643][CONNECT][PYTHON] Register Java (aggregate) user-defined functions 92aa08786fe is described below commit 92aa08786feaf473330a863d19b0c902b721789e Author: Xinrong Meng AuthorDate: Wed Mar 8 14:23:18 2023 +0800 [SPARK-42643][CONNECT][PYTHON] Register Java (aggregate) user-defined functions ### What changes were proposed in this pull request? Implement `spark.udf.registerJavaFunction` and `spark.udf.registerJavaUDAF`. A new proto `JavaUDF` is introduced. ### Why are the changes needed? Parity with vanilla PySpark. ### Does this PR introduce _any_ user-facing change? Yes. `spark.udf.registerJavaFunction` and `spark.udf.registerJavaUDAF` are supported now. ### How was this patch tested? Parity unit tests. Closes #40244 from xinrong-meng/registerJava. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- .../main/protobuf/spark/connect/expressions.proto | 13 - .../sql/connect/planner/SparkConnectPlanner.scala | 21 python/pyspark/sql/connect/client.py | 39 ++- python/pyspark/sql/connect/expressions.py | 44 +++-- .../pyspark/sql/connect/proto/expressions_pb2.py | 26 +++--- .../pyspark/sql/connect/proto/expressions_pb2.pyi | 56 +- python/pyspark/sql/connect/udf.py | 17 ++- .../pyspark/sql/tests/connect/test_parity_udf.py | 30 +++- python/pyspark/sql/udf.py | 6 +++ 9 files changed, 212 insertions(+), 40 deletions(-) diff --git a/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto b/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto index 6eb769ad27e..0aee3ca13b9 100644 --- a/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto +++ b/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto @@ -312,7 +312,7 @@ message Expression { message CommonInlineUserDefinedFunction { // (Required) Name of the user-defined function. string function_name = 1; - // (Required) Indicate if the user-defined function is deterministic. + // (Optional) Indicate if the user-defined function is deterministic. bool deterministic = 2; // (Optional) Function arguments. Empty arguments are allowed. repeated Expression arguments = 3; @@ -320,6 +320,7 @@ message CommonInlineUserDefinedFunction { oneof function { PythonUDF python_udf = 4; ScalarScalaUDF scalar_scala_udf = 5; +JavaUDF java_udf = 6; } } @@ -345,3 +346,13 @@ message ScalarScalaUDF { bool nullable = 4; } +message JavaUDF { + // (Required) Fully qualified name of Java class + string class_name = 1; + + // (Optional) Output type of the Java UDF + optional string output_type = 2; + + // (Required) Indicate if the Java user-defined function is an aggregate function + bool aggregate = 3; +} diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala index d7b3c057d92..3b9443f4e3c 100644 --- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala +++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala @@ -1552,6 +1552,8 @@ class SparkConnectPlanner(val session: SparkSession) { fun.getFunctionCase match { case proto.CommonInlineUserDefinedFunction.FunctionCase.PYTHON_UDF => handleRegisterPythonUDF(fun) + case proto.CommonInlineUserDefinedFunction.FunctionCase.JAVA_UDF => +handleRegisterJavaUDF(fun) case _ => throw InvalidPlanInput( s"Function with ID: ${fun.getFunctionCase.getNumber} is not supported") @@ -1577,6 +1579,25 @@ class SparkConnectPlanner(val session: SparkSession) { session.udf.registerPython(fun.getFunctionName, udpf) } + private def handleRegisterJavaUDF(fun: proto.CommonInlineUserDefinedFunction): Unit = { +val udf = fun.getJavaUdf +val dataType = + if (udf.hasOutputType) { +DataType.parseTypeWithFallback( + schema = udf.getOutputType, + parser = DataType.fromDDL, + fallbackParser = DataType.fromJson) match { + case s: DataType => s + case other => throw InvalidPlanInput(s"Invalid return type $other") +} + } else null +if (udf.getAggregate) { + session.udf.registerJavaUDAF(fun.getFunctionName, udf.getClassName) +} else { + session.udf.registerJava(fu
svn commit: r60407 - in /dev/spark/v3.4.0-rc2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api
Author: xinrong Date: Thu Mar 2 09:35:10 2023 New Revision: 60407 Log: Apache Spark v3.4.0-rc2 docs [This commit notification would consist of 2806 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r60406 - /dev/spark/v3.4.0-rc2-bin/
Author: xinrong Date: Thu Mar 2 07:42:27 2023 New Revision: 60406 Log: Apache Spark v3.4.0-rc2 Added: dev/spark/v3.4.0-rc2-bin/ dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz (with props) dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz.asc dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz.sha512 dev/spark/v3.4.0-rc2-bin/pyspark-3.4.0.tar.gz (with props) dev/spark/v3.4.0-rc2-bin/pyspark-3.4.0.tar.gz.asc dev/spark/v3.4.0-rc2-bin/pyspark-3.4.0.tar.gz.sha512 dev/spark/v3.4.0-rc2-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz (with props) dev/spark/v3.4.0-rc2-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.asc dev/spark/v3.4.0-rc2-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.sha512 dev/spark/v3.4.0-rc2-bin/spark-3.4.0-bin-hadoop3.tgz (with props) dev/spark/v3.4.0-rc2-bin/spark-3.4.0-bin-hadoop3.tgz.asc dev/spark/v3.4.0-rc2-bin/spark-3.4.0-bin-hadoop3.tgz.sha512 dev/spark/v3.4.0-rc2-bin/spark-3.4.0-bin-without-hadoop.tgz (with props) dev/spark/v3.4.0-rc2-bin/spark-3.4.0-bin-without-hadoop.tgz.asc dev/spark/v3.4.0-rc2-bin/spark-3.4.0-bin-without-hadoop.tgz.sha512 dev/spark/v3.4.0-rc2-bin/spark-3.4.0.tgz (with props) dev/spark/v3.4.0-rc2-bin/spark-3.4.0.tgz.asc dev/spark/v3.4.0-rc2-bin/spark-3.4.0.tgz.sha512 Added: dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz.asc == --- dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz.asc (added) +++ dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz.asc Thu Mar 2 07:42:27 2023 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQAUvITHHhpbnJvbmdA +YXBhY2hlLm9yZwAKCRCn5XkIx6ThsRPdEACTNy0qOWmOXidbPjyZzJCdr8zkAIZX +UGhVWPrlF0sQR1FzTtPPwwI4sywSC+DNAetcYVXEzOVfNf3I/UgE02183xCQJVfx +EaE+lpCmIwFjY+AcPGwz7fZ+aTxFa2f9wu04G+q9Uaw40Ys/WMmvck/Wg4Ih0nj3 +PbBuftQIy5K1YHJOx6PvkzCpZsmP4njNGrJ+IJU8vpYh35zp8E3jkfbECCvKkTWE +ABWGxpAKjN5npkarbNpZp8Emd6EtrRYaJzDPApjW6GFSQAmZwE0WJj2nKJu4Aszu +fstx27dZ4bvx3bgbfSEmRgTc5VD7glzvWKIWqt0PdkDq1AQdwdFodZfJFqXUccuk +G3yL+RTrggtvDBEjcMh+ym6kOrHmUBgy7SqPfOI5UPO8PQ+KdhE94tqXfhHAl5QS +Okw1XWc2EQzDyeu/j+Kp4yc0tbZRnuqkAzS5yLJVix0z4GBOyRyvTsDLykwEM9h+ +jniFAkWfu+su9JRMfIdaXqak1DgyVZ9bxZOfLIo7lA5U4vYxCZM5TU8ToNDnnOWd +O0pbweQ/W4UdXP6AYEJt2J8wItDiv+xry4jI9JqTEPV5IbrAZjZmJ/RoMzjeh+eA +WwqSEXuWXrUStb9bPfhFnryYmbKGYGG7dRP6HnnaFlevBc6qrNlMPL3xedZsk12b +opcLL5skNQoHuA== +=6ENL +-END PGP SIGNATURE- Added: dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz.sha512 == --- dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz.sha512 (added) +++ dev/spark/v3.4.0-rc2-bin/SparkR_3.4.0.tar.gz.sha512 Thu Mar 2 07:42:27 2023 @@ -0,0 +1 @@ +9f719616f1547d449488957cc74d6dd9080e32096e1178deb0c339be47bb06e158d8b0c7a80f1e53595b34467d5b5b7f23d66643cca1fa9f5e8c7b9687893b59 SparkR_3.4.0.tar.gz Added: dev/spark/v3.4.0-rc2-bin/pyspark-3.4.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.4.0-rc2-bin/pyspark-3.4.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.4.0-rc2-bin/pyspark-3.4.0.tar.gz.asc == --- dev/spark/v3.4.0-rc2-bin/pyspark-3.4.0.tar.gz.asc (added) +++ dev/spark/v3.4.0-rc2-bin/pyspark-3.4.0.tar.gz.asc Thu Mar 2 07:42:27 2023 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmQAUvQTHHhpbnJvbmdA +YXBhY2hlLm9yZwAKCRCn5XkIx6ThsSCWD/9LhMRWQlceBu5wuEwGdwCTSEvQvpCl +zDn1HKCwqXQvzj5YPOllSBolHCQgy1U3S08CeGF8kB+hT/MSozif/+qzMNTFWfz8 +EEyB02XxWjOXO38muJ51/r3WXseoB0L/yMqdipgZAQRT5A5i9xBZqH718a7k6pow +m+/8qD4oMYmnWE9X2TwW47uSCMpKOgZRSALBwx5HAQ6HADHfW3q6Rwdm6yL6vv0J +n/FTMjeKAKwetSYhwDwPCXaTTKaw8h90IWHOykZdv8IoynUO4egKfoeHeOKQ8Dyl +8mlqIWsQi0wdcrfAlKp2HjD001j0iUV8ZfDkZsmReTRNf8Y7yKdFF6BBAW+zPwAw +ILsb0HeP50s36WiON7Ywjy8pXJdOBN+6QiM9CIP7c5D45RNAbPe8ARhDZwuHZTMy +7jzAYnrjDIXlrFGmpFS2I+xk0/ZoI2H6BC8V7t5ZvhJ8Qm7SifAgfOt5G9rlUwu0 +BnCE3INQghRq5mv9aH40aHZPhVUN8woTxUussNXeqds4cAVXdvj7BQJMqZtplj1N +k4bFKvjjtO/GbrbTcNTClqk7CtII4GRQCJWmV7ksvDejavRfDMJn6Bt/ZhHYfDPw +rOXXuMX/HdVgH1E+RhntqnejilGuKNsWf08dZPgQ1kwMd2fnygDMoaUbG769nJqW +JLAkWKLvu+YXFA== +=R11G +-END PGP SIGNATURE- Added: dev/spark/v3.4.0-rc2-bin/pyspark-3.4.0.tar.gz.sha512 == --- dev/spark/v3.4.0-rc2-bin/pyspark-3.4.0.tar.gz.sha512 (added) +++ dev/spark
[spark] branch branch-3.4 updated (4fa4d2fd54c -> aeacf0d0f24)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git from 4fa4d2fd54c [SPARK-41823][CONNECT][FOLLOW-UP][TESTS] Disable ANSI mode in ProtoToParsedPlanTestSuite add 759511bb59b Preparing Spark release v3.4.0-rc2 new aeacf0d0f24 Preparing development version 3.4.1-SNAPSHOT The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing development version 3.4.1-SNAPSHOT
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git commit aeacf0d0f24ec509b7bbf318bb71edb1cba8bc36 Author: Xinrong Meng AuthorDate: Thu Mar 2 06:25:37 2023 + Preparing development version 3.4.1-SNAPSHOT --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/connect/common/pom.xml | 2 +- connector/connect/server/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 43 files changed, 45 insertions(+), 45 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 4a32762b34c..fa7028630a8 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.4.0 +Version: 3.4.1 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: diff --git a/assembly/pom.xml b/assembly/pom.xml index 58dd9ef46e0..a4111eb64d9 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 95ea15552da..f9ecfb3d692 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index e4d98471bf9..22ee65b7d25 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 7a6d5aedf65..2c67da81ca4 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 1c421754083..219682e047d 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch
[spark] tag v3.4.0-rc2 created (now 759511bb59b)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to tag v3.4.0-rc2 in repository https://gitbox.apache.org/repos/asf/spark.git at 759511bb59b (commit) This tag includes the following new commits: new 759511bb59b Preparing Spark release v3.4.0-rc2 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing Spark release v3.4.0-rc2
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to tag v3.4.0-rc2 in repository https://gitbox.apache.org/repos/asf/spark.git commit 759511bb59b206ac5ff18f377c239a2f38bf5db6 Author: Xinrong Meng AuthorDate: Thu Mar 2 06:25:32 2023 + Preparing Spark release v3.4.0-rc2 --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/connect/common/pom.xml | 2 +- connector/connect/server/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 43 files changed, 45 insertions(+), 45 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index fa7028630a8..4a32762b34c 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.4.1 +Version: 3.4.0 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: diff --git a/assembly/pom.xml b/assembly/pom.xml index a4111eb64d9..58dd9ef46e0 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index f9ecfb3d692..95ea15552da 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 22ee65b7d25..e4d98471bf9 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 2c67da81ca4..7a6d5aedf65 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 219682e047d..1c421754083 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index 22ce7
[spark] branch branch-3.4 updated: [SPARK-42510][CONNECT][PYTHON] Implement `DataFrame.mapInPandas`
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 000895da3f6 [SPARK-42510][CONNECT][PYTHON] Implement `DataFrame.mapInPandas` 000895da3f6 is described below commit 000895da3f6c0d17ccfdfe79c0ca34dfb9fb6e7b Author: Xinrong Meng AuthorDate: Sat Feb 25 07:39:54 2023 +0800 [SPARK-42510][CONNECT][PYTHON] Implement `DataFrame.mapInPandas` ### What changes were proposed in this pull request? Implement `DataFrame.mapInPandas` and enable parity tests to vanilla PySpark. A proto message `FrameMap` is intorudced for `mapInPandas` and `mapInArrow`(to implement next). ### Why are the changes needed? To reach parity with vanilla PySpark. ### Does this PR introduce _any_ user-facing change? Yes. `DataFrame.mapInPandas` is supported. An example is as shown below. ```py >>> df = spark.range(2) >>> def filter_func(iterator): ... for pdf in iterator: ... yield pdf[pdf.id == 1] ... >>> df.mapInPandas(filter_func, df.schema) DataFrame[id: bigint] >>> df.mapInPandas(filter_func, df.schema).show() +---+ | id| +---+ | 1| +---+ ``` ### How was this patch tested? Unit tests. Closes #40104 from xinrong-meng/mapInPandas. Lead-authored-by: Xinrong Meng ] Co-authored-by: Xinrong Meng Signed-off-by: Xinrong Meng (cherry picked from commit 9abccad1d93a243d7e47e53dcbc85568a460c529) Signed-off-by: Xinrong Meng --- .../main/protobuf/spark/connect/relations.proto| 10 + .../sql/connect/planner/SparkConnectPlanner.scala | 18 +- dev/sparktestsupport/modules.py| 1 + python/pyspark/sql/connect/_typing.py | 8 +- python/pyspark/sql/connect/client.py | 2 +- python/pyspark/sql/connect/dataframe.py| 22 +- python/pyspark/sql/connect/expressions.py | 6 +- python/pyspark/sql/connect/plan.py | 25 ++- python/pyspark/sql/connect/proto/relations_pb2.py | 222 +++-- python/pyspark/sql/connect/proto/relations_pb2.pyi | 36 python/pyspark/sql/connect/types.py| 4 +- python/pyspark/sql/connect/udf.py | 20 +- python/pyspark/sql/pandas/map_ops.py | 3 + .../sql/tests/connect/test_parity_pandas_map.py| 50 + python/pyspark/sql/tests/pandas/test_pandas_map.py | 46 +++-- 15 files changed, 331 insertions(+), 142 deletions(-) diff --git a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto index 29fffd65c75..4d96b6b0c7e 100644 --- a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto +++ b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto @@ -60,6 +60,7 @@ message Relation { Unpivot unpivot = 25; ToSchema to_schema = 26; RepartitionByExpression repartition_by_expression = 27; +FrameMap frame_map = 28; // NA functions NAFill fill_na = 90; @@ -768,3 +769,12 @@ message RepartitionByExpression { // (Optional) number of partitions, must be positive. optional int32 num_partitions = 3; } + +message FrameMap { + // (Required) Input relation for a Frame Map API: mapInPandas, mapInArrow. + Relation input = 1; + + // (Required) Input user-defined function of a Frame Map API. + CommonInlineUserDefinedFunction func = 2; +} + diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala index 268bf02fad9..cc43c1cace3 100644 --- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala +++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala @@ -24,7 +24,7 @@ import com.google.common.collect.{Lists, Maps} import com.google.protobuf.{Any => ProtoAny} import org.apache.spark.TaskContext -import org.apache.spark.api.python.SimplePythonFunction +import org.apache.spark.api.python.{PythonEvalType, SimplePythonFunction} import org.apache.spark.connect.proto import org.apache.spark.sql.{Column, Dataset, Encoders, SparkSession} import org.apache.spark.sql.catalyst.{expressions, AliasIdentifier, FunctionIdentifier} @@ -106,6 +106,8 @@ class SparkConnectPlanner(val session: SparkSession) { case proto.Relation.RelTypeCase.UNPIVOT => transformUnpivot(rel.getUnpivot) case proto.Relation.RelTypeCase.REPARTITION_BY_EXPRESSION => transformRepartitionByExpression(rel.getRepartitionByExpression) +
[spark] branch master updated: [SPARK-42510][CONNECT][PYTHON] Implement `DataFrame.mapInPandas`
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9abccad1d93 [SPARK-42510][CONNECT][PYTHON] Implement `DataFrame.mapInPandas` 9abccad1d93 is described below commit 9abccad1d93a243d7e47e53dcbc85568a460c529 Author: Xinrong Meng AuthorDate: Sat Feb 25 07:39:54 2023 +0800 [SPARK-42510][CONNECT][PYTHON] Implement `DataFrame.mapInPandas` ### What changes were proposed in this pull request? Implement `DataFrame.mapInPandas` and enable parity tests to vanilla PySpark. A proto message `FrameMap` is intorudced for `mapInPandas` and `mapInArrow`(to implement next). ### Why are the changes needed? To reach parity with vanilla PySpark. ### Does this PR introduce _any_ user-facing change? Yes. `DataFrame.mapInPandas` is supported. An example is as shown below. ```py >>> df = spark.range(2) >>> def filter_func(iterator): ... for pdf in iterator: ... yield pdf[pdf.id == 1] ... >>> df.mapInPandas(filter_func, df.schema) DataFrame[id: bigint] >>> df.mapInPandas(filter_func, df.schema).show() +---+ | id| +---+ | 1| +---+ ``` ### How was this patch tested? Unit tests. Closes #40104 from xinrong-meng/mapInPandas. Lead-authored-by: Xinrong Meng ] Co-authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- .../main/protobuf/spark/connect/relations.proto| 10 + .../sql/connect/planner/SparkConnectPlanner.scala | 18 +- dev/sparktestsupport/modules.py| 1 + python/pyspark/sql/connect/_typing.py | 8 +- python/pyspark/sql/connect/client.py | 2 +- python/pyspark/sql/connect/dataframe.py| 22 +- python/pyspark/sql/connect/expressions.py | 6 +- python/pyspark/sql/connect/plan.py | 25 ++- python/pyspark/sql/connect/proto/relations_pb2.py | 222 +++-- python/pyspark/sql/connect/proto/relations_pb2.pyi | 36 python/pyspark/sql/connect/types.py| 4 +- python/pyspark/sql/connect/udf.py | 20 +- python/pyspark/sql/pandas/map_ops.py | 3 + .../sql/tests/connect/test_parity_pandas_map.py| 50 + python/pyspark/sql/tests/pandas/test_pandas_map.py | 46 +++-- 15 files changed, 331 insertions(+), 142 deletions(-) diff --git a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto index 29fffd65c75..4d96b6b0c7e 100644 --- a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto +++ b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto @@ -60,6 +60,7 @@ message Relation { Unpivot unpivot = 25; ToSchema to_schema = 26; RepartitionByExpression repartition_by_expression = 27; +FrameMap frame_map = 28; // NA functions NAFill fill_na = 90; @@ -768,3 +769,12 @@ message RepartitionByExpression { // (Optional) number of partitions, must be positive. optional int32 num_partitions = 3; } + +message FrameMap { + // (Required) Input relation for a Frame Map API: mapInPandas, mapInArrow. + Relation input = 1; + + // (Required) Input user-defined function of a Frame Map API. + CommonInlineUserDefinedFunction func = 2; +} + diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala index 268bf02fad9..cc43c1cace3 100644 --- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala +++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala @@ -24,7 +24,7 @@ import com.google.common.collect.{Lists, Maps} import com.google.protobuf.{Any => ProtoAny} import org.apache.spark.TaskContext -import org.apache.spark.api.python.SimplePythonFunction +import org.apache.spark.api.python.{PythonEvalType, SimplePythonFunction} import org.apache.spark.connect.proto import org.apache.spark.sql.{Column, Dataset, Encoders, SparkSession} import org.apache.spark.sql.catalyst.{expressions, AliasIdentifier, FunctionIdentifier} @@ -106,6 +106,8 @@ class SparkConnectPlanner(val session: SparkSession) { case proto.Relation.RelTypeCase.UNPIVOT => transformUnpivot(rel.getUnpivot) case proto.Relation.RelTypeCase.REPARTITION_BY_EXPRESSION => transformRepartitionByExpression(rel.getRepartitionByExpression) + case proto.Relation.RelTypeCase.FRAME_MAP => +transformFrameMap(rel.getFrameMap) case proto.Relation.R
svn commit: r60251 - /dev/spark/KEYS
Author: xinrong Date: Wed Feb 22 07:01:27 2023 New Revision: 60251 Log: Update KEYS Modified: dev/spark/KEYS Modified: dev/spark/KEYS == --- dev/spark/KEYS (original) +++ dev/spark/KEYS Wed Feb 22 07:01:27 2023 @@ -1850,59 +1850,59 @@ JznYPjY83fSKkeCh =3Ggj -END PGP PUBLIC KEY BLOCK- -pub rsa4096 2022-08-16 [SC] - 0C33D35E1A9296B32CF31005ACD84F20930B47E8 -uid [ultimate] Xinrong Meng (CODE SIGNING KEY) -sub rsa4096 2022-08-16 [E] +pub rsa4096 2023-02-21 [SC] + CC68B3D16FE33A766705160BA7E57908C7A4E1B1 +uid [ultimate] Xinrong Meng (RELEASE SIGNING KEY) +sub rsa4096 2023-02-21 [E] -BEGIN PGP PUBLIC KEY BLOCK- -mQINBGL64s8BEADCeefEm9XB63o/xIGpnwurEL24h5LsZdA7k7juZ5C1Fu6m5amT -0A1n49YncYv6jDQD8xh+eiZ11+mYEAzkmGD+aVEMQA0/Zrp0rMe22Ymq5fQHfRCO -88sQl4PvmqaElcAswFz7RP+55GWSIfEbZIJhZQdukaVCZuC+Xpb68TAj2OSXZ+Mt -m8RdJXIJpmD0P6R7bvY4LPZL8tY7wtnxUj1I9wRnXc0AnbPfI6gGyF+b0x54b4Ey -2+sZ6tNH501I9hgdEOWj+nqQFZTTzZQPI1r3nPIA28T9VDOKi5dmoI6iXFjCWZ2N -dmsw8GN+45V1udOgylE2Mop7URzOQYlqaFnJvXzO/nZhAqbetrMmZ6jmlbqLEq/D -C8cgYFuMwER3oAC0OwpSz2HLCya95xHDdPqX+Iag0h0bbFBxSNpgzQiUk1mvSYXa -+7HGQ3rIfy7+87hA1BIHaN0L1oOw37UWk2IGDvS29JlGJ3SJDX5Ir5uBvW6k9So6 -xG9vT+l+R878rLcjJLJT4Me4pk4z8O4Uo+IY0uptiTYnvYRXBOw9wk9KpSckbr+s -I2keVwa+0fui4c1ESwNHR8HviALho9skvwaCAP3TUZ43SHeDU840M9LwDWc6VNc1 -x30YbgYeKtyU1deh7pcBhykUJPrZ457OllG8SbnhAncwmf8TaJjUkQARAQAB -tDRYaW5yb25nIE1lbmcgKENPREUgU0lHTklORyBLRVkpIDx4aW5yb25nQGFwYWNo -ZS5vcmc+iQJOBBMBCAA4FiEEDDPTXhqSlrMs8xAFrNhPIJMLR+gFAmL64s8CGwMF -CwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQrNhPIJMLR+gNSRAAkhNM7vAFRwaX -MachhS97+L2ZklerzeZuCP0zeYZ9gZloGUx+eM3MWOglUcKH0f6DjPitMMCr1Qbo -OsENANTS5ZOp4r4rhbbNhYbA8Wbx8H+ZABmCuUNJMjmeVh3qL1WmHclApegqxiSH -uc9xXB1RZOJH2pS2v7UXW2c/Y745oT/YxWX9hBeJUPWmg6M6jn1/osnqmUngXSvB -HNzxzHT1gJJNEcRU3r5bKAJlLWBZzLO4pIgtFqIfpS79ieG54OwedrW3oqOheFKa -LTYInFAdscmZwIo8jHakqf+UMu3H5dzABBRATDvcci7nBPi+J8F7qLvklzb1zd0L -Ir/QnAy3zFUYUbwwRXDy0Gi0HsU5xP9QYT3pmtW3I+Xlwpso417XoE+1DYtizjbx -FuJaSNs7K7VPaELezdvtFL0SGYNkpxz7EiVcW6TxmLsLBoNAeaKhHYtwhblQKznv -6mEbjmiAo3oB68ghI+3xW2mZ+T+t3sgl5aNWiZ6RQx5v4liYc4vShmewcKGWvN7T -RC5Ert0GxMJGsx7fIRAgWDOI1aMj5bx9H23d3RKxJWrRCXhSlg1lyzVj+GCrhYAy -16/JH5ph0m+FCVwAP0GhHsZCQV1AT+YL7lgEZvmGq0ucDShc69lLh7qsxMg7zckk -l66F14Imuz0EasVCdI3IwkuTFch9Quu5Ag0EYvrizwEQANpINEPd+Vio1D0opPBO -Sa4keWk5IvvGETt6jUBemQten1gOB89Zba3E8ZgJpPobaThFrpsQJ9wNM8+KBHGm -U+DTP+JC+65J9Eq6KA8qcH2jn3xKBWipWUACKUCvpFSNq63f3+RVbAyTYdykRhEU -Ih+7eFtl3X0Q6v92TMZL26euXqt73UoOsoulKEmfSyhiQBQX7WNCtq3JR/mZ4+OA -/N3J7qw+emvKG3t8h3/5CtpZWEMaJwaGyyENScsw5KEOYjl9o11mMeYRYfZ0n0h7 -DA8BmBl/k71+UvdopdzuwjRib02uZfdCC15tltLpoVeL/pa0GRmTRuCJARwjDD95 -xbrrYYqw2wD6l3Mtv/EooIBdzGpP15VnD4DFC5W9vxnxuEfSnX0DxCObsd6MCzZw -GOiF4HudfFzB2SiE/OXNaAxdpSD9C8n0Y3ac74dk6uamzCkSnCjzzAOytFZY18fi -N5ihDA9+2TeEOL0RVrQw0Mdc4X80A1dlCJ6Gh1Py4WOtDxB5UmSY2olvV6p5pRRD -1HEnM9bivPdEErYpUI72K4L5feXFxt/obQ0rZMmmnYMldAcPcqsTMVgPWZICK/z0 -X/SrOR0YEa28XA+V69o4TwPR77oUK6t3SiFzAi3VmQtAP6NkqL+FNMa0V1ZiEPse -lZhKVziNh5Jb8bnkQA6+9Md3ABEBAAGJAjYEGAEIACAWIQQMM9NeGpKWsyzzEAWs -2E8gkwtH6AUCYvrizwIbDAAKCRCs2E8gkwtH6OYIEACtPjMCg+x+vxVU8KhqwxpA -UyDOuNbzB2TSMmETgGqHDqk/F4eSlMvZTukGlo5yPDYXhd7vUT45mrlRq8ljzBLr -NkX2mkGgocdjAjSF2rgugMb+APpKNFxZtUPKosyyOPS9z4+4tjxfCpj2u2hZy8PD -C3/6dz9Yga0kgWu2GWFZFFZiGxPyUCkjnUBWz53dT/1JwWt3W81bihVfhLX9CVgO -KPEoZ96BaEucAHY0r/yq0zAq/+DCTYRrDLkeuZaDTB1RThWOrW+GCoPcIxbLi4/j -/YkIGQCaYvpVsuacklwqhSxhucqctRklGHLrjLdxrqcS1pIfraCsRJazUoO1Uu7n -DQ/aF9fczzX9nKv7t341lGn+Ujv5EEuaA/y38XSffsHxCmpEcvjGAH0NZsjHbYd/ -abeFTAnMV1r2r9/UcyuosEsaRyjW4Ljd51wWyGVv4Ky40HJYRmtefJX+1QDAntPJ -lVPHQCa2B/YIDrFeokXFxDqONkA+fFm+lDb83lhAAhjxCwfbytZqJFTvYh7TQTLx -3+ZA1BoFhxIHnR2mrFK+yqny9w6YAeZ8YMG5edH1EKoNVfic7OwwId1eQL6FCKCv -F3sNZiCC3i7P6THg9hZSF1eNbfiuZuMxUbw3OZgYhyXLB023vEZ1mUQCAcbfsQxU -sw6Rs2zVSxvPcg5CN8APig== -=fujW +mQINBGP0Hf0BEACyHWHb/DyfpkIC64sJQKR7GGLBicFOxsVNYrxxcZJvdnfjFnHC +ajib6m6dIQ5g+YgH23U/jIpHhZbXLWrQkyuYW4JbaG8uobK5S7crAqpYjtwRJHRe +R4f8DO6nWUNxZGHYFU46zvt7GuBjN005u+X2Oxq9xau+CVgkS1r/vbykxDwGOcYM +/vmgITo+Zk2zs2Krea+ul0aVZRvhGB8ZHHSdz83NTDm0DwlzALFodLWIRvSblqtZ +SPVKntzmN6OYjVjPMK6HgLlVlH2WqOIexuZnbadioM6+Hg/eihXQVLU7wpBBliFA +KTUnCNRRxEF8M7zPKEpyQbV2KJqMLdGLpE+ZEfzOKUxbCBmzF1MQ5Pxm4mm8RlvA +DDoOI/I3IstoizsxI6hV7U3w22R4c++qmFtX/lzgDnCKfISBTQaofiVlvMg7fx+f +7bA1oJxlMJMpjNO9s3qudMAxtrSzHUnIt2ThsxcsL+wfu/HxvR1+PfX6eCCXaVjN +/ii0EkWbHBq6Jb1IDzKuU02oX0TWQisDqn+IHq8/Q46PH3H2nF6hfg8zJXMkTusc +T8AmCoQCeVEPMbnVTWW9sVJC2gQPrCQJHEUbu5OHb9REtJ3GqtRw+mogTrpO5ads +PO61a94fJQcTDgR59hShrXiXxUK07C/rXqexcVnXEZyfn/5ZnqmgdVNt2wARAQAB +tDdYaW5yb25nIE1lbmcgKFJFTEVBU0UgU0lHTklORyBLRVkpIDx4aW5yb25nQGFw +YWNoZS5vcmc+iQJOBBMBCgA4FiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmP0Hf0C +GwMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQp+V5CMek4bFlWg//YIN9HNQ2 +yj3gW9lXVTWtSzJvlnwZr5V9JBGevpWMNF3U38Dk0nlQUiSvHdpfQjIyITOYR9Iv +GxuZCp5szVaRc00pfQWFy684zLvwqrjKekLzCpkqTOGXHO2RxeJH2ZBqcI9OSpR5 +B2J94dlQItM/bKsXhMNOwmVtS6kSW36aN/0Nd9ZQF
svn commit: r60249 - /dev/spark/KEYS
Author: xinrong Date: Wed Feb 22 03:51:47 2023 New Revision: 60249 Log: Update KEYS Modified: dev/spark/KEYS Modified: dev/spark/KEYS == --- dev/spark/KEYS (original) +++ dev/spark/KEYS Wed Feb 22 03:51:47 2023 @@ -1848,4 +1848,61 @@ P+3d/bY7eHLaFnkIuQR2dzaJti/nf2b/7VQHLm6H Y2wH1LgDJJsoBLPFNxhgTLjMlErwsZlacmXyogrmOS+ZvgQz/LZ1mIryTAkd1Gym JznYPjY83fSKkeCh =3Ggj --END PGP PUBLIC KEY BLOCK- \ No newline at end of file +-END PGP PUBLIC KEY BLOCK- + +pub rsa4096 2022-08-16 [SC] + 0C33D35E1A9296B32CF31005ACD84F20930B47E8 +uid [ultimate] Xinrong Meng (CODE SIGNING KEY) +sub rsa4096 2022-08-16 [E] +-BEGIN PGP PUBLIC KEY BLOCK- + +mQINBGL64s8BEADCeefEm9XB63o/xIGpnwurEL24h5LsZdA7k7juZ5C1Fu6m5amT +0A1n49YncYv6jDQD8xh+eiZ11+mYEAzkmGD+aVEMQA0/Zrp0rMe22Ymq5fQHfRCO +88sQl4PvmqaElcAswFz7RP+55GWSIfEbZIJhZQdukaVCZuC+Xpb68TAj2OSXZ+Mt +m8RdJXIJpmD0P6R7bvY4LPZL8tY7wtnxUj1I9wRnXc0AnbPfI6gGyF+b0x54b4Ey +2+sZ6tNH501I9hgdEOWj+nqQFZTTzZQPI1r3nPIA28T9VDOKi5dmoI6iXFjCWZ2N +dmsw8GN+45V1udOgylE2Mop7URzOQYlqaFnJvXzO/nZhAqbetrMmZ6jmlbqLEq/D +C8cgYFuMwER3oAC0OwpSz2HLCya95xHDdPqX+Iag0h0bbFBxSNpgzQiUk1mvSYXa ++7HGQ3rIfy7+87hA1BIHaN0L1oOw37UWk2IGDvS29JlGJ3SJDX5Ir5uBvW6k9So6 +xG9vT+l+R878rLcjJLJT4Me4pk4z8O4Uo+IY0uptiTYnvYRXBOw9wk9KpSckbr+s +I2keVwa+0fui4c1ESwNHR8HviALho9skvwaCAP3TUZ43SHeDU840M9LwDWc6VNc1 +x30YbgYeKtyU1deh7pcBhykUJPrZ457OllG8SbnhAncwmf8TaJjUkQARAQAB +tDRYaW5yb25nIE1lbmcgKENPREUgU0lHTklORyBLRVkpIDx4aW5yb25nQGFwYWNo +ZS5vcmc+iQJOBBMBCAA4FiEEDDPTXhqSlrMs8xAFrNhPIJMLR+gFAmL64s8CGwMF +CwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQrNhPIJMLR+gNSRAAkhNM7vAFRwaX +MachhS97+L2ZklerzeZuCP0zeYZ9gZloGUx+eM3MWOglUcKH0f6DjPitMMCr1Qbo +OsENANTS5ZOp4r4rhbbNhYbA8Wbx8H+ZABmCuUNJMjmeVh3qL1WmHclApegqxiSH +uc9xXB1RZOJH2pS2v7UXW2c/Y745oT/YxWX9hBeJUPWmg6M6jn1/osnqmUngXSvB +HNzxzHT1gJJNEcRU3r5bKAJlLWBZzLO4pIgtFqIfpS79ieG54OwedrW3oqOheFKa +LTYInFAdscmZwIo8jHakqf+UMu3H5dzABBRATDvcci7nBPi+J8F7qLvklzb1zd0L +Ir/QnAy3zFUYUbwwRXDy0Gi0HsU5xP9QYT3pmtW3I+Xlwpso417XoE+1DYtizjbx +FuJaSNs7K7VPaELezdvtFL0SGYNkpxz7EiVcW6TxmLsLBoNAeaKhHYtwhblQKznv +6mEbjmiAo3oB68ghI+3xW2mZ+T+t3sgl5aNWiZ6RQx5v4liYc4vShmewcKGWvN7T +RC5Ert0GxMJGsx7fIRAgWDOI1aMj5bx9H23d3RKxJWrRCXhSlg1lyzVj+GCrhYAy +16/JH5ph0m+FCVwAP0GhHsZCQV1AT+YL7lgEZvmGq0ucDShc69lLh7qsxMg7zckk +l66F14Imuz0EasVCdI3IwkuTFch9Quu5Ag0EYvrizwEQANpINEPd+Vio1D0opPBO +Sa4keWk5IvvGETt6jUBemQten1gOB89Zba3E8ZgJpPobaThFrpsQJ9wNM8+KBHGm +U+DTP+JC+65J9Eq6KA8qcH2jn3xKBWipWUACKUCvpFSNq63f3+RVbAyTYdykRhEU +Ih+7eFtl3X0Q6v92TMZL26euXqt73UoOsoulKEmfSyhiQBQX7WNCtq3JR/mZ4+OA +/N3J7qw+emvKG3t8h3/5CtpZWEMaJwaGyyENScsw5KEOYjl9o11mMeYRYfZ0n0h7 +DA8BmBl/k71+UvdopdzuwjRib02uZfdCC15tltLpoVeL/pa0GRmTRuCJARwjDD95 +xbrrYYqw2wD6l3Mtv/EooIBdzGpP15VnD4DFC5W9vxnxuEfSnX0DxCObsd6MCzZw +GOiF4HudfFzB2SiE/OXNaAxdpSD9C8n0Y3ac74dk6uamzCkSnCjzzAOytFZY18fi +N5ihDA9+2TeEOL0RVrQw0Mdc4X80A1dlCJ6Gh1Py4WOtDxB5UmSY2olvV6p5pRRD +1HEnM9bivPdEErYpUI72K4L5feXFxt/obQ0rZMmmnYMldAcPcqsTMVgPWZICK/z0 +X/SrOR0YEa28XA+V69o4TwPR77oUK6t3SiFzAi3VmQtAP6NkqL+FNMa0V1ZiEPse +lZhKVziNh5Jb8bnkQA6+9Md3ABEBAAGJAjYEGAEIACAWIQQMM9NeGpKWsyzzEAWs +2E8gkwtH6AUCYvrizwIbDAAKCRCs2E8gkwtH6OYIEACtPjMCg+x+vxVU8KhqwxpA +UyDOuNbzB2TSMmETgGqHDqk/F4eSlMvZTukGlo5yPDYXhd7vUT45mrlRq8ljzBLr +NkX2mkGgocdjAjSF2rgugMb+APpKNFxZtUPKosyyOPS9z4+4tjxfCpj2u2hZy8PD +C3/6dz9Yga0kgWu2GWFZFFZiGxPyUCkjnUBWz53dT/1JwWt3W81bihVfhLX9CVgO +KPEoZ96BaEucAHY0r/yq0zAq/+DCTYRrDLkeuZaDTB1RThWOrW+GCoPcIxbLi4/j +/YkIGQCaYvpVsuacklwqhSxhucqctRklGHLrjLdxrqcS1pIfraCsRJazUoO1Uu7n +DQ/aF9fczzX9nKv7t341lGn+Ujv5EEuaA/y38XSffsHxCmpEcvjGAH0NZsjHbYd/ +abeFTAnMV1r2r9/UcyuosEsaRyjW4Ljd51wWyGVv4Ky40HJYRmtefJX+1QDAntPJ +lVPHQCa2B/YIDrFeokXFxDqONkA+fFm+lDb83lhAAhjxCwfbytZqJFTvYh7TQTLx +3+ZA1BoFhxIHnR2mrFK+yqny9w6YAeZ8YMG5edH1EKoNVfic7OwwId1eQL6FCKCv +F3sNZiCC3i7P6THg9hZSF1eNbfiuZuMxUbw3OZgYhyXLB023vEZ1mUQCAcbfsQxU +sw6Rs2zVSxvPcg5CN8APig== +=fujW +-END PGP PUBLIC KEY BLOCK- - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r60241 - in /dev/spark/v3.4.0-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api
Author: xinrong Date: Tue Feb 21 13:34:14 2023 New Revision: 60241 Log: Apache Spark v3.4.0-rc1 docs [This commit notification would consist of 2806 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r60238 - /dev/spark/v3.4.0-rc1-bin/
Author: xinrong Date: Tue Feb 21 11:57:55 2023 New Revision: 60238 Log: Apache Spark v3.4.0-rc1 Added: dev/spark/v3.4.0-rc1-bin/ dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz (with props) dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.asc dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.sha512 dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz (with props) dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.asc dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.sha512 dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz (with props) dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.asc dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.sha512 dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3.tgz (with props) dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3.tgz.asc dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3.tgz.sha512 dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-without-hadoop.tgz (with props) dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-without-hadoop.tgz.asc dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-without-hadoop.tgz.sha512 dev/spark/v3.4.0-rc1-bin/spark-3.4.0.tgz (with props) dev/spark/v3.4.0-rc1-bin/spark-3.4.0.tgz.asc dev/spark/v3.4.0-rc1-bin/spark-3.4.0.tgz.sha512 Added: dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.asc == --- dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.asc (added) +++ dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.asc Tue Feb 21 11:57:55 2023 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmP0sVMTHHhpbnJvbmdA +YXBhY2hlLm9yZwAKCRCn5XkIx6Thsbk1D/4wKDoCUBbr0bOOPpGKbMyWggJQDdvl +xCDXR5nFFkLdY6vZFerIp32jX1JFQA2Enr24iCBy00ERszFT9LMRP66nOG3OseU1 +6eI4Y4l5ACAD35qdUjFsuPNPy71Q2HqWrY52isMZWfj8TYY9X3T3w9Wox6KgTOon +rGoOtj+N6tAF5ACvJIX43li8JPesJQNl1epbu2LtrZa+tFyfgQBowuHmhiQ5PQ/v +EufANZytLWllzX81EfNbiJ9hN9geqIHgXew6b1rtd8IS05PdDimA/uwtP+LqBBqq +MKfUA6Tf8T9SpN36ZN6/lfOKVKu0OFXc9qfJIj9cdBfhTcoP1vUGVMqNtWEQQFqo +DZVRnBrnnx5lQOYry3gm4UgdLtHpwqvOZtqpmbvSHV503+JCqBnFnw8jvGzaVfWZ +OIPa4AuhjAxqMcnCdLHmpg/QcX07/tPXPO0kpEWz7a1QjF6C+gidtbgIghY/HIzs +lNfI3TdWop3Wwnpa0kHHlwi15jfeaxnPQDtIw/YRWojbztE0wG8rXycoWl2h0o05 +XQ55Rl9qEviW3GPOW52SGAD47+2j3eU6lFEs+xz85E/jxIneYkuweMJ5Vk1iTdEH +7yfjQqVozR3QeyaYll9W1ax50LUtrMx5vTMdy82L0yzg0NQctqEa+I3HRQjgxVFB +7gqTLxqG8bpyPA== +=+Kud +-END PGP SIGNATURE- Added: dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.sha512 == --- dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.sha512 (added) +++ dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.sha512 Tue Feb 21 11:57:55 2023 @@ -0,0 +1 @@ +21574f5fb95f397640c896678002559a10b6e264b3887115128bde380682065e8a3883dd94136c318c78f3047a7cd4a2763b617863686329b47532983f171240 SparkR_3.4.0.tar.gz Added: dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.asc == --- dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.asc (added) +++ dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.asc Tue Feb 21 11:57:55 2023 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmP0sVUTHHhpbnJvbmdA +YXBhY2hlLm9yZwAKCRCn5XkIx6ThsWbPD/9dWcxjrRR54QccE8zwX5oaiboVFXuI +0BLahV54IQi4HZjVgRHzbEWD/qaemW5Brcos003nsaGnXT0m0oi656X2967ZuJTk +zYanrIafACwplVo7uxcq2VBp6IKcDkWEUL42fAcV5GN1/1NpNHqzZqZMGe5ufKLB +05Np0ac8L6XXMpIG0to6H1LEmAW7/4PBARpzt6/TgZjoEI7a7YHMUlL0OjmHmP/m +3Ck8slg+Osk2opYJL4AXycFh36Ns43OG3TnhfLYyDG0jtiXpWBZ4Yt2bin55j0f/ +yrDe1lDlRJ14pXay2f/s5eFrz16qHfRluWZzxcEyJjZva1AD5V1XMh/zsRGDfvUZ +BkEM2GHYn3gZH9uuGfYbqL+pcZgrmVjZMgcZfhjyxLrRW8WBFr9g5lCIQF+4lpU8 +JwM4W3eOLyaC3wpVTfPU8rJfGExeBLhJ7zAyw65+yUx27KMUWatzGuQSA63iE1bg +FIruQABSDsenFARnLybB8l41t0PTGlWU9+g5E4BlU/+GbnxaQEuOTSnZOenhPOGe +n2g4Yfr81aYqVX8VKL0wzYXeB39SaXrtGhUaWVjFookNb42SNB1IPG2xQ+qQtcMw +jv1m+1BIMWXDLZcLlrIViEzoyNhIy83CipDujJpoh4tlXb3OHOJqYuIZjMPhgVcB +vtJFP8xIOdwRIg== +=058e +-END PGP SIGNATURE- Added: dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.sha512 == --- dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.sha512 (added) +++ dev/spark
[spark] branch branch-3.4 updated (f394322be3b -> 63be7fd7334)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git from f394322be3b [SPARK-42507][SQL][TESTS] Simplify ORC schema merging conflict error check add e2484f626bb Preparing Spark release v3.4.0-rc1 new 63be7fd7334 Preparing development version 3.4.1-SNAPSHOT The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing development version 3.4.1-SNAPSHOT
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git commit 63be7fd7334111474e79d88c687d376ede30e37f Author: Xinrong Meng AuthorDate: Tue Feb 21 10:39:26 2023 + Preparing development version 3.4.1-SNAPSHOT --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/connect/common/pom.xml | 2 +- connector/connect/server/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 43 files changed, 45 insertions(+), 45 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 4a32762b34c..fa7028630a8 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.4.0 +Version: 3.4.1 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: diff --git a/assembly/pom.xml b/assembly/pom.xml index 58dd9ef46e0..a4111eb64d9 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 95ea15552da..f9ecfb3d692 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index e4d98471bf9..22ee65b7d25 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 7a6d5aedf65..2c67da81ca4 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 1c421754083..219682e047d 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch
[spark] tag v3.4.0-rc1 created (now e2484f626bb)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to tag v3.4.0-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git at e2484f626bb (commit) This tag includes the following new commits: new e2484f626bb Preparing Spark release v3.4.0-rc1 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing Spark release v3.4.0-rc1
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to tag v3.4.0-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git commit e2484f626bb338274665a49078b528365ea18c3b Author: Xinrong Meng AuthorDate: Tue Feb 21 10:39:21 2023 + Preparing Spark release v3.4.0-rc1 --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/connect/common/pom.xml | 2 +- connector/connect/server/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 43 files changed, 45 insertions(+), 45 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index fa7028630a8..4a32762b34c 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.4.1 +Version: 3.4.0 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: diff --git a/assembly/pom.xml b/assembly/pom.xml index a4111eb64d9..58dd9ef46e0 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index f9ecfb3d692..95ea15552da 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 22ee65b7d25..e4d98471bf9 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 2c67da81ca4..7a6d5aedf65 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 219682e047d..1c421754083 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml inde
[spark] branch branch-3.4 updated: [SPARK-42507][SQL][TESTS] Simplify ORC schema merging conflict error check
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new f394322be3b [SPARK-42507][SQL][TESTS] Simplify ORC schema merging conflict error check f394322be3b is described below commit f394322be3b9a0451e0dff158129b607549b9160 Author: Dongjoon Hyun AuthorDate: Tue Feb 21 17:48:09 2023 +0800 [SPARK-42507][SQL][TESTS] Simplify ORC schema merging conflict error check ### What changes were proposed in this pull request? This PR aims to simplify ORC schema merging conflict error check. ### Why are the changes needed? Currently, `branch-3.4` CI is broken because the order of partitions. - https://github.com/apache/spark/runs/11463120795 - https://github.com/apache/spark/runs/11463886897 - https://github.com/apache/spark/runs/11467827738 - https://github.com/apache/spark/runs/11471484144 - https://github.com/apache/spark/runs/11471507531 - https://github.com/apache/spark/runs/11474764316 ![Screenshot 2023-02-20 at 12 30 19 PM](https://user-images.githubusercontent.com/9700541/220193503-6d6ce2ce-3fd6-4b01-b91c-bc1ec1f41c03.png) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass the CIs. Closes #40101 from dongjoon-hyun/SPARK-42507. Authored-by: Dongjoon Hyun Signed-off-by: Xinrong Meng (cherry picked from commit 0c20263dcd0c394f8bfd6fa2bfc62031135de06a) Signed-off-by: Xinrong Meng --- .../spark/sql/execution/datasources/orc/OrcSourceSuite.scala | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala index c821276431e..024f5f6b67e 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala @@ -455,11 +455,8 @@ abstract class OrcSuite throw new UnsupportedOperationException(s"Unknown ORC implementation: $impl") } -checkError( - exception = innerException.asInstanceOf[SparkException], - errorClass = "CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE", - parameters = Map("left" -> "\"BIGINT\"", "right" -> "\"STRING\"") -) +assert(innerException.asInstanceOf[SparkException].getErrorClass === + "CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE") } // it is ok if no schema merging - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0e8a20e6da1 -> 0c20263dcd0)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 0e8a20e6da1 [SPARK-37099][SQL] Introduce the group limit of Window for rank-based filter to optimize top-k computation add 0c20263dcd0 [SPARK-42507][SQL][TESTS] Simplify ORC schema merging conflict error check No new revisions were added by this update. Summary of changes: .../spark/sql/execution/datasources/orc/OrcSourceSuite.scala | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing development version 3.4.1-SNAPSHOT
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git commit 1dfa58d78eba7080a244945c23f7b35b62dde12b Author: Xinrong Meng AuthorDate: Tue Feb 21 02:43:10 2023 + Preparing development version 3.4.1-SNAPSHOT --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/connect/common/pom.xml | 2 +- connector/connect/server/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 43 files changed, 45 insertions(+), 45 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 4a32762b34c..fa7028630a8 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.4.0 +Version: 3.4.1 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: diff --git a/assembly/pom.xml b/assembly/pom.xml index 58dd9ef46e0..a4111eb64d9 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 95ea15552da..f9ecfb3d692 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index e4d98471bf9..22ee65b7d25 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 7a6d5aedf65..2c67da81ca4 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 1c421754083..219682e047d 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch
[spark] branch branch-3.4 updated (4560d4c4f75 -> 1dfa58d78eb)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git from 4560d4c4f75 [SPARK-41952][SQL] Fix Parquet zstd off-heap memory leak as a workaround for PARQUET-2160 add 81d39dcf742 Preparing Spark release v3.4.0-rc1 new 1dfa58d78eb Preparing development version 3.4.1-SNAPSHOT The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] tag v3.4.0-rc1 created (now 81d39dcf742)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to tag v3.4.0-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git at 81d39dcf742 (commit) This tag includes the following new commits: new 81d39dcf742 Preparing Spark release v3.4.0-rc1 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing Spark release v3.4.0-rc1
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to tag v3.4.0-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git commit 81d39dcf742ed7114d6e01ecc2487825651e30cb Author: Xinrong Meng AuthorDate: Tue Feb 21 02:43:05 2023 + Preparing Spark release v3.4.0-rc1 --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/connect/common/pom.xml | 2 +- connector/connect/server/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 43 files changed, 45 insertions(+), 45 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index fa7028630a8..4a32762b34c 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.4.1 +Version: 3.4.0 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: diff --git a/assembly/pom.xml b/assembly/pom.xml index a4111eb64d9..58dd9ef46e0 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index f9ecfb3d692..95ea15552da 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 22ee65b7d25..e4d98471bf9 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 2c67da81ca4..7a6d5aedf65 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 219682e047d..1c421754083 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml inde
svn commit: r60229 - /dev/spark/v3.4.0-rc1-bin/
Author: xinrong Date: Tue Feb 21 00:44:12 2023 New Revision: 60229 Log: Removing RC artifacts. Removed: dev/spark/v3.4.0-rc1-bin/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r60203 - /dev/spark/v3.4.0-rc1-bin/
Author: xinrong Date: Mon Feb 20 01:01:42 2023 New Revision: 60203 Log: Apache Spark v3.4.0-rc1 Added: dev/spark/v3.4.0-rc1-bin/ dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz (with props) dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.asc dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.sha512 dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz (with props) dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.asc dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.sha512 dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz (with props) dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.asc dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.sha512 dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3.tgz (with props) dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3.tgz.asc dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3.tgz.sha512 dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-without-hadoop.tgz (with props) dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-without-hadoop.tgz.asc dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-without-hadoop.tgz.sha512 dev/spark/v3.4.0-rc1-bin/spark-3.4.0.tgz (with props) dev/spark/v3.4.0-rc1-bin/spark-3.4.0.tgz.asc dev/spark/v3.4.0-rc1-bin/spark-3.4.0.tgz.sha512 Added: dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.asc == --- dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.asc (added) +++ dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.asc Mon Feb 20 01:01:42 2023 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEEDDPTXhqSlrMs8xAFrNhPIJMLR+gFAmPxf5cTHHhpbnJvbmdA +YXBhY2hlLm9yZwAKCRCs2E8gkwtH6I6gEACmdKxXlIrG6Nzi7Hv8Xie11LIRzVUP +59kSQ/bOYEdloW5gx5nLg+Cpcwh+yvgEvT0clvTNznXD4NEDRuS9XyPsRoXos+Ct +YL/xJo23+3zX1/OGE4P/fi7NXrgC3GmX3KKzpn3RkKuC6QRh6U1R1jlkl896LcHK +fOcLDuLCAKA6fy+EmlkX6H4sZGGLM5b2gYJcukvbA8bH5kdyWF2mPgprYwVUtryE +UfciZ9O5BSaawA5fo2MTmaI/9JAN9j1Vnxg+CQVnDN9arnQMp/0PegblyEa7ZRjt +ww8r/Ylq5F9Yi1wFLLhkgyF7KzLQtO8Bl/ar1UoDhWnTnNaAEUbEtVCN2Wy1E1y/ +BK2nKYzNM3cqXnLXMyXxSVVl6Cx4NXVpDxt94VlvO5S+ijFmyd2DyN2G/MCF9yJg +IQcad+vVtt6BdXbmFW+lD4eVFtXbX+eKrDPVKLMYCaWyTZkw3aCachSprjJabX0l +ph4ogML8iOVQiODobKzI+S4EXRMe5KDD9VXAVbN+1jOyTdnU7WYqSWI3rh7BGBwO +ihwBOHOjI+dkr0awBTmDKMXWaLeUYiDfXqeoVxNtXJ7SptPJcfkd47XpR9Tgw6yU +oxYMHLMrYYAC6qFMxjWbJz029FJxBvRJCmynQPCd7p0tmPL0qteqGymckjGUv8ko +TdJcHjdc2+UyeQ== +=TUhq +-END PGP SIGNATURE- Added: dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.sha512 == --- dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.sha512 (added) +++ dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.sha512 Mon Feb 20 01:01:42 2023 @@ -0,0 +1 @@ +38b2b86698d182620785b8f34d6f9a35e0a7f2ae2208e999cece2928ff66d50e75c621ce35189610d830f2475c2c134c3be5d4460050da65da23523d88707ceb SparkR_3.4.0.tar.gz Added: dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.asc == --- dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.asc (added) +++ dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.asc Mon Feb 20 01:01:42 2023 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJHBAABCgAxFiEEDDPTXhqSlrMs8xAFrNhPIJMLR+gFAmPxf5wTHHhpbnJvbmdA +YXBhY2hlLm9yZwAKCRCs2E8gkwtH6F2ZEACP0qBBbAv0z3lbq2Hvn3jeWyZVWbBy +BVWvfadOOKqKeC9VAgdfY6t6WT8yti0g5Ax+WqmgWHHLgjOKRECTWdlaSqD5m9bh +ALNphiKafoQjneqkwegNuN4uWNikGQzmCGqJLQG7bGy+9NoO2ib/pN6an4bmIxtb +uqdglfB7bC+MXB4YKdqyW5LfE1gi3diSXngBdU0p0nBqsDiUcC+gCZPIt8z5AN8i +c9rNoFrEEZ3jb14335AtkIufP6ebK2YT/1NF/FdirNB1hgtAfIRREi7jzptAuHYt +jDvuNxo6O2+G80ExbK0z7Ab3Qv3seSzLJYaIalRSAIn+NqH60g9PRv1/80FYLVUv +VYKKf4Y+KqGn4/rwaxWiUL1ggkbcbay1cpbJWxMc1ARKO1uUaTwjgEPoNEIXg0uU +VYsQwfS61Tp+wkRLFQ/2yXp5S4kOgI+gyOpe2QVXioJvtgUc3CWCWBOsRvPUOLQt +wv91pnqu+m7YcUfOmosJvtQudBCT/STz1fnMCug0YygWMj6u5QhTXpbj+UycOVkq +Q0TvFe+kDsptQWKX2uHlYOvBA8CfzVDeauoDTvEOwx4lxPB1C6GZ1LrD/RTk5SEh +5r8Wotul5JdbCxHpynqcDruGXBZv2SOa7ChF8q8S6CdrSxLdWWPekt0Q0zzg63cJ +n4x/dQdcXBDaXA== +=O8hd +-END PGP SIGNATURE- Added: dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.sha512 == --- dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.sha512 (added) +++ dev/spark
svn commit: r60202 - /dev/spark/v3.4.0-rc1-bin/
Author: xinrong Date: Mon Feb 20 00:49:21 2023 New Revision: 60202 Log: Removing RC artifacts. Removed: dev/spark/v3.4.0-rc1-bin/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.4 updated (2b54f076794 -> fdbc57aaf43)
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a change to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git from 2b54f076794 [SPARK-42430][DOC][FOLLOW-UP] Revise the java doc for TimestampNTZ & ANSI interval types add 96cff939031 Preparing Spark release v3.4.0-rc1 new fdbc57aaf43 Preparing development version 3.4.1-SNAPSHOT The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing development version 3.4.1-SNAPSHOT
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git commit fdbc57aaf431745ced4a1bea4057553e0c939d32 Author: Xinrong Meng AuthorDate: Sat Feb 18 12:12:49 2023 + Preparing development version 3.4.1-SNAPSHOT --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/connect/common/pom.xml | 2 +- connector/connect/server/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 43 files changed, 45 insertions(+), 45 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 4a32762b34c..fa7028630a8 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.4.0 +Version: 3.4.1 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: diff --git a/assembly/pom.xml b/assembly/pom.xml index 58dd9ef46e0..a4111eb64d9 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 95ea15552da..f9ecfb3d692 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index e4d98471bf9..22ee65b7d25 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 7a6d5aedf65..2c67da81ca4 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 1c421754083..219682e047d 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.0 +3.4.1-SNAPSHOT ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch
[spark] 01/01: Preparing Spark release v3.4.0-rc1
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to tag v3.4.0-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git commit 96cff93903153a3bcdca02d346daa9d65614d00a Author: Xinrong Meng AuthorDate: Sat Feb 18 12:11:25 2023 + Preparing Spark release v3.4.0-rc1 --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/connect/common/pom.xml | 2 +- connector/connect/server/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 43 files changed, 45 insertions(+), 45 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index fa7028630a8..4a32762b34c 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.4.1 +Version: 3.4.0 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: diff --git a/assembly/pom.xml b/assembly/pom.xml index a4111eb64d9..58dd9ef46e0 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index f9ecfb3d692..95ea15552da 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 22ee65b7d25..e4d98471bf9 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 2c67da81ca4..7a6d5aedf65 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 219682e047d..1c421754083 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.4.1-SNAPSHOT +3.4.0 ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml inde