date:20220208

[spark] branch master updated: [SPARK-38153][BUILD] Remove option newlines.topLevelStatements in scalafmt.conf

2022-02-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 69c2c34  [SPARK-38153][BUILD] Remove option 
newlines.topLevelStatements in scalafmt.conf
69c2c34 is described below

commit 69c2c34e80a21927c3ca3664c477748ff47a3804
Author: Gengliang Wang 
AuthorDate: Tue Feb 8 23:01:24 2022 -0800

[SPARK-38153][BUILD] Remove option newlines.topLevelStatements in 
scalafmt.conf

### What changes were proposed in this pull request?

Remove option newlines.topLevelStatements in scalafmt.conf

### Why are the changes needed?

The configuration
```
newlines.topLevelStatements = [before,after]
```
is to add a blank line before the first member or after the last member of 
the class. This is neither encouraged nor discouraged as per 
https://github.com/databricks/scala-style-guide#blanklines

**Without the conf, scalafmt will still add blank lines between consecutive 
members (or initializers) of a class.**

As I tried running the script `./dev/scalafmt`, I saw unnessary blank lines

![image](https://user-images.githubusercontent.com/1097932/153122925-3238f15c-312b-4973-8e2d-92978bf5c6ad.png)

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Manual test

Closes #35455 from gengliangwang/removeToplevelLine.

Authored-by: Gengliang Wang 
Signed-off-by: Dongjoon Hyun 
---
 dev/.scalafmt.conf | 1 -
 1 file changed, 1 deletion(-)

diff --git a/dev/.scalafmt.conf b/dev/.scalafmt.conf
index 9598540..d2196e6 100644
--- a/dev/.scalafmt.conf
+++ b/dev/.scalafmt.conf
@@ -25,4 +25,3 @@ optIn = {
 danglingParentheses = false
 docstrings = JavaDoc
 maxColumn = 98
-newlines.topLevelStatements = [before,after]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fdda0eb -> 8a559b3)

2022-02-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fdda0eb  [SPARK-38147][BUILD][MLLIB] Upgrade `shapeless` to 2.3.7
 add 8a559b3  [SPARK-38149][BUILD] Upgrade joda-time to 2.10.13

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +-
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-38147][BUILD][MLLIB] Upgrade `shapeless` to 2.3.7

2022-02-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fdda0eb  [SPARK-38147][BUILD][MLLIB] Upgrade `shapeless` to 2.3.7
fdda0eb is described below

commit fdda0ebc752e6c8d9d71a6a83454c9c072c6d743
Author: Dongjoon Hyun 
AuthorDate: Tue Feb 8 20:56:11 2022 -0800

[SPARK-38147][BUILD][MLLIB] Upgrade `shapeless` to 2.3.7

### What changes were proposed in this pull request?

This PR aims to upgrade `shapeless` to 2.3.7.

### Why are the changes needed?

This will bring the latest bug fixes.
- https://github.com/milessabin/shapeless/releases/tag/v2.3.7 (Released on 
May 16, 2021)
- https://github.com/milessabin/shapeless/releases/tag/v2.3.6 (This is 
recommended to skip.)
- https://github.com/milessabin/shapeless/releases/tag/v2.3.5 (This is 
recommended to skip.)
- https://github.com/milessabin/shapeless/releases/tag/v2.3.4

### Does this PR introduce _any_ user-facing change?

No. `v2.3.7` is backward binary-compatible with `v2.3.3`.
- https://github.com/milessabin/shapeless/releases/tag/v2.3.7

### How was this patch tested?

Pass the CIs.

Closes #35450 from dongjoon-hyun/SPARK-38147.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-2-hive-2.3 | 3 +--
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 3 +--
 pom.xml   | 5 +
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 
b/dev/deps/spark-deps-hadoop-2-hive-2.3
index 8284237..63e3f87 100644
--- a/dev/deps/spark-deps-hadoop-2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2-hive-2.3
@@ -193,7 +193,6 @@ log4j-core/2.17.1//log4j-core-2.17.1.jar
 log4j-slf4j-impl/2.17.1//log4j-slf4j-impl-2.17.1.jar
 logging-interceptor/3.12.12//logging-interceptor-3.12.12.jar
 lz4-java/1.8.0//lz4-java-1.8.0.jar
-macro-compat_2.12/1.1.1//macro-compat_2.12-1.1.1.jar
 mesos/1.4.3/shaded-protobuf/mesos-1.4.3-shaded-protobuf.jar
 metrics-core/4.2.7//metrics-core-4.2.7.jar
 metrics-graphite/4.2.7//metrics-graphite-4.2.7.jar
@@ -243,7 +242,7 @@ scala-library/2.12.15//scala-library-2.12.15.jar
 scala-parser-combinators_2.12/1.1.2//scala-parser-combinators_2.12-1.1.2.jar
 scala-reflect/2.12.15//scala-reflect-2.12.15.jar
 scala-xml_2.12/1.2.0//scala-xml_2.12-1.2.0.jar
-shapeless_2.12/2.3.3//shapeless_2.12-2.3.3.jar
+shapeless_2.12/2.3.7//shapeless_2.12-2.3.7.jar
 shims/0.9.23//shims-0.9.23.jar
 slf4j-api/1.7.32//slf4j-api-1.7.32.jar
 snakeyaml/1.28//snakeyaml-1.28.jar
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index f169277..88cd560 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -179,7 +179,6 @@ log4j-core/2.17.1//log4j-core-2.17.1.jar
 log4j-slf4j-impl/2.17.1//log4j-slf4j-impl-2.17.1.jar
 logging-interceptor/3.12.12//logging-interceptor-3.12.12.jar
 lz4-java/1.8.0//lz4-java-1.8.0.jar
-macro-compat_2.12/1.1.1//macro-compat_2.12-1.1.1.jar
 mesos/1.4.3/shaded-protobuf/mesos-1.4.3-shaded-protobuf.jar
 metrics-core/4.2.7//metrics-core-4.2.7.jar
 metrics-graphite/4.2.7//metrics-graphite-4.2.7.jar
@@ -229,7 +228,7 @@ scala-library/2.12.15//scala-library-2.12.15.jar
 scala-parser-combinators_2.12/1.1.2//scala-parser-combinators_2.12-1.1.2.jar
 scala-reflect/2.12.15//scala-reflect-2.12.15.jar
 scala-xml_2.12/1.2.0//scala-xml_2.12-1.2.0.jar
-shapeless_2.12/2.3.3//shapeless_2.12-2.3.3.jar
+shapeless_2.12/2.3.7//shapeless_2.12-2.3.7.jar
 shims/0.9.23//shims-0.9.23.jar
 slf4j-api/1.7.32//slf4j-api-1.7.32.jar
 snakeyaml/1.28//snakeyaml-1.28.jar
diff --git a/pom.xml b/pom.xml
index f8f13fc..496c370 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1027,6 +1027,11 @@
 
   
   
+com.chuusai
+shapeless_${scala.binary.version}
+2.3.7
+  
+  
 org.json4s
 json4s-jackson_${scala.binary.version}
 3.7.0-M11

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (0af4fc8 -> de5e45a)

2022-02-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0af4fc8  [SPARK-37414][PYTHON][ML] Inline type hints for 
pyspark.ml.tuning
 add de5e45a  [SPARK-38086][SQL] Make ArrowColumnVector Extendable

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/vectorized/ArrowColumnVector.java| 59 +-
 .../sql/vectorized/ArrowColumnVectorSuite.scala| 31 
 2 files changed, 65 insertions(+), 25 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated (1aa9ef0 -> 66e73c4)

2022-02-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1aa9ef0  [SPARK-38144][CORE] Remove unused 
`spark.storage.safetyFraction` config
 add 66e73c4  [SPARK-38030][SQL][3.1] Canonicalization should not remove 
nullability of AttributeReference dataType

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/expressions/Canonicalize.scala|  7 +++
 .../catalyst/expressions/CanonicalizeSuite.scala   | 15 +-
 .../adaptive/AdaptiveQueryExecSuite.scala  | 24 --
 3 files changed, 39 insertions(+), 7 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated (ba3c8c5 -> d62735d)

2022-02-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ba3c8c5  [SPARK-38144][CORE] Remove unused 
`spark.storage.safetyFraction` config
 add d62735d  [SPARK-38030][SQL][3.2] Canonicalization should not remove 
nullability of AttributeReference dataType

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/Canonicalize.scala |  7 +++
 .../sql/catalyst/expressions/CanonicalizeSuite.scala  | 15 ++-
 2 files changed, 17 insertions(+), 5 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37414][PYTHON][ML] Inline type hints for pyspark.ml.tuning

2022-02-08 Thread zero323

This is an automated email from the ASF dual-hosted git repository.

zero323 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0af4fc8  [SPARK-37414][PYTHON][ML] Inline type hints for 
pyspark.ml.tuning
0af4fc8 is described below

commit 0af4fc8ea837d9d7f7f023e2eb3b9807fb073db1
Author: zero323 
AuthorDate: Wed Feb 9 03:16:20 2022 +0100

[SPARK-37414][PYTHON][ML] Inline type hints for pyspark.ml.tuning

### What changes were proposed in this pull request?

This PR migrates type `pyspark.ml.tuning` annotations from stub file to 
inline type hints.

### Why are the changes needed?

Part of ongoing migration of type hints.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #35406 from zero323/SPARK-37414.

Authored-by: zero323 
Signed-off-by: zero323 
---
 python/pyspark/ml/base.py   |   3 +-
 python/pyspark/ml/param/__init__.py |   2 +-
 python/pyspark/ml/tuning.py | 476 ++--
 python/pyspark/ml/tuning.pyi| 228 -
 python/pyspark/ml/util.py   |   7 +-
 5 files changed, 302 insertions(+), 414 deletions(-)

diff --git a/python/pyspark/ml/base.py b/python/pyspark/ml/base.py
index 9e8252d..20540eb 100644
--- a/python/pyspark/ml/base.py
+++ b/python/pyspark/ml/base.py
@@ -24,7 +24,6 @@ from typing import (
 Any,
 Callable,
 Generic,
-Iterable,
 Iterator,
 List,
 Optional,
@@ -133,7 +132,7 @@ class Estimator(Params, Generic[M], metaclass=ABCMeta):
 
 def fitMultiple(
 self, dataset: DataFrame, paramMaps: Sequence["ParamMap"]
-) -> Iterable[Tuple[int, M]]:
+) -> Iterator[Tuple[int, M]]:
 """
 Fits a model to the input dataset for each param map in `paramMaps`.
 
diff --git a/python/pyspark/ml/param/__init__.py 
b/python/pyspark/ml/param/__init__.py
index ee2c289..6c223c6 100644
--- a/python/pyspark/ml/param/__init__.py
+++ b/python/pyspark/ml/param/__init__.py
@@ -569,7 +569,7 @@ class Params(Identifiable, metaclass=ABCMeta):
 to._set(**{param.name: paramMap[param]})
 return to
 
-def _resetUid(self: "P", newUid: Any) -> "P":
+def _resetUid(self: P, newUid: Any) -> P:
 """
 Changes the uid of this instance. This updates both
 the stored uid and the parent uid of params and param maps.
diff --git a/python/pyspark/ml/tuning.py b/python/pyspark/ml/tuning.py
index 47805c9..9fae5fe 100644
--- a/python/pyspark/ml/tuning.py
+++ b/python/pyspark/ml/tuning.py
@@ -20,12 +20,28 @@ import sys
 import itertools
 from multiprocessing.pool import ThreadPool
 
+from typing import (
+Any,
+Callable,
+Dict,
+Iterable,
+List,
+Optional,
+Sequence,
+Tuple,
+Type,
+Union,
+cast,
+overload,
+TYPE_CHECKING,
+)
+
 import numpy as np
 
 from pyspark import keyword_only, since, SparkContext, 
inheritable_thread_target
 from pyspark.ml import Estimator, Transformer, Model
 from pyspark.ml.common import inherit_doc, _py2java, _java2py
-from pyspark.ml.evaluation import Evaluator
+from pyspark.ml.evaluation import Evaluator, JavaEvaluator
 from pyspark.ml.param import Params, Param, TypeConverters
 from pyspark.ml.param.shared import HasCollectSubModels, HasParallelism, 
HasSeed
 from pyspark.ml.util import (
@@ -43,6 +59,13 @@ from pyspark.ml.wrapper import JavaParams, JavaEstimator, 
JavaWrapper
 from pyspark.sql.functions import col, lit, rand, UserDefinedFunction
 from pyspark.sql.types import BooleanType
 
+from pyspark.sql.dataframe import DataFrame
+
+if TYPE_CHECKING:
+from pyspark.ml._typing import ParamMap
+from py4j.java_gateway import JavaObject  # type: ignore[import]
+from py4j.java_collections import JavaArray  # type: ignore[import]
+
 __all__ = [
 "ParamGridBuilder",
 "CrossValidator",
@@ -52,7 +75,14 @@ __all__ = [
 ]
 
 
-def _parallelFitTasks(est, train, eva, validation, epm, collectSubModel):
+def _parallelFitTasks(
+est: Estimator,
+train: DataFrame,
+eva: Evaluator,
+validation: DataFrame,
+epm: Sequence["ParamMap"],
+collectSubModel: bool,
+) -> List[Callable[[], Tuple[int, float, Transformer]]]:
 """
 Creates a list of callables which can be called from different threads to 
fit and evaluate
 an estimator in parallel. Each callable returns an `(index, metric)` pair.
@@ -79,7 +109,7 @@ def _parallelFitTasks(est, train, eva, validation, epm, 
collectSubModel):
 """
 modelIter = est.fitMultiple(train, epm)
 
-def singleTask():
+def singleTask() -> Tuple[int, float, Transformer]:
 index, model = next(modelIter)
 # TODO: duplicate evaluator to take extra params from input
 #  Note: Supporting tuning params in

[spark] branch master updated: [SPARK-37404][PYTHON][ML] Inline type hints for pyspark.ml.evaluation.py

2022-02-08 Thread zero323

This is an automated email from the ASF dual-hosted git repository.

zero323 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2a9416a  [SPARK-37404][PYTHON][ML] Inline type hints for 
pyspark.ml.evaluation.py
2a9416a is described below

commit 2a9416abd5c8d83901c09a08433bf91e649dc43f
Author: zero323 
AuthorDate: Wed Feb 9 01:32:57 2022 +0100

[SPARK-37404][PYTHON][ML] Inline type hints for pyspark.ml.evaluation.py

### What changes were proposed in this pull request?

This PR migrates type `pyspark.ml.evaluation` annotations from stub file to 
inline type hints.

### Why are the changes needed?

Part of ongoing migration of type hints.

### Does this PR introduce _any_ user-facing change?

No,

### How was this patch tested?

Existing tests.

Closes #35403 from zero323/SPARK-37404.

Authored-by: zero323 
Signed-off-by: zero323 
---
 python/pyspark/ml/_typing.pyi  |   2 +
 python/pyspark/ml/evaluation.py| 349 -
 python/pyspark/ml/evaluation.pyi   | 277 
 python/pyspark/ml/tests/typing/test_evaluation.yml |   2 +
 4 files changed, 207 insertions(+), 423 deletions(-)

diff --git a/python/pyspark/ml/_typing.pyi b/python/pyspark/ml/_typing.pyi
index 7862078..12d831f 100644
--- a/python/pyspark/ml/_typing.pyi
+++ b/python/pyspark/ml/_typing.pyi
@@ -71,6 +71,8 @@ MultilabelClassificationEvaluatorMetricType = Union[
 Literal["microF1Measure"],
 ]
 ClusteringEvaluatorMetricType = Literal["silhouette"]
+ClusteringEvaluatorDistanceMeasureType = Union[Literal["squaredEuclidean"], 
Literal["cosine"]]
+
 RankingEvaluatorMetricType = Union[
 Literal["meanAveragePrecision"],
 Literal["meanAveragePrecisionAtK"],
diff --git a/python/pyspark/ml/evaluation.py b/python/pyspark/ml/evaluation.py
index be63a8f..ff0e5b9 100644
--- a/python/pyspark/ml/evaluation.py
+++ b/python/pyspark/ml/evaluation.py
@@ -18,6 +18,8 @@
 import sys
 from abc import abstractmethod, ABCMeta
 
+from typing import Any, Dict, Optional, TYPE_CHECKING
+
 from pyspark import since, keyword_only
 from pyspark.ml.wrapper import JavaParams
 from pyspark.ml.param import Param, Params, TypeConverters
@@ -31,6 +33,20 @@ from pyspark.ml.param.shared import (
 )
 from pyspark.ml.common import inherit_doc
 from pyspark.ml.util import JavaMLReadable, JavaMLWritable
+from pyspark.sql.dataframe import DataFrame
+
+if TYPE_CHECKING:
+from pyspark.ml._typing import (
+ParamMap,
+BinaryClassificationEvaluatorMetricType,
+ClusteringEvaluatorDistanceMeasureType,
+ClusteringEvaluatorMetricType,
+MulticlassClassificationEvaluatorMetricType,
+MultilabelClassificationEvaluatorMetricType,
+RankingEvaluatorMetricType,
+RegressionEvaluatorMetricType,
+)
+
 
 __all__ = [
 "Evaluator",
@@ -54,7 +70,7 @@ class Evaluator(Params, metaclass=ABCMeta):
 pass
 
 @abstractmethod
-def _evaluate(self, dataset):
+def _evaluate(self, dataset: DataFrame) -> float:
 """
 Evaluates the output.
 
@@ -70,7 +86,7 @@ class Evaluator(Params, metaclass=ABCMeta):
 """
 raise NotImplementedError()
 
-def evaluate(self, dataset, params=None):
+def evaluate(self, dataset: DataFrame, params: Optional["ParamMap"] = 
None) -> float:
 """
 Evaluates the output with optional parameters.
 
@@ -99,7 +115,7 @@ class Evaluator(Params, metaclass=ABCMeta):
 raise TypeError("Params must be a param map but got %s." % 
type(params))
 
 @since("1.5.0")
-def isLargerBetter(self):
+def isLargerBetter(self) -> bool:
 """
 Indicates whether the metric returned by :py:meth:`evaluate` should be 
maximized
 (True, default) or minimized (False).
@@ -115,7 +131,7 @@ class JavaEvaluator(JavaParams, Evaluator, 
metaclass=ABCMeta):
 implementations.
 """
 
-def _evaluate(self, dataset):
+def _evaluate(self, dataset: DataFrame) -> float:
 """
 Evaluates the output.
 
@@ -130,16 +146,23 @@ class JavaEvaluator(JavaParams, Evaluator, 
metaclass=ABCMeta):
 evaluation metric
 """
 self._transfer_params_to_java()
+assert self._java_obj is not None
 return self._java_obj.evaluate(dataset._jdf)
 
-def isLargerBetter(self):
+def isLargerBetter(self) -> bool:
 self._transfer_params_to_java()
+assert self._java_obj is not None
 return self._java_obj.isLargerBetter()
 
 
 @inherit_doc
 class BinaryClassificationEvaluator(
-JavaEvaluator, HasLabelCol, HasRawPredictionCol, HasWeightCol, 
JavaMLReadable, JavaMLWritable
+JavaEvaluator,
+HasLabelCol,
+HasRawPredictionCol,
+HasWeightCol,
+

[spark] branch master updated (cc53a0e -> 8d2e08f)

2022-02-08 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cc53a0e  [SPARK-38144][CORE] Remove unused 
`spark.storage.safetyFraction` config
 add 8d2e08f  [SPARK-38069][SQL][SS] Improve the calculation of time window

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala  | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated (0b35c24 -> 1aa9ef0)

2022-02-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0b35c24  Preparing development version 3.1.4-SNAPSHOT
 add 1aa9ef0  [SPARK-38144][CORE] Remove unused 
`spark.storage.safetyFraction` config

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 5 -
 1 file changed, 5 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated (f58daa7 -> ba3c8c5)

2022-02-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f58daa7  [SPARK-38142][SQL][TESTS] Move `ArrowColumnVectorSuite` to 
`org.apache.spark.sql.vectorized`
 add ba3c8c5  [SPARK-38144][CORE] Remove unused 
`spark.storage.safetyFraction` config

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 5 -
 1 file changed, 5 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (43cce92 -> cc53a0e)

2022-02-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 43cce92  [SPARK-38124][SQL][SS] Introduce 
StatefulOpClusteredDistribution and apply to stream-stream join
 add cc53a0e  [SPARK-38144][CORE] Remove unused 
`spark.storage.safetyFraction` config

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 5 -
 1 file changed, 5 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5b02a34 -> 43cce92)

2022-02-08 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5b02a34  [SPARK-38142][SQL][TESTS] Move `ArrowColumnVectorSuite` to 
`org.apache.spark.sql.vectorized`
 add 43cce92  [SPARK-38124][SQL][SS] Introduce 
StatefulOpClusteredDistribution and apply to stream-stream join

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/plans/physical/partitioning.scala | 40 ++
 .../streaming/StreamingSymmetricHashJoinExec.scala |  4 +--
 .../spark/sql/streaming/StreamingJoinSuite.scala   |  2 +-
 3 files changed, 43 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated (adba516 -> f58daa7)

2022-02-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git.


from adba516  [SPARK-37934][BUILD][3.2] Upgrade Jetty version to 9.4.44
 add f58daa7  [SPARK-38142][SQL][TESTS] Move `ArrowColumnVectorSuite` to 
`org.apache.spark.sql.vectorized`

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/{execution => }/vectorized/ArrowColumnVectorSuite.scala  | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
 rename sql/core/src/test/scala/org/apache/spark/sql/{execution => 
}/vectorized/ArrowColumnVectorSuite.scala (99%)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (899d3bb -> 5b02a34)

2022-02-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 899d3bb  [SPARK-34183][SS] DataSource V2: Required distribution and 
ordering in micro-batch execution
 add 5b02a34  [SPARK-38142][SQL][TESTS] Move `ArrowColumnVectorSuite` to 
`org.apache.spark.sql.vectorized`

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/{execution => }/vectorized/ArrowColumnVectorSuite.scala  | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
 rename sql/core/src/test/scala/org/apache/spark/sql/{execution => 
}/vectorized/ArrowColumnVectorSuite.scala (99%)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c69f08f8 -> 899d3bb)

2022-02-08 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c69f08f8 [SPARK-34826][SHUFFLE] Adaptively fetch shuffle mergers for 
push based shuffle
 add 899d3bb  [SPARK-34183][SS] DataSource V2: Required distribution and 
ordering in micro-batch execution

No new revisions were added by this update.

Summary of changes:
 .../write/RequiresDistributionAndOrdering.java |  10 +
 .../spark/sql/errors/QueryCompilationErrors.scala  |   5 +
 .../sql/connector/catalog/InMemoryTable.scala  |   5 +-
 .../sql/execution/datasources/v2/V2Writes.scala|  46 +++-
 .../execution/streaming/MicroBatchExecution.scala  |  11 +-
 .../sql/execution/streaming/StreamExecution.scala  |  12 +-
 .../streaming/continuous/ContinuousExecution.scala |  33 ++-
 .../sources/WriteToMicroBatchDataSource.scala  |  20 +-
 .../WriteDistributionAndOrderingSuite.scala| 294 -
 9 files changed, 403 insertions(+), 33 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-37934][BUILD][3.2] Upgrade Jetty version to 9.4.44

2022-02-08 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new adba516  [SPARK-37934][BUILD][3.2] Upgrade Jetty version to 9.4.44
adba516 is described below

commit adba5165a56bd4e7a71fcad77c568c0cbc2e7f97
Author: Jack Richard Buggins 
AuthorDate: Wed Feb 9 02:28:03 2022 +0900

[SPARK-37934][BUILD][3.2] Upgrade Jetty version to 9.4.44

### What changes were proposed in this pull request?

This pull request updates provides a minor update to the Jetty version from 
`9.4.43.v20210629` to `9.4.44.v20210927` which is required against branch-3.2 
to fully resolve https://issues.apache.org/jira/browse/SPARK-37934

### Why are the changes needed?

As discussed in https://github.com/apache/spark/pull/35338, DoS vector is 
available even within a private or restricted network. The below result is the 
output of a twistlock scan, which also detects this vulnerability.

```
Source: https://github.com/eclipse/jetty.project/issues/6973
CVE: PRISMA-2021-0182
Sev.: medium
Package Name: org.eclipse.jetty_jetty-server
Package Ver.: 9.4.43.v20210629
Status: fixed in 9.4.44
Description: org.eclipse.jetty_jetty-server package versions before 9.4.44 
are vulnerable to DoS (Denial of Service). Logback-access calls 
Request.getParameterNames() for request logging. That will force a request body 
read (if it hasn't been read before) per the servlet. This will now consume 
resources to read the request body content, which could easily be malicious (in 
size? in keys? etc), even though the application intentionally didn't read the 
request body.
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

* Core local
```
$ build/sbt
> project core
> test
```
* CI

Closes #35442 from JackBuggins/branch-3.2.

Authored-by: Jack Richard Buggins 
Signed-off-by: Kousuke Saruta 
---
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pom.xml b/pom.xml
index bc3f925..8af3d6a 100644
--- a/pom.xml
+++ b/pom.xml
@@ -138,7 +138,7 @@
 10.14.2.0
 1.12.2
 1.6.13
-9.4.43.v20210629
+9.4.44.v20210927
 4.0.3
 0.10.0
 2.5.0

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-34826][SHUFFLE] Adaptively fetch shuffle mergers for push based shuffle

2022-02-08 Thread mridulm80

This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c69f08f8 [SPARK-34826][SHUFFLE] Adaptively fetch shuffle mergers for 
push based shuffle
c69f08f8 is described below

commit c69f08f81042c3ecca4b5dfa5511c1217ae88096
Author: Venkata krishnan Sowrirajan 
AuthorDate: Tue Feb 8 11:24:15 2022 -0600

[SPARK-34826][SHUFFLE] Adaptively fetch shuffle mergers for push based 
shuffle

### What changes were proposed in this pull request?

Currently shuffle mergers are fetched before the start of the 
ShuffleMapStage. But for initial stages this can be problematic as shuffle 
mergers are nothing but unique hosts with shuffle services running which could 
be very few based on executors and this can cause merge ratio to be low.

With this approach, `ShuffleMapTask` query for merger locations if not 
available and if available and start using this for pushing the blocks. Since 
partitions are mapped uniquely to a merger location, it should be fine to not 
push for the earlier set of tasks. This should improve the merge ratio for even 
initial stages.

### Why are the changes needed?

Performance improvement. No new APIs change.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added unit tests and also has been working in our internal production 
environment for a while now.

Closes #34122 from venkata91/SPARK-34826.

Authored-by: Venkata krishnan Sowrirajan 
Signed-off-by: Mridul Muralidharan gmail.com>
---
 .../main/scala/org/apache/spark/Dependency.scala   |  39 ++--
 .../scala/org/apache/spark/MapOutputTracker.scala  |  88 +++-
 .../org/apache/spark/scheduler/DAGScheduler.scala  |  86 ++--
 .../org/apache/spark/scheduler/StageInfo.scala |  16 +-
 .../spark/shuffle/ShuffleWriteProcessor.scala  |  14 +-
 .../spark/shuffle/sort/SortShuffleManager.scala|   2 +-
 .../org/apache/spark/MapOutputTrackerSuite.scala   |  30 ++-
 .../apache/spark/scheduler/DAGSchedulerSuite.scala | 224 +++--
 8 files changed, 440 insertions(+), 59 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/Dependency.scala 
b/core/src/main/scala/org/apache/spark/Dependency.scala
index 8e348ee..fbb92b4 100644
--- a/core/src/main/scala/org/apache/spark/Dependency.scala
+++ b/core/src/main/scala/org/apache/spark/Dependency.scala
@@ -104,15 +104,17 @@ class ShuffleDependency[K: ClassTag, V: ClassTag, C: 
ClassTag](
 
   private[this] val numPartitions = rdd.partitions.length
 
-  // By default, shuffle merge is enabled for ShuffleDependency if push based 
shuffle
+  // By default, shuffle merge is allowed for ShuffleDependency if push based 
shuffle
   // is enabled
-  private[this] var _shuffleMergeEnabled = canShuffleMergeBeEnabled()
+  private[this] var _shuffleMergeAllowed = canShuffleMergeBeEnabled()
 
-  private[spark] def setShuffleMergeEnabled(shuffleMergeEnabled: Boolean): 
Unit = {
-_shuffleMergeEnabled = shuffleMergeEnabled
+  private[spark] def setShuffleMergeAllowed(shuffleMergeAllowed: Boolean): 
Unit = {
+_shuffleMergeAllowed = shuffleMergeAllowed
   }
 
-  def shuffleMergeEnabled : Boolean = _shuffleMergeEnabled
+  def shuffleMergeEnabled : Boolean = shuffleMergeAllowed && 
mergerLocs.nonEmpty
+
+  def shuffleMergeAllowed : Boolean = _shuffleMergeAllowed
 
   /**
* Stores the location of the list of chosen external shuffle services for 
handling the
@@ -124,7 +126,7 @@ class ShuffleDependency[K: ClassTag, V: ClassTag, C: 
ClassTag](
* Stores the information about whether the shuffle merge is finalized for 
the shuffle map stage
* associated with this shuffle dependency
*/
-  private[this] var _shuffleMergedFinalized: Boolean = false
+  private[this] var _shuffleMergeFinalized: Boolean = false
 
   /**
* shuffleMergeId is used to uniquely identify merging process of shuffle
@@ -135,31 +137,34 @@ class ShuffleDependency[K: ClassTag, V: ClassTag, C: 
ClassTag](
   def shuffleMergeId: Int = _shuffleMergeId
 
   def setMergerLocs(mergerLocs: Seq[BlockManagerId]): Unit = {
+assert(shuffleMergeAllowed)
 this.mergerLocs = mergerLocs
   }
 
   def getMergerLocs: Seq[BlockManagerId] = mergerLocs
 
   private[spark] def markShuffleMergeFinalized(): Unit = {
-_shuffleMergedFinalized = true
+_shuffleMergeFinalized = true
+  }
+
+  private[spark] def isShuffleMergeFinalizedMarked: Boolean = {
+_shuffleMergeFinalized
   }
 
   /**
-   * Returns true if push-based shuffle is disabled for this stage or empty 
RDD,
-   * or if the shuffle merge for this stage is finalized, i.e. the shuffle 
merge
-   * results for all partitions are available.
+   * Returns true if push-based shuffle is disabled or if the shuffle merge for
+   *

[spark] branch master updated (3d736d9 -> 6115f58)

2022-02-08 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3d736d9  [SPARK-37412][PYTHON][ML] Inline typehints for pyspark.ml.stat
 add 6115f58  [MINOR][SQL] Remove redundant array creation in UnsafeRow

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37412][PYTHON][ML] Inline typehints for pyspark.ml.stat

2022-02-08 Thread zero323

This is an automated email from the ASF dual-hosted git repository.

zero323 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3d736d9  [SPARK-37412][PYTHON][ML] Inline typehints for pyspark.ml.stat
3d736d9 is described below

commit 3d736d978cdaf345d81833a6216d41464fa86c69
Author: zero323 
AuthorDate: Tue Feb 8 12:46:30 2022 +0100

[SPARK-37412][PYTHON][ML] Inline typehints for pyspark.ml.stat

### What changes were proposed in this pull request?

This PR migrates type `pyspark.ml.stat` annotations from stub file to 
inline type hints.

(second take, after issue resulting in reversion of #35401)

### Why are the changes needed?

Part of ongoing migration of type hints.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #35437 from zero323/SPARK-37412.

Authored-by: zero323 
Signed-off-by: zero323 
---
 python/pyspark/ml/stat.py  | 62 ++-
 python/pyspark/ml/stat.pyi | 73 --
 2 files changed, 41 insertions(+), 94 deletions(-)

diff --git a/python/pyspark/ml/stat.py b/python/pyspark/ml/stat.py
index 15bb6ca..3b3588c 100644
--- a/python/pyspark/ml/stat.py
+++ b/python/pyspark/ml/stat.py
@@ -17,12 +17,20 @@
 
 import sys
 
+from typing import Optional, Tuple, TYPE_CHECKING
+
+
 from pyspark import since, SparkContext
 from pyspark.ml.common import _java2py, _py2java
-from pyspark.ml.wrapper import JavaWrapper, _jvm
+from pyspark.ml.linalg import Matrix, Vector
+from pyspark.ml.wrapper import JavaWrapper, _jvm  # type: ignore[attr-defined]
 from pyspark.sql.column import Column, _to_seq
+from pyspark.sql.dataframe import DataFrame
 from pyspark.sql.functions import lit
 
+if TYPE_CHECKING:
+from py4j.java_gateway import JavaObject  # type: ignore[import]
+
 
 class ChiSquareTest:
 """
@@ -37,7 +45,9 @@ class ChiSquareTest:
 """
 
 @staticmethod
-def test(dataset, featuresCol, labelCol, flatten=False):
+def test(
+dataset: DataFrame, featuresCol: str, labelCol: str, flatten: bool = 
False
+) -> DataFrame:
 """
 Perform a Pearson's independence test using dataset.
 
@@ -95,6 +105,8 @@ class ChiSquareTest:
 4.0
 """
 sc = SparkContext._active_spark_context
+assert sc is not None
+
 javaTestObj = _jvm().org.apache.spark.ml.stat.ChiSquareTest
 args = [_py2java(sc, arg) for arg in (dataset, featuresCol, labelCol, 
flatten)]
 return _java2py(sc, javaTestObj.test(*args))
@@ -116,7 +128,7 @@ class Correlation:
 """
 
 @staticmethod
-def corr(dataset, column, method="pearson"):
+def corr(dataset: DataFrame, column: str, method: str = "pearson") -> 
DataFrame:
 """
 Compute the correlation matrix with specified method using dataset.
 
@@ -162,6 +174,8 @@ class Correlation:
  [ 0.4   ,  0.9486... , NaN,  1.]])
 """
 sc = SparkContext._active_spark_context
+assert sc is not None
+
 javaCorrObj = _jvm().org.apache.spark.ml.stat.Correlation
 args = [_py2java(sc, arg) for arg in (dataset, column, method)]
 return _java2py(sc, javaCorrObj.corr(*args))
@@ -181,7 +195,7 @@ class KolmogorovSmirnovTest:
 """
 
 @staticmethod
-def test(dataset, sampleCol, distName, *params):
+def test(dataset: DataFrame, sampleCol: str, distName: str, *params: 
float) -> DataFrame:
 """
 Conduct a one-sample, two-sided Kolmogorov-Smirnov test for 
probability distribution
 equality. Currently supports the normal distribution, taking as 
parameters the mean and
@@ -228,9 +242,11 @@ class KolmogorovSmirnovTest:
 0.175
 """
 sc = SparkContext._active_spark_context
+assert sc is not None
+
 javaTestObj = _jvm().org.apache.spark.ml.stat.KolmogorovSmirnovTest
 dataset = _py2java(sc, dataset)
-params = [float(param) for param in params]
+params = [float(param) for param in params]  # type: ignore[assignment]
 return _java2py(
 sc, javaTestObj.test(dataset, sampleCol, distName, 
_jvm().PythonUtils.toSeq(params))
 )
@@ -284,7 +300,7 @@ class Summarizer:
 
 @staticmethod
 @since("2.4.0")
-def mean(col, weightCol=None):
+def mean(col: Column, weightCol: Optional[Column] = None) -> Column:
 """
 return a column of mean summary
 """
@@ -292,7 +308,7 @@ class Summarizer:
 
 @staticmethod
 @since("3.0.0")
-def sum(col, weightCol=None):
+def sum(col: Column, weightCol: Optional[Column] = None) -> Column:
 """
 return a column of sum summary
 """
@@ -300,7 +316,7 @@ class

[spark] branch master updated: [SPARK-38136][INFRA][TESTS] Update GitHub Action test image and PyArrow dependency

2022-02-08 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6b62c30  [SPARK-38136][INFRA][TESTS] Update GitHub Action test image 
and PyArrow dependency
6b62c30 is described below

commit 6b62c30fa686dfca812b19f42e0333a0ddde2791
Author: Dongjoon Hyun 
AuthorDate: Tue Feb 8 18:53:12 2022 +0900

[SPARK-38136][INFRA][TESTS] Update GitHub Action test image and PyArrow 
dependency

### What changes were proposed in this pull request?

This PR aims to update `GitHub Action` test docker image to make the test 
environment up-to-date. For example, use `PyArrow 7.0.0` instead of `6.0.0`. In 
addition, `Python 3.8`'s `PyArrow` installation is also updated together to be 
consistent.

Please note that this aims to upgrade the test infra instead of Spark 
itself.

### Why are the changes needed?

|   SW  |  20211228  | 20220207 |
| - | --- | --- |
| OpenJDK  | 1.8.0_292 | 1.8.0_312 |
| numpy | 1.21.4 | 1.22.2 |
| pandas | 1.3.4 | 1.3.5 |
| pyarrow | 6.0.0 | 7.0.0 |
| scipy | 1.7.2 | 1.8.0 |

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- Pass the GitHub Action with new image.
- Check the package list in the GitHub Action log. Or check the docker 
image directly.

Closes #35434 from dongjoon-hyun/SPARK-38136.

Authored-by: Dongjoon Hyun 
Signed-off-by: Hyukjin Kwon 
---
 .github/workflows/build_and_test.yml | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 4529cd9..ae35f50 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -252,7 +252,7 @@ jobs:
 - name: Install Python packages (Python 3.8)
   if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 
'sql-'))
   run: |
-python3.8 -m pip install 'numpy>=1.20.0' 'pyarrow<7.0.0' pandas scipy 
xmlrunner
+python3.8 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy xmlrunner
 python3.8 -m pip list
 # Run the tests.
 - name: Run tests
@@ -287,7 +287,7 @@ jobs:
 name: "Build modules (${{ format('{0}, {1} job', 
needs.configure-jobs.outputs.branch, needs.configure-jobs.outputs.type) }}): 
${{ matrix.modules }}"
 runs-on: ubuntu-20.04
 container:
-  image: dongjoon/apache-spark-github-action-image:20211228
+  image: dongjoon/apache-spark-github-action-image:20220207
 strategy:
   fail-fast: false
   matrix:
@@ -391,7 +391,7 @@ jobs:
 name: "Build modules: sparkr"
 runs-on: ubuntu-20.04
 container:
-  image: dongjoon/apache-spark-github-action-image:20211228
+  image: dongjoon/apache-spark-github-action-image:20220207
 env:
   HADOOP_PROFILE: ${{ needs.configure-jobs.outputs.hadoop }}
   HIVE_PROFILE: hive2.3
@@ -462,7 +462,7 @@ jobs:
   PYSPARK_DRIVER_PYTHON: python3.9
   PYSPARK_PYTHON: python3.9
 container:
-  image: dongjoon/apache-spark-github-action-image:20211228
+  image: dongjoon/apache-spark-github-action-image:20220207
 steps:
 - name: Checkout Spark repository
   uses: actions/checkout@v2
@@ -530,7 +530,7 @@ jobs:
 # Jinja2 3.0.0+ causes error when building with Sphinx.
 #   See also https://issues.apache.org/jira/browse/SPARK-35375.
 python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 
ipython nbsphinx numpydoc 'jinja2<3.0.0'
-python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 
'pyarrow<7.0.0' pandas 'plotly>=4.8'
+python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 
pyarrow pandas 'plotly>=4.8'
 apt-get update -y
 apt-get install -y ruby ruby-dev
 Rscript -e "install.packages(c('devtools', 'testthat', 'knitr', 
'rmarkdown', 'roxygen2'), repos='https://cloud.r-project.org/')"

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (2e703ae -> 08c851d)

2022-02-08 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2e703ae  [SPARK-38030][SQL] Canonicalization should not remove 
nullability of AttributeReference dataType
 add 08c851d  [SPARK-37943][SQL] Use error classes in the compilation 
errors of grouping

No new revisions were added by this update.

Summary of changes:
 core/src/main/resources/error/error-classes.json   |  3 ++
 .../spark/sql/errors/QueryCompilationErrors.scala  |  4 ++-
 .../sql/errors/QueryCompilationErrorsSuite.scala   | 35 ++
 3 files changed, 41 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-38153][BUILD] Remove option newlines.topLevelStatements in scalafmt.conf

[spark] branch master updated (fdda0eb -> 8a559b3)

[spark] branch master updated: [SPARK-38147][BUILD][MLLIB] Upgrade `shapeless` to 2.3.7

[spark] branch master updated (0af4fc8 -> de5e45a)

[spark] branch branch-3.1 updated (1aa9ef0 -> 66e73c4)

[spark] branch branch-3.2 updated (ba3c8c5 -> d62735d)

[spark] branch master updated: [SPARK-37414][PYTHON][ML] Inline type hints for pyspark.ml.tuning

[spark] branch master updated: [SPARK-37404][PYTHON][ML] Inline type hints for pyspark.ml.evaluation.py

[spark] branch master updated (cc53a0e -> 8d2e08f)

[spark] branch branch-3.1 updated (0b35c24 -> 1aa9ef0)

[spark] branch branch-3.2 updated (f58daa7 -> ba3c8c5)

[spark] branch master updated (43cce92 -> cc53a0e)

[spark] branch master updated (5b02a34 -> 43cce92)

[spark] branch branch-3.2 updated (adba516 -> f58daa7)

[spark] branch master updated (899d3bb -> 5b02a34)

[spark] branch master updated (c69f08f8 -> 899d3bb)

[spark] branch branch-3.2 updated: [SPARK-37934][BUILD][3.2] Upgrade Jetty version to 9.4.44

[spark] branch master updated: [SPARK-34826][SHUFFLE] Adaptively fetch shuffle mergers for push based shuffle

[spark] branch master updated (3d736d9 -> 6115f58)

[spark] branch master updated: [SPARK-37412][PYTHON][ML] Inline typehints for pyspark.ml.stat

[spark] branch master updated: [SPARK-38136][INFRA][TESTS] Update GitHub Action test image and PyArrow dependency

[spark] branch master updated (2e703ae -> 08c851d)

22 matches

Site Navigation

Mail list logo

Footer information