date:20230530

[spark] branch master updated: [SPARK-43859][SQL] Override toString in LateralColumnAliasReference

2023-05-30 Thread yumwang

This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6fa910b2ab6 [SPARK-43859][SQL] Override toString in 
LateralColumnAliasReference
6fa910b2ab6 is described below

commit 6fa910b2ab6ac619cda76608a109d69e71a8c09d
Author: Yuming Wang 
AuthorDate: Wed May 31 13:11:59 2023 +0800

[SPARK-43859][SQL] Override toString in LateralColumnAliasReference

### What changes were proposed in this pull request?

This PR makes it override `toString` in `LateralColumnAliasReference`.

### Why are the changes needed?

Improve the readability of logical plans.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual test:
```sql
select id + 1 as a1, a1 + 2 as a3 from range(10);
```
Before this PR:
```
Project [(id#2L + 1) AS a1#0, (lateralAliasReference('a1, a1, 'a1) + 2) AS 
a3#1]
+- Range (0, 10, step=1, splits=None)
```

After this  PR:
```
Project [(id#2L + 1) AS a1#0, (lateralAliasReference(a1) + 2) AS a3#1]
+- Range (0, 10, step=1, splits=None)
```

Closes #41359 from wangyum/SPARK-43859.

Authored-by: Yuming Wang 
Signed-off-by: Yuming Wang 
---
 .../org/apache/spark/sql/catalyst/expressions/namedExpressions.scala | 1 +
 1 file changed, 1 insertion(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
index 4f4fbc07b16..c0ab13c4e9e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
@@ -459,6 +459,7 @@ case class LateralColumnAliasReference(ne: NamedExpression, 
nameParts: Seq[Strin
   override def dataType: DataType = ne.dataType
   override def prettyName: String = "lateralAliasReference"
   override def sql: String = s"$prettyName($name)"
+  override def toString: String = sql
 
   final override val nodePatterns: Seq[TreePattern] = 
Seq(LATERAL_COLUMN_ALIAS_REFERENCE)
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43868][SQL][TESTS] Remove `originalUDFs` from `TestHive` to ensure `ObjectHashAggregateExecBenchmark` can run successfully on Github Action

2023-05-30 Thread yumwang

This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3472619a261 [SPARK-43868][SQL][TESTS] Remove `originalUDFs` from 
`TestHive` to ensure `ObjectHashAggregateExecBenchmark` can run successfully on 
Github Action
3472619a261 is described below

commit 3472619a26106b211685798034ad4622e7053cdf
Author: yangjie01 
AuthorDate: Wed May 31 13:09:04 2023 +0800

[SPARK-43868][SQL][TESTS] Remove `originalUDFs` from `TestHive` to ensure 
`ObjectHashAggregateExecBenchmark` can run successfully on Github Action

### What changes were proposed in this pull request?
This pr remove `originalUDFs` from `TestHive` to ensure 
`ObjectHashAggregateExecBenchmark` can run successfully on Github Action.

### Why are the changes needed?
After SPARK-43225, `org.codehaus.jackson:jackson-mapper-asl` becomes a test 
scope dependency, so when using GA to run benchmark, it is not in the classpath 
because GA uses


https://github.com/apache/spark/blob/d61c77cac17029ee27319e6b766b48d314a4dd31/.github/workflows/benchmark.yml#L179-L183

iunstead of the sbt `Test/runMain`.

`ObjectHashAggregateExecBenchmark` used `TestHive`, and `TestHive` will 
always call `org.apache.hadoop.hive.ql.exec.FunctionRegistry#getFunctionNames` 
to init `originalUDFs` before this pr, so when we run 
`ObjectHashAggregateExecBenchmark` on GitHub Actions, there will be the 
following exceptions:

```
Error: Exception in thread "main" java.lang.NoClassDefFoundError: 
org/codehaus/jackson/map/type/TypeFactory
at org.apache.hadoop.hive.ql.udf.UDFJson.(UDFJson.java:64)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClassInternal(GenericUDFBridge.java:142)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:132)
at 
org.apache.hadoop.hive.ql.exec.FunctionInfo.getFunctionClass(FunctionInfo.java:151)
at 
org.apache.hadoop.hive.ql.exec.Registry.addFunction(Registry.java:519)
at 
org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:163)
at 
org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:154)
at 
org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:147)
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.(FunctionRegistry.java:322)
at 
org.apache.spark.sql.hive.test.TestHiveSparkSession.(TestHive.scala:530)
at 
org.apache.spark.sql.hive.test.TestHiveSparkSession.(TestHive.scala:185)
at 
org.apache.spark.sql.hive.test.TestHiveContext.(TestHive.scala:133)
at 
org.apache.spark.sql.hive.test.TestHive$.(TestHive.scala:54)
at 
org.apache.spark.sql.hive.test.TestHive$.(TestHive.scala:53)
at 
org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark$.getSparkSession(ObjectHashAggregateExecBenchmark.scala:47)
at 
org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark.$init$(SqlBasedBenchmark.scala:35)
at 
org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark$.(ObjectHashAggregateExecBenchmark.scala:45)
at 
org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark.main(ObjectHashAggregateExecBenchmark.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.benchmark.Benchmarks$.$anonfun$main$7(Benchmarks.scala:128)
at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1328)
at org.apache.spark.benchmark.Benchmarks$.main(Benchmarks.scala:91)
at org.apache.spark.benchmark.Benchmarks.main(Benchmarks.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1025)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
at

[spark] branch master updated: [SPARK-43890][CONNECT][BUILD] Upgrade buf to v1.20.0

2023-05-30 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d61c77cac17 [SPARK-43890][CONNECT][BUILD] Upgrade buf to v1.20.0
d61c77cac17 is described below

commit d61c77cac17029ee27319e6b766b48d314a4dd31
Author: panbingkun 
AuthorDate: Wed May 31 11:39:14 2023 +0800

[SPARK-43890][CONNECT][BUILD] Upgrade buf to v1.20.0

### What changes were proposed in this pull request?
The pr aims to upgrade buf from 1.19.0 to 1.20.0

### Why are the changes needed?
1.Release Notes: https://github.com/bufbuild/buf/releases, improvment as 
follow:
- Add --emit-defaults flag to buf curl to emit default values in 
JSON-encoded responses.
- Indent JSON-encoded responses from buf curl by default.
- Log a warning in case an import statement does not point to a file in the 
module, a file in a direct dependency, or a well-known type file.

2.https://github.com/bufbuild/buf/compare/v1.19.0...v1.20.0

3.Manually test: dev/connect-gen-protos.sh, this upgrade will not change 
the generated files.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test
- Pass GA

Closes #41394 from panbingkun/SPARK-43890.

Authored-by: panbingkun 
Signed-off-by: Ruifeng Zheng 
---
 .github/workflows/build_and_test.yml| 2 +-
 python/docs/source/development/contributing.rst | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index cda5940f2bc..8aa0f42916e 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -616,7 +616,7 @@ jobs:
 - name: Install dependencies for Python code generation check
   run: |
 # See more in "Installation" 
https://docs.buf.build/installation#tarball
-curl -LO 
https://github.com/bufbuild/buf/releases/download/v1.19.0/buf-Linux-x86_64.tar.gz
+curl -LO 
https://github.com/bufbuild/buf/releases/download/v1.20.0/buf-Linux-x86_64.tar.gz
 mkdir -p $HOME/buf
 tar -xvzf buf-Linux-x86_64.tar.gz -C $HOME/buf --strip-components 1
 python3.9 -m pip install 'protobuf==3.19.5' 'mypy-protobuf==3.3.0'
diff --git a/python/docs/source/development/contributing.rst 
b/python/docs/source/development/contributing.rst
index 885ab8d8cba..32ae440711b 100644
--- a/python/docs/source/development/contributing.rst
+++ b/python/docs/source/development/contributing.rst
@@ -120,7 +120,7 @@ Prerequisite
 
 PySpark development requires to build Spark that needs a proper JDK installed, 
etc. See `Building Spark 
`_ for more details.
 
-Note that if you intend to contribute to Spark Connect in Python, ``buf`` 
version ``1.19.0`` is required, see `Buf Installation 
`_ for more details.
+Note that if you intend to contribute to Spark Connect in Python, ``buf`` 
version ``1.20.0`` is required, see `Buf Installation 
`_ for more details.
 
 Conda
 ~


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4bee0bb91cc -> 8a1100806d6)

2023-05-30 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 4bee0bb91cc [SPARK-43760][SQL] Nullability of scalar subquery results
 add 8a1100806d6 [SPARK-43830][BUILD][FOLLOWUP] Update scalatest and 
scalatestplus related dependencies to newest version

No new revisions were added by this update.

Summary of changes:
 pom.xml | 12 
 1 file changed, 12 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8c7a6fc81cc -> 4bee0bb91cc)

2023-05-30 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 8c7a6fc81cc [SPARK-43886][PYTHON] Accept generics tuple as typing 
hints of Pandas UDF
 add 4bee0bb91cc [SPARK-43760][SQL] Nullability of scalar subquery results

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/subquery.scala  |  4 +++-
 .../scalar-subquery-predicate.sql.out| 17 +
 .../scalar-subquery/scalar-subquery-select.sql.out   | 20 
 .../scalar-subquery/scalar-subquery-predicate.sql|  7 +++
 .../scalar-subquery/scalar-subquery-select.sql   |  8 
 .../scalar-subquery-predicate.sql.out| 12 
 .../scalar-subquery/scalar-subquery-select.sql.out   | 13 +
 7 files changed, 80 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43886][PYTHON] Accept generics tuple as typing hints of Pandas UDF

2023-05-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8c7a6fc81cc [SPARK-43886][PYTHON] Accept generics tuple as typing 
hints of Pandas UDF
8c7a6fc81cc is described below

commit 8c7a6fc81cceaba3d9c428baec336639b0d91205
Author: Xinrong Meng 
AuthorDate: Wed May 31 09:19:21 2023 +0900

[SPARK-43886][PYTHON] Accept generics tuple as typing hints of Pandas UDF

### What changes were proposed in this pull request?
Accept generics tuple as typing hints in Pandas UDF.

### Why are the changes needed?
Adapt to [PEP 585](https://peps.python.org/pep-0585/) with Python 3.9.

### Does this PR introduce _any_ user-facing change?
Yes. `tuple` is accepted as typing hints of Pandas UDF.

FROM
```py
>>> pandas_udf("long")
... def multiply(iterator: Iterator[tuple[pd.Series, pd.DataFrame]]) -> 
Iterator[pd.Series]:
...   for s1, df in iterator:
... yield s1 * df.v
...
Traceback (most recent call last):
...
raise PySparkNotImplementedError(
pyspark.errors.exceptions.base.PySparkNotImplementedError: 
[UNSUPPORTED_SIGNATURE] Unsupported signature: (iterator: 
Iterator[tuple[pandas.core.series.Series, pandas.core.frame.DataFrame]]) -> 
Iterator[pandas.core.series.Series].
```

TO
```py
>>> pandas_udf("long")
... def multiply(iterator: Iterator[tuple[pd.Series, pd.DataFrame]]) -> 
Iterator[pd.Series]:
...   for s1, df in iterator:
... yield s1 * df.v
...
>>> multiply._unwrapped.evalType
204  # SQL_SCALAR_PANDAS_ITER_UDF
```

### How was this patch tested?
Unit tests.

Closes #41388 from xinrong-meng/tuple.

Authored-by: Xinrong Meng 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/pandas/typehints.py |  2 +-
 .../sql/tests/pandas/test_pandas_udf_typehints.py  | 24 ++
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/sql/pandas/typehints.py 
b/python/pyspark/sql/pandas/typehints.py
index 29ac81af944..f0c13e66a63 100644
--- a/python/pyspark/sql/pandas/typehints.py
+++ b/python/pyspark/sql/pandas/typehints.py
@@ -145,7 +145,7 @@ def check_tuple_annotation(
 # Tuple has _name but other types have __name__
 # Check if the name is Tuple first. After that, check the generic types.
 name = getattr(annotation, "_name", getattr(annotation, "__name__", None))
-return name == "Tuple" and (
+return name in ("Tuple", "tuple") and (
 parameter_check_func is None or all(map(parameter_check_func, 
annotation.__args__))
 )
 
diff --git a/python/pyspark/sql/tests/pandas/test_pandas_udf_typehints.py 
b/python/pyspark/sql/tests/pandas/test_pandas_udf_typehints.py
index 3cdf83e2d06..bfb874ffe53 100644
--- a/python/pyspark/sql/tests/pandas/test_pandas_udf_typehints.py
+++ b/python/pyspark/sql/tests/pandas/test_pandas_udf_typehints.py
@@ -14,6 +14,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
+import sys
 import unittest
 from inspect import signature
 from typing import Union, Iterator, Tuple, cast, get_type_hints
@@ -113,6 +114,29 @@ class PandasUDFTypeHintsTests(ReusedSQLTestCase):
 infer_eval_type(signature(func), get_type_hints(func)), 
PandasUDFType.SCALAR_ITER
 )
 
+@unittest.skipIf(sys.version_info < (3, 9), "Type hinting generics require 
Python 3.9.")
+def test_type_annotation_tuple_generics(self):
+def func(iter: Iterator[tuple[pd.DataFrame, pd.Series]]) -> 
Iterator[pd.DataFrame]:
+pass
+
+self.assertEqual(
+infer_eval_type(signature(func), get_type_hints(func)), 
PandasUDFType.SCALAR_ITER
+)
+
+def func(iter: Iterator[tuple[pd.DataFrame, ...]]) -> 
Iterator[pd.Series]:
+pass
+
+self.assertEqual(
+infer_eval_type(signature(func), get_type_hints(func)), 
PandasUDFType.SCALAR_ITER
+)
+
+def func(iter: Iterator[tuple[Union[pd.DataFrame, pd.Series], ...]]) 
-> Iterator[pd.Series]:
+pass
+
+self.assertEqual(
+infer_eval_type(signature(func), get_type_hints(func)), 
PandasUDFType.SCALAR_ITER
+)
+
 def test_type_annotation_group_agg(self):
 def func(col: pd.Series) -> str:
 pass


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-41971][CONNECT][PYTHON][FOLLOWUP] Fix to_pandas to support the older Spark

2023-05-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 04125eb80e5 [SPARK-41971][CONNECT][PYTHON][FOLLOWUP] Fix to_pandas to 
support the older Spark
04125eb80e5 is described below

commit 04125eb80e5c5602bfaa9a5512706e31e49ca4c4
Author: Takuya UESHIN 
AuthorDate: Wed May 31 09:18:19 2023 +0900

[SPARK-41971][CONNECT][PYTHON][FOLLOWUP] Fix to_pandas to support the older 
Spark

### What changes were proposed in this pull request?

This is a follow-up of #40988.

Fix `to_pandas` to support the older Spark.

For the server:

```py
% ./sbin/start-connect-server.sh --packages 
org.apache.spark:spark-connect_2.12:3.4.0
```

with the client with the change here:

```py
>>> spark.sql("values (1, struct('x' as x)) as t(a, b)").toPandas()
   a   b
0  1  {'x': 'x'}
```

### Why are the changes needed?

The config `spark.sql.execution.pandas.structHandlingMode` introduced in 
#40988 does not exist in the older Spark, `<3.5`

```py
>>> spark.sql("values (1, struct('x' as x)) as t(a, b)").toPandas()
Traceback (most recent call last):
...
pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(java.util.NoSuchElementException) spark.sql.execution.pandas.structHandlingMode
```

### Does this PR introduce _any_ user-facing change?

The newer Spark Connect client will work with `Spark<3.5`.

### How was this patch tested?

Manually.

Closes #41390 from ueshin/issues/SPARK-41971/config_with_default.

Authored-by: Takuya UESHIN 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/connect/client/core.py | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/python/pyspark/sql/connect/client/core.py 
b/python/pyspark/sql/connect/client/core.py
index a0f790b2992..8da649e7765 100644
--- a/python/pyspark/sql/connect/client/core.py
+++ b/python/pyspark/sql/connect/client/core.py
@@ -726,11 +726,14 @@ class SparkConnectClient(object):
 
 if len(pdf.columns) > 0:
 timezone: Optional[str] = None
+if any(_has_type(f.dataType, TimestampType) for f in 
schema.fields):
+(timezone,) = self.get_configs("spark.sql.session.timeZone")
+
 struct_in_pandas: Optional[str] = None
 error_on_duplicated_field_names: bool = False
-if any(_has_type(f.dataType, (StructType, TimestampType)) for f in 
schema.fields):
-timezone, struct_in_pandas = self.get_configs(
-"spark.sql.session.timeZone", 
"spark.sql.execution.pandas.structHandlingMode"
+if any(_has_type(f.dataType, StructType) for f in schema.fields):
+(struct_in_pandas,) = self.get_config_with_defaults(
+("spark.sql.execution.pandas.structHandlingMode", 
"legacy"),
 )
 
 if struct_in_pandas == "legacy":
@@ -1108,6 +,17 @@ class SparkConnectClient(object):
 configs = dict(self.config(op).pairs)
 return tuple(configs.get(key) for key in keys)
 
+def get_config_with_defaults(
+self, *pairs: Tuple[str, Optional[str]]
+) -> Tuple[Optional[str], ...]:
+op = pb2.ConfigRequest.Operation(
+get_with_default=pb2.ConfigRequest.GetWithDefault(
+pairs=[pb2.KeyValue(key=key, value=default) for key, default 
in pairs]
+)
+)
+configs = dict(self.config(op).pairs)
+return tuple(configs.get(key) for key, _ in pairs)
+
 def config(self, operation: pb2.ConfigRequest.Operation) -> ConfigResult:
 """
 Call the config RPC of Spark Connect.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-connect-go] branch main deleted (was bb8e2a7)

2023-05-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/spark-connect-go.git


 was bb8e2a7  [MINOR] Update README and CONTRIBUTING (#2)

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7d87fecda70 -> 11390c50972)

2023-05-30 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 7d87fecda70 [SPARK-43878][BUILD] Upgrade `cyclonedx-maven-plugin` from 
2.7.6 to 2.7.9
 add 11390c50972 [SPARK-43815][SQL] Add `to_varchar` alias for `to_char`

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/FunctionRegistry.scala   |  1 +
 .../sql-functions/sql-expression-schema.md |  1 +
 .../sql-tests/analyzer-results/charvarchar.sql.out | 21 +++
 .../resources/sql-tests/inputs/charvarchar.sql |  5 +
 .../sql-tests/results/charvarchar.sql.out  | 24 ++
 5 files changed, 52 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43878][BUILD] Upgrade `cyclonedx-maven-plugin` from 2.7.6 to 2.7.9

2023-05-30 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7d87fecda70 [SPARK-43878][BUILD] Upgrade `cyclonedx-maven-plugin` from 
2.7.6 to 2.7.9
7d87fecda70 is described below

commit 7d87fecda7077714601cdf7328a11d2719a99bb1
Author: panbingkun 
AuthorDate: Tue May 30 08:54:32 2023 -0700

[SPARK-43878][BUILD] Upgrade `cyclonedx-maven-plugin` from 2.7.6 to 2.7.9

### What changes were proposed in this pull request?
The pr aims to upgrade `cyclonedx-maven-plugin` from 2.7.6 to 2.7.9.

### Why are the changes needed?
The release notes as follows:
- 
https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.9
- 
https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.8
- 
https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.7

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #41378 from panbingkun/SPARK-43878.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pom.xml b/pom.xml
index aa4376856d0..fbf3b2e5f53 100644
--- a/pom.xml
+++ b/pom.xml
@@ -3438,7 +3438,7 @@
   
 org.cyclonedx
 cyclonedx-maven-plugin
-2.7.6
+2.7.9
 
   
 package


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (52a5d8dfe0d -> 40065595781)

2023-05-30 Thread yangjie01

This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 52a5d8dfe0d [SPARK-43680][PS][FOLLOWUP] Fix wrong usage for `is_remote`
 add 40065595781 [SPARK-43880][BUILD] Organize `hadoop-cloud` in standard 
maven project structure

No new revisions were added by this update.

Summary of changes:
 hadoop-cloud/pom.xml   | 230 ++---
 .../src/hadoop-3/test/resources/log4j2.properties  |  40 
 ...AbortableStreamBasedCheckpointFileManager.scala |   0
 .../io/cloud/BindingParquetOutputCommitter.scala   |   0
 .../io/cloud/PathOutputCommitProtocol.scala|   0
 .../io/cloud/abortable/AbortableFileSystem.java|   0
 .../abortable/AbstractAbortableFileSystem.java |   0
 ...ableStreamBasedCheckpointFileManagerSuite.scala |   0
 .../internal/io/cloud/CommitterBindingSuite.scala  |   0
 .../io/cloud/StubPathOutputCommitter.scala |   0
 10 files changed, 62 insertions(+), 208 deletions(-)
 delete mode 100644 hadoop-cloud/src/hadoop-3/test/resources/log4j2.properties
 rename hadoop-cloud/src/{hadoop-3 => 
}/main/scala/org/apache/spark/internal/io/cloud/AbortableStreamBasedCheckpointFileManager.scala
 (100%)
 rename hadoop-cloud/src/{hadoop-3 => 
}/main/scala/org/apache/spark/internal/io/cloud/BindingParquetOutputCommitter.scala
 (100%)
 rename hadoop-cloud/src/{hadoop-3 => 
}/main/scala/org/apache/spark/internal/io/cloud/PathOutputCommitProtocol.scala 
(100%)
 rename hadoop-cloud/src/{hadoop-3 => 
}/test/java/org/apache/spark/internal/io/cloud/abortable/AbortableFileSystem.java
 (100%)
 rename hadoop-cloud/src/{hadoop-3 => 
}/test/java/org/apache/spark/internal/io/cloud/abortable/AbstractAbortableFileSystem.java
 (100%)
 rename hadoop-cloud/src/{hadoop-3 => 
}/test/scala/org/apache/spark/internal/io/cloud/AbortableStreamBasedCheckpointFileManagerSuite.scala
 (100%)
 rename hadoop-cloud/src/{hadoop-3 => 
}/test/scala/org/apache/spark/internal/io/cloud/CommitterBindingSuite.scala 
(100%)
 rename hadoop-cloud/src/{hadoop-3 => 
}/test/scala/org/apache/spark/internal/io/cloud/StubPathOutputCommitter.scala 
(100%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43680][PS][FOLLOWUP] Fix wrong usage for `is_remote`

2023-05-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 52a5d8dfe0d [SPARK-43680][PS][FOLLOWUP] Fix wrong usage for `is_remote`
52a5d8dfe0d is described below

commit 52a5d8dfe0d0a9c85a8ac86be9e626d638510736
Author: itholic 
AuthorDate: Tue May 30 20:21:48 2023 +0900

[SPARK-43680][PS][FOLLOWUP] Fix wrong usage for `is_remote`

### What changes were proposed in this pull request?

This PR follow up for https://github.com/apache/spark/pull/41361 to fix 
misusage for `is_remote` on `if` clause.

### Why are the changes needed?

To fix the wrong function call

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually tested.

Closes #41376 from itholic/nullop_followup.

Authored-by: itholic 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/pandas/data_type_ops/null_ops.py | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/python/pyspark/pandas/data_type_ops/null_ops.py 
b/python/pyspark/pandas/data_type_ops/null_ops.py
index ddd7bddcfbd..ab86f074b99 100644
--- a/python/pyspark/pandas/data_type_ops/null_ops.py
+++ b/python/pyspark/pandas/data_type_ops/null_ops.py
@@ -46,32 +46,32 @@ class NullOps(DataTypeOps):
 def lt(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
 _sanitize_list_like(right)
 result = pyspark_column_op("__lt__")(left, right)
-if is_remote:
-# In Spark Connect, it returns None instead of False, so we 
manually cast it.
+if is_remote():
+# TODO(SPARK-43877): Fix behavior difference for compare binary 
functions.
 result = result.fillna(False)
 return result
 
 def le(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
 _sanitize_list_like(right)
 result = pyspark_column_op("__le__")(left, right)
-if is_remote:
-# In Spark Connect, it returns None instead of False, so we 
manually cast it.
+if is_remote():
+# TODO(SPARK-43877): Fix behavior difference for compare binary 
functions.
 result = result.fillna(False)
 return result
 
 def ge(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
 _sanitize_list_like(right)
 result = pyspark_column_op("__ge__")(left, right)
-if is_remote:
-# In Spark Connect, it returns None instead of False, so we 
manually cast it.
+if is_remote():
+# TODO(SPARK-43877): Fix behavior difference for compare binary 
functions.
 result = result.fillna(False)
 return result
 
 def gt(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
 _sanitize_list_like(right)
 result = pyspark_column_op("__gt__")(left, right)
-if is_remote:
-# In Spark Connect, it returns None instead of False, so we 
manually cast it.
+if is_remote():
+# TODO(SPARK-43877): Fix behavior difference for compare binary 
functions.
 result = result.fillna(False)
 return result
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-connect-go] branch master created (now bb8e2a7)

2023-05-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark-connect-go.git


  at bb8e2a7  [MINOR] Update README and CONTRIBUTING (#2)

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-connect-go] branch master deleted (was e7ecc83)

2023-05-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark-connect-go.git


 was e7ecc83  [MINOR] Add a merge script

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-connect-go] branch main updated: [MINOR] Update README and CONTRIBUTING (#2)

2023-05-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-connect-go.git


The following commit(s) were added to refs/heads/main by this push:
 new bb8e2a7  [MINOR] Update README and CONTRIBUTING (#2)
bb8e2a7 is described below

commit bb8e2a784f18a57e3d913e5e8ece92b66cc7a8b0
Author: Martin Grund 
AuthorDate: Tue May 30 13:18:53 2023 +0200

[MINOR] Update README and CONTRIBUTING (#2)

This patch adds some additional background to the README file and adds some 
guidelines to contributing similar to the Spark project.
---
 CONTRIBUTING.md | 16 
 README.md   | 17 -
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 000..4e5a578
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,16 @@
+## Contributing to Spark
+
+*Before opening a pull request*, review the
+[Contributing to Spark guide](https://spark.apache.org/contributing.html).
+It lists steps that are required before creating a PR. In particular, consider:
+
+- Is the change important and ready enough to ask the community to spend time 
reviewing?
+- Have you searched for existing, related JIRAs and pull requests?
+- Is this a new feature that can stand alone as a [third party 
project](https://spark.apache.org/third-party-projects.html) ?
+- Is the change being proposed clearly explained and motivated?
+
+When you contribute code, you affirm that the contribution is your original 
work and that you
+license the work to the project under the project's open source license. 
Whether or not you
+state this explicitly, by submitting any copyrighted material via pull 
request, email, or
+other means you agree to license the material under the project's open source 
license and
+warrant that you have the legal authority to do so.
\ No newline at end of file
diff --git a/README.md b/README.md
index 79b03d6..beaeaeb 100644
--- a/README.md
+++ b/README.md
@@ -1,3 +1,18 @@
 # Apache Spark Connect Client for Golang
 
-Experimental
+This project houses the **experimental** client for [Spark
+Connect](https://spark.apache.org/docs/latest/spark-connect-overview.html) for
+[Apache Spark](https://spark.apache.org/) written in [Golang](https://go.dev/).
+
+
+## Current State of the Project
+
+Currently, the Spark Connect client for Golang is highly experimental and 
should
+not be used in any production setting. In addition, the PMC of the Apache Spark
+project reserves the right to withdraw and abandon the development of this 
project
+if it is not sustainable.
+
+## Contributing
+
+Please review the [Contribution to Spark 
guide](https://spark.apache.org/contributing.html)
+for information on how to get started contributing to the project.
\ No newline at end of file


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43862][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_(1254 & 1315)

2023-05-30 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0c6ea478d6b [SPARK-43862][SQL] Assign a name to the error class 
_LEGACY_ERROR_TEMP_(1254 & 1315)
0c6ea478d6b is described below

commit 0c6ea478d6b448caab5c969be122159acef2bbeb
Author: panbingkun 
AuthorDate: Tue May 30 14:18:26 2023 +0300

[SPARK-43862][SQL] Assign a name to the error class 
_LEGACY_ERROR_TEMP_(1254 & 1315)

### What changes were proposed in this pull request?
The pr aims to
1. Assign a name to the error class, include:
  - _LEGACY_ERROR_TEMP_1254 -> UNSUPPORTED_OVERWRITE.PATH
  - _LEGACY_ERROR_TEMP_1315 -> UNSUPPORTED_OVERWRITE.TABLE

2. Convert _LEGACY_ERROR_TEMP_0002 to INTERNAL_ERROR.

### Why are the changes needed?
- The changes improve the error framework.
- Because the subclass `SparkSqlAstBuilder` of `AstBuilder` has already 
override methods `visitInsertOverwriteDir` and `visitInsertOverwriteHiveDir`. 
In reality, `SparkSqlParser` is used in the Spark base code , and 
`SparkSqlAstBuilder` is used, The two exceptions mentioned above in AstBuilder 
will not be thrown through the user's perspective.

https://github.com/apache/spark/blob/88f69d6f92860823b1a90bc162ebca2b7c8132fc/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala#L46-L47

- visitInsertOverwriteDir

https://github.com/apache/spark/blob/88f69d6f92860823b1a90bc162ebca2b7c8132fc/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala#L802-L834

- visitInsertOverwriteHiveDir

https://github.com/apache/spark/blob/88f69d6f92860823b1a90bc162ebca2b7c8132fc/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala#L848-L866

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manual testing:
$ build/sbt "test:testOnly *DDLParserSuite"
$ build/sbt "test:testOnly *InsertSuite"
$ build/sbt "test:testOnly *MetastoreDataSourcesSuite"
$ build/sbt "test:testOnly *HiveDDLSuite"

- Pass GA.

Closes #41367 from panbingkun/LEGACY_ERROR_TEMP_1254.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   | 32 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala |  4 +-
 .../spark/sql/errors/QueryCompilationErrors.scala  | 18 +++---
 .../spark/sql/errors/QueryParsingErrors.scala  |  5 +-
 .../spark/sql/catalyst/analysis/AnalysisTest.scala |  5 ++
 .../spark/sql/catalyst/parser/DDLParserSuite.scala | 20 +++
 .../org/apache/spark/sql/DataFrameWriter.scala |  6 +-
 .../apache/spark/sql/execution/command/ddl.scala   | 12 +++-
 .../execution/datasources/DataSourceStrategy.scala |  2 +-
 .../org/apache/spark/sql/sources/InsertSuite.scala | 70 --
 .../spark/sql/hive/MetastoreDataSourcesSuite.scala | 35 ++-
 .../spark/sql/hive/execution/HiveDDLSuite.scala| 11 ++--
 12 files changed, 149 insertions(+), 71 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index 07ff6e1c7c2..8c3ba1e190d 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -2320,6 +2320,23 @@
   "grouping()/grouping_id() can only be used with 
GroupingSets/Cube/Rollup."
 ]
   },
+  "UNSUPPORTED_OVERWRITE" : {
+"message" : [
+  "Can't overwrite the target that is also being read from."
+],
+"subClass" : {
+  "PATH" : {
+"message" : [
+  "The target path is ."
+]
+  },
+  "TABLE" : {
+"message" : [
+  "The target table is ."
+]
+  }
+}
+  },
   "UNSUPPORTED_SAVE_MODE" : {
 "message" : [
   "The save mode  is not supported for:"
@@ -2477,11 +2494,6 @@
   "Invalid InsertIntoContext."
 ]
   },
-  "_LEGACY_ERROR_TEMP_0002" : {
-"message" : [
-  "INSERT OVERWRITE DIRECTORY is not supported."
-]
-  },
   "_LEGACY_ERROR_TEMP_0004" : {
 "message" : [
   "Empty source for merge: you should specify a source table/subquery in 
merge."
@@ -3669,11 +3681,6 @@
   "Cannot alter a table with ALTER VIEW. Please use ALTER TABLE instead."
 ]
   },
-  "_LEGACY_ERROR_TEMP_1254" : {
-"message" : [
-  "Cannot overwrite a path that is also being read from."
-]
-  },
   "_LEGACY_ERROR_TEMP_1255" : {
 "message" : [
   "Cannot drop built-in function ''."
@@ -3921,11 +3928,6 @@
   "'' does not support bucketBy and sortBy right now."
 ]
   },
-  "_LEGACY_ERROR_TEMP_1315" : {
-"message" : [
-  "Cannot overwrite table  that is also being read from."
-]
-  },
   "_LEGACY_ERROR_TEMP_1316" : {
 "message" : [

[spark] branch master updated: [SPARK-41593][FOLLOW-UP] Fix the case torch distributor logging server not shut down

2023-05-30 Thread weichenxu123

This is an automated email from the ASF dual-hosted git repository.

weichenxu123 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e1619653895 [SPARK-41593][FOLLOW-UP] Fix the case torch distributor 
logging server not shut down
e1619653895 is described below

commit e1619653895b4d5e11d7121bdb7906355d8c17bf
Author: Weichen Xu 
AuthorDate: Tue May 30 19:13:20 2023 +0800

[SPARK-41593][FOLLOW-UP] Fix the case torch distributor logging server not 
shut down

### What changes were proposed in this pull request?

Fix the case torch distributor logging server not shut down.

The `_get_spark_task_function` and `_check_encryption` might raise 
exception, in this case, the logging server must be shut down but it is not 
shut down. This PR fixes the case.

### Why are the changes needed?

Fix the case torch distributor logging server not shut down

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit tests.

Closes #41375 from 
WeichenXu123/improve-torch-distributor-log-server-exception-handling.

Authored-by: Weichen Xu 
Signed-off-by: Weichen Xu 
---
 python/pyspark/ml/torch/distributor.py | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/python/pyspark/ml/torch/distributor.py 
b/python/pyspark/ml/torch/distributor.py
index ad8b4d8cc25..0249e6b4b2c 100644
--- a/python/pyspark/ml/torch/distributor.py
+++ b/python/pyspark/ml/torch/distributor.py
@@ -665,20 +665,20 @@ class TorchDistributor(Distributor):
 time.sleep(1)  # wait for the server to start
 self.log_streaming_server_port = log_streaming_server.port
 
-spark_task_function = self._get_spark_task_function(
-framework_wrapper_fn, train_object, spark_dataframe, *args, 
**kwargs
-)
-self._check_encryption()
-self.logger.info(
-f"Started distributed training with {self.num_processes} executor 
processes"
-)
-if spark_dataframe is not None:
-input_df = spark_dataframe
-else:
-input_df = self.spark.range(
-start=0, end=self.num_tasks, step=1, 
numPartitions=self.num_tasks
-)
 try:
+spark_task_function = self._get_spark_task_function(
+framework_wrapper_fn, train_object, spark_dataframe, *args, 
**kwargs
+)
+self._check_encryption()
+self.logger.info(
+f"Started distributed training with {self.num_processes} 
executor processes"
+)
+if spark_dataframe is not None:
+input_df = spark_dataframe
+else:
+input_df = self.spark.range(
+start=0, end=self.num_tasks, step=1, 
numPartitions=self.num_tasks
+)
 rows = input_df.mapInArrow(
 func=spark_task_function, schema="chunk binary", barrier=True
 ).collect()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (1b50a7757cd -> 5d9c664773e)

2023-05-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 1b50a7757cd [SPARK-43696][SPARK-43697][SPARK-43698][SPARK-43699][PS] 
Fix `TimedeltaOps` for Spark Connect
 add 5d9c664773e [SPARK-43687][SPARK-43688][SPARK-43689][SPARK-43690][PS] 
Fix `NumOps` for Spark Connect

No new revisions were added by this update.

Summary of changes:
 python/pyspark/pandas/data_type_ops/num_ops.py | 42 +-
 .../connect/data_type_ops/test_parity_num_ops.py   | 16 -
 2 files changed, 17 insertions(+), 41 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-connect-go] branch master deleted (was 9cb683d)

2023-05-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark-connect-go.git


 was 9cb683d  [MINOR] Add License file (#1)

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-connect-go] branch master created (now e7ecc83)

2023-05-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark-connect-go.git


  at e7ecc83  [MINOR] Add a merge script

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-connect-go] branch main updated: [MINOR] Add a merge script

2023-05-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-connect-go.git


The following commit(s) were added to refs/heads/main by this push:
 new e7ecc83  [MINOR] Add a merge script
e7ecc83 is described below

commit e7ecc83205dda4e7cea311f83149ffca30dcdadb
Author: Hyukjin Kwon 
AuthorDate: Tue May 30 20:04:36 2023 +0900

[MINOR] Add a merge script
---
 merge_connect_go_pr.py | 602 +
 1 file changed, 602 insertions(+)

diff --git a/merge_connect_go_pr.py b/merge_connect_go_pr.py
new file mode 100755
index 000..417c0ff
--- /dev/null
+++ b/merge_connect_go_pr.py
@@ -0,0 +1,602 @@
+#!/usr/bin/env python3
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Utility for creating well-formed pull request merges and pushing them to 
Apache
+# Spark.
+#   usage: ./merge_connect_go_pr.py(see config env vars below)
+#
+# This utility assumes you already have a local Spark git folder and that you
+# have added remotes corresponding to both (i) the github apache Spark
+# mirror and (ii) the apache git repo.
+
+import json
+import os
+import re
+import subprocess
+import sys
+import traceback
+from urllib.request import urlopen
+from urllib.request import Request
+from urllib.error import HTTPError
+
+try:
+import jira.client
+
+JIRA_IMPORTED = True
+except ImportError:
+JIRA_IMPORTED = False
+
+# Location of your Spark git development area
+SPARK_CONNECT_GO_HOME = os.environ.get("SPARK_CONNECT_GO_HOME", os.getcwd())
+# Remote name which points to the Gihub site
+PR_REMOTE_NAME = os.environ.get("PR_REMOTE_NAME", "apache-github")
+# Remote name which points to Apache git
+PUSH_REMOTE_NAME = os.environ.get("PUSH_REMOTE_NAME", "apache")
+# ASF JIRA username
+JIRA_USERNAME = os.environ.get("JIRA_USERNAME", "")
+# ASF JIRA password
+JIRA_PASSWORD = os.environ.get("JIRA_PASSWORD", "")
+# OAuth key used for issuing requests against the GitHub API. If this is not 
defined, then requests
+# will be unauthenticated. You should only need to configure this if you find 
yourself regularly
+# exceeding your IP's unauthenticated request rate limit. You can create an 
OAuth key at
+# https://github.com/settings/tokens. This script only requires the 
"public_repo" scope.
+GITHUB_OAUTH_KEY = os.environ.get("GITHUB_OAUTH_KEY")
+
+
+GITHUB_BASE = "https://github.com/apache/spark-connect-go/pull;
+GITHUB_API_BASE = "https://api.github.com/repos/apache/spark-connect-go;
+JIRA_BASE = "https://issues.apache.org/jira/browse;
+JIRA_API_BASE = "https://issues.apache.org/jira;
+# Prefix added to temporary branches
+BRANCH_PREFIX = "PR_TOOL"
+
+
+def get_json(url):
+try:
+request = Request(url)
+if GITHUB_OAUTH_KEY:
+request.add_header("Authorization", "token %s" % GITHUB_OAUTH_KEY)
+return json.load(urlopen(request))
+except HTTPError as e:
+if "X-RateLimit-Remaining" in e.headers and 
e.headers["X-RateLimit-Remaining"] == "0":
+print(
+"Exceeded the GitHub API rate limit; see the instructions in "
++ "dev/merge_connect_go_pr.py to configure an OAuth token for 
making authenticated "
++ "GitHub requests."
+)
+else:
+print("Unable to fetch URL, exiting: %s" % url)
+sys.exit(-1)
+
+
+def fail(msg):
+print(msg)
+clean_up()
+sys.exit(-1)
+
+
+def run_cmd(cmd):
+print(cmd)
+if isinstance(cmd, list):
+return subprocess.check_output(cmd).decode("utf-8")
+else:
+return subprocess.check_output(cmd.split(" ")).decode("utf-8")
+
+
+def continue_maybe(prompt):
+result = input("\n%s (y/n): " % prompt)
+if result.lower() != "y":
+fail("Okay, exiting")
+
+
+def clean_up():
+if "original_head" in globals():
+print("Restoring head pointer to %s" % original_head)
+run_cmd("git checkout %s" % original_head)
+
+branches = run_cmd("git branch").replace(" ", "").split("\n")
+
+for branch in list(filter(lambda x: x.startswith(BRANCH_PREFIX), 
branches)):
+print("Deleting local branch %s" %

[spark-connect-go] branch master created (now 9cb683d)

2023-05-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark-connect-go.git


  at 9cb683d  [MINOR] Add License file (#1)

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-connect-go] branch master deleted (was 9cb683d)

2023-05-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark-connect-go.git


 was 9cb683d  [MINOR] Add License file (#1)

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-connect-go] branch master created (now 9cb683d)

2023-05-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark-connect-go.git


  at 9cb683d  [MINOR] Add License file (#1)

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-connect-go] branch master deleted (was 9cb683d)

2023-05-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark-connect-go.git


 was 9cb683d  [MINOR] Add License file (#1)

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-connect-go] branch master created (now 9cb683d)

2023-05-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark-connect-go.git


  at 9cb683d  [MINOR] Add License file (#1)

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-connect-go] branch main updated: [MINOR] Add License file (#1)

2023-05-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-connect-go.git


The following commit(s) were added to refs/heads/main by this push:
 new 9cb683d  [MINOR] Add License file (#1)
9cb683d is described below

commit 9cb683dff40e0939bf3e09144e0b72bc2b149146
Author: Martin Grund 
AuthorDate: Tue May 30 12:36:26 2023 +0200

[MINOR] Add License file (#1)

This patch adds the same APL 2.0 licence file as in the main repo minus the 
custom dependency licence remarks.
---
 LICENSE | 203 
 1 file changed, 203 insertions(+)

diff --git a/LICENSE b/LICENSE
new file mode 100644
index 000..ec336b2
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,203 @@
+   Apache License
+   Version 2.0, January 2004
+http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+  "License" shall mean the terms and conditions for use, reproduction,
+  and distribution as defined by Sections 1 through 9 of this document.
+
+  "Licensor" shall mean the copyright owner or entity authorized by
+  the copyright owner that is granting the License.
+
+  "Legal Entity" shall mean the union of the acting entity and all
+  other entities that control, are controlled by, or are under common
+  control with that entity. For the purposes of this definition,
+  "control" means (i) the power, direct or indirect, to cause the
+  direction or management of such entity, whether by contract or
+  otherwise, or (ii) ownership of fifty percent (50%) or more of the
+  outstanding shares, or (iii) beneficial ownership of such entity.
+
+  "You" (or "Your") shall mean an individual or Legal Entity
+  exercising permissions granted by this License.
+
+  "Source" form shall mean the preferred form for making modifications,
+  including but not limited to software source code, documentation
+  source, and configuration files.
+
+  "Object" form shall mean any form resulting from mechanical
+  transformation or translation of a Source form, including but
+  not limited to compiled object code, generated documentation,
+  and conversions to other media types.
+
+  "Work" shall mean the work of authorship, whether in Source or
+  Object form, made available under the License, as indicated by a
+  copyright notice that is included in or attached to the work
+  (an example is provided in the Appendix below).
+
+  "Derivative Works" shall mean any work, whether in Source or Object
+  form, that is based on (or derived from) the Work and for which the
+  editorial revisions, annotations, elaborations, or other modifications
+  represent, as a whole, an original work of authorship. For the purposes
+  of this License, Derivative Works shall not include works that remain
+  separable from, or merely link (or bind by name) to the interfaces of,
+  the Work and Derivative Works thereof.
+
+  "Contribution" shall mean any work of authorship, including
+  the original version of the Work and any modifications or additions
+  to that Work or Derivative Works thereof, that is intentionally
+  submitted to Licensor for inclusion in the Work by the copyright owner
+  or by an individual or Legal Entity authorized to submit on behalf of
+  the copyright owner. For the purposes of this definition, "submitted"
+  means any form of electronic, verbal, or written communication sent
+  to the Licensor or its representatives, including but not limited to
+  communication on electronic mailing lists, source code control systems,
+  and issue tracking systems that are managed by, or on behalf of, the
+  Licensor for the purpose of discussing and improving the Work, but
+  excluding communication that is conspicuously marked or otherwise
+  designated in writing by the copyright owner as "Not a Contribution."
+
+  "Contributor" shall mean Licensor and any individual or Legal Entity
+  on behalf of whom a Contribution has been received by Licensor and
+  subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+  this License, each Contributor hereby grants to You a perpetual,
+  worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+  copyright license to reproduce, prepare Derivative Works of,
+  publicly display, publicly perform, sublicense, and distribute the
+  Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+  this License, each Contributor hereby grants to You a perpetual,
+  worldwide,

[spark-connect-go] branch main created (now 2ceb9b6)

2023-05-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/spark-connect-go.git


  at 2ceb9b6  Initial commit

This branch includes the following new commits:

 new 2ceb9b6  Initial commit

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-connect-go] 01/01: Initial commit

2023-05-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-connect-go.git

commit 2ceb9b601c9f48fce006ac397145f0120d5ad2aa
Author: Hyukjin Kwon 
AuthorDate: Tue May 30 19:16:18 2023 +0900

Initial commit
---
 .asf.yaml | 30 ++
 README.md |  3 +++
 2 files changed, 33 insertions(+)

diff --git a/.asf.yaml b/.asf.yaml
new file mode 100644
index 000..c83552c
--- /dev/null
+++ b/.asf.yaml
@@ -0,0 +1,30 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features
+---
+github:
+  description: "Apache Spark Connect Client for Golang"
+  homepage: https://spark.apache.org/
+  enabled_merge_buttons:
+merge: false
+squash: true
+rebase: true
+
+notifications:
+  pullrequests: revi...@spark.apache.org
+  issues: revi...@spark.apache.org
+  commits: commits@spark.apache.org
+
diff --git a/README.md b/README.md
new file mode 100644
index 000..79b03d6
--- /dev/null
+++ b/README.md
@@ -0,0 +1,3 @@
+# Apache Spark Connect Client for Golang
+
+Experimental


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (844222231a6 -> 1b50a7757cd)

2023-05-30 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 84431a6 [SPARK-43030][SQL][FOLLOWUP] CTE ref should keep the 
output attributes duplicated when renew
 add 1b50a7757cd [SPARK-43696][SPARK-43697][SPARK-43698][SPARK-43699][PS] 
Fix `TimedeltaOps` for Spark Connect

No new revisions were added by this update.

Summary of changes:
 python/pyspark/pandas/data_type_ops/timedelta_ops.py| 17 +
 .../connect/data_type_ops/test_parity_timedelta_ops.py  | 16 
 2 files changed, 5 insertions(+), 28 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43859][SQL] Override toString in LateralColumnAliasReference

[spark] branch master updated: [SPARK-43868][SQL][TESTS] Remove `originalUDFs` from `TestHive` to ensure `ObjectHashAggregateExecBenchmark` can run successfully on Github Action

[spark] branch master updated: [SPARK-43890][CONNECT][BUILD] Upgrade buf to v1.20.0

[spark] branch master updated (4bee0bb91cc -> 8a1100806d6)

[spark] branch master updated (8c7a6fc81cc -> 4bee0bb91cc)

[spark] branch master updated: [SPARK-43886][PYTHON] Accept generics tuple as typing hints of Pandas UDF

[spark] branch master updated: [SPARK-41971][CONNECT][PYTHON][FOLLOWUP] Fix to_pandas to support the older Spark

[spark-connect-go] branch main deleted (was bb8e2a7)

[spark] branch master updated (7d87fecda70 -> 11390c50972)

[spark] branch master updated: [SPARK-43878][BUILD] Upgrade `cyclonedx-maven-plugin` from 2.7.6 to 2.7.9

[spark] branch master updated (52a5d8dfe0d -> 40065595781)

[spark] branch master updated: [SPARK-43680][PS][FOLLOWUP] Fix wrong usage for `is_remote`

[spark-connect-go] branch master created (now bb8e2a7)

[spark-connect-go] branch master deleted (was e7ecc83)

[spark-connect-go] branch main updated: [MINOR] Update README and CONTRIBUTING (#2)

[spark] branch master updated: [SPARK-43862][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_(1254 & 1315)

[spark] branch master updated: [SPARK-41593][FOLLOW-UP] Fix the case torch distributor logging server not shut down

[spark] branch master updated (1b50a7757cd -> 5d9c664773e)

[spark-connect-go] branch master deleted (was 9cb683d)

[spark-connect-go] branch master created (now e7ecc83)

[spark-connect-go] branch main updated: [MINOR] Add a merge script

[spark-connect-go] branch master created (now 9cb683d)

[spark-connect-go] branch master deleted (was 9cb683d)

[spark-connect-go] branch master created (now 9cb683d)

[spark-connect-go] branch master deleted (was 9cb683d)

[spark-connect-go] branch master created (now 9cb683d)

[spark-connect-go] branch main updated: [MINOR] Add License file (#1)

[spark-connect-go] branch main created (now 2ceb9b6)

[spark-connect-go] 01/01: Initial commit

[spark] branch master updated (844222231a6 -> 1b50a7757cd)

30 matches

Site Navigation

Mail list logo

Footer information