[spark] branch master updated (cfb96eb -> 4288ddc)

2021-10-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cfb96eb  [SPARK-37133][SQL] Add a config to optionally enforce ANSI 
reserved keywords
 add 4288ddc  [SPARK-37135][TEST] Fix `KryoSerializerBenchmark` and 
`DateTimeBenchmark` run faild

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/serializer/KryoSerializerBenchmark.scala| 2 ++
 .../org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala   | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-36928][SQL] Handle ANSI intervals in ColumnarRow, ColumnarBatchRow and ColumnarArray

2021-10-28 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fd8d5ad  [SPARK-36928][SQL] Handle ANSI intervals in ColumnarRow, 
ColumnarBatchRow and ColumnarArray
fd8d5ad is described below

commit fd8d5ad2140d6405357b908dce2d00a21036dedb
Author: PengLei 
AuthorDate: Thu Oct 28 14:52:41 2021 +0300

[SPARK-36928][SQL] Handle ANSI intervals in ColumnarRow, ColumnarBatchRow 
and ColumnarArray

### What changes were proposed in this pull request?
1. add handle ansi interval type for `get`, `copy` method of ColumnarArray
2. add handle ansi interval type for `get`, `copy` method of 
ColumnarBatchRow
3.  add handle ansi interval type for `get`, `copy` method of ColumnarRow

### Why are the changes needed?
[SPARK-36928](https://issues.apache.org/jira/browse/SPARK-36928)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add test case

Closes #34421 from Peng-Lei/SPARK-36928.

Authored-by: PengLei 
Signed-off-by: Max Gekk 
---
 .../apache/spark/sql/vectorized/ColumnarArray.java |  6 +-
 .../spark/sql/vectorized/ColumnarBatchRow.java |  8 +--
 .../apache/spark/sql/vectorized/ColumnarRow.java   |  8 +--
 .../execution/vectorized/ColumnVectorSuite.scala   | 69 ++
 .../execution/vectorized/ColumnarBatchSuite.scala  | 32 ++
 5 files changed, 113 insertions(+), 10 deletions(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java 
b/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java
index 147dd24..2fb6b3f 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java
@@ -57,9 +57,11 @@ public final class ColumnarArray extends ArrayData {
   return UnsafeArrayData.fromPrimitiveArray(toByteArray());
 } else if (dt instanceof ShortType) {
   return UnsafeArrayData.fromPrimitiveArray(toShortArray());
-} else if (dt instanceof IntegerType || dt instanceof DateType) {
+} else if (dt instanceof IntegerType || dt instanceof DateType
+|| dt instanceof YearMonthIntervalType) {
   return UnsafeArrayData.fromPrimitiveArray(toIntArray());
-} else if (dt instanceof LongType || dt instanceof TimestampType) {
+} else if (dt instanceof LongType || dt instanceof TimestampType
+|| dt instanceof DayTimeIntervalType) {
   return UnsafeArrayData.fromPrimitiveArray(toLongArray());
 } else if (dt instanceof FloatType) {
   return UnsafeArrayData.fromPrimitiveArray(toFloatArray());
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarBatchRow.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarBatchRow.java
index c6b7287e7..8c32d5c 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarBatchRow.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarBatchRow.java
@@ -52,9 +52,9 @@ public final class ColumnarBatchRow extends InternalRow {
   row.setByte(i, getByte(i));
 } else if (dt instanceof ShortType) {
   row.setShort(i, getShort(i));
-} else if (dt instanceof IntegerType) {
+} else if (dt instanceof IntegerType || dt instanceof 
YearMonthIntervalType) {
   row.setInt(i, getInt(i));
-} else if (dt instanceof LongType) {
+} else if (dt instanceof LongType || dt instanceof 
DayTimeIntervalType) {
   row.setLong(i, getLong(i));
 } else if (dt instanceof FloatType) {
   row.setFloat(i, getFloat(i));
@@ -151,9 +151,9 @@ public final class ColumnarBatchRow extends InternalRow {
   return getByte(ordinal);
 } else if (dataType instanceof ShortType) {
   return getShort(ordinal);
-} else if (dataType instanceof IntegerType) {
+} else if (dataType instanceof IntegerType || dataType instanceof 
YearMonthIntervalType) {
   return getInt(ordinal);
-} else if (dataType instanceof LongType) {
+} else if (dataType instanceof LongType || dataType instanceof 
DayTimeIntervalType) {
   return getLong(ordinal);
 } else if (dataType instanceof FloatType) {
   return getFloat(ordinal);
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarRow.java 
b/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarRow.java
index 4b9d3c5..da4b242 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarRow.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarRow.java
@@ -61,9 +61,9 @@ public final class ColumnarRow extends InternalRow {
   row.setByte(i, getByte(i));
 

[spark] branch master updated: [SPARK-37136][SQL] Remove code about hive buildin function but not implement in spark

2021-10-28 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7ed7afe  [SPARK-37136][SQL] Remove code about hive buildin function 
but not implement in spark
7ed7afe is described below

commit 7ed7afecd44d546391794d2ad2b2e7ea7a8accf7
Author: Angerszh 
AuthorDate: Thu Oct 28 20:55:24 2021 +0800

[SPARK-37136][SQL] Remove code about hive buildin function but not 
implement in spark

### What changes were proposed in this pull request?
Since we have implement `histogram_numeric` in spark, now we can remove 
code about check function we can pass to hive.

### Why are the changes needed?
Remove unused code

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
existed UT

Closes #34410 from AngersZh/SPARK-37136.

Authored-by: Angerszh 
Signed-off-by: Wenchen Fan 
---
 .../sql/catalyst/catalog/SessionCatalog.scala  |  6 +-
 .../apache/spark/sql/hive/HiveSessionCatalog.scala | 87 --
 2 files changed, 1 insertion(+), 92 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
index 141de75..12a5cbc 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
@@ -1507,16 +1507,12 @@ class SessionCatalog(
* Returns whether it is a temporary function. If not existed, returns false.
*/
   def isTemporaryFunction(name: FunctionIdentifier): Boolean = {
-// copied from HiveSessionCatalog
-val hiveFunctions = Seq()
-
 // A temporary function is a function that has been registered in 
functionRegistry
 // without a database name, and is neither a built-in function nor a Hive 
function
 name.database.isEmpty &&
   (functionRegistry.functionExists(name) || 
tableFunctionRegistry.functionExists(name)) &&
   !FunctionRegistry.builtin.functionExists(name) &&
-  !TableFunctionRegistry.builtin.functionExists(name) &&
-  !hiveFunctions.contains(name.funcName.toLowerCase(Locale.ROOT))
+  !TableFunctionRegistry.builtin.functionExists(name)
   }
 
   def isTempFunction(name: String): Boolean = {
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
index b11774b..1cc0314 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
@@ -17,20 +17,11 @@
 
 package org.apache.spark.sql.hive
 
-import java.util.Locale
-
-import scala.util.{Failure, Success, Try}
-import scala.util.control.NonFatal
-
 import org.apache.hadoop.conf.Configuration
-import org.apache.hadoop.hive.ql.exec.{FunctionRegistry => 
HiveFunctionRegistry}
 
-import org.apache.spark.sql.catalyst.FunctionIdentifier
 import org.apache.spark.sql.catalyst.analysis.{FunctionRegistry, 
TableFunctionRegistry}
 import org.apache.spark.sql.catalyst.catalog._
-import org.apache.spark.sql.catalyst.expressions.{Cast, Expression}
 import org.apache.spark.sql.catalyst.parser.ParserInterface
-import org.apache.spark.sql.types.{DecimalType, DoubleType}
 
 private[sql] class HiveSessionCatalog(
 externalCatalogBuilder: () => ExternalCatalog,
@@ -51,82 +42,4 @@ private[sql] class HiveSessionCatalog(
 parser,
 functionResourceLoader,
 functionExpressionBuilder) {
-
-  override def lookupFunction(name: FunctionIdentifier, children: 
Seq[Expression]): Expression = {
-try {
-  lookupFunction0(name, children)
-} catch {
-  case NonFatal(_) if 
children.exists(_.dataType.isInstanceOf[DecimalType]) =>
-// SPARK-16228 ExternalCatalog may recognize `double`-type only.
-val newChildren = children.map { child =>
-  if (child.dataType.isInstanceOf[DecimalType]) Cast(child, 
DoubleType) else child
-}
-lookupFunction0(name, newChildren)
-}
-  }
-
-  private def lookupFunction0(name: FunctionIdentifier, children: 
Seq[Expression]): Expression = {
-val database = name.database.map(formatDatabaseName)
-val funcName = name.copy(database = database)
-Try(super.lookupFunction(funcName, children)) match {
-  case Success(expr) => expr
-  case Failure(error) =>
-if (super.functionExists(name)) {
-  // If the function exists (either in functionRegistry or 
externalCatalog),
-  // it means that there is an error when we create the Expression 
using the given children.
-  // We need to throw the original exception.
-

[spark] branch master updated (7ed7afe -> 09ec7ca)

2021-10-28 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7ed7afe  [SPARK-37136][SQL] Remove code about hive buildin function 
but not implement in spark
 add 09ec7ca  [SPARK-37118][PYTHON][ML] Add distanceMeasure param to 
trainKMeansModel

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala  | 4 +++-
 python/pyspark/mllib/clustering.py| 8 ++--
 2 files changed, 9 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [MINOR][SS][DOCS] Point to correct examples of Arbitrary Stateful Operations

2021-10-28 Thread viirya
This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5b2bbce  [MINOR][SS][DOCS] Point to correct examples of Arbitrary 
Stateful Operations
5b2bbce is described below

commit 5b2bbcef6854c495c32b37e383dd5f1f6ce23dd4
Author: Liang-Chi Hsieh 
AuthorDate: Thu Oct 28 09:22:42 2021 -0700

[MINOR][SS][DOCS] Point to correct examples of Arbitrary Stateful Operations

### What changes were proposed in this pull request?

This fixes incorrect example links in Structured Streaming Programming 
Guide.

### Why are the changes needed?

StructuredSessionization.scala and JavaStructuredSessionization.java are 
now using session window expression, not `flatMapGroupsWithState`. The section 
talks about arbitrary stateful operations and should point to another examples.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Doc change only.

Closes #34408 from viirya/fix-ss-doc.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: Liang-Chi Hsieh 
---
 docs/structured-streaming-programming-guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 6e98d5a..b36cdc7 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -1806,7 +1806,7 @@ However, as a side effect, data from the slower streams 
will be aggressively dro
 this configuration judiciously.
 
 ### Arbitrary Stateful Operations
-Many usecases require more advanced stateful operations than aggregations. For 
example, in many usecases, you have to track sessions from data streams of 
events. For doing such sessionization, you will have to save arbitrary types of 
data as state, and perform arbitrary operations on the state using the data 
stream events in every trigger. Since Spark 2.2, this can be done using the 
operation `mapGroupsWithState` and the more powerful operation 
`flatMapGroupsWithState`. Both operations a [...]
+Many usecases require more advanced stateful operations than aggregations. For 
example, in many usecases, you have to track sessions from data streams of 
events. For doing such sessionization, you will have to save arbitrary types of 
data as state, and perform arbitrary operations on the state using the data 
stream events in every trigger. Since Spark 2.2, this can be done using the 
operation `mapGroupsWithState` and the more powerful operation 
`flatMapGroupsWithState`. Both operations a [...]
 
 Though Spark cannot check and force it, the state function should be 
implemented with respect to the semantics of the output mode. For example, in 
Update mode Spark doesn't expect that the state function will emit rows which 
are older than current watermark plus allowed late record delay, whereas in 
Append mode the state function can emit these rows.
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.2 updated: [MINOR][SS][DOCS] Point to correct examples of Arbitrary Stateful Operations

2021-10-28 Thread viirya
This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 9bfc5b1  [MINOR][SS][DOCS] Point to correct examples of Arbitrary 
Stateful Operations
9bfc5b1 is described below

commit 9bfc5b14c9b0fbb50dd537a509f2e094e1c5779e
Author: Liang-Chi Hsieh 
AuthorDate: Thu Oct 28 09:22:42 2021 -0700

[MINOR][SS][DOCS] Point to correct examples of Arbitrary Stateful Operations

### What changes were proposed in this pull request?

This fixes incorrect example links in Structured Streaming Programming 
Guide.

### Why are the changes needed?

StructuredSessionization.scala and JavaStructuredSessionization.java are 
now using session window expression, not `flatMapGroupsWithState`. The section 
talks about arbitrary stateful operations and should point to another examples.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Doc change only.

Closes #34408 from viirya/fix-ss-doc.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: Liang-Chi Hsieh 
(cherry picked from commit 5b2bbcef6854c495c32b37e383dd5f1f6ce23dd4)
Signed-off-by: Liang-Chi Hsieh 
---
 docs/structured-streaming-programming-guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 18dfbec..4642d44 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -1806,7 +1806,7 @@ However, as a side effect, data from the slower streams 
will be aggressively dro
 this configuration judiciously.
 
 ### Arbitrary Stateful Operations
-Many usecases require more advanced stateful operations than aggregations. For 
example, in many usecases, you have to track sessions from data streams of 
events. For doing such sessionization, you will have to save arbitrary types of 
data as state, and perform arbitrary operations on the state using the data 
stream events in every trigger. Since Spark 2.2, this can be done using the 
operation `mapGroupsWithState` and the more powerful operation 
`flatMapGroupsWithState`. Both operations a [...]
+Many usecases require more advanced stateful operations than aggregations. For 
example, in many usecases, you have to track sessions from data streams of 
events. For doing such sessionization, you will have to save arbitrary types of 
data as state, and perform arbitrary operations on the state using the data 
stream events in every trigger. Since Spark 2.2, this can be done using the 
operation `mapGroupsWithState` and the more powerful operation 
`flatMapGroupsWithState`. Both operations a [...]
 
 Though Spark cannot check and force it, the state function should be 
implemented with respect to the semantics of the output mode. For example, in 
Update mode Spark doesn't expect that the state function will emit rows which 
are older than current watermark plus allowed late record delay, whereas in 
Append mode the state function can emit these rows.
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (5b2bbce -> b3b7e64)

2021-10-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5b2bbce  [MINOR][SS][DOCS] Point to correct examples of Arbitrary 
Stateful Operations
 add b3b7e64  [SPARK-37141][TESTS] Fix WorkerSuite failure on MacOS

No new revisions were added by this update.

Summary of changes:
 core/src/test/scala/org/apache/spark/deploy/worker/WorkerSuite.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-37143][TESTS] Supplement the missing Java 11 benchmark result files

2021-10-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4a09e74  [SPARK-37143][TESTS] Supplement the missing Java 11 benchmark 
result files
4a09e74 is described below

commit 4a09e74f1d6c46f3ca5150e1b4eaff13a9034ab1
Author: yangjie01 
AuthorDate: Thu Oct 28 11:08:43 2021 -0700

[SPARK-37143][TESTS] Supplement the missing Java 11 benchmark result files

### What changes were proposed in this pull request?
`CharVarcharBenchmark-results.txt`and `UpdateFieldsBenchmark-results.txt` 
exist in the project, but `CharVarcharBenchmark-jdk11-results.txt` and 
`UpdateFieldsBenchmark-jdk11-results.txt `are missing, so this pr added them.

### Why are the changes needed?
Supplement the missing Java 11 benchmark result files.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
The 2 file was generated from:
- [Run benchmarks: * (JDK 
11)](https://github.com/LuciferYang/spark/actions/runs/1392788240)

Closes #34423 from LuciferYang/bench-jdk11.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 .../CharVarcharBenchmark-jdk11-results.txt | 122 +
 .../UpdateFieldsBenchmark-jdk11-results.txt|  26 +
 2 files changed, 148 insertions(+)

diff --git a/sql/core/benchmarks/CharVarcharBenchmark-jdk11-results.txt 
b/sql/core/benchmarks/CharVarcharBenchmark-jdk11-results.txt
new file mode 100644
index 000..25740d0
--- /dev/null
+++ b/sql/core/benchmarks/CharVarcharBenchmark-jdk11-results.txt
@@ -0,0 +1,122 @@
+
+Char Varchar Write Side Perf w/o Tailing Spaces
+
+
+OpenJDK 64-Bit Server VM 11.0.13+8-LTS on Linux 5.8.0-1042-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Write with length 5:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
+
+write string with length 511587  11682 
133  3.5 289.7   1.0X
+write char with length 5  15479  15566 
103  2.6 387.0   0.7X
+write varchar with length 5   12057  12165 
122  3.3 301.4   1.0X
+
+OpenJDK 64-Bit Server VM 11.0.13+8-LTS on Linux 5.8.0-1042-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Write with length 10: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
+
+write string with length 105660   5698 
 56  3.5 283.0   1.0X
+write char with length 10  9484   9542 
 60  2.1 474.2   0.6X
+write varchar with length 10   5908   5928 
 26  3.4 295.4   1.0X
+
+OpenJDK 64-Bit Server VM 11.0.13+8-LTS on Linux 5.8.0-1042-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Write with length 20: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
+
+write string with length 202968   2981 
 21  3.4 296.8   1.0X
+write char with length 20  6647   6672 
 22  1.5 664.7   0.4X
+write varchar with length 20   3064   3070 
  6  3.3 306.4   1.0X
+
+OpenJDK 64-Bit Server VM 11.0.13+8-LTS on Linux 5.8.0-1042-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Write with length 40: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
+
+write string with length 401508   1538 
 31  3.3 301.7   1.0X
+write char with length 40  5218   5233 
 13  1.01043.6   0.3X
+write varchar with length 40   1603   1609 
  6  3.1 320.7   0.9X
+
+OpenJDK 64-Bit Server VM 11.0.13+8-LTS on Linux 5.8.0-1042-azu

[spark] branch master updated: [SPARK-36627][CORE] Fix java deserialization of proxy classes

2021-10-28 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 90fe41b  [SPARK-36627][CORE] Fix java deserialization of proxy classes
90fe41b is described below

commit 90fe41b70a9d0403418aa05a220d38c20f51c6f9
Author: Samuel Souza 
AuthorDate: Thu Oct 28 18:15:38 2021 -0500

[SPARK-36627][CORE] Fix java deserialization of proxy classes

## Upstream SPARK-X ticket and PR link (if not applicable, explain)
https://issues.apache.org/jira/browse/SPARK-36627

## What changes were proposed in this pull request?
In JavaSerializer.JavaDeserializationStream we override resolveClass of 
ObjectInputStream to use the threads' contextClassLoader. However, we do not 
override resolveProxyClass, which is used when deserializing Java proxy 
objects, which makes spark use the wrong classloader when deserializing 
objects, which causes the job to fail with the following exception:

```
Caused by: org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 
in stage 1.0 (TID 4, , executor 1): java.lang.ClassNotFoundException: 

Signed-off-by: Sean Owen 
---
 .../apache/spark/serializer/JavaSerializer.scala   | 50 +++---
 .../spark/serializer/ContainsProxyClass.java   | 50 ++
 .../spark/serializer/JavaSerializerSuite.scala | 26 ++-
 3 files changed, 108 insertions(+), 18 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala 
b/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala
index 077b035..9d76611 100644
--- a/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala
+++ b/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala
@@ -28,8 +28,10 @@ import org.apache.spark.internal.config._
 import org.apache.spark.util.{ByteBufferInputStream, ByteBufferOutputStream, 
Utils}
 
 private[spark] class JavaSerializationStream(
-out: OutputStream, counterReset: Int, extraDebugInfo: Boolean)
-  extends SerializationStream {
+out: OutputStream,
+counterReset: Int,
+extraDebugInfo: Boolean)
+extends SerializationStream {
   private val objOut = new ObjectOutputStream(out)
   private var counter = 0
 
@@ -59,9 +61,10 @@ private[spark] class JavaSerializationStream(
 }
 
 private[spark] class JavaDeserializationStream(in: InputStream, loader: 
ClassLoader)
-  extends DeserializationStream {
+extends DeserializationStream {
 
   private val objIn = new ObjectInputStream(in) {
+
 override def resolveClass(desc: ObjectStreamClass): Class[_] =
   try {
 // scalastyle:off classforname
@@ -71,6 +74,14 @@ private[spark] class JavaDeserializationStream(in: 
InputStream, loader: ClassLoa
 case e: ClassNotFoundException =>
   JavaDeserializationStream.primitiveMappings.getOrElse(desc.getName, 
throw e)
   }
+
+override def resolveProxyClass(ifaces: Array[String]): Class[_] = {
+  // scalastyle:off classforname
+  val resolved = ifaces.map(iface => Class.forName(iface, false, loader))
+  // scalastyle:on classforname
+  java.lang.reflect.Proxy.getProxyClass(loader, resolved: _*)
+}
+
   }
 
   def readObject[T: ClassTag](): T = objIn.readObject().asInstanceOf[T]
@@ -78,6 +89,7 @@ private[spark] class JavaDeserializationStream(in: 
InputStream, loader: ClassLoa
 }
 
 private object JavaDeserializationStream {
+
   val primitiveMappings = Map[String, Class[_]](
 "boolean" -> classOf[Boolean],
 "byte" -> classOf[Byte],
@@ -87,13 +99,15 @@ private object JavaDeserializationStream {
 "long" -> classOf[Long],
 "float" -> classOf[Float],
 "double" -> classOf[Double],
-"void" -> classOf[Void]
-  )
+"void" -> classOf[Void])
+
 }
 
 private[spark] class JavaSerializerInstance(
-counterReset: Int, extraDebugInfo: Boolean, defaultClassLoader: 
ClassLoader)
-  extends SerializerInstance {
+counterReset: Int,
+extraDebugInfo: Boolean,
+defaultClassLoader: ClassLoader)
+extends SerializerInstance {
 
   override def serialize[T: ClassTag](t: T): ByteBuffer = {
 val bos = new ByteBufferOutputStream()
@@ -126,6 +140,7 @@ private[spark] class JavaSerializerInstance(
   def deserializeStream(s: InputStream, loader: ClassLoader): 
DeserializationStream = {
 new JavaDeserializationStream(s, loader)
   }
+
 }
 
 /**
@@ -141,20 +156,23 @@ class JavaSerializer(conf: SparkConf) extends Serializer 
with Externalizable {
   private var counterReset = conf.get(SERIALIZER_OBJECT_STREAM_RESET)
   private var extraDebugInfo = conf.get(SERIALIZER_EXTRA_DEBUG_INFO)
 
-  protected def this() = this(new SparkConf())  // For deserialization only
+  protected def this() = this(new SparkConf()) // For dese

[spark] branch master updated: [SPARK-37020][SQL] DS V2 LIMIT push down

2021-10-28 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9821a28  [SPARK-37020][SQL] DS V2 LIMIT push down
9821a28 is described below

commit 9821a286c7d5ee5e0668c49c893de158809ec38f
Author: Huaxin Gao 
AuthorDate: Thu Oct 28 16:59:12 2021 -0700

[SPARK-37020][SQL] DS V2 LIMIT push down

### What changes were proposed in this pull request?
Push down limit to data source for better performance

### Why are the changes needed?
For LIMIT, e.g. `SELECT * FROM table LIMIT 10`, Spark retrieves all the 
data from table and then returns 10 rows. If we can push LIMIT to data source 
side, the data transferred to Spark will be dramatically reduced.

### Does this PR introduce _any_ user-facing change?
Yes. new interface `SupportsPushDownLimit`

### How was this patch tested?
new test

Closes #34291 from huaxingao/pushdownLimit.

Authored-by: Huaxin Gao 
Signed-off-by: Huaxin Gao 
---
 docs/sql-data-sources-jdbc.md  |  9 +++
 .../spark/sql/connector/read/ScanBuilder.java  |  6 +-
 ...ScanBuilder.java => SupportsPushDownLimit.java} | 17 +++---
 .../spark/sql/execution/DataSourceScanExec.scala   |  4 +-
 .../execution/datasources/DataSourceStrategy.scala |  3 +
 .../execution/datasources/jdbc/JDBCOptions.scala   |  4 ++
 .../sql/execution/datasources/jdbc/JDBCRDD.scala   | 15 +++--
 .../execution/datasources/jdbc/JDBCRelation.scala  |  6 +-
 .../datasources/v2/DataSourceV2Strategy.scala  |  4 +-
 .../execution/datasources/v2/PushDownUtils.scala   | 13 +++-
 .../datasources/v2/V2ScanRelationPushDown.scala| 30 --
 .../execution/datasources/v2/jdbc/JDBCScan.scala   |  5 +-
 .../datasources/v2/jdbc/JDBCScanBuilder.scala  | 15 -
 .../org/apache/spark/sql/jdbc/DerbyDialect.scala   |  4 ++
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   | 12 
 .../apache/spark/sql/jdbc/MsSqlServerDialect.scala |  3 +
 .../apache/spark/sql/jdbc/TeradataDialect.scala|  3 +
 .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 69 +-
 18 files changed, 191 insertions(+), 31 deletions(-)

diff --git a/docs/sql-data-sources-jdbc.md b/docs/sql-data-sources-jdbc.md
index 16d525e..361b92be 100644
--- a/docs/sql-data-sources-jdbc.md
+++ b/docs/sql-data-sources-jdbc.md
@@ -247,6 +247,15 @@ logging into the data sources.
   
 
   
+pushDownLimit
+false
+
+ The option to enable or disable LIMIT push-down into the JDBC data 
source. The default value is false, in which case Spark does not push down 
LIMIT to the JDBC data source. Otherwise, if value sets to true, LIMIT is 
pushed down to the JDBC data source. SPARK still applies LIMIT on the result 
from data source even if LIMIT is pushed down.
+
+read
+  
+
+  
 keytab
 (none)
 
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/ScanBuilder.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/ScanBuilder.java
index b46f620..20c9d2e 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/ScanBuilder.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/ScanBuilder.java
@@ -21,9 +21,9 @@ import org.apache.spark.annotation.Evolving;
 
 /**
  * An interface for building the {@link Scan}. Implementations can mixin 
SupportsPushDownXYZ
- * interfaces to do operator pushdown, and keep the operator pushdown result 
in the returned
- * {@link Scan}. When pushing down operators, Spark pushes down filters first, 
then pushes down
- * aggregates or applies column pruning.
+ * interfaces to do operator push down, and keep the operator push down result 
in the returned
+ * {@link Scan}. When pushing down operators, the push down order is:
+ * filter -> aggregate -> limit -> column pruning.
  *
  * @since 3.0.0
  */
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/ScanBuilder.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownLimit.java
similarity index 68%
copy from 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/ScanBuilder.java
copy to 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownLimit.java
index b46f620..7e50bf1 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/ScanBuilder.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownLimit.java
@@ -20,14 +20,17 @@ package org.apache.spark.sql.connector.read;
 import org.apache.spark.annotation.Evolving;
 
 /**
- * An interface for building the {@link Scan}. Implementations can mixin 
SupportsPushDownXYZ
- * interfaces to do operator pushdown, and keep the operator pushdown result 
in the returned
- * {@link Scan}. When pushing down op

[spark] branch master updated: [SPARK-34960][SQL] Aggregate push down for ORC

2021-10-28 Thread viirya
This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 609e749  [SPARK-34960][SQL] Aggregate push down for ORC
609e749 is described below

commit 609e7498326ebedb06904a1f5bab59b739380b1a
Author: Cheng Su 
AuthorDate: Thu Oct 28 17:29:15 2021 -0700

[SPARK-34960][SQL] Aggregate push down for ORC

### What changes were proposed in this pull request?

This PR is to add aggregate push down feature for ORC data source v2 reader.

At a high level, the PR does:

* The supported aggregate expression is MIN/MAX/COUNT same as [Parquet 
aggregate push down](https://github.com/apache/spark/pull/33639).
* BooleanType, ByteType, ShortType, IntegerType, LongType, FloatType, 
DoubleType, DateType are allowed in MIN/MAXX aggregate push down. All other 
columns types are not allowed in MIN/MAX aggregate push down.
* All columns types are supported in COUNT aggregate push down.
* Nested column's sub-fields are disallowed in aggregate push down.
* If the file does not have valid statistics, Spark will throw exception 
and fail query.
* If aggregate has filter or group-by column, aggregate will not be pushed 
down.

At code level, the PR does:
* `OrcScanBuilder`: `pushAggregation()` checks whether the aggregation can 
be pushed down. The most checking logic is shared between Parquet and ORC, 
extracted into `AggregatePushDownUtils.getSchemaForPushedAggregation()`. 
`OrcScanBuilder` will create a `OrcScan` with aggregation and aggregation data 
schema.
* `OrcScan`: `createReaderFactory` creates a ORC reader factory with 
aggregation and schema. Similar change with `ParquetScan`.
* `OrcPartitionReaderFactory`: `buildReaderWithAggregates` creates a ORC 
reader with aggregate push down (i.e. read ORC file footer to process columns 
statistics, instead of reading actual data in the file). 
`buildColumnarReaderWithAggregates` creates a columnar ORC reader similarly. 
Both delegate the real work to read footer in 
`OrcUtils.createAggInternalRowFromFooter`.
* `OrcUtils.createAggInternalRowFromFooter`: reads ORC file footer to 
process columns statistics (real heavy lift happens here). Similar to 
`ParquetUtils.createAggInternalRowFromFooter`. Leverage utility method such as 
`OrcFooterReader.readStatistics`.
* `OrcFooterReader`: `readStatistics` reads the ORC `ColumnStatistics[]` 
into Spark `OrcColumnStatistics`. The transformation is needed here, because 
ORC `ColumnStatistics[]` stores all columns statistics in a flatten array 
style, and hard to process. Spark `OrcColumnStatistics` stores the statistics 
in nested tree structure (e.g. like `StructType`). This is used by 
`OrcUtils.createAggInternalRowFromFooter`
* `OrcColumnStatistics`: the easy-to-manipulate structure for ORC 
`ColumnStatistics`. This is used by `OrcFooterReader.readStatistics`.

### Why are the changes needed?

To improve the performance of query with aggregate.

### Does this PR introduce _any_ user-facing change?

Yes. A user-facing config `spark.sql.orc.aggregatePushdown` is added to 
control enabling/disabling the aggregate push down for ORC. By default the 
feature is disabled.

### How was this patch tested?

Added unit test in `FileSourceAggregatePushDownSuite.scala`. Refactored all 
unit tests in https://github.com/apache/spark/pull/33639, and it now works for 
both Parquet and ORC.

Closes #34298 from c21/orc-agg.

Authored-by: Cheng Su 
Signed-off-by: Liang-Chi Hsieh 
---
 .../org/apache/spark/sql/internal/SQLConf.scala|  10 +
 .../org/apache/spark/sql/types/StructType.scala|   2 +-
 .../datasources/orc/OrcColumnStatistics.java   |  80 +
 .../execution/datasources/orc/OrcFooterReader.java |  67 +
 .../datasources/AggregatePushDownUtils.scala   | 141 +
 .../datasources/orc/OrcDeserializer.scala  |  16 +
 .../sql/execution/datasources/orc/OrcUtils.scala   | 122 +++-
 .../datasources/parquet/ParquetUtils.scala |  41 ---
 .../v2/orc/OrcPartitionReaderFactory.scala |  93 --
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  45 ++-
 .../datasources/v2/orc/OrcScanBuilder.scala|  43 ++-
 .../v2/parquet/ParquetPartitionReaderFactory.scala |  14 +-
 .../datasources/v2/parquet/ParquetScan.scala   |  10 +-
 .../v2/parquet/ParquetScanBuilder.scala|  93 ++
 .../scala/org/apache/spark/sql/FileScanSuite.scala |   2 +-
 ...cala => FileSourceAggregatePushDownSuite.scala} | 324 -
 16 files changed, 804 insertions(+), 299 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index fe3204b..def6b

[spark] branch master updated (609e749 -> f59e1d5)

2021-10-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 609e749  [SPARK-34960][SQL] Aggregate push down for ORC
 add f59e1d5  [SPARK-37139][PYTHON] Inline type hints for 
python/pyspark/taskcontext.py and version.py

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/streaming.py |  4 +-
 python/pyspark/taskcontext.py   | 86 +++--
 python/pyspark/taskcontext.pyi  | 46 --
 python/pyspark/version.py   |  2 +-
 python/pyspark/version.pyi  | 19 -
 5 files changed, 53 insertions(+), 104 deletions(-)
 delete mode 100644 python/pyspark/taskcontext.pyi
 delete mode 100644 python/pyspark/version.pyi

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (f59e1d5 -> 337dbf17)

2021-10-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f59e1d5  [SPARK-37139][PYTHON] Inline type hints for 
python/pyspark/taskcontext.py and version.py
 add 337dbf17 [SPARK-37042][PYTHON] Inline type hints for kinesis.py and 
listener.py in python/pyspark/streaming

No new revisions were added by this update.

Summary of changes:
 python/pyspark/streaming/kinesis.py   | 102 +-
 python/pyspark/streaming/kinesis.pyi  |  49 
 python/pyspark/streaming/listener.py  |  23 
 python/pyspark/streaming/listener.pyi |  35 
 4 files changed, 100 insertions(+), 109 deletions(-)
 delete mode 100644 python/pyspark/streaming/kinesis.pyi
 delete mode 100644 python/pyspark/streaming/listener.pyi

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (337dbf17 -> a74f76c)

2021-10-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 337dbf17 [SPARK-37042][PYTHON] Inline type hints for kinesis.py and 
listener.py in python/pyspark/streaming
 add a74f76c  [SPARK-37144][PYTHON] Inline type hints for 
python/pyspark/file.py

No new revisions were added by this update.

Summary of changes:
 python/pyspark/files.py  | 24 
 python/pyspark/files.pyi | 24 
 2 files changed, 16 insertions(+), 32 deletions(-)
 delete mode 100644 python/pyspark/files.pyi

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (a74f76c -> 92f18ad)

2021-10-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a74f76c  [SPARK-37144][PYTHON] Inline type hints for 
python/pyspark/file.py
 add 92f18ad  [SPARK-37107][PYTHON] Inline type hints for 
python/pyspark/status.py

No new revisions were added by this update.

Summary of changes:
 python/pyspark/status.py  | 37 +
 python/pyspark/status.pyi | 42 --
 2 files changed, 25 insertions(+), 54 deletions(-)
 delete mode 100644 python/pyspark/status.pyi

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] HyukjinKwon opened a new pull request #366: Update Spark 3.3 release window

2021-10-28 Thread GitBox


HyukjinKwon opened a new pull request #366:
URL: https://github.com/apache/spark-website/pull/366


   This PR proposes to update Spark 3.3 release window. See also 
https://mail-archives.apache.org/mod_mbox/spark-dev/202110.mbox/%3CCAMFhwAZVrNvXj3z9HhteHT22dC9N76zMqimPCrJW-GLFOSDHDA%40mail.gmail.com%3E


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-37134][PYTHON][DOCS] Clarify the options in "Using PySpark Native Features"

2021-10-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f258d30  [SPARK-37134][PYTHON][DOCS] Clarify the options in "Using 
PySpark Native Features"
f258d30 is described below

commit f258d30b8ef5d1698da3af1dddc08dd47ca0db3b
Author: Hyukjin Kwon 
AuthorDate: Fri Oct 29 14:48:53 2021 +0900

[SPARK-37134][PYTHON][DOCS] Clarify the options in "Using PySpark Native 
Features"

### What changes were proposed in this pull request?

This PR proposes to fix:

```diff
- to the executors by:
+ to the executors by one of the following:
```

to clarify that doing one of many options works (instead of doing all 
options together).

### Why are the changes needed?

To prevent confusion.

### Does this PR introduce _any_ user-facing change?

Yes, this is user-facing documentation change.

### How was this patch tested?

Manually double checked.

Closes #34422

Closes #34432 from HyukjinKwon/SPARK-37134.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 python/docs/source/user_guide/python_packaging.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/docs/source/user_guide/python_packaging.rst 
b/python/docs/source/user_guide/python_packaging.rst
index 02ef7f6..6409c5f 100644
--- a/python/docs/source/user_guide/python_packaging.rst
+++ b/python/docs/source/user_guide/python_packaging.rst
@@ -63,7 +63,7 @@ Using PySpark Native Features
 -
 
 PySpark allows to upload Python files (``.py``), zipped Python packages 
(``.zip``), and Egg files (``.egg``)
-to the executors by:
+to the executors by one of the following:
 
 - Setting the configuration setting ``spark.submit.pyFiles``
 - Setting ``--py-files`` option in Spark scripts

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.2 updated: [SPARK-37134][PYTHON][DOCS] Clarify the options in "Using PySpark Native Features"

2021-10-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new b58db1f  [SPARK-37134][PYTHON][DOCS] Clarify the options in "Using 
PySpark Native Features"
b58db1f is described below

commit b58db1f287b854cdeb54722c80bbd4558d848fab
Author: Hyukjin Kwon 
AuthorDate: Fri Oct 29 14:48:53 2021 +0900

[SPARK-37134][PYTHON][DOCS] Clarify the options in "Using PySpark Native 
Features"

### What changes were proposed in this pull request?

This PR proposes to fix:

```diff
- to the executors by:
+ to the executors by one of the following:
```

to clarify that doing one of many options works (instead of doing all 
options together).

### Why are the changes needed?

To prevent confusion.

### Does this PR introduce _any_ user-facing change?

Yes, this is user-facing documentation change.

### How was this patch tested?

Manually double checked.

Closes #34422

Closes #34432 from HyukjinKwon/SPARK-37134.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit f258d30b8ef5d1698da3af1dddc08dd47ca0db3b)
Signed-off-by: Hyukjin Kwon 
---
 python/docs/source/user_guide/python_packaging.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/docs/source/user_guide/python_packaging.rst 
b/python/docs/source/user_guide/python_packaging.rst
index 02ef7f6..6409c5f 100644
--- a/python/docs/source/user_guide/python_packaging.rst
+++ b/python/docs/source/user_guide/python_packaging.rst
@@ -63,7 +63,7 @@ Using PySpark Native Features
 -
 
 PySpark allows to upload Python files (``.py``), zipped Python packages 
(``.zip``), and Egg files (``.egg``)
-to the executors by:
+to the executors by one of the following:
 
 - Setting the configuration setting ``spark.submit.pyFiles``
 - Setting ``--py-files`` option in Spark scripts

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org