[spark] branch master updated (63ab38f -> 7c32415)

2021-06-03 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 63ab38f  [SPARK-35396][CORE][FOLLOWUP] Free memory entry immediately
 add 7c32415  [SPARK-35523] Fix the default value in Data Source Options 
page

No new revisions were added by this update.

Summary of changes:
 docs/sql-data-sources-avro.md|   2 +-
 docs/sql-data-sources-csv.md |   2 +-
 docs/sql-data-sources-jdbc.md| 105 +++
 docs/sql-data-sources-json.md|  84 +++
 docs/sql-data-sources-orc.md |   6 +--
 docs/sql-data-sources-parquet.md |  10 ++--
 docs/sql-data-sources-text.md|   4 +-
 7 files changed, 126 insertions(+), 87 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (221553c -> 63ab38f)

2021-06-03 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 221553c  [SPARK-35642][INFRA] Split pyspark-pandas tests to rebalance 
the test duration
 add 63ab38f  [SPARK-35396][CORE][FOLLOWUP] Free memory entry immediately

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/storage/memory/MemoryStore.scala  | 42 
 .../apache/spark/storage/MemoryStoreSuite.scala| 74 +++---
 2 files changed, 38 insertions(+), 78 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (3d158f9 -> 221553c)

2021-06-03 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3d158f9  [SPARK-35587][PYTHON][DOCS] Initial porting of Koalas 
documentation
 add 221553c  [SPARK-35642][INFRA] Split pyspark-pandas tests to rebalance 
the test duration

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml |  2 ++
 dev/run-tests.py | 10 -
 dev/sparktestsupport/modules.py  | 40 +---
 3 files changed, 35 insertions(+), 17 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (745bd09 -> 3d158f9)

2021-06-03 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 745bd09  [SPARK-35589][CORE][TESTS][FOLLOWUP] Remove the duplicated 
test coverage
 add 3d158f9  [SPARK-35587][PYTHON][DOCS] Initial porting of Koalas 
documentation

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml   | 3 +-
 dev/tox.ini| 2 +-
 docs/_plugins/copy_api_dirs.rb |26 +-
 python/docs/source/conf.py | 9 +
 python/docs/source/development/index.rst   |18 +-
 python/docs/source/development/ps_contributing.rst |   192 +
 python/docs/source/development/ps_design.rst   |85 +
 python/docs/source/getting_started/index.rst   |16 +-
 python/docs/source/getting_started/ps_10mins.ipynb | 14471 +++
 python/docs/source/getting_started/ps_install.rst  |   145 +
 .../source/getting_started/ps_videos_blogs.rst |   130 +
 python/docs/source/index.rst   | 9 +
 python/docs/source/reference/index.rst |15 +
 python/docs/source/reference/ps_extensions.rst |21 +
 python/docs/source/reference/ps_frame.rst  |   327 +
 .../docs/source/reference/ps_general_functions.rst |49 +
 python/docs/source/reference/ps_groupby.rst|88 +
 python/docs/source/reference/ps_indexing.rst   |   359 +
 python/docs/source/reference/ps_io.rst |   103 +
 python/docs/source/reference/ps_ml.rst |28 +
 python/docs/source/reference/ps_series.rst |   454 +
 python/docs/source/reference/ps_window.rst |31 +
 python/docs/source/user_guide/index.rst|20 +-
 .../docs/source/user_guide/ps_best_practices.rst   |   313 +
 python/docs/source/user_guide/ps_faq.rst   |86 +
 python/docs/source/user_guide/ps_from_to_dbms.rst  |   107 +
 python/docs/source/user_guide/ps_options.rst   |   274 +
 .../docs/source/user_guide/ps_pandas_pyspark.rst   |   118 +
 .../docs/source/user_guide/ps_transform_apply.rst  |   121 +
 python/docs/source/user_guide/ps_typehints.rst |   137 +
 python/docs/source/user_guide/ps_types.rst |   228 +
 python/pyspark/pandas/accessors.py | 2 +-
 32 files changed, 17966 insertions(+), 21 deletions(-)
 create mode 100644 python/docs/source/development/ps_contributing.rst
 create mode 100644 python/docs/source/development/ps_design.rst
 create mode 100644 python/docs/source/getting_started/ps_10mins.ipynb
 create mode 100644 python/docs/source/getting_started/ps_install.rst
 create mode 100644 python/docs/source/getting_started/ps_videos_blogs.rst
 create mode 100644 python/docs/source/reference/ps_extensions.rst
 create mode 100644 python/docs/source/reference/ps_frame.rst
 create mode 100644 python/docs/source/reference/ps_general_functions.rst
 create mode 100644 python/docs/source/reference/ps_groupby.rst
 create mode 100644 python/docs/source/reference/ps_indexing.rst
 create mode 100644 python/docs/source/reference/ps_io.rst
 create mode 100644 python/docs/source/reference/ps_ml.rst
 create mode 100644 python/docs/source/reference/ps_series.rst
 create mode 100644 python/docs/source/reference/ps_window.rst
 create mode 100644 python/docs/source/user_guide/ps_best_practices.rst
 create mode 100644 python/docs/source/user_guide/ps_faq.rst
 create mode 100644 python/docs/source/user_guide/ps_from_to_dbms.rst
 create mode 100644 python/docs/source/user_guide/ps_options.rst
 create mode 100644 python/docs/source/user_guide/ps_pandas_pyspark.rst
 create mode 100644 python/docs/source/user_guide/ps_transform_apply.rst
 create mode 100644 python/docs/source/user_guide/ps_typehints.rst
 create mode 100644 python/docs/source/user_guide/ps_types.rst

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (7eeb07d -> 745bd09)

2021-06-03 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7eeb07d  [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed 
libraries in build_and_test workflow
 add 745bd09  [SPARK-35589][CORE][TESTS][FOLLOWUP] Remove the duplicated 
test coverage

No new revisions were added by this update.

Summary of changes:
 core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala | 4 
 1 file changed, 4 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (878527d -> 7eeb07d)

2021-06-03 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 878527d  [SPARK-35612][SQL] Support LZ4 compression in ORC data source
 add 7eeb07d  [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed 
libraries in build_and_test workflow

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml | 3 +++
 1 file changed, 3 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (0342dcb -> 878527d)

2021-06-03 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0342dcb  [SPARK-35580][SQL] Implement canonicalized method for 
HigherOrderFunction
 add 878527d  [SPARK-35612][SQL] Support LZ4 compression in ORC data source

No new revisions were added by this update.

Summary of changes:
 docs/sql-data-sources-orc.md |  2 +-
 .../main/scala/org/apache/spark/sql/internal/SQLConf.scala   |  4 ++--
 .../spark/sql/execution/datasources/orc/OrcOptions.scala |  1 +
 .../spark/sql/execution/datasources/orc/OrcUtils.scala   |  1 +
 .../spark/sql/execution/datasources/orc/OrcSourceSuite.scala | 12 +++-
 .../scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala  |  1 +
 6 files changed, 17 insertions(+), 4 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.1 updated (36577c7 -> ba02faa)

2021-06-03 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 36577c7  [SPARK-35610][CORE] Fix the memory leak introduced by the 
Executor's stop shutdown hook
 add ba02faa  [SPARK-35589][CORE][3.1] BlockManagerMasterEndpoint should 
not ignore index-only shuffle file during updating

No new revisions were added by this update.

Summary of changes:
 .../spark/shuffle/IndexShuffleBlockResolver.scala  |  2 +-
 .../spark/storage/BlockManagerMasterEndpoint.scala |  8 ++--
 .../apache/spark/storage/BlockManagerSuite.scala   | 48 ++
 3 files changed, 53 insertions(+), 5 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (4f0db87 -> 0342dcb)

2021-06-03 Thread viirya
This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4f0db87  [SPARK-35416][K8S][FOLLOWUP] Use Set instead of ArrayBuffer
 add 0342dcb  [SPARK-35580][SQL] Implement canonicalized method for 
HigherOrderFunction

No new revisions were added by this update.

Summary of changes:
 .../expressions/higherOrderFunctions.scala |  17 
 .../expressions/HigherOrderFunctionsSuite.scala| 105 -
 .../spark/sql/execution/joins/HashedRelation.scala |   4 +-
 3 files changed, 120 insertions(+), 6 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-35416][K8S][FOLLOWUP] Use Set instead of ArrayBuffer

2021-06-03 Thread mridulm80
This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4f0db87  [SPARK-35416][K8S][FOLLOWUP] Use Set instead of ArrayBuffer
4f0db87 is described below

commit 4f0db872a089257f1e894060e1757f9e5b973142
Author: Dongjoon Hyun 
AuthorDate: Thu Jun 3 10:41:11 2021 -0500

[SPARK-35416][K8S][FOLLOWUP] Use Set instead of ArrayBuffer

### What changes were proposed in this pull request?

This is a follow-up of https://github.com/apache/spark/pull/32564 .

### Why are the changes needed?

To use Set instead of ArrayBuffer and add a return type.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #32758 from dongjoon-hyun/SPARK-35416-2.

Authored-by: Dongjoon Hyun 
Signed-off-by: Mridul Muralidharan gmail.com>
---
 .../apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala  | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
index 3349e0c..606339a 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
@@ -390,8 +390,8 @@ private[spark] class ExecutorPodsAllocator(
   private def replacePVCsIfNeeded(
   pod: Pod,
   resources: Seq[HasMetadata],
-  reusablePVCs: mutable.Buffer[PersistentVolumeClaim]) = {
-val replacedResources = mutable.ArrayBuffer[HasMetadata]()
+  reusablePVCs: mutable.Buffer[PersistentVolumeClaim]): Seq[HasMetadata] = 
{
+val replacedResources = mutable.Set[HasMetadata]()
 resources.foreach {
   case pvc: PersistentVolumeClaim =>
 // Find one with the same storage class and size.
@@ -407,7 +407,7 @@ private[spark] class ExecutorPodsAllocator(
   }
   if (volume.nonEmpty) {
 val matchedPVC = reusablePVCs.remove(index)
-replacedResources.append(pvc)
+replacedResources.add(pvc)
 logInfo(s"Reuse PersistentVolumeClaim 
${matchedPVC.getMetadata.getName}")
 
volume.get.getPersistentVolumeClaim.setClaimName(matchedPVC.getMetadata.getName)
   }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-35316][SQL] UnwrapCastInBinaryComparison support In/InSet predicate

2021-06-03 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new cfde117  [SPARK-35316][SQL] UnwrapCastInBinaryComparison support 
In/InSet predicate
cfde117 is described below

commit cfde117c6fec758c44f77fe9f9379ba38940b51a
Author: Fu Chen 
AuthorDate: Thu Jun 3 14:45:17 2021 +

[SPARK-35316][SQL] UnwrapCastInBinaryComparison support In/InSet predicate

### What changes were proposed in this pull request?

This pr add in/inset predicate support for `UnwrapCastInBinaryComparison`.

Current implement doesn't pushdown filters for `In/InSet` which contains 
`Cast`.

For instance:

```scala
spark.range(50).selectExpr("cast(id as int) as 
id").write.mode("overwrite").parquet("/tmp/parquet/t1")
spark.read.parquet("/tmp/parquet/t1").where("id in (1L, 2L, 4L)").explain
```

before this pr:

```
== Physical Plan ==
*(1) Filter cast(id#5 as bigint) IN (1,2,4)
+- *(1) ColumnarToRow
   +- FileScan parquet [id#5] Batched: true, DataFilters: [cast(id#5 as 
bigint) IN (1,2,4)], Format: Parquet, Location: InMemoryFileIndex(1 
paths)[file:/tmp/parquet/t1], PartitionFilters: [], PushedFilters: [], 
ReadSchema: struct
```

after this pr:

```
== Physical Plan ==
*(1) Filter id#95 IN (1,2,4)
+- *(1) ColumnarToRow
   +- FileScan parquet [id#95] Batched: true, DataFilters: [id#95 IN 
(1,2,4)], Format: Parquet, Location: InMemoryFileIndex(1 
paths)[file:/tmp/parquet/t1], PartitionFilters: [], PushedFilters: [In(id, 
[1,2,4])], ReadSchema: struct
```

### Does this PR introduce _any_ user-facing change?

No.
### How was this patch tested?

New test.

Closes #32488 from cfmcgrady/SPARK-35316.

Authored-by: Fu Chen 
Signed-off-by: Wenchen Fan 
---
 .../optimizer/UnwrapCastInBinaryComparison.scala   | 114 +++--
 .../UnwrapCastInBinaryComparisonSuite.scala|  50 +
 2 files changed, 156 insertions(+), 8 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
index 9f72751..097d810 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
@@ -17,19 +17,23 @@
 
 package org.apache.spark.sql.catalyst.optimizer
 
+import scala.collection.immutable.HashSet
+import scala.collection.mutable.ArrayBuffer
+
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.expressions.Literal.FalseLiteral
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
 import org.apache.spark.sql.catalyst.rules.Rule
-import org.apache.spark.sql.catalyst.trees.TreePattern.BINARY_COMPARISON
+import org.apache.spark.sql.catalyst.trees.TreePattern.{BINARY_COMPARISON, IN, 
INSET}
 import org.apache.spark.sql.types._
 
 /**
- * Unwrap casts in binary comparison operations with patterns like following:
+ * Unwrap casts in binary comparison or `In/InSet` operations with patterns 
like following:
  *
- * `BinaryComparison(Cast(fromExp, toType), Literal(value, toType))`
- *   or
- * `BinaryComparison(Literal(value, toType), Cast(fromExp, toType))`
+ * - `BinaryComparison(Cast(fromExp, toType), Literal(value, toType))`
+ * - `BinaryComparison(Literal(value, toType), Cast(fromExp, toType))`
+ * - `In(Cast(fromExp, toType), Seq(Literal(v1, toType), Literal(v2, toType), 
...)`
+ * - `InSet(Cast(fromExp, toType), Set(v1, v2, ...))`
  *
  * This rule optimizes expressions with the above pattern by either replacing 
the cast with simpler
  * constructs, or moving the cast from the expression side to the literal 
side, which enables them
@@ -86,13 +90,22 @@ import org.apache.spark.sql.types._
  * Further, the above `if(isnull(fromExp), null, false)` is represented using 
conjunction
  * `and(isnull(fromExp), null)`, to enable further optimization and filter 
pushdown to data sources.
  * Similarly, `if(isnull(fromExp), null, true)` is represented with 
`or(isnotnull(fromExp), null)`.
+ *
+ * For `In/InSet` operation, first the rule transform the expression to Equals:
+ * `Seq(
+ *   EqualTo(Cast(fromExp, toType), Literal(v1, toType)),
+ *   EqualTo(Cast(fromExp, toType), Literal(v2, toType)),
+ *   ...
+ * )`
+ * and using the same rule with `BinaryComparison` show as before to optimize 
each `EqualTo`.
  */
 object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
   override def apply(plan: LogicalPlan): LogicalPlan = 
plan.transformWithPruning(
-_.containsPattern(BINARY_COMPARISON), ruleId) 

[spark] branch master updated: [SPARK-35609][BUILD] Add style rules to prohibit to use a Guava's API which is incompatible with newer versions

2021-06-03 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c532f82  [SPARK-35609][BUILD] Add style rules to prohibit to use a 
Guava's API which is incompatible with newer versions
c532f82 is described below

commit c532f8260ee2f2f4170dc50f7e890fafab438b76
Author: Kousuke Saruta 
AuthorDate: Thu Jun 3 21:52:41 2021 +0900

[SPARK-35609][BUILD] Add style rules to prohibit to use a Guava's API which 
is incompatible with newer versions

### What changes were proposed in this pull request?

This PR adds rules to `checkstyle.xml` and `scalastyle-config.xml` to avoid 
introducing `Objects.toStringHelper` a Guava's API which is no longer present 
in newer Guava.

### Why are the changes needed?

SPARK-30272 (#26911) replaced `Objects.toStringHelper` which is an APIs 
Guava 14 provides with `commons.lang3` API because `Objects.toStringHelper` is 
no longer present in newer Guava.
But toStringHelper was introduced into Spark again and replaced them in 
SPARK-35420 (#32567).
I think it's better to have a style rule to avoid such repetition.

SPARK-30272 replaced some APIs aside from `Objects.toStringHelper` but 
`Objects.toStringHelper` seems to affect Spark for now so I add rules only for 
it.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I confirmed that `lint-java` and `lint-scala` detect the usage of 
`toStringHelper` and let the lint check fail.
```
$ dev/lint-java
exec: curl --silent --show-error -L 
https://downloads.lightbend.com/scala/2.12.14/scala-2.12.14.tgz
Using `mvn` from path: /opt/maven/3.6.3//bin/mvn
Checkstyle checks failed at following occurrences:
[ERROR] 
src/main/java/org/apache/spark/network/protocol/OneWayMessage.java:[78] 
(regexp) RegexpSinglelineJava: Avoid using Object.toStringHelper. Use 
ToStringBuilder instead.

$ dev/lint-scala
Scalastyle checks failed at following occurrences:
[error] 
/home/kou/work/oss/spark/core/src/main/scala/org/apache/spark/rdd/RDD.scala:93:25:
 Avoid using Object.toStringHelper. Use ToStringBuilder instead.
[error] Total time: 25 s, completed 2021/06/02 16:18:25
```

Closes #32740 from sarutak/style-rule-for-guava.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
---
 dev/checkstyle.xml| 5 -
 scalastyle-config.xml | 4 
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/dev/checkstyle.xml b/dev/checkstyle.xml
index 483fc7c..06c79a9 100644
--- a/dev/checkstyle.xml
+++ b/dev/checkstyle.xml
@@ -185,6 +185,9 @@
 
 
 
-
+
+
+
+
 
 
diff --git a/scalastyle-config.xml b/scalastyle-config.xml
index c1dc57b..c06b4ab 100644
--- a/scalastyle-config.xml
+++ b/scalastyle-config.xml
@@ -397,4 +397,8 @@ This file is divided into 3 sections:
 -1,0,1,2,3
   
 
+  
+Objects.toStringHelper
+Avoid using Object.toStringHelper. Use ToStringBuilder 
instead.
+  
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org