[spark] branch master updated (63ab38f -> 7c32415)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 63ab38f [SPARK-35396][CORE][FOLLOWUP] Free memory entry immediately add 7c32415 [SPARK-35523] Fix the default value in Data Source Options page No new revisions were added by this update. Summary of changes: docs/sql-data-sources-avro.md| 2 +- docs/sql-data-sources-csv.md | 2 +- docs/sql-data-sources-jdbc.md| 105 +++ docs/sql-data-sources-json.md| 84 +++ docs/sql-data-sources-orc.md | 6 +-- docs/sql-data-sources-parquet.md | 10 ++-- docs/sql-data-sources-text.md| 4 +- 7 files changed, 126 insertions(+), 87 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (221553c -> 63ab38f)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 221553c [SPARK-35642][INFRA] Split pyspark-pandas tests to rebalance the test duration add 63ab38f [SPARK-35396][CORE][FOLLOWUP] Free memory entry immediately No new revisions were added by this update. Summary of changes: .../apache/spark/storage/memory/MemoryStore.scala | 42 .../apache/spark/storage/MemoryStoreSuite.scala| 74 +++--- 2 files changed, 38 insertions(+), 78 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3d158f9 -> 221553c)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3d158f9 [SPARK-35587][PYTHON][DOCS] Initial porting of Koalas documentation add 221553c [SPARK-35642][INFRA] Split pyspark-pandas tests to rebalance the test duration No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 2 ++ dev/run-tests.py | 10 - dev/sparktestsupport/modules.py | 40 +--- 3 files changed, 35 insertions(+), 17 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (745bd09 -> 3d158f9)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 745bd09 [SPARK-35589][CORE][TESTS][FOLLOWUP] Remove the duplicated test coverage add 3d158f9 [SPARK-35587][PYTHON][DOCS] Initial porting of Koalas documentation No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 3 +- dev/tox.ini| 2 +- docs/_plugins/copy_api_dirs.rb |26 +- python/docs/source/conf.py | 9 + python/docs/source/development/index.rst |18 +- python/docs/source/development/ps_contributing.rst | 192 + python/docs/source/development/ps_design.rst |85 + python/docs/source/getting_started/index.rst |16 +- python/docs/source/getting_started/ps_10mins.ipynb | 14471 +++ python/docs/source/getting_started/ps_install.rst | 145 + .../source/getting_started/ps_videos_blogs.rst | 130 + python/docs/source/index.rst | 9 + python/docs/source/reference/index.rst |15 + python/docs/source/reference/ps_extensions.rst |21 + python/docs/source/reference/ps_frame.rst | 327 + .../docs/source/reference/ps_general_functions.rst |49 + python/docs/source/reference/ps_groupby.rst|88 + python/docs/source/reference/ps_indexing.rst | 359 + python/docs/source/reference/ps_io.rst | 103 + python/docs/source/reference/ps_ml.rst |28 + python/docs/source/reference/ps_series.rst | 454 + python/docs/source/reference/ps_window.rst |31 + python/docs/source/user_guide/index.rst|20 +- .../docs/source/user_guide/ps_best_practices.rst | 313 + python/docs/source/user_guide/ps_faq.rst |86 + python/docs/source/user_guide/ps_from_to_dbms.rst | 107 + python/docs/source/user_guide/ps_options.rst | 274 + .../docs/source/user_guide/ps_pandas_pyspark.rst | 118 + .../docs/source/user_guide/ps_transform_apply.rst | 121 + python/docs/source/user_guide/ps_typehints.rst | 137 + python/docs/source/user_guide/ps_types.rst | 228 + python/pyspark/pandas/accessors.py | 2 +- 32 files changed, 17966 insertions(+), 21 deletions(-) create mode 100644 python/docs/source/development/ps_contributing.rst create mode 100644 python/docs/source/development/ps_design.rst create mode 100644 python/docs/source/getting_started/ps_10mins.ipynb create mode 100644 python/docs/source/getting_started/ps_install.rst create mode 100644 python/docs/source/getting_started/ps_videos_blogs.rst create mode 100644 python/docs/source/reference/ps_extensions.rst create mode 100644 python/docs/source/reference/ps_frame.rst create mode 100644 python/docs/source/reference/ps_general_functions.rst create mode 100644 python/docs/source/reference/ps_groupby.rst create mode 100644 python/docs/source/reference/ps_indexing.rst create mode 100644 python/docs/source/reference/ps_io.rst create mode 100644 python/docs/source/reference/ps_ml.rst create mode 100644 python/docs/source/reference/ps_series.rst create mode 100644 python/docs/source/reference/ps_window.rst create mode 100644 python/docs/source/user_guide/ps_best_practices.rst create mode 100644 python/docs/source/user_guide/ps_faq.rst create mode 100644 python/docs/source/user_guide/ps_from_to_dbms.rst create mode 100644 python/docs/source/user_guide/ps_options.rst create mode 100644 python/docs/source/user_guide/ps_pandas_pyspark.rst create mode 100644 python/docs/source/user_guide/ps_transform_apply.rst create mode 100644 python/docs/source/user_guide/ps_typehints.rst create mode 100644 python/docs/source/user_guide/ps_types.rst - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7eeb07d -> 745bd09)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7eeb07d [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow add 745bd09 [SPARK-35589][CORE][TESTS][FOLLOWUP] Remove the duplicated test coverage No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala | 4 1 file changed, 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (878527d -> 7eeb07d)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 878527d [SPARK-35612][SQL] Support LZ4 compression in ORC data source add 7eeb07d [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 3 +++ 1 file changed, 3 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0342dcb -> 878527d)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0342dcb [SPARK-35580][SQL] Implement canonicalized method for HigherOrderFunction add 878527d [SPARK-35612][SQL] Support LZ4 compression in ORC data source No new revisions were added by this update. Summary of changes: docs/sql-data-sources-orc.md | 2 +- .../main/scala/org/apache/spark/sql/internal/SQLConf.scala | 4 ++-- .../spark/sql/execution/datasources/orc/OrcOptions.scala | 1 + .../spark/sql/execution/datasources/orc/OrcUtils.scala | 1 + .../spark/sql/execution/datasources/orc/OrcSourceSuite.scala | 12 +++- .../scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala | 1 + 6 files changed, 17 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated (36577c7 -> ba02faa)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from 36577c7 [SPARK-35610][CORE] Fix the memory leak introduced by the Executor's stop shutdown hook add ba02faa [SPARK-35589][CORE][3.1] BlockManagerMasterEndpoint should not ignore index-only shuffle file during updating No new revisions were added by this update. Summary of changes: .../spark/shuffle/IndexShuffleBlockResolver.scala | 2 +- .../spark/storage/BlockManagerMasterEndpoint.scala | 8 ++-- .../apache/spark/storage/BlockManagerSuite.scala | 48 ++ 3 files changed, 53 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4f0db87 -> 0342dcb)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4f0db87 [SPARK-35416][K8S][FOLLOWUP] Use Set instead of ArrayBuffer add 0342dcb [SPARK-35580][SQL] Implement canonicalized method for HigherOrderFunction No new revisions were added by this update. Summary of changes: .../expressions/higherOrderFunctions.scala | 17 .../expressions/HigherOrderFunctionsSuite.scala| 105 - .../spark/sql/execution/joins/HashedRelation.scala | 4 +- 3 files changed, 120 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35416][K8S][FOLLOWUP] Use Set instead of ArrayBuffer
This is an automated email from the ASF dual-hosted git repository. mridulm80 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4f0db87 [SPARK-35416][K8S][FOLLOWUP] Use Set instead of ArrayBuffer 4f0db87 is described below commit 4f0db872a089257f1e894060e1757f9e5b973142 Author: Dongjoon Hyun AuthorDate: Thu Jun 3 10:41:11 2021 -0500 [SPARK-35416][K8S][FOLLOWUP] Use Set instead of ArrayBuffer ### What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/32564 . ### Why are the changes needed? To use Set instead of ArrayBuffer and add a return type. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #32758 from dongjoon-hyun/SPARK-35416-2. Authored-by: Dongjoon Hyun Signed-off-by: Mridul Muralidharan gmail.com> --- .../apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala index 3349e0c..606339a 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala @@ -390,8 +390,8 @@ private[spark] class ExecutorPodsAllocator( private def replacePVCsIfNeeded( pod: Pod, resources: Seq[HasMetadata], - reusablePVCs: mutable.Buffer[PersistentVolumeClaim]) = { -val replacedResources = mutable.ArrayBuffer[HasMetadata]() + reusablePVCs: mutable.Buffer[PersistentVolumeClaim]): Seq[HasMetadata] = { +val replacedResources = mutable.Set[HasMetadata]() resources.foreach { case pvc: PersistentVolumeClaim => // Find one with the same storage class and size. @@ -407,7 +407,7 @@ private[spark] class ExecutorPodsAllocator( } if (volume.nonEmpty) { val matchedPVC = reusablePVCs.remove(index) -replacedResources.append(pvc) +replacedResources.add(pvc) logInfo(s"Reuse PersistentVolumeClaim ${matchedPVC.getMetadata.getName}") volume.get.getPersistentVolumeClaim.setClaimName(matchedPVC.getMetadata.getName) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35316][SQL] UnwrapCastInBinaryComparison support In/InSet predicate
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new cfde117 [SPARK-35316][SQL] UnwrapCastInBinaryComparison support In/InSet predicate cfde117 is described below commit cfde117c6fec758c44f77fe9f9379ba38940b51a Author: Fu Chen AuthorDate: Thu Jun 3 14:45:17 2021 + [SPARK-35316][SQL] UnwrapCastInBinaryComparison support In/InSet predicate ### What changes were proposed in this pull request? This pr add in/inset predicate support for `UnwrapCastInBinaryComparison`. Current implement doesn't pushdown filters for `In/InSet` which contains `Cast`. For instance: ```scala spark.range(50).selectExpr("cast(id as int) as id").write.mode("overwrite").parquet("/tmp/parquet/t1") spark.read.parquet("/tmp/parquet/t1").where("id in (1L, 2L, 4L)").explain ``` before this pr: ``` == Physical Plan == *(1) Filter cast(id#5 as bigint) IN (1,2,4) +- *(1) ColumnarToRow +- FileScan parquet [id#5] Batched: true, DataFilters: [cast(id#5 as bigint) IN (1,2,4)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/parquet/t1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct ``` after this pr: ``` == Physical Plan == *(1) Filter id#95 IN (1,2,4) +- *(1) ColumnarToRow +- FileScan parquet [id#95] Batched: true, DataFilters: [id#95 IN (1,2,4)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/parquet/t1], PartitionFilters: [], PushedFilters: [In(id, [1,2,4])], ReadSchema: struct ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New test. Closes #32488 from cfmcgrady/SPARK-35316. Authored-by: Fu Chen Signed-off-by: Wenchen Fan --- .../optimizer/UnwrapCastInBinaryComparison.scala | 114 +++-- .../UnwrapCastInBinaryComparisonSuite.scala| 50 + 2 files changed, 156 insertions(+), 8 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala index 9f72751..097d810 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala @@ -17,19 +17,23 @@ package org.apache.spark.sql.catalyst.optimizer +import scala.collection.immutable.HashSet +import scala.collection.mutable.ArrayBuffer + import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.expressions.Literal.FalseLiteral import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan import org.apache.spark.sql.catalyst.rules.Rule -import org.apache.spark.sql.catalyst.trees.TreePattern.BINARY_COMPARISON +import org.apache.spark.sql.catalyst.trees.TreePattern.{BINARY_COMPARISON, IN, INSET} import org.apache.spark.sql.types._ /** - * Unwrap casts in binary comparison operations with patterns like following: + * Unwrap casts in binary comparison or `In/InSet` operations with patterns like following: * - * `BinaryComparison(Cast(fromExp, toType), Literal(value, toType))` - * or - * `BinaryComparison(Literal(value, toType), Cast(fromExp, toType))` + * - `BinaryComparison(Cast(fromExp, toType), Literal(value, toType))` + * - `BinaryComparison(Literal(value, toType), Cast(fromExp, toType))` + * - `In(Cast(fromExp, toType), Seq(Literal(v1, toType), Literal(v2, toType), ...)` + * - `InSet(Cast(fromExp, toType), Set(v1, v2, ...))` * * This rule optimizes expressions with the above pattern by either replacing the cast with simpler * constructs, or moving the cast from the expression side to the literal side, which enables them @@ -86,13 +90,22 @@ import org.apache.spark.sql.types._ * Further, the above `if(isnull(fromExp), null, false)` is represented using conjunction * `and(isnull(fromExp), null)`, to enable further optimization and filter pushdown to data sources. * Similarly, `if(isnull(fromExp), null, true)` is represented with `or(isnotnull(fromExp), null)`. + * + * For `In/InSet` operation, first the rule transform the expression to Equals: + * `Seq( + * EqualTo(Cast(fromExp, toType), Literal(v1, toType)), + * EqualTo(Cast(fromExp, toType), Literal(v2, toType)), + * ... + * )` + * and using the same rule with `BinaryComparison` show as before to optimize each `EqualTo`. */ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = plan.transformWithPruning( -_.containsPattern(BINARY_COMPARISON), ruleId)
[spark] branch master updated: [SPARK-35609][BUILD] Add style rules to prohibit to use a Guava's API which is incompatible with newer versions
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c532f82 [SPARK-35609][BUILD] Add style rules to prohibit to use a Guava's API which is incompatible with newer versions c532f82 is described below commit c532f8260ee2f2f4170dc50f7e890fafab438b76 Author: Kousuke Saruta AuthorDate: Thu Jun 3 21:52:41 2021 +0900 [SPARK-35609][BUILD] Add style rules to prohibit to use a Guava's API which is incompatible with newer versions ### What changes were proposed in this pull request? This PR adds rules to `checkstyle.xml` and `scalastyle-config.xml` to avoid introducing `Objects.toStringHelper` a Guava's API which is no longer present in newer Guava. ### Why are the changes needed? SPARK-30272 (#26911) replaced `Objects.toStringHelper` which is an APIs Guava 14 provides with `commons.lang3` API because `Objects.toStringHelper` is no longer present in newer Guava. But toStringHelper was introduced into Spark again and replaced them in SPARK-35420 (#32567). I think it's better to have a style rule to avoid such repetition. SPARK-30272 replaced some APIs aside from `Objects.toStringHelper` but `Objects.toStringHelper` seems to affect Spark for now so I add rules only for it. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I confirmed that `lint-java` and `lint-scala` detect the usage of `toStringHelper` and let the lint check fail. ``` $ dev/lint-java exec: curl --silent --show-error -L https://downloads.lightbend.com/scala/2.12.14/scala-2.12.14.tgz Using `mvn` from path: /opt/maven/3.6.3//bin/mvn Checkstyle checks failed at following occurrences: [ERROR] src/main/java/org/apache/spark/network/protocol/OneWayMessage.java:[78] (regexp) RegexpSinglelineJava: Avoid using Object.toStringHelper. Use ToStringBuilder instead. $ dev/lint-scala Scalastyle checks failed at following occurrences: [error] /home/kou/work/oss/spark/core/src/main/scala/org/apache/spark/rdd/RDD.scala:93:25: Avoid using Object.toStringHelper. Use ToStringBuilder instead. [error] Total time: 25 s, completed 2021/06/02 16:18:25 ``` Closes #32740 from sarutak/style-rule-for-guava. Authored-by: Kousuke Saruta Signed-off-by: Kousuke Saruta --- dev/checkstyle.xml| 5 - scalastyle-config.xml | 4 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/dev/checkstyle.xml b/dev/checkstyle.xml index 483fc7c..06c79a9 100644 --- a/dev/checkstyle.xml +++ b/dev/checkstyle.xml @@ -185,6 +185,9 @@ - + + + + diff --git a/scalastyle-config.xml b/scalastyle-config.xml index c1dc57b..c06b4ab 100644 --- a/scalastyle-config.xml +++ b/scalastyle-config.xml @@ -397,4 +397,8 @@ This file is divided into 3 sections: -1,0,1,2,3 + +Objects.toStringHelper +Avoid using Object.toStringHelper. Use ToStringBuilder instead. + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org