spark git commit: [SPARK-22274][PYTHON][SQL] User-defined aggregation functions with pandas udf (full shuffle)
Repository: spark Updated Branches: refs/heads/master 51eb75026 -> b2ce17b4c [SPARK-22274][PYTHON][SQL] User-defined aggregation functions with pandas udf (full shuffle) ## What changes were proposed in this pull request? Add support for using pandas UDFs with groupby().agg(). This PR introduces a new type of pandas UDF - group aggregate pandas UDF. This type of UDF defines a transformation of multiple pandas Series -> a scalar value. Group aggregate pandas UDFs can be used with groupby().agg(). Note group aggregate pandas UDF doesn't support partial aggregation, i.e., a full shuffle is required. This PR doesn't support group aggregate pandas UDFs that return ArrayType, StructType or MapType. Support for these types is left for future PR. ## How was this patch tested? GroupbyAggPandasUDFTests Author: Li JinCloses #19872 from icexelloss/SPARK-22274-groupby-agg. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b2ce17b4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b2ce17b4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b2ce17b4 Branch: refs/heads/master Commit: b2ce17b4c9fea58140a57ca1846b2689b15c0d61 Parents: 51eb750 Author: Li Jin Authored: Tue Jan 23 14:11:30 2018 +0900 Committer: Takuya UESHIN Committed: Tue Jan 23 14:11:30 2018 +0900 -- .../apache/spark/api/python/PythonRunner.scala | 2 + python/pyspark/rdd.py | 1 + python/pyspark/sql/functions.py | 36 +- python/pyspark/sql/group.py | 33 +- python/pyspark/sql/tests.py | 486 ++- python/pyspark/sql/udf.py | 13 +- python/pyspark/worker.py| 22 +- .../sql/catalyst/analysis/CheckAnalysis.scala | 14 +- .../sql/catalyst/expressions/PythonUDF.scala| 64 +++ .../spark/sql/catalyst/planning/patterns.scala | 12 +- .../spark/sql/RelationalGroupedDataset.scala| 1 - .../spark/sql/execution/SparkStrategies.scala | 29 +- .../python/AggregateInPandasExec.scala | 155 ++ .../execution/python/ExtractPythonUDFs.scala| 16 +- .../spark/sql/execution/python/PythonUDF.scala | 41 -- .../python/UserDefinedPythonFunction.scala | 2 +- 16 files changed, 829 insertions(+), 98 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b2ce17b4/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala -- diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala index 1ec0e71..29148a7 100644 --- a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala +++ b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala @@ -39,12 +39,14 @@ private[spark] object PythonEvalType { val SQL_PANDAS_SCALAR_UDF = 200 val SQL_PANDAS_GROUP_MAP_UDF = 201 + val SQL_PANDAS_GROUP_AGG_UDF = 202 def toString(pythonEvalType: Int): String = pythonEvalType match { case NON_UDF => "NON_UDF" case SQL_BATCHED_UDF => "SQL_BATCHED_UDF" case SQL_PANDAS_SCALAR_UDF => "SQL_PANDAS_SCALAR_UDF" case SQL_PANDAS_GROUP_MAP_UDF => "SQL_PANDAS_GROUP_MAP_UDF" +case SQL_PANDAS_GROUP_AGG_UDF => "SQL_PANDAS_GROUP_AGG_UDF" } } http://git-wip-us.apache.org/repos/asf/spark/blob/b2ce17b4/python/pyspark/rdd.py -- diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py index 1b39155..6b018c3 100644 --- a/python/pyspark/rdd.py +++ b/python/pyspark/rdd.py @@ -70,6 +70,7 @@ class PythonEvalType(object): SQL_PANDAS_SCALAR_UDF = 200 SQL_PANDAS_GROUP_MAP_UDF = 201 +SQL_PANDAS_GROUP_AGG_UDF = 202 def portable_hash(x): http://git-wip-us.apache.org/repos/asf/spark/blob/b2ce17b4/python/pyspark/sql/functions.py -- diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py index 961b326..a291c9b 100644 --- a/python/pyspark/sql/functions.py +++ b/python/pyspark/sql/functions.py @@ -2089,6 +2089,8 @@ class PandasUDFType(object): GROUP_MAP = PythonEvalType.SQL_PANDAS_GROUP_MAP_UDF +GROUP_AGG = PythonEvalType.SQL_PANDAS_GROUP_AGG_UDF + @since(1.3) def udf(f=None, returnType=StringType()): @@ -2159,7 +2161,7 @@ def pandas_udf(f=None, returnType=None, functionType=None): 1. SCALAR A scalar UDF defines a transformation: One or more `pandas.Series` -> A `pandas.Series`. - The returnType should be a primitive data type, e.g., `DoubleType()`. + The returnType should be a
svn commit: r24370 - in /dev/spark/2.3.1-SNAPSHOT-2018_01_22_18_01-7241556-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Tue Jan 23 02:15:25 2018 New Revision: 24370 Log: Apache Spark 2.3.1-SNAPSHOT-2018_01_22_18_01-7241556 docs [This commit notification would consist of 1442 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r24368 - in /dev/spark/2.4.0-SNAPSHOT-2018_01_22_16_01-51eb750-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Tue Jan 23 00:14:52 2018 New Revision: 24368 Log: Apache Spark 2.4.0-SNAPSHOT-2018_01_22_16_01-51eb750 docs [This commit notification would consist of 1442 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-22389][SQL] data source v2 partitioning reporting interface
Repository: spark Updated Branches: refs/heads/master 76b8b840d -> 51eb75026 [SPARK-22389][SQL] data source v2 partitioning reporting interface ## What changes were proposed in this pull request? a new interface which allows data source to report partitioning and avoid shuffle at Spark side. The design is pretty like the internal distribution/partitioing framework. Spark defines a `Distribution` interfaces and several concrete implementations, and ask the data source to report a `Partitioning`, the `Partitioning` should tell Spark if it can satisfy a `Distribution` or not. ## How was this patch tested? new test Author: Wenchen FanCloses #20201 from cloud-fan/partition-reporting. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/51eb7502 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/51eb7502 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/51eb7502 Branch: refs/heads/master Commit: 51eb750263dd710434ddb60311571fa3dcec66eb Parents: 76b8b84 Author: Wenchen Fan Authored: Mon Jan 22 15:21:09 2018 -0800 Committer: gatorsmile Committed: Mon Jan 22 15:21:09 2018 -0800 -- .../catalyst/plans/physical/partitioning.scala | 2 +- .../v2/reader/ClusteredDistribution.java| 38 +++ .../sql/sources/v2/reader/Distribution.java | 39 +++ .../sql/sources/v2/reader/Partitioning.java | 46 .../v2/reader/SupportsReportPartitioning.java | 33 ++ .../datasources/v2/DataSourcePartitioning.scala | 56 ++ .../datasources/v2/DataSourceV2ScanExec.scala | 9 ++ .../v2/JavaPartitionAwareDataSource.java| 110 +++ .../sql/sources/v2/DataSourceV2Suite.scala | 79 + 9 files changed, 411 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/51eb7502/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala index 0189bd7..4d9a992 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala @@ -153,7 +153,7 @@ case class BroadcastDistribution(mode: BroadcastMode) extends Distribution { * 1. number of partitions. * 2. if it can satisfy a given distribution. */ -sealed trait Partitioning { +trait Partitioning { /** Returns the number of partitions that the data is split across */ val numPartitions: Int http://git-wip-us.apache.org/repos/asf/spark/blob/51eb7502/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ClusteredDistribution.java -- diff --git a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ClusteredDistribution.java b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ClusteredDistribution.java new file mode 100644 index 000..7346500 --- /dev/null +++ b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ClusteredDistribution.java @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader; + +import org.apache.spark.annotation.InterfaceStability; + +/** + * A concrete implementation of {@link Distribution}. Represents a distribution where records that + * share the same values for the {@link #clusteredColumns} will be produced by the same + * {@link ReadTask}. + */ +@InterfaceStability.Evolving +public class ClusteredDistribution implements Distribution { + + /** + * The names of the clustered columns. Note that they are order insensitive. + */ + public final String[] clusteredColumns; + + public
spark git commit: [SPARK-22389][SQL] data source v2 partitioning reporting interface
Repository: spark Updated Branches: refs/heads/branch-2.3 566ef93a6 -> 7241556d8 [SPARK-22389][SQL] data source v2 partitioning reporting interface ## What changes were proposed in this pull request? a new interface which allows data source to report partitioning and avoid shuffle at Spark side. The design is pretty like the internal distribution/partitioing framework. Spark defines a `Distribution` interfaces and several concrete implementations, and ask the data source to report a `Partitioning`, the `Partitioning` should tell Spark if it can satisfy a `Distribution` or not. ## How was this patch tested? new test Author: Wenchen FanCloses #20201 from cloud-fan/partition-reporting. (cherry picked from commit 51eb750263dd710434ddb60311571fa3dcec66eb) Signed-off-by: gatorsmile Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7241556d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7241556d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7241556d Branch: refs/heads/branch-2.3 Commit: 7241556d8b550e22eed2341287812ea373dc1cb2 Parents: 566ef93 Author: Wenchen Fan Authored: Mon Jan 22 15:21:09 2018 -0800 Committer: gatorsmile Committed: Mon Jan 22 15:21:19 2018 -0800 -- .../catalyst/plans/physical/partitioning.scala | 2 +- .../v2/reader/ClusteredDistribution.java| 38 +++ .../sql/sources/v2/reader/Distribution.java | 39 +++ .../sql/sources/v2/reader/Partitioning.java | 46 .../v2/reader/SupportsReportPartitioning.java | 33 ++ .../datasources/v2/DataSourcePartitioning.scala | 56 ++ .../datasources/v2/DataSourceV2ScanExec.scala | 9 ++ .../v2/JavaPartitionAwareDataSource.java| 110 +++ .../sql/sources/v2/DataSourceV2Suite.scala | 79 + 9 files changed, 411 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/7241556d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala index 0189bd7..4d9a992 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala @@ -153,7 +153,7 @@ case class BroadcastDistribution(mode: BroadcastMode) extends Distribution { * 1. number of partitions. * 2. if it can satisfy a given distribution. */ -sealed trait Partitioning { +trait Partitioning { /** Returns the number of partitions that the data is split across */ val numPartitions: Int http://git-wip-us.apache.org/repos/asf/spark/blob/7241556d/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ClusteredDistribution.java -- diff --git a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ClusteredDistribution.java b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ClusteredDistribution.java new file mode 100644 index 000..7346500 --- /dev/null +++ b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ClusteredDistribution.java @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader; + +import org.apache.spark.annotation.InterfaceStability; + +/** + * A concrete implementation of {@link Distribution}. Represents a distribution where records that + * share the same values for the {@link #clusteredColumns} will be produced by the same + * {@link ReadTask}. + */ +@InterfaceStability.Evolving +public class ClusteredDistribution implements Distribution { + + /** + * The names
svn commit: r24366 - in /dev/spark/2.3.1-SNAPSHOT-2018_01_22_14_01-566ef93-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Mon Jan 22 22:14:49 2018 New Revision: 24366 Log: Apache Spark 2.3.1-SNAPSHOT-2018_01_22_14_01-566ef93 docs [This commit notification would consist of 1441 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r24365 - /dev/spark/KEYS
Author: sameerag Date: Mon Jan 22 21:25:48 2018 New Revision: 24365 Log: Update KEYS Modified: dev/spark/KEYS Modified: dev/spark/KEYS == --- dev/spark/KEYS (original) +++ dev/spark/KEYS Mon Jan 22 21:25:48 2018 @@ -403,40 +403,61 @@ dcqbOYBLINwxIMZA6N9qCGrST4DfqbAzGSvZ08oe =et2/ -END PGP PUBLIC KEY BLOCK- -pub rsa2048/A1CEDBA8AD0C022A 2018-01-11 [SC] - FA757B8D64ABBC21FC02BC1CA1CEDBA8AD0C022A -uid [ultimate] Sameer Agarwal-sub rsa2048/5B0E7FAD797FCBE2 2018-01-11 [E] +pub rsa4096 2018-01-17 [SC] + F2C64242EC1BEC69EA8FBE35DCE4BFD807461E96 +uid [ultimate] Sameer Agarwal (CODE SIGNING KEY) +sub rsa4096 2018-01-17 [E] -BEGIN PGP PUBLIC KEY BLOCK- -mQENBFpX9XgBCADGZb9Jywy7gJuoyzX3+8JA7kPnc6Ah/mTbCemzkq+NkrMQ+eXP -D6IyHH+ktCp8rG0KEZph3BwQ9m/9YpvGpyUjEAl7miWvnYQCoBfhoMdoM+/9R77G -yaUgV1z85n0rI7+EUmstitb1Q1qu6FJgO0r/YOBImEqD0VID+vuDVEmjg9DPX2K/ -fADhKHvQDbR5car8Oh9lXEdxn6oRdQif9spkX26P75Oa7oLbK5s1PQm/z2Wn0q6/ -9tsh+HNCKU4oNTboTXiuNEI4S3ypjb5zsSL2PMmxw+eSV859lBuL/THRN1xe3+3h -EK6Ma3UThtNcHpOHx+YJmiWahic9NHvO58jHABEBAAG0JFNhbWVlciBBZ2Fyd2Fs -IDxzYW1lZXJhZ0BhcGFjaGUub3JnPokBTgQTAQgAOBYhBPp1e41kq7wh/AK8HKHO -26itDAIqBQJaV/V4AhsDBQsJCAcCBhUKCQgLAgQWAgMBAh4BAheAAAoJEKHO26it -DAIqIZYH/AoMHZ27lfK1XfQqEujmz5KSWsSVImgMh/t7F61D9sIvnoiMkrhP9/RG -R/LJA8bIEIBR906Lto4fcuDboUhNYlGpOsJGSTQeEnGpuonNzNpOssFXYfxrGSRe -M062/9GwvOer7MthhLbNYSzah6lYnijHe67a5woL3mLEnJj0a8vc0DH0jxpe0d/8 -f0VVQnWe+oZOiFx/Gp+RLfqtnMQ+FrPlGu7WFDseXd9NtMzEVQpoQoBbJ29nBvAU -4AXjuBZa0dR7cZr4u8C+QMkJOBPEQcyBHYv0/MOT3ggABuLTSdJcGsj7NdCxkSZ2 -NTjjgi+OzLqsdU4srniy8vVDuaIqBhi5AQ0EWlf1eAEIAMk/n66XAoetLEyBHOO7 -wZJNnnCssuGOFh4+xLelOeB4Tx4fKeU9wWGUPaqHbyQJbYxEmVPH0Rq/VTfRYgGl -XuJXgi7f0A/Q0bhxc5A3DRMl5ifnT6Ame9yOUq9BFoH/VG7qO/GVQ7yRrp+cmj5h -kTSMUxYrzvHWzozxj9/P1bE5EGGsDjaHkA9t3RuzzV/mKjwpyCep72IxMbmRMfPM -vD/KaKfNryvyEBmqQpdvJXXremfs3warmvhkYnSpkIeUrRjt32jMO4MHzzC74w+J -/Cn4+0A/YuvFfU0YnjySRNMqpgT2EFA802QI+Mwj2D6fat8oKhnVvBAY+wHal1c2 -m/UAEQEAAYkBNgQYAQgAIBYhBPp1e41kq7wh/AK8HKHO26itDAIqBQJaV/V4AhsM -AAoJEKHO26itDAIqMi4IAJ1dyai2f03R1AgzI+W5enp8989vf5KVxwDPv4tJX87o -sAOSNYmPRXBbj2Hr2N+A+656vx3KkIIozuwuVSDbVDdDnxS6dUqvmA07qtKRXWEO -da8taStwiaetbCJQkLOr1kyrL6XgL+t5E1jMcDmZxF2Owu4NSaEVERtkovY89V4m -Ku0fEiDWr/6SWUcPnyPGpwZKccShDGl8JuwM/uRO5HKLeAJp93poqWeOtnpw1Xpw -RiLNdJXDBol1/+xtV2O3CzX0i4o6Z/hhderuJc/v57LlP/PnOVkGG4/mZA8G/kSC -jUFFi/fz1oSCMpcpdSOAhCs4oRFv2POgXTCLkpOJNSU= -=Oc/a +mQINBFpftRMBEADEsiDSnSg7EBdFoWdRhVrjePjsYyEq4Sxt61vkkwhrH/pZ8r07 +4kVSZV0hdc+7PLa27X400re6OgULDtQ7c3F1hcrcl72VLNo7iE5FcQITSRvXXsf0 +Lb6eHmkUjCrZW8FF5WLdr/XA/aC2YpuXYszCWH3f7It9864M8OjzKznGfR/Q+9kd +jq2l2d1gLhdMnBwOjxMlyDvU3N3wr1bGNf/s7QAltv5V3yNTPvH9I+iy9FbTuseE +vnMo3KnopEivmF0yqz2qlN3joVg7yAcMPWG92lRQzkUAkrQXcPvcsEvu22kipcOQ +SQQMcMQZFQh8E/dLzp4+DA2bRcshHnM5bWG9NZNMnXKRmcJrHmjJDstEN7LR+zwt +cRj9d0RwCFtS7M9YUX4eCc9Dqgtgg31GVNUZdUcZ1/OHqv+NJUOSZipoKJmAfcBN +OyEGhlWOGidd/3xJtK1GUtTd9iLqjcbcxHapeTOS3kNdXbAwuvX1ADkQ+CTYw5cd +jx2CAEKsBCz1r++/sApRPLIWSRBaGoF2HgGv89/33R66EVSmNhGkS3g6W6ICqrdY +vwhK92NJpapQFwhzk4U3ZrcRwXXktv7PlMFywuSXNbOT7XwkrGOUYqzzi7esV4uF +TDllNmwuVG7q3K7cvGDn69mbgYH8vULzEfuZQYhT9zYPaRePKaILqWLf6wARAQAB +tDdTYW1lZXIgQWdhcndhbCAoQ09ERSBTSUdOSU5HIEtFWSkgPHNhbWVlcmFnQGFw +YWNoZS5vcmc+iQJOBBMBCAA4FiEE8sZCQuwb7Gnqj7413OS/2AdGHpYFAlpftRMC +GwMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQ3OS/2AdGHpYqtg/+IrcrH66c +8A6+LurGr0ZDxQzI3Ka016UOkruLGI4oitqyzgJ/j6quGTxLNEcBToeh8IUqQDN0 +VriV9iPntIUarf9b6Yx6aCxSvBwls9k9PMZqWVu0oIAecWGvvniGooxJlrelpp0M +PJaEPHswH80d8rBDGjktBOrQIq8bak7jLomsFK1zGH6pPkAL9GYo4XK2Ik5OiRs3 +H8bJA/FS4sx17GR0IBWumBvYXtHvAmvfwIEeGtcE+cPj/S438N+fwuXI82c6EGIH +ubFM7uqylbZMlmDgdKkG6YmEQMqK0Ka84iLzUOzqFyOj/aTrKj9GKLc8bBVLU1DP +/PfMmJQDiETJGwwcKhRm9tYYH1DiMhWp5j1jyhOKIEKGUVJ8IxgpAkFURyOQaA4e +5rnPoC65Pp1JzTKXWqmjDm7MRgcP77WqWis7SDgMq56/tdCbjZ2WzyfBQCUlfKU3 +7Iax5qKtdoczZRYhdZGzT8d2pMvQVu9zGuwhiPU/nwFybY1haneZhWpXTKbJkNpc +Gzi2gE7pqXasjA+fn40tuMa4WZlrlvNhTONatcfVuNv1hGS/G+UJjhJzOo40AX2w +2TCmaj4jiwiqByc4QZKM/iGfVCN6GlOI3+1O1KzybqoQG2Tg/ug5unmAvc23ZYw7 +uu+BnBSTsCODqQG8fPRiDlYRdZtDyQQC8M25Ag0EWl+1EwEQAJ82cuI/R4StkgBX +zn7loZmSRZUx08EgsB8vq0s1h8g/pLdBN1h22sj9dnfcW4tFUxIKiwpLK84/Rlj7 +o2W8ZynpaKzR6pelV6Cb3+SMgtWe6DQnKaBRKJ3hzdcdA7Fp6aIjuzMsakOEOx3V +wmtHkCn5MgN/xQBAB3T65thTOFryYqcmEoKWkd5FegJwG4sjHCCARPjgv8ucY/Vs +6lZ0cxOB6qMO0jxH+FSMCZ4xmy7gpvQSs7D0/aj73kJ0Xv1sPZYxacf+P9MnF8jr +mI7jKODvtKNbffRzIK/c2YCcYHvb0PtkLN8hhsmtXcmm4ezQwqA1QZWJhtI7oiCX +A7AYrDKqsLPY4sgzeIzVmz35P/Y0baFp6Qt2eiHQ58I3Eu2+PG6x897So5j6obKi +FEfprFKOewjefPmt+yNxhXITXUAuw57uXR7PeIcIb6bynZjyUcK+Rr8+vfI1JPaS +ZVFaUn6KNFueK/bxDo4dzHMdj4gF9kGE+hPNRGepO7ba90QeaZSA6Bk3EUhovu8H +eMmN/ZsdgMwIHOO3JZ9aWV7wkak7df6qbNVGDhp/QycBAm6J/iG2xYfncYp9nyw8 +UAkrht5EMAdG14Qm3Vq9GGihUsthl2ehPeD37d2/pitTMfnf2Ac6TieHbye0JgL0 +wC3WvL7cLXGmvtIRfXzNd4oDmjGtABEBAAGJAjYEGAEIACAWIQTyxkJC7BvsaeqP +vjXc5L/YB0YelgUCWl+1EwIbDAAKCRDc5L/YB0YelrVgEACjcrAN9bY+Kv8eNcn0
svn commit: r24364 - in /dev/spark/v2.3.0-rc2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/spark
Author: sameerag Date: Mon Jan 22 20:30:45 2018 New Revision: 24364 Log: Apache Spark v2.3.0-rc2 docs [This commit notification would consist of 1444 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r24363 - in /dev/spark/2.4.0-SNAPSHOT-2018_01_22_12_01-76b8b84-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Mon Jan 22 20:17:30 2018 New Revision: 24363 Log: Apache Spark 2.4.0-SNAPSHOT-2018_01_22_12_01-76b8b84 docs [This commit notification would consist of 1441 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR] Typo fixes
Repository: spark Updated Branches: refs/heads/master 446948af1 -> 76b8b840d [MINOR] Typo fixes ## What changes were proposed in this pull request? Typo fixes ## How was this patch tested? Local build / Doc-only changes Author: Jacek LaskowskiCloses #20344 from jaceklaskowski/typo-fixes. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/76b8b840 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/76b8b840 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/76b8b840 Branch: refs/heads/master Commit: 76b8b840ddc951ee6203f9cccd2c2b9671c1b5e8 Parents: 446948a Author: Jacek Laskowski Authored: Mon Jan 22 13:55:14 2018 -0600 Committer: Sean Owen Committed: Mon Jan 22 13:55:14 2018 -0600 -- core/src/main/scala/org/apache/spark/SparkContext.scala | 2 +- .../apache/spark/sql/kafka010/KafkaSourceProvider.scala | 4 ++-- .../org/apache/spark/sql/kafka010/KafkaWriteTask.scala | 2 +- .../java/org/apache/spark/sql/streaming/OutputMode.java | 2 +- .../apache/spark/sql/catalyst/analysis/Analyzer.scala | 8 .../apache/spark/sql/catalyst/analysis/unresolved.scala | 2 +- .../sql/catalyst/expressions/aggregate/interfaces.scala | 12 +--- .../sql/catalyst/plans/logical/LogicalPlanVisitor.scala | 2 +- .../logical/statsEstimation/BasicStatsPlanVisitor.scala | 2 +- .../SizeInBytesOnlyStatsPlanVisitor.scala | 4 ++-- .../scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- .../org/apache/spark/sql/catalyst/plans/PlanTest.scala | 2 +- .../scala/org/apache/spark/sql/DataFrameWriter.scala| 2 +- .../org/apache/spark/sql/execution/SparkSqlParser.scala | 2 +- .../spark/sql/execution/WholeStageCodegenExec.scala | 2 +- .../apache/spark/sql/execution/command/SetCommand.scala | 4 ++-- .../apache/spark/sql/execution/datasources/rules.scala | 2 +- .../spark/sql/execution/streaming/HDFSMetadataLog.scala | 2 +- .../spark/sql/execution/streaming/OffsetSeq.scala | 2 +- .../spark/sql/execution/streaming/OffsetSeqLog.scala| 2 +- .../sql/execution/streaming/StreamingQueryWrapper.scala | 2 +- .../sql/execution/streaming/state/StateStore.scala | 2 +- .../apache/spark/sql/execution/ui/ExecutionPage.scala | 2 +- .../spark/sql/expressions/UserDefinedFunction.scala | 4 ++-- .../spark/sql/internal/BaseSessionStateBuilder.scala| 4 ++-- .../apache/spark/sql/streaming/DataStreamReader.scala | 6 +++--- .../sql-tests/results/columnresolution-negative.sql.out | 2 +- .../sql-tests/results/columnresolution-views.sql.out| 2 +- .../sql-tests/results/columnresolution.sql.out | 6 +++--- .../test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 4 ++-- .../org/apache/spark/sql/execution/SQLViewSuite.scala | 2 +- .../org/apache/spark/sql/hive/HiveExternalCatalog.scala | 4 ++-- 32 files changed, 50 insertions(+), 52 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/76b8b840/core/src/main/scala/org/apache/spark/SparkContext.scala -- diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index 31f3cb9..3828d4f 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -2276,7 +2276,7 @@ class SparkContext(config: SparkConf) extends Logging { } /** - * Clean a closure to make it ready to be serialized and send to tasks + * Clean a closure to make it ready to be serialized and sent to tasks * (removes unreferenced variables in $outer's, updates REPL variables) * If checkSerializable is set, clean will also proactively * check to see if f is serializable and throw a SparkException http://git-wip-us.apache.org/repos/asf/spark/blob/76b8b840/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala -- diff --git a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala index 3914370..62a998f 100644 --- a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala +++ b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala @@ -307,7 +307,7 @@ private[kafka010] class KafkaSourceProvider extends DataSourceRegister if (caseInsensitiveParams.contains(s"kafka.${ConsumerConfig.GROUP_ID_CONFIG}")) { throw new IllegalArgumentException( s"Kafka option
spark git commit: [MINOR] Typo fixes
Repository: spark Updated Branches: refs/heads/branch-2.3 6facc7fb2 -> 566ef93a6 [MINOR] Typo fixes ## What changes were proposed in this pull request? Typo fixes ## How was this patch tested? Local build / Doc-only changes Author: Jacek LaskowskiCloses #20344 from jaceklaskowski/typo-fixes. (cherry picked from commit 76b8b840ddc951ee6203f9cccd2c2b9671c1b5e8) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/566ef93a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/566ef93a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/566ef93a Branch: refs/heads/branch-2.3 Commit: 566ef93a672aea1803d6977883204780c2f6982d Parents: 6facc7f Author: Jacek Laskowski Authored: Mon Jan 22 13:55:14 2018 -0600 Committer: Sean Owen Committed: Mon Jan 22 13:55:22 2018 -0600 -- core/src/main/scala/org/apache/spark/SparkContext.scala | 2 +- .../apache/spark/sql/kafka010/KafkaSourceProvider.scala | 4 ++-- .../org/apache/spark/sql/kafka010/KafkaWriteTask.scala | 2 +- .../java/org/apache/spark/sql/streaming/OutputMode.java | 2 +- .../apache/spark/sql/catalyst/analysis/Analyzer.scala | 8 .../apache/spark/sql/catalyst/analysis/unresolved.scala | 2 +- .../sql/catalyst/expressions/aggregate/interfaces.scala | 12 +--- .../sql/catalyst/plans/logical/LogicalPlanVisitor.scala | 2 +- .../logical/statsEstimation/BasicStatsPlanVisitor.scala | 2 +- .../SizeInBytesOnlyStatsPlanVisitor.scala | 4 ++-- .../scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- .../org/apache/spark/sql/catalyst/plans/PlanTest.scala | 2 +- .../scala/org/apache/spark/sql/DataFrameWriter.scala| 2 +- .../org/apache/spark/sql/execution/SparkSqlParser.scala | 2 +- .../spark/sql/execution/WholeStageCodegenExec.scala | 2 +- .../apache/spark/sql/execution/command/SetCommand.scala | 4 ++-- .../apache/spark/sql/execution/datasources/rules.scala | 2 +- .../spark/sql/execution/streaming/HDFSMetadataLog.scala | 2 +- .../spark/sql/execution/streaming/OffsetSeq.scala | 2 +- .../spark/sql/execution/streaming/OffsetSeqLog.scala| 2 +- .../sql/execution/streaming/StreamingQueryWrapper.scala | 2 +- .../sql/execution/streaming/state/StateStore.scala | 2 +- .../apache/spark/sql/execution/ui/ExecutionPage.scala | 2 +- .../spark/sql/expressions/UserDefinedFunction.scala | 4 ++-- .../spark/sql/internal/BaseSessionStateBuilder.scala| 4 ++-- .../apache/spark/sql/streaming/DataStreamReader.scala | 6 +++--- .../sql-tests/results/columnresolution-negative.sql.out | 2 +- .../sql-tests/results/columnresolution-views.sql.out| 2 +- .../sql-tests/results/columnresolution.sql.out | 6 +++--- .../test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 4 ++-- .../org/apache/spark/sql/execution/SQLViewSuite.scala | 2 +- .../org/apache/spark/sql/hive/HiveExternalCatalog.scala | 4 ++-- 32 files changed, 50 insertions(+), 52 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/566ef93a/core/src/main/scala/org/apache/spark/SparkContext.scala -- diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index 31f3cb9..3828d4f 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -2276,7 +2276,7 @@ class SparkContext(config: SparkConf) extends Logging { } /** - * Clean a closure to make it ready to be serialized and send to tasks + * Clean a closure to make it ready to be serialized and sent to tasks * (removes unreferenced variables in $outer's, updates REPL variables) * If checkSerializable is set, clean will also proactively * check to see if f is serializable and throw a SparkException http://git-wip-us.apache.org/repos/asf/spark/blob/566ef93a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala -- diff --git a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala index 3914370..62a998f 100644 --- a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala +++ b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala @@ -307,7 +307,7 @@ private[kafka010] class KafkaSourceProvider extends DataSourceRegister if
svn commit: r24362 - /dev/spark/v2.3.0-rc2-bin/
Author: sameerag Date: Mon Jan 22 19:45:22 2018 New Revision: 24362 Log: Apache Spark v2.3.0-rc2 Added: dev/spark/v2.3.0-rc2-bin/ dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz (with props) dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.asc dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.md5 dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.sha512 dev/spark/v2.3.0-rc2-bin/pyspark-2.3.0.tar.gz (with props) dev/spark/v2.3.0-rc2-bin/pyspark-2.3.0.tar.gz.asc dev/spark/v2.3.0-rc2-bin/pyspark-2.3.0.tar.gz.md5 dev/spark/v2.3.0-rc2-bin/pyspark-2.3.0.tar.gz.sha512 dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-hadoop2.6.tgz (with props) dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-hadoop2.6.tgz.asc dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-hadoop2.6.tgz.md5 dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-hadoop2.6.tgz.sha512 dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-hadoop2.7.tgz (with props) dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-hadoop2.7.tgz.asc dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-hadoop2.7.tgz.md5 dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-hadoop2.7.tgz.sha512 dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-without-hadoop.tgz (with props) dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-without-hadoop.tgz.asc dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-without-hadoop.tgz.md5 dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-without-hadoop.tgz.sha512 dev/spark/v2.3.0-rc2-bin/spark-2.3.0.tgz (with props) dev/spark/v2.3.0-rc2-bin/spark-2.3.0.tgz.asc dev/spark/v2.3.0-rc2-bin/spark-2.3.0.tgz.md5 dev/spark/v2.3.0-rc2-bin/spark-2.3.0.tgz.sha512 dev/spark/v2.3.0-rc2-bin/spark-parent_2.11.iml Added: dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.asc == --- dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.asc (added) +++ dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.asc Mon Jan 22 19:45:22 2018 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCAAdFiEE8sZCQuwb7Gnqj7413OS/2AdGHpYFAlpmPsoACgkQ3OS/2AdG +Hpb5gg/+P0jEiAZi7FRqfRiVW2O2qBe/Oj24CgwM3wbdxD9OMaywQkWmzAMaFSBJ +Pqkam/lxL3oy1GE+bQI8gMkfZIwneJK6fJwyCo5zqqLwZO+eDCDc1BWqEYn2sAvR +xVdOFE5RZ3qahOjH1JPnIsrUQT3aWfVBMMWTJLm+cEUhQ4yTmiABH2nqlqiFdRM4 +Cvw6r7wRo/bvPhnyc9Ly+Cu0UnBZFdV/qHdNqaJD/CoJPpuPEyuEv4Y0QN42MgC4 +RUY3YwaRerBS3wxEbO+zUVgnWZR7KlBQZVy40YjzLRhIjgo4KkiqX6hWIaPL+TlU +mTRWFvIQEZh/b7gZkCitLoO/t2iHvf2TvJqXFeWpieCDgXghmWdSVdg5UYREcxcY +gY86E8qfyPxnKquJHlBu/qExESjEzrvfaPgZcY9aQFrLaS9zBzRIr51Evz6dBiD5 +0UcgiQW98cZgDJqgwMqfTNosYB9GEEWlB7llLROy/iWZ9JEpZYNYk52JQieW7gWM +kUodYkoTOuquBE93TZiFRXEr9Er+ACofESh7kdm+MgPvFlLSYdCeaknf8+JB2Q+M +aASarUslmgOehCGU5cqRgBXEdvm7PDuLyzNfYOT6onmbMCm6QU/wygCy3DQTR+cp +75kTNlVqAISMQCC7S/3+8DSZhZffugnqnb6mmxa4uOqSsljczws= +=Is9J +-END PGP SIGNATURE- Added: dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.md5 == --- dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.md5 (added) +++ dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.md5 Mon Jan 22 19:45:22 2018 @@ -0,0 +1 @@ +SparkR_2.3.0.tar.gz: 58 7E C4 A4 7E 60 B1 AC F1 FB 81 96 F7 7E BD A0 Added: dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.sha512 == --- dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.sha512 (added) +++ dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.sha512 Mon Jan 22 19:45:22 2018 @@ -0,0 +1,3 @@ +SparkR_2.3.0.tar.gz: 86A461C9 84324BB0 DC525774 2D4CCCB8 F0F16495 3C147E25 + 3040DBE3 D2FFBE31 C1596FEB C1905139 92AAF623 C296E3DD + 7599955F DFC55EE1 BCF5691A 6FB02759 Added: dev/spark/v2.3.0-rc2-bin/pyspark-2.3.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v2.3.0-rc2-bin/pyspark-2.3.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v2.3.0-rc2-bin/pyspark-2.3.0.tar.gz.asc == --- dev/spark/v2.3.0-rc2-bin/pyspark-2.3.0.tar.gz.asc (added) +++ dev/spark/v2.3.0-rc2-bin/pyspark-2.3.0.tar.gz.asc Mon Jan 22 19:45:22 2018 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCAAdFiEE8sZCQuwb7Gnqj7413OS/2AdGHpYFAlpmPkkACgkQ3OS/2AdG +HpbGZBAAjfAgbQuI1ye/5BBDT5Zd65kT78FD4/E6l6Idu0r4DRVywrUyjp90Vc+3 ++g9/cLDF5faWq23KyWSYpkO9rOL96sx0z65KV+spdaSRwNk7z4NOfyvzHyxzHSoy +723l9coFwG5zD96PzmI2mTfOSrfrXyKs1nn/j8QBSDhkGxNhCEGMhUKYgYICJ34Q
[1/2] spark git commit: Preparing Spark release v2.3.0-rc2
Repository: spark Updated Branches: refs/heads/branch-2.3 4e75b0cb4 -> 6facc7fb2 Preparing Spark release v2.3.0-rc2 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/489ecb0e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/489ecb0e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/489ecb0e Branch: refs/heads/branch-2.3 Commit: 489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91 Parents: 4e75b0c Author: Sameer AgarwalAuthored: Mon Jan 22 10:49:08 2018 -0800 Committer: Sameer Agarwal Committed: Mon Jan 22 10:49:08 2018 -0800 -- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml| 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 41 files changed, 42 insertions(+), 42 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/489ecb0e/R/pkg/DESCRIPTION -- diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 29a8a00..6d46c31 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 2.3.1 +Version: 2.3.0 Title: R Frontend for Apache Spark Description: Provides an R Frontend for Apache Spark. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), http://git-wip-us.apache.org/repos/asf/spark/blob/489ecb0e/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index 5c5a8e9..2ca9ab6 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.3.1-SNAPSHOT +2.3.0 ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/489ecb0e/common/kvstore/pom.xml -- diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 2a625da..404c744 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.1-SNAPSHOT +2.3.0 ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/489ecb0e/common/network-common/pom.xml -- diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index adb1890..3c0b528 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.1-SNAPSHOT +2.3.0 ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/489ecb0e/common/network-shuffle/pom.xml -- diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 4cdcfa2..fe3bcfd 100644 ---
[spark] Git Push Summary
Repository: spark Updated Tags: refs/tags/v2.3.0-rc2 [created] 489ecb0ef - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[2/2] spark git commit: Preparing development version 2.3.1-SNAPSHOT
Preparing development version 2.3.1-SNAPSHOT Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6facc7fb Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6facc7fb Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6facc7fb Branch: refs/heads/branch-2.3 Commit: 6facc7fb2333cc61409149e2f896bf84dd085fa3 Parents: 489ecb0 Author: Sameer AgarwalAuthored: Mon Jan 22 10:49:29 2018 -0800 Committer: Sameer Agarwal Committed: Mon Jan 22 10:49:29 2018 -0800 -- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml| 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 41 files changed, 42 insertions(+), 42 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6facc7fb/R/pkg/DESCRIPTION -- diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 6d46c31..29a8a00 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 2.3.0 +Version: 2.3.1 Title: R Frontend for Apache Spark Description: Provides an R Frontend for Apache Spark. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), http://git-wip-us.apache.org/repos/asf/spark/blob/6facc7fb/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index 2ca9ab6..5c5a8e9 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.3.0 +2.3.1-SNAPSHOT ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/6facc7fb/common/kvstore/pom.xml -- diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 404c744..2a625da 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.0 +2.3.1-SNAPSHOT ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/6facc7fb/common/network-common/pom.xml -- diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 3c0b528..adb1890 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.0 +2.3.1-SNAPSHOT ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/6facc7fb/common/network-shuffle/pom.xml -- diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index fe3bcfd..4cdcfa2 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@
spark git commit: [SPARK-23121][CORE] Fix for ui becoming unaccessible for long running streaming apps
Repository: spark Updated Branches: refs/heads/master 4327ccf28 -> 446948af1 [SPARK-23121][CORE] Fix for ui becoming unaccessible for long running streaming apps ## What changes were proposed in this pull request? The allJobs and the job pages attempt to use stage attempt and DAG visualization from the store, but for long running jobs they are not guaranteed to be retained, leading to exceptions when these pages are rendered. To fix it `store.lastStageAttempt(stageId)` and `store.operationGraphForJob(jobId)` are wrapped in `store.asOption` and default values are used if the info is missing. ## How was this patch tested? Manual testing of the UI, also using the test command reported in SPARK-23121: ./bin/spark-submit --class org.apache.spark.examples.streaming.HdfsWordCount ./examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar /spark Closes #20287 Author: Sandor MurakoziCloses #20330 from smurakozi/SPARK-23121. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/446948af Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/446948af Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/446948af Branch: refs/heads/master Commit: 446948af1d8dbc080a26a6eec6f743d338f1d12b Parents: 4327ccf Author: Sandor Murakozi Authored: Mon Jan 22 10:36:28 2018 -0800 Committer: Marcelo Vanzin Committed: Mon Jan 22 10:36:28 2018 -0800 -- .../org/apache/spark/ui/jobs/AllJobsPage.scala | 24 +++- .../org/apache/spark/ui/jobs/JobPage.scala | 10 ++-- .../org/apache/spark/ui/jobs/StagePage.scala| 9 +--- 3 files changed, 27 insertions(+), 16 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/446948af/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala -- diff --git a/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala b/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala index e3b72f1..2b0f4ac 100644 --- a/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala +++ b/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala @@ -36,6 +36,9 @@ import org.apache.spark.util.Utils /** Page showing list of all ongoing and recently finished jobs */ private[ui] class AllJobsPage(parent: JobsTab, store: AppStatusStore) extends WebUIPage("") { + + import ApiHelper._ + private val JOBS_LEGEND = val jobId = job.jobId val status = job.status - val jobDescription = store.lastStageAttempt(job.stageIds.max).description - val displayJobDescription = jobDescription -.map(UIUtils.makeDescription(_, "", plainText = true).text) -.getOrElse("") + val (_, lastStageDescription) = lastStageNameAndDescription(store, job) + val jobDescription = UIUtils.makeDescription(lastStageDescription, "", plainText = true).text + val submissionTime = job.submissionTime.get.getTime() val completionTime = job.completionTime.map(_.getTime()).getOrElse(System.currentTimeMillis()) val classNameByStatus = status match { @@ -80,7 +82,7 @@ private[ui] class AllJobsPage(parent: JobsTab, store: AppStatusStore) extends We // The timeline library treats contents as HTML, so we have to escape them. We need to add // extra layers of escaping in order to embed this in a Javascript string literal. - val escapedDesc = Utility.escape(displayJobDescription) + val escapedDesc = Utility.escape(jobDescription) val jsEscapedDesc = StringEscapeUtils.escapeEcmaScript(escapedDesc) val jobEventJsonAsStr = s""" @@ -430,6 +432,8 @@ private[ui] class JobDataSource( sortColumn: String, desc: Boolean) extends PagedDataSource[JobTableRowData](pageSize) { + import ApiHelper._ + // Convert JobUIData to JobTableRowData which contains the final contents to show in the table // so that we can avoid creating duplicate contents during sorting the data private val data = jobs.map(jobRow).sorted(ordering(sortColumn, desc)) @@ -454,23 +458,21 @@ private[ui] class JobDataSource( val formattedDuration = duration.map(d => UIUtils.formatDuration(d)).getOrElse("Unknown") val submissionTime = jobData.submissionTime val formattedSubmissionTime = submissionTime.map(UIUtils.formatDate).getOrElse("Unknown") -val lastStageAttempt = store.lastStageAttempt(jobData.stageIds.max) -val lastStageDescription = lastStageAttempt.description.getOrElse("") +val (lastStageName, lastStageDescription) = lastStageNameAndDescription(store, jobData) -val formattedJobDescription = - UIUtils.makeDescription(lastStageDescription, basePath, plainText = false) +val
spark git commit: [SPARK-23121][CORE] Fix for ui becoming unaccessible for long running streaming apps
Repository: spark Updated Branches: refs/heads/branch-2.3 d963ba031 -> 4e75b0cb4 [SPARK-23121][CORE] Fix for ui becoming unaccessible for long running streaming apps ## What changes were proposed in this pull request? The allJobs and the job pages attempt to use stage attempt and DAG visualization from the store, but for long running jobs they are not guaranteed to be retained, leading to exceptions when these pages are rendered. To fix it `store.lastStageAttempt(stageId)` and `store.operationGraphForJob(jobId)` are wrapped in `store.asOption` and default values are used if the info is missing. ## How was this patch tested? Manual testing of the UI, also using the test command reported in SPARK-23121: ./bin/spark-submit --class org.apache.spark.examples.streaming.HdfsWordCount ./examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar /spark Closes #20287 Author: Sandor MurakoziCloses #20330 from smurakozi/SPARK-23121. (cherry picked from commit 446948af1d8dbc080a26a6eec6f743d338f1d12b) Signed-off-by: Marcelo Vanzin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4e75b0cb Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4e75b0cb Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4e75b0cb Branch: refs/heads/branch-2.3 Commit: 4e75b0cb4b575d4799c02455eed286fe971c6c50 Parents: d963ba0 Author: Sandor Murakozi Authored: Mon Jan 22 10:36:28 2018 -0800 Committer: Marcelo Vanzin Committed: Mon Jan 22 10:36:39 2018 -0800 -- .../org/apache/spark/ui/jobs/AllJobsPage.scala | 24 +++- .../org/apache/spark/ui/jobs/JobPage.scala | 10 ++-- .../org/apache/spark/ui/jobs/StagePage.scala| 9 +--- 3 files changed, 27 insertions(+), 16 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4e75b0cb/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala -- diff --git a/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala b/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala index ff916bb..c2668a7 100644 --- a/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala +++ b/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala @@ -36,6 +36,9 @@ import org.apache.spark.util.Utils /** Page showing list of all ongoing and recently finished jobs */ private[ui] class AllJobsPage(parent: JobsTab, store: AppStatusStore) extends WebUIPage("") { + + import ApiHelper._ + private val JOBS_LEGEND = val jobId = job.jobId val status = job.status - val jobDescription = store.lastStageAttempt(job.stageIds.max).description - val displayJobDescription = jobDescription -.map(UIUtils.makeDescription(_, "", plainText = true).text) -.getOrElse("") + val (_, lastStageDescription) = lastStageNameAndDescription(store, job) + val jobDescription = UIUtils.makeDescription(lastStageDescription, "", plainText = true).text + val submissionTime = job.submissionTime.get.getTime() val completionTime = job.completionTime.map(_.getTime()).getOrElse(System.currentTimeMillis()) val classNameByStatus = status match { @@ -80,7 +82,7 @@ private[ui] class AllJobsPage(parent: JobsTab, store: AppStatusStore) extends We // The timeline library treats contents as HTML, so we have to escape them. We need to add // extra layers of escaping in order to embed this in a Javascript string literal. - val escapedDesc = Utility.escape(displayJobDescription) + val escapedDesc = Utility.escape(jobDescription) val jsEscapedDesc = StringEscapeUtils.escapeEcmaScript(escapedDesc) val jobEventJsonAsStr = s""" @@ -403,6 +405,8 @@ private[ui] class JobDataSource( sortColumn: String, desc: Boolean) extends PagedDataSource[JobTableRowData](pageSize) { + import ApiHelper._ + // Convert JobUIData to JobTableRowData which contains the final contents to show in the table // so that we can avoid creating duplicate contents during sorting the data private val data = jobs.map(jobRow).sorted(ordering(sortColumn, desc)) @@ -427,23 +431,21 @@ private[ui] class JobDataSource( val formattedDuration = duration.map(d => UIUtils.formatDuration(d)).getOrElse("Unknown") val submissionTime = jobData.submissionTime val formattedSubmissionTime = submissionTime.map(UIUtils.formatDate).getOrElse("Unknown") -val lastStageAttempt = store.lastStageAttempt(jobData.stageIds.max) -val lastStageDescription = lastStageAttempt.description.getOrElse("") +val (lastStageName, lastStageDescription) = lastStageNameAndDescription(store, jobData) -
svn commit: r24360 - in /dev/spark/2.4.0-SNAPSHOT-2018_01_22_08_01-4327ccf-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Mon Jan 22 16:22:00 2018 New Revision: 24360 Log: Apache Spark 2.4.0-SNAPSHOT-2018_01_22_08_01-4327ccf docs [This commit notification would consist of 1441 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-11630][CORE] ClosureCleaner moved from warning to debug
Repository: spark Updated Branches: refs/heads/master 87ffe7add -> 4327ccf28 [SPARK-11630][CORE] ClosureCleaner moved from warning to debug ## What changes were proposed in this pull request? ClosureCleaner moved from warning to debug ## How was this patch tested? Existing tests Author: Rekha JoshiAuthor: rjoshi2 Closes #20337 from rekhajoshm/SPARK-11630-1. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4327ccf2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4327ccf2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4327ccf2 Branch: refs/heads/master Commit: 4327ccf289b5a0dc51f6294113d01af6eb52eea0 Parents: 87ffe7a Author: Rekha Joshi Authored: Mon Jan 22 08:36:17 2018 -0600 Committer: Sean Owen Committed: Mon Jan 22 08:36:17 2018 -0600 -- core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4327ccf2/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala -- diff --git a/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala b/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala index 4061642..ad0c063 100644 --- a/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala +++ b/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala @@ -207,7 +207,7 @@ private[spark] object ClosureCleaner extends Logging { accessedFields: Map[Class[_], Set[String]]): Unit = { if (!isClosure(func.getClass)) { - logWarning("Expected a closure; got " + func.getClass.getName) + logDebug(s"Expected a closure; got ${func.getClass.getName}") return } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r24354 - in /dev/spark/2.3.1-SNAPSHOT-2018_01_22_06_01-d963ba0-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Mon Jan 22 14:20:14 2018 New Revision: 24354 Log: Apache Spark 2.3.1-SNAPSHOT-2018_01_22_06_01-d963ba0 docs [This commit notification would consist of 1441 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-7721][PYTHON][TESTS] Adds PySpark coverage generation script
Repository: spark Updated Branches: refs/heads/master 5d680cae4 -> 87ffe7add [SPARK-7721][PYTHON][TESTS] Adds PySpark coverage generation script ## What changes were proposed in this pull request? Note that this PR was made based on the top of https://github.com/apache/spark/pull/20151. So, it almost leaves the main codes intact. This PR proposes to add a script for the preparation of automatic PySpark coverage generation. Now, it's difficult to check the actual coverage in case of PySpark. With this script, it allows to run tests by the way we did via `run-tests` script before. The usage is exactly the same with `run-tests` script as this basically wraps it. This script and PR alone should also be useful. I was asked about how to run this before, and seems some reviewers (including me) need this. It would be also useful to run it manually. It usually requires a small diff in normal Python projects but PySpark cases are a bit different because apparently we are unable to track the coverage after it's forked. So, here, I made a custom worker that forces the coverage, based on the top of https://github.com/apache/spark/pull/20151. I made a simple demo. Please take a look - https://spark-test.github.io/pyspark-coverage-site. To show up the structure, this PR adds the files as below: ``` python âââ .coveragerc # Runtime configuration when we run the script. âââ run-tests-with-coverage # The script that has coverage support and wraps run-tests script. âââ test_coverage # Directories that have files required when running coverage. âââ conf â  âââ spark-defaults.conf # Having the configuration 'spark.python.daemon.module'. âââ coverage_daemon.py # A daemon having custom fix and wrapping our daemon.py âââ sitecustomize.py # Initiate coverage with COVERAGE_PROCESS_START ``` Note that this PR has a minor nit: [This scope](https://github.com/apache/spark/blob/04e44b37cc04f62fbf9e08c7076349e0a4d12ea8/python/pyspark/daemon.py#L148-L169) in `daemon.py` is not in the coverage results as basically I am producing the coverage results in `worker.py` separately and then merging it. I believe it's not a big deal. In a followup, I might have a site that has a single up-to-date PySpark coverage from the master branch as the fallback / default, or have a site that has multiple PySpark coverages and the site link will be left to each pull request. ## How was this patch tested? Manually tested. Usage is the same with the existing Python test script - `./python/run-tests`. For example, ``` sh run-tests-with-coverage --python-executables=python3 --modules=pyspark-sql ``` Running this will generate HTMLs under `./python/test_coverage/htmlcov`. Console output example: ``` sh run-tests-with-coverage --python-executables=python3,python --modules=pyspark-core Running PySpark tests. Output is in /.../spark/python/unit-tests.log Will test against the following Python executables: ['python3', 'python'] Will test the following Python modules: ['pyspark-core'] Starting test(python): pyspark.tests Starting test(python3): pyspark.tests ... Tests passed in 231 seconds Combining collected coverage data under /.../spark/python/test_coverage/coverage_data Reporting the coverage data at /...spark/python/test_coverage/coverage_data/coverage Name Stmts Miss Branch BrPart Cover -- pyspark/__init__.py 41 0 8 296% ... pyspark/profiler.py 74 11 22 583% pyspark/rdd.py 871 40303 3293% pyspark/rddsampler.py 68 10 32 282% ... -- TOTAL 8521 3077 274819159% Generating HTML files for PySpark coverage under /.../spark/python/test_coverage/htmlcov ``` Author: hyukjinkwonCloses #20204 from HyukjinKwon/python-coverage. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/87ffe7ad Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/87ffe7ad Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/87ffe7ad Branch: refs/heads/master Commit: 87ffe7adddf517541aac0d1e8536b02ad8881606 Parents: 5d680ca Author: hyukjinkwon Authored: Mon Jan 22 22:12:50 2018 +0900 Committer: hyukjinkwon Committed: Mon Jan 22 22:12:50 2018 +0900 -- .gitignore| 2 + python/.coveragerc| 21 +++ python/run-tests-with-coverage| 69 ++ python/run-tests.py | 5 +- python/test_coverage/conf/spark-defaults.conf | 21 +++
spark git commit: [SPARK-23090][SQL] polish ColumnVector
Repository: spark Updated Branches: refs/heads/branch-2.3 1069fad41 -> d963ba031 [SPARK-23090][SQL] polish ColumnVector ## What changes were proposed in this pull request? Several improvements: * provide a default implementation for the batch get methods * rename `getChildColumn` to `getChild`, which is more concise * remove `getStruct(int, int)`, it's only used to simplify the codegen, which is an internal thing, we should not add a public API for this purpose. ## How was this patch tested? existing tests Author: Wenchen FanCloses #20277 from cloud-fan/column-vector. (cherry picked from commit 5d680cae486c77cdb12dbe9e043710e49e8d51e4) Signed-off-by: Wenchen Fan Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d963ba03 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d963ba03 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d963ba03 Branch: refs/heads/branch-2.3 Commit: d963ba031748711ec7847ad0b702911eb7319c63 Parents: 1069fad Author: Wenchen Fan Authored: Mon Jan 22 20:56:38 2018 +0800 Committer: Wenchen Fan Committed: Mon Jan 22 20:56:57 2018 +0800 -- .../expressions/codegen/CodeGenerator.scala | 18 ++-- .../datasources/orc/OrcColumnVector.java| 65 + .../datasources/orc/OrcColumnarBatchReader.java | 23 ++--- .../execution/vectorized/ColumnVectorUtils.java | 10 +- .../vectorized/MutableColumnarRow.java | 4 +- .../vectorized/WritableColumnVector.java| 10 +- .../spark/sql/vectorized/ArrowColumnVector.java | 99 +--- .../spark/sql/vectorized/ColumnVector.java | 79 +++- .../spark/sql/vectorized/ColumnarArray.java | 4 +- .../spark/sql/vectorized/ColumnarRow.java | 46 - .../spark/sql/execution/ColumnarBatchScan.scala | 2 +- .../aggregate/VectorizedHashMapGenerator.scala | 4 +- .../sql/execution/arrow/ArrowWriterSuite.scala | 14 +-- .../vectorized/ArrowColumnVectorSuite.scala | 12 +-- .../vectorized/ColumnVectorSuite.scala | 12 +-- .../vectorized/ColumnarBatchBenchmark.scala | 38 .../vectorized/ColumnarBatchSuite.scala | 20 ++-- 17 files changed, 164 insertions(+), 296 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d963ba03/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala index 2c714c2..f96ed76 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala @@ -688,17 +688,13 @@ class CodegenContext { /** * Returns the specialized code to access a value from a column vector for a given `DataType`. */ - def getValue(vector: String, rowId: String, dataType: DataType): String = { -val jt = javaType(dataType) -dataType match { - case _ if isPrimitiveType(jt) => -s"$vector.get${primitiveTypeName(jt)}($rowId)" - case t: DecimalType => -s"$vector.getDecimal($rowId, ${t.precision}, ${t.scale})" - case StringType => -s"$vector.getUTF8String($rowId)" - case _ => -throw new IllegalArgumentException(s"cannot generate code for unsupported type: $dataType") + def getValueFromVector(vector: String, dataType: DataType, rowId: String): String = { +if (dataType.isInstanceOf[StructType]) { + // `ColumnVector.getStruct` is different from `InternalRow.getStruct`, it only takes an + // `ordinal` parameter. + s"$vector.getStruct($rowId)" +} else { + getValue(vector, dataType, rowId) } } http://git-wip-us.apache.org/repos/asf/spark/blob/d963ba03/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java -- diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java index b6e7922..aaf2a38 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java +++ b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java @@ -111,57 +111,21 @@ public class OrcColumnVector extends org.apache.spark.sql.vectorized.ColumnVecto } @Override -
spark git commit: [SPARK-23090][SQL] polish ColumnVector
Repository: spark Updated Branches: refs/heads/master 896e45af5 -> 5d680cae4 [SPARK-23090][SQL] polish ColumnVector ## What changes were proposed in this pull request? Several improvements: * provide a default implementation for the batch get methods * rename `getChildColumn` to `getChild`, which is more concise * remove `getStruct(int, int)`, it's only used to simplify the codegen, which is an internal thing, we should not add a public API for this purpose. ## How was this patch tested? existing tests Author: Wenchen FanCloses #20277 from cloud-fan/column-vector. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5d680cae Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5d680cae Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5d680cae Branch: refs/heads/master Commit: 5d680cae486c77cdb12dbe9e043710e49e8d51e4 Parents: 896e45a Author: Wenchen Fan Authored: Mon Jan 22 20:56:38 2018 +0800 Committer: Wenchen Fan Committed: Mon Jan 22 20:56:38 2018 +0800 -- .../expressions/codegen/CodeGenerator.scala | 18 ++-- .../datasources/orc/OrcColumnVector.java| 65 + .../datasources/orc/OrcColumnarBatchReader.java | 23 ++--- .../execution/vectorized/ColumnVectorUtils.java | 10 +- .../vectorized/MutableColumnarRow.java | 4 +- .../vectorized/WritableColumnVector.java| 10 +- .../spark/sql/vectorized/ArrowColumnVector.java | 99 +--- .../spark/sql/vectorized/ColumnVector.java | 79 +++- .../spark/sql/vectorized/ColumnarArray.java | 4 +- .../spark/sql/vectorized/ColumnarRow.java | 46 - .../spark/sql/execution/ColumnarBatchScan.scala | 2 +- .../aggregate/VectorizedHashMapGenerator.scala | 4 +- .../sql/execution/arrow/ArrowWriterSuite.scala | 14 +-- .../vectorized/ArrowColumnVectorSuite.scala | 12 +-- .../vectorized/ColumnVectorSuite.scala | 12 +-- .../vectorized/ColumnarBatchBenchmark.scala | 38 .../vectorized/ColumnarBatchSuite.scala | 20 ++-- 17 files changed, 164 insertions(+), 296 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5d680cae/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala index 2c714c2..f96ed76 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala @@ -688,17 +688,13 @@ class CodegenContext { /** * Returns the specialized code to access a value from a column vector for a given `DataType`. */ - def getValue(vector: String, rowId: String, dataType: DataType): String = { -val jt = javaType(dataType) -dataType match { - case _ if isPrimitiveType(jt) => -s"$vector.get${primitiveTypeName(jt)}($rowId)" - case t: DecimalType => -s"$vector.getDecimal($rowId, ${t.precision}, ${t.scale})" - case StringType => -s"$vector.getUTF8String($rowId)" - case _ => -throw new IllegalArgumentException(s"cannot generate code for unsupported type: $dataType") + def getValueFromVector(vector: String, dataType: DataType, rowId: String): String = { +if (dataType.isInstanceOf[StructType]) { + // `ColumnVector.getStruct` is different from `InternalRow.getStruct`, it only takes an + // `ordinal` parameter. + s"$vector.getStruct($rowId)" +} else { + getValue(vector, dataType, rowId) } } http://git-wip-us.apache.org/repos/asf/spark/blob/5d680cae/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java -- diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java index b6e7922..aaf2a38 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java +++ b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java @@ -111,57 +111,21 @@ public class OrcColumnVector extends org.apache.spark.sql.vectorized.ColumnVecto } @Override - public boolean[] getBooleans(int rowId, int count) { -boolean[] res = new boolean[count]; -for (int i = 0; i < count;
spark git commit: [MINOR][SQL][TEST] Test case cleanups for recent PRs
Repository: spark Updated Branches: refs/heads/master 78801881c -> 896e45af5 [MINOR][SQL][TEST] Test case cleanups for recent PRs ## What changes were proposed in this pull request? Revert the unneeded test case changes we made in SPARK-23000 Also fixes the test suites that do not call `super.afterAll()` in the local `afterAll`. The `afterAll()` of `TestHiveSingleton` actually reset the environments. ## How was this patch tested? N/A Author: gatorsmileCloses #20341 from gatorsmile/testRelated. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/896e45af Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/896e45af Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/896e45af Branch: refs/heads/master Commit: 896e45af5fea264683b1d7d20a1711f33908a06f Parents: 7880188 Author: gatorsmile Authored: Mon Jan 22 04:32:59 2018 -0800 Committer: gatorsmile Committed: Mon Jan 22 04:32:59 2018 -0800 -- .../apache/spark/sql/DataFrameJoinSuite.scala | 21 ++-- .../apache/spark/sql/hive/test/TestHive.scala | 3 +- .../sql/hive/HiveMetastoreCatalogSuite.scala| 26 +++ .../sql/hive/execution/HiveUDAFSuite.scala | 8 +++-- .../sql/hive/execution/Hive_2_1_DDLSuite.scala | 6 +++- .../execution/ObjectHashAggregateSuite.scala| 6 +++- .../apache/spark/sql/hive/parquetSuites.scala | 35 7 files changed, 60 insertions(+), 45 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/896e45af/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala index 1656f29..0d9eeab 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala @@ -21,6 +21,7 @@ import org.apache.spark.sql.catalyst.plans.{Inner, LeftOuter, RightOuter} import org.apache.spark.sql.catalyst.plans.logical.Join import org.apache.spark.sql.execution.joins.BroadcastHashJoinExec import org.apache.spark.sql.functions._ +import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.test.SharedSQLContext class DataFrameJoinSuite extends QueryTest with SharedSQLContext { @@ -276,16 +277,14 @@ class DataFrameJoinSuite extends QueryTest with SharedSQLContext { test("SPARK-23087: don't throw Analysis Exception in CheckCartesianProduct when join condition " + "is false or null") { -val df = spark.range(10) -val dfNull = spark.range(10).select(lit(null).as("b")) -val planNull = df.join(dfNull, $"id" === $"b", "left").queryExecution.analyzed - -spark.sessionState.executePlan(planNull).optimizedPlan - -val dfOne = df.select(lit(1).as("a")) -val dfTwo = spark.range(10).select(lit(2).as("b")) -val planFalse = dfOne.join(dfTwo, $"a" === $"b", "left").queryExecution.analyzed - -spark.sessionState.executePlan(planFalse).optimizedPlan +withSQLConf(SQLConf.CROSS_JOINS_ENABLED.key -> "false") { + val df = spark.range(10) + val dfNull = spark.range(10).select(lit(null).as("b")) + df.join(dfNull, $"id" === $"b", "left").queryExecution.optimizedPlan + + val dfOne = df.select(lit(1).as("a")) + val dfTwo = spark.range(10).select(lit(2).as("b")) + dfOne.join(dfTwo, $"a" === $"b", "left").queryExecution.optimizedPlan +} } } http://git-wip-us.apache.org/repos/asf/spark/blob/896e45af/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala index c84131f..7287e20 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala @@ -492,8 +492,7 @@ private[hive] class TestHiveSparkSession( protected val originalUDFs: JavaSet[String] = FunctionRegistry.getFunctionNames /** - * Resets the test instance by deleting any tables that have been created. - * TODO: also clear out UDFs, views, etc. + * Resets the test instance by deleting any table, view, temp view, and UDF that have been created */ def reset() { try { http://git-wip-us.apache.org/repos/asf/spark/blob/896e45af/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala -- diff --git
spark git commit: [MINOR][SQL][TEST] Test case cleanups for recent PRs
Repository: spark Updated Branches: refs/heads/branch-2.3 d933fcea6 -> 1069fad41 [MINOR][SQL][TEST] Test case cleanups for recent PRs ## What changes were proposed in this pull request? Revert the unneeded test case changes we made in SPARK-23000 Also fixes the test suites that do not call `super.afterAll()` in the local `afterAll`. The `afterAll()` of `TestHiveSingleton` actually reset the environments. ## How was this patch tested? N/A Author: gatorsmileCloses #20341 from gatorsmile/testRelated. (cherry picked from commit 896e45af5fea264683b1d7d20a1711f33908a06f) Signed-off-by: gatorsmile Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1069fad4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1069fad4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1069fad4 Branch: refs/heads/branch-2.3 Commit: 1069fad41fb6896fef4245e6ae6b5ba36115ad68 Parents: d933fce Author: gatorsmile Authored: Mon Jan 22 04:32:59 2018 -0800 Committer: gatorsmile Committed: Mon Jan 22 04:33:07 2018 -0800 -- .../apache/spark/sql/DataFrameJoinSuite.scala | 21 ++-- .../apache/spark/sql/hive/test/TestHive.scala | 3 +- .../sql/hive/HiveMetastoreCatalogSuite.scala| 26 +++ .../sql/hive/execution/HiveUDAFSuite.scala | 8 +++-- .../sql/hive/execution/Hive_2_1_DDLSuite.scala | 6 +++- .../execution/ObjectHashAggregateSuite.scala| 6 +++- .../apache/spark/sql/hive/parquetSuites.scala | 35 7 files changed, 60 insertions(+), 45 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/1069fad4/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala index 1656f29..0d9eeab 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala @@ -21,6 +21,7 @@ import org.apache.spark.sql.catalyst.plans.{Inner, LeftOuter, RightOuter} import org.apache.spark.sql.catalyst.plans.logical.Join import org.apache.spark.sql.execution.joins.BroadcastHashJoinExec import org.apache.spark.sql.functions._ +import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.test.SharedSQLContext class DataFrameJoinSuite extends QueryTest with SharedSQLContext { @@ -276,16 +277,14 @@ class DataFrameJoinSuite extends QueryTest with SharedSQLContext { test("SPARK-23087: don't throw Analysis Exception in CheckCartesianProduct when join condition " + "is false or null") { -val df = spark.range(10) -val dfNull = spark.range(10).select(lit(null).as("b")) -val planNull = df.join(dfNull, $"id" === $"b", "left").queryExecution.analyzed - -spark.sessionState.executePlan(planNull).optimizedPlan - -val dfOne = df.select(lit(1).as("a")) -val dfTwo = spark.range(10).select(lit(2).as("b")) -val planFalse = dfOne.join(dfTwo, $"a" === $"b", "left").queryExecution.analyzed - -spark.sessionState.executePlan(planFalse).optimizedPlan +withSQLConf(SQLConf.CROSS_JOINS_ENABLED.key -> "false") { + val df = spark.range(10) + val dfNull = spark.range(10).select(lit(null).as("b")) + df.join(dfNull, $"id" === $"b", "left").queryExecution.optimizedPlan + + val dfOne = df.select(lit(1).as("a")) + val dfTwo = spark.range(10).select(lit(2).as("b")) + dfOne.join(dfTwo, $"a" === $"b", "left").queryExecution.optimizedPlan +} } } http://git-wip-us.apache.org/repos/asf/spark/blob/1069fad4/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala index c84131f..7287e20 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala @@ -492,8 +492,7 @@ private[hive] class TestHiveSparkSession( protected val originalUDFs: JavaSet[String] = FunctionRegistry.getFunctionNames /** - * Resets the test instance by deleting any tables that have been created. - * TODO: also clear out UDFs, views, etc. + * Resets the test instance by deleting any table, view, temp view, and UDF that have been created */ def reset() { try {
spark git commit: [SPARK-23170][SQL] Dump the statistics of effective runs of analyzer and optimizer rules
Repository: spark Updated Branches: refs/heads/master 73281161f -> 78801881c [SPARK-23170][SQL] Dump the statistics of effective runs of analyzer and optimizer rules ## What changes were proposed in this pull request? Dump the statistics of effective runs of analyzer and optimizer rules. ## How was this patch tested? Do a manual run of TPCDSQuerySuite ``` === Metrics of Analyzer/Optimizer Rules === Total number of runs: 175899 Total time: 25.486559948 seconds Rule Effective Time / Total Time Effective Runs / Total Runs org.apache.spark.sql.catalyst.optimizer.ColumnPruning 1603280450 / 2868461549 761 / 1877 org.apache.spark.sql.catalyst.analysis.Analyzer$CTESubstitution 2045860009 / 2056602674 37 / 788 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions 440719059 / 1693110949 38 / 1982 org.apache.spark.sql.catalyst.optimizer.Optimizer$OptimizeSubqueries 1429834919 / 1446016225 39 / 285 org.apache.spark.sql.catalyst.optimizer.PruneFilters 33273083 / 1389586938 3 / 1592 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences 821183615 / 128754 616 / 1982 org.apache.spark.sql.catalyst.optimizer.ReorderJoin 775837028 / 866238225 132 / 1592 org.apache.spark.sql.catalyst.analysis.DecimalPrecision 550683593 / 748854507 211 / 1982 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery 513075345 / 634370596 49 / 1982 org.apache.spark.sql.catalyst.analysis.Analyzer$FixNullability 33475731 / 60640653212 / 742 org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts 193144298 / 545403925 86 / 1982 org.apache.spark.sql.catalyst.optimizer.BooleanSimplification 18651497 / 4957250047 / 1592 org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin 369257217 / 489934378 709 / 1592 org.apache.spark.sql.catalyst.optimizer.RemoveRedundantAliases 3707000 / 468291609 9 / 1592 org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints 410155900 / 435254175 192 / 285 org.apache.spark.sql.execution.datasources.FindDataSourceTable 348885539 / 371855866 233 / 1982 org.apache.spark.sql.catalyst.optimizer.NullPropagation 11307645 / 30753122526 / 1592 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions 120324545 / 304948785 294 / 1982 org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion 92323199 / 28669500738 / 1982 org.apache.spark.sql.catalyst.optimizer.PushDownPredicate 230084193 / 265845972 785 / 1592 org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings 45938401 / 26514400940 / 1982 org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion 14888776 / 2614994501 / 1982 org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion 113796384 / 244913861 29 / 1982 org.apache.spark.sql.catalyst.optimizer.ConstantFolding 65008069 / 236548480126 / 1592 org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator 0 / 226338929 0 / 1982 org.apache.spark.sql.catalyst.analysis.ResolveTimeZone 98134906 / 221323770417 / 1982 org.apache.spark.sql.catalyst.optimizer.ReorderAssociativeOperator 0 / 208421703 0 / 1592 org.apache.spark.sql.catalyst.optimizer.OptimizeIn
spark git commit: [SPARK-23170][SQL] Dump the statistics of effective runs of analyzer and optimizer rules
Repository: spark Updated Branches: refs/heads/branch-2.3 743b9173f -> d933fcea6 [SPARK-23170][SQL] Dump the statistics of effective runs of analyzer and optimizer rules ## What changes were proposed in this pull request? Dump the statistics of effective runs of analyzer and optimizer rules. ## How was this patch tested? Do a manual run of TPCDSQuerySuite ``` === Metrics of Analyzer/Optimizer Rules === Total number of runs: 175899 Total time: 25.486559948 seconds Rule Effective Time / Total Time Effective Runs / Total Runs org.apache.spark.sql.catalyst.optimizer.ColumnPruning 1603280450 / 2868461549 761 / 1877 org.apache.spark.sql.catalyst.analysis.Analyzer$CTESubstitution 2045860009 / 2056602674 37 / 788 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions 440719059 / 1693110949 38 / 1982 org.apache.spark.sql.catalyst.optimizer.Optimizer$OptimizeSubqueries 1429834919 / 1446016225 39 / 285 org.apache.spark.sql.catalyst.optimizer.PruneFilters 33273083 / 1389586938 3 / 1592 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences 821183615 / 128754 616 / 1982 org.apache.spark.sql.catalyst.optimizer.ReorderJoin 775837028 / 866238225 132 / 1592 org.apache.spark.sql.catalyst.analysis.DecimalPrecision 550683593 / 748854507 211 / 1982 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery 513075345 / 634370596 49 / 1982 org.apache.spark.sql.catalyst.analysis.Analyzer$FixNullability 33475731 / 60640653212 / 742 org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts 193144298 / 545403925 86 / 1982 org.apache.spark.sql.catalyst.optimizer.BooleanSimplification 18651497 / 4957250047 / 1592 org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin 369257217 / 489934378 709 / 1592 org.apache.spark.sql.catalyst.optimizer.RemoveRedundantAliases 3707000 / 468291609 9 / 1592 org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints 410155900 / 435254175 192 / 285 org.apache.spark.sql.execution.datasources.FindDataSourceTable 348885539 / 371855866 233 / 1982 org.apache.spark.sql.catalyst.optimizer.NullPropagation 11307645 / 30753122526 / 1592 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions 120324545 / 304948785 294 / 1982 org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion 92323199 / 28669500738 / 1982 org.apache.spark.sql.catalyst.optimizer.PushDownPredicate 230084193 / 265845972 785 / 1592 org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings 45938401 / 26514400940 / 1982 org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion 14888776 / 2614994501 / 1982 org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion 113796384 / 244913861 29 / 1982 org.apache.spark.sql.catalyst.optimizer.ConstantFolding 65008069 / 236548480126 / 1592 org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator 0 / 226338929 0 / 1982 org.apache.spark.sql.catalyst.analysis.ResolveTimeZone 98134906 / 221323770417 / 1982 org.apache.spark.sql.catalyst.optimizer.ReorderAssociativeOperator 0 / 208421703 0 / 1592 org.apache.spark.sql.catalyst.optimizer.OptimizeIn
spark git commit: [SPARK-23122][PYSPARK][FOLLOW-UP] Update the docs for UDF Registration
Repository: spark Updated Branches: refs/heads/branch-2.3 cf078a205 -> 743b9173f [SPARK-23122][PYSPARK][FOLLOW-UP] Update the docs for UDF Registration ## What changes were proposed in this pull request? This PR is to update the docs for UDF registration ## How was this patch tested? N/A Author: gatorsmileCloses #20348 from gatorsmile/testUpdateDoc. (cherry picked from commit 73281161fc7fddd645c712986ec376ac2b1bd213) Signed-off-by: gatorsmile Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/743b9173 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/743b9173 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/743b9173 Branch: refs/heads/branch-2.3 Commit: 743b9173f8feaed8e594961aa85d61fb3f8e5e70 Parents: cf078a2 Author: gatorsmile Authored: Mon Jan 22 04:27:59 2018 -0800 Committer: gatorsmile Committed: Mon Jan 22 04:28:08 2018 -0800 -- python/pyspark/sql/udf.py | 12 1 file changed, 8 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/743b9173/python/pyspark/sql/udf.py -- diff --git a/python/pyspark/sql/udf.py b/python/pyspark/sql/udf.py index c77f19f8..134badb 100644 --- a/python/pyspark/sql/udf.py +++ b/python/pyspark/sql/udf.py @@ -199,8 +199,8 @@ class UDFRegistration(object): @ignore_unicode_prefix @since("1.3.1") def register(self, name, f, returnType=None): -"""Registers a Python function (including lambda function) or a user-defined function -in SQL statements. +"""Register a Python function (including lambda function) or a user-defined function +as a SQL function. :param name: name of the user-defined function in SQL statements. :param f: a Python function, or a user-defined function. The user-defined function can @@ -210,6 +210,10 @@ class UDFRegistration(object): be either a :class:`pyspark.sql.types.DataType` object or a DDL-formatted type string. :return: a user-defined function. +To register a nondeterministic Python function, users need to first build +a nondeterministic user-defined function for the Python function and then register it +as a SQL function. + `returnType` can be optionally specified when `f` is a Python function but not when `f` is a user-defined function. Please see below. @@ -297,7 +301,7 @@ class UDFRegistration(object): @ignore_unicode_prefix @since(2.3) def registerJavaFunction(self, name, javaClassName, returnType=None): -"""Register a Java user-defined function so it can be used in SQL statements. +"""Register a Java user-defined function as a SQL function. In addition to a name and the function itself, the return type can be optionally specified. When the return type is not specified we would infer it via reflection. @@ -334,7 +338,7 @@ class UDFRegistration(object): @ignore_unicode_prefix @since(2.3) def registerJavaUDAF(self, name, javaClassName): -"""Register a Java user-defined aggregate function so it can be used in SQL statements. +"""Register a Java user-defined aggregate function as a SQL function. :param name: name of the user-defined aggregate function :param javaClassName: fully qualified name of java class - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-23122][PYSPARK][FOLLOW-UP] Update the docs for UDF Registration
Repository: spark Updated Branches: refs/heads/master 60175e959 -> 73281161f [SPARK-23122][PYSPARK][FOLLOW-UP] Update the docs for UDF Registration ## What changes were proposed in this pull request? This PR is to update the docs for UDF registration ## How was this patch tested? N/A Author: gatorsmileCloses #20348 from gatorsmile/testUpdateDoc. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/73281161 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/73281161 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/73281161 Branch: refs/heads/master Commit: 73281161fc7fddd645c712986ec376ac2b1bd213 Parents: 60175e959 Author: gatorsmile Authored: Mon Jan 22 04:27:59 2018 -0800 Committer: gatorsmile Committed: Mon Jan 22 04:27:59 2018 -0800 -- python/pyspark/sql/udf.py | 12 1 file changed, 8 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/73281161/python/pyspark/sql/udf.py -- diff --git a/python/pyspark/sql/udf.py b/python/pyspark/sql/udf.py index c77f19f8..134badb 100644 --- a/python/pyspark/sql/udf.py +++ b/python/pyspark/sql/udf.py @@ -199,8 +199,8 @@ class UDFRegistration(object): @ignore_unicode_prefix @since("1.3.1") def register(self, name, f, returnType=None): -"""Registers a Python function (including lambda function) or a user-defined function -in SQL statements. +"""Register a Python function (including lambda function) or a user-defined function +as a SQL function. :param name: name of the user-defined function in SQL statements. :param f: a Python function, or a user-defined function. The user-defined function can @@ -210,6 +210,10 @@ class UDFRegistration(object): be either a :class:`pyspark.sql.types.DataType` object or a DDL-formatted type string. :return: a user-defined function. +To register a nondeterministic Python function, users need to first build +a nondeterministic user-defined function for the Python function and then register it +as a SQL function. + `returnType` can be optionally specified when `f` is a Python function but not when `f` is a user-defined function. Please see below. @@ -297,7 +301,7 @@ class UDFRegistration(object): @ignore_unicode_prefix @since(2.3) def registerJavaFunction(self, name, javaClassName, returnType=None): -"""Register a Java user-defined function so it can be used in SQL statements. +"""Register a Java user-defined function as a SQL function. In addition to a name and the function itself, the return type can be optionally specified. When the return type is not specified we would infer it via reflection. @@ -334,7 +338,7 @@ class UDFRegistration(object): @ignore_unicode_prefix @since(2.3) def registerJavaUDAF(self, name, javaClassName): -"""Register a Java user-defined aggregate function so it can be used in SQL statements. +"""Register a Java user-defined aggregate function as a SQL function. :param name: name of the user-defined aggregate function :param javaClassName: fully qualified name of java class - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR][DOC] Fix the path to the examples jar
Repository: spark Updated Branches: refs/heads/branch-2.3 57c320a0d -> cf078a205 [MINOR][DOC] Fix the path to the examples jar ## What changes were proposed in this pull request? The example jar file is now in ./examples/jars directory of Spark distribution. Author: Arseniy TashoyanCloses #20349 from tashoyan/patch-1. (cherry picked from commit 60175e959f275d2961798fbc5a9150dac9de51ff) Signed-off-by: jerryshao Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cf078a20 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cf078a20 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cf078a20 Branch: refs/heads/branch-2.3 Commit: cf078a205a14d8709e2c4a9d9f23f6efa20b4fe7 Parents: 57c320a Author: Arseniy Tashoyan Authored: Mon Jan 22 20:17:05 2018 +0800 Committer: jerryshao Committed: Mon Jan 22 20:20:45 2018 +0800 -- docs/running-on-yarn.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/cf078a20/docs/running-on-yarn.md -- diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md index e4f5a0c..c010af3 100644 --- a/docs/running-on-yarn.md +++ b/docs/running-on-yarn.md @@ -35,7 +35,7 @@ For example: --executor-memory 2g \ --executor-cores 1 \ --queue thequeue \ -lib/spark-examples*.jar \ +examples/jars/spark-examples*.jar \ 10 The above starts a YARN client program which starts the default Application Master. Then SparkPi will be run as a child thread of Application Master. The client will periodically poll the Application Master for status updates and display them in the console. The client will exit once your application has finished running. Refer to the "Debugging your Application" section below for how to see driver and executor logs. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR][DOC] Fix the path to the examples jar
Repository: spark Updated Branches: refs/heads/master ec2289761 -> 60175e959 [MINOR][DOC] Fix the path to the examples jar ## What changes were proposed in this pull request? The example jar file is now in ./examples/jars directory of Spark distribution. Author: Arseniy TashoyanCloses #20349 from tashoyan/patch-1. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/60175e95 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/60175e95 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/60175e95 Branch: refs/heads/master Commit: 60175e959f275d2961798fbc5a9150dac9de51ff Parents: ec22897 Author: Arseniy Tashoyan Authored: Mon Jan 22 20:17:05 2018 +0800 Committer: jerryshao Committed: Mon Jan 22 20:17:05 2018 +0800 -- docs/running-on-yarn.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/60175e95/docs/running-on-yarn.md -- diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md index e4f5a0c..c010af3 100644 --- a/docs/running-on-yarn.md +++ b/docs/running-on-yarn.md @@ -35,7 +35,7 @@ For example: --executor-memory 2g \ --executor-cores 1 \ --queue thequeue \ -lib/spark-examples*.jar \ +examples/jars/spark-examples*.jar \ 10 The above starts a YARN client program which starts the default Application Master. Then SparkPi will be run as a child thread of Application Master. The client will periodically poll the Application Master for status updates and display them in the console. The client will exit once your application has finished running. Refer to the "Debugging your Application" section below for how to see driver and executor logs. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r24352 - in /dev/spark/2.3.1-SNAPSHOT-2018_01_22_02_01-57c320a-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Mon Jan 22 10:15:16 2018 New Revision: 24352 Log: Apache Spark 2.3.1-SNAPSHOT-2018_01_22_02_01-57c320a docs [This commit notification would consist of 1441 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r24350 - in /dev/spark/2.4.0-SNAPSHOT-2018_01_22_00_01-ec22897-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Mon Jan 22 08:16:21 2018 New Revision: 24350 Log: Apache Spark 2.4.0-SNAPSHOT-2018_01_22_00_01-ec22897 docs [This commit notification would consist of 1441 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org