spark git commit: [SPARK-5875][SQL]logical.Project should not be resolved if it contains aggregates or generators

2015-02-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master a51fc7ef9 - d5f12bfe8 [SPARK-5875][SQL]logical.Project should not be resolved if it contains aggregates or generators https://issues.apache.org/jira/browse/SPARK-5875 has a case to reproduce the bug and explain the root cause. Author:

spark git commit: [SPARK-5875][SQL]logical.Project should not be resolved if it contains aggregates or generators

2015-02-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.3 7320605ad - e8284b29d [SPARK-5875][SQL]logical.Project should not be resolved if it contains aggregates or generators https://issues.apache.org/jira/browse/SPARK-5875 has a case to reproduce the bug and explain the root cause.

spark git commit: [SPARK-4454] Properly synchronize accesses to DAGScheduler cacheLocs map

2015-02-17 Thread pwendell
Repository: spark Updated Branches: refs/heads/master ae6cfb3ac - d46d6246d [SPARK-4454] Properly synchronize accesses to DAGScheduler cacheLocs map This patch addresses a race condition in DAGScheduler by properly synchronizing accesses to its `cacheLocs` map. This map is accessed by the

spark git commit: [SPARK-4454] Properly synchronize accesses to DAGScheduler cacheLocs map

2015-02-17 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.3 cb905841b - 07a401a7b [SPARK-4454] Properly synchronize accesses to DAGScheduler cacheLocs map This patch addresses a race condition in DAGScheduler by properly synchronizing accesses to its `cacheLocs` map. This map is accessed by

spark git commit: [SPARK-5723][SQL]Change the default file format to Parquet for CTAS statements.

2015-02-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.3 2ab0ba04f - 6e82c46bf [SPARK-5723][SQL]Change the default file format to Parquet for CTAS statements. JIRA: https://issues.apache.org/jira/browse/SPARK-5723 Author: Yin Huai yh...@databricks.com This patch had conflicts when merged,

spark git commit: [SPARK-4454] Revert getOrElse() cleanup in DAGScheduler.getCacheLocs()

2015-02-17 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.3 07a401a7b - 7e5e4d82b [SPARK-4454] Revert getOrElse() cleanup in DAGScheduler.getCacheLocs() This method is performance-sensitive and this change wasn't necessary. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-4454] Revert getOrElse() cleanup in DAGScheduler.getCacheLocs()

2015-02-17 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master d46d6246d - a51fc7ef9 [SPARK-4454] Revert getOrElse() cleanup in DAGScheduler.getCacheLocs() This method is performance-sensitive and this change wasn't necessary. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-5785] [PySpark] narrow dependency for cogroup/join in PySpark

2015-02-17 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.3 07d8ef9e7 - 81202350a [SPARK-5785] [PySpark] narrow dependency for cogroup/join in PySpark Currently, PySpark does not support narrow dependency during cogroup/join when the two RDDs have the partitioner, another unnecessary shuffle

spark git commit: [SPARK-5852][SQL]Fail to convert a newly created empty metastore parquet table to a data source parquet table.

2015-02-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.3 0dba382ee - 07d8ef9e7 [SPARK-5852][SQL]Fail to convert a newly created empty metastore parquet table to a data source parquet table. The problem is that after we create an empty hive metastore parquet table (e.g. `CREATE TABLE test

spark git commit: [SPARK-5852][SQL]Fail to convert a newly created empty metastore parquet table to a data source parquet table.

2015-02-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 4d4cc760f - 117121a4e [SPARK-5852][SQL]Fail to convert a newly created empty metastore parquet table to a data source parquet table. The problem is that after we create an empty hive metastore parquet table (e.g. `CREATE TABLE test (a

spark git commit: [SPARK-5872] [SQL] create a sqlCtx in pyspark shell

2015-02-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 3df85dccb - 4d4cc760f [SPARK-5872] [SQL] create a sqlCtx in pyspark shell The sqlCtx will be HiveContext if hive is built in assembly jar, or SQLContext if not. It also skip the Hive tests in pyspark.sql.tests if no hive is available.

spark git commit: [SPARK-5872] [SQL] create a sqlCtx in pyspark shell

2015-02-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.3 cb061603c - 0dba382ee [SPARK-5872] [SQL] create a sqlCtx in pyspark shell The sqlCtx will be HiveContext if hive is built in assembly jar, or SQLContext if not. It also skip the Hive tests in pyspark.sql.tests if no hive is

[2/2] spark git commit: Revert Preparing Spark release v1.3.0-snapshot1

2015-02-17 Thread pwendell
Revert Preparing Spark release v1.3.0-snapshot1 This reverts commit d97bfc6f28ec4b7acfb36410c7c167d8d3c145ec. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7320605a Tree:

[1/2] spark git commit: Revert Preparing development version 1.3.1-SNAPSHOT

2015-02-17 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.3 7e5e4d82b - 7320605ad Revert Preparing development version 1.3.1-SNAPSHOT This reverts commit e57c81b8c1a6581c2588973eaf30d3c7ae90ed0c. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

[1/2] spark git commit: Preparing development version 1.3.1-SNAPSHOT

2015-02-17 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.3 e8284b29d - 2ab0ba04f Preparing development version 1.3.1-SNAPSHOT Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2ab0ba04 Tree:

spark git commit: [SPARK-5785] [PySpark] narrow dependency for cogroup/join in PySpark

2015-02-17 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 117121a4e - c3d2b90bd [SPARK-5785] [PySpark] narrow dependency for cogroup/join in PySpark Currently, PySpark does not support narrow dependency during cogroup/join when the two RDDs have the partitioner, another unnecessary shuffle stage

svn commit: r1660554 - /spark/news/_posts/2015-02-09-spark-1-2-1-released.md

2015-02-17 Thread pwendell
Author: pwendell Date: Wed Feb 18 01:07:50 2015 New Revision: 1660554 URL: http://svn.apache.org/r1660554 Log: Adding missing news file for Spark 1.2.1 release Added: spark/news/_posts/2015-02-09-spark-1-2-1-released.md Added: spark/news/_posts/2015-02-09-spark-1-2-1-released.md URL:

spark git commit: [SPARK-5731][Streaming][Test] Fix incorrect test in DirectKafkaStreamSuite

2015-02-17 Thread tdas
Repository: spark Updated Branches: refs/heads/master e50934f11 - 3912d3324 [SPARK-5731][Streaming][Test] Fix incorrect test in DirectKafkaStreamSuite The test was incorrect. Instead of counting the number of records, it counted the number of partitions of RDD generated by DStream. Which is

spark git commit: [SPARK-5731][Streaming][Test] Fix incorrect test in DirectKafkaStreamSuite

2015-02-17 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.3 6e82c46bf - f8f9a64eb [SPARK-5731][Streaming][Test] Fix incorrect test in DirectKafkaStreamSuite The test was incorrect. Instead of counting the number of records, it counted the number of partitions of RDD generated by DStream. Which

[2/2] spark git commit: [Minor] [SQL] Cleans up DataFrame variable names and toDF() calls

2015-02-17 Thread rxin
[Minor] [SQL] Cleans up DataFrame variable names and toDF() calls Although we've migrated to the DataFrame API, lots of code still uses `rdd` or `srdd` as local variable names. This PR tries to address these naming inconsistencies and some other minor DataFrame related style issues. !--

[1/2] spark git commit: [Minor] [SQL] Cleans up DataFrame variable names and toDF() calls

2015-02-17 Thread rxin
Repository: spark Updated Branches: refs/heads/master 3912d3324 - 61ab08549 http://git-wip-us.apache.org/repos/asf/spark/blob/61ab0854/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUdfSuite.scala -- diff

[2/2] spark git commit: [Minor] [SQL] Cleans up DataFrame variable names and toDF() calls

2015-02-17 Thread rxin
[Minor] [SQL] Cleans up DataFrame variable names and toDF() calls Although we've migrated to the DataFrame API, lots of code still uses `rdd` or `srdd` as local variable names. This PR tries to address these naming inconsistencies and some other minor DataFrame related style issues. !--

[1/2] spark git commit: [Minor] [SQL] Cleans up DataFrame variable names and toDF() calls

2015-02-17 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 f8f9a64eb - 2bd33ce62 http://git-wip-us.apache.org/repos/asf/spark/blob/2bd33ce6/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUdfSuite.scala -- diff

spark git commit: Revert [SPARK-5363] [PySpark] check ending mark in non-block way

2015-02-17 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.3 b8da5c390 - aeb85cdee Revert [SPARK-5363] [PySpark] check ending mark in non-block way This reverts commits ac6fe67e1d8bf01ee565f9cc09ad48d88a275829 and c06e42f2c1e5fcf123b466efd27ee4cb53bbed3f. Project:

spark git commit: Revert [SPARK-5363] [PySpark] check ending mark in non-block way

2015-02-17 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.2 432ceca2a - 6be36d5a8 Revert [SPARK-5363] [PySpark] check ending mark in non-block way This reverts commits ac6fe67e1d8bf01ee565f9cc09ad48d88a275829 and c06e42f2c1e5fcf123b466efd27ee4cb53bbed3f. Project:

spark git commit: SPARK-5841 [CORE] [HOTFIX] Memory leak in DiskBlockManager

2015-02-17 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-1.3 420bc9b3a - e64afcd84 SPARK-5841 [CORE] [HOTFIX] Memory leak in DiskBlockManager Avoid call to remove shutdown hook being called from shutdown hook CC pwendell JoshRosen MattWhelan Author: Sean Owen so...@cloudera.com Closes #4648

spark git commit: [Minor] fix typo in SQL document

2015-02-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.3 71cf6e295 - 5636c4a58 [Minor] fix typo in SQL document Author: CodingCat zhunans...@gmail.com Closes #4656 from CodingCat/fix_typo and squashes the following commits: b41d15c [CodingCat] recover 689fe46 [CodingCat] fix typo (cherry

spark git commit: [SPARK-5862][SQL] Only transformUp the given plan once in HiveMetastoreCatalog

2015-02-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 31efb39c1 - 4611de1ce [SPARK-5862][SQL] Only transformUp the given plan once in HiveMetastoreCatalog Current `ParquetConversions` in `HiveMetastoreCatalog` will transformUp the given plan multiple times if there are many Metastore Parquet

spark git commit: [SPARK-5862][SQL] Only transformUp the given plan once in HiveMetastoreCatalog

2015-02-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.3 5636c4a58 - 62063b7a3 [SPARK-5862][SQL] Only transformUp the given plan once in HiveMetastoreCatalog Current `ParquetConversions` in `HiveMetastoreCatalog` will transformUp the given plan multiple times if there are many Metastore

spark git commit: [SPARK-5864] [PySpark] support .jar as python package

2015-02-17 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 49c19fdba - fc4eb9505 [SPARK-5864] [PySpark] support .jar as python package A jar file containing Python sources in it could be used as a Python package, just like zip file. spark-submit already put the jar file into PYTHONPATH, this

spark git commit: SPARK-5856: In Maven build script, launch Zinc with more memory

2015-02-17 Thread pwendell
Repository: spark Updated Branches: refs/heads/master ee6e3eff0 - 3ce46e94f SPARK-5856: In Maven build script, launch Zinc with more memory I've seen out of memory exceptions when trying to run many parallel builds against the same Zinc server during packaging. We should use the same

spark git commit: [SPARK-5661]function hasShutdownDeleteTachyonDir should use shutdownDeleteTachyonPaths to determine whether contains file

2015-02-17 Thread srowen
Repository: spark Updated Branches: refs/heads/master d8f69cf78 - b271c265b [SPARK-5661]function hasShutdownDeleteTachyonDir should use shutdownDeleteTachyonPaths to determine whether contains file hasShutdownDeleteTachyonDir(file: TachyonFile) should use shutdownDeleteTachyonPaths(not

spark git commit: [SPARK-5778] throw if nonexistent metrics config file provided

2015-02-17 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.3 4a581aa3f - 2bf2b56ef [SPARK-5778] throw if nonexistent metrics config file provided previous behavior was to log an error; this is fine in the general case where no `spark.metrics.conf` parameter was specified, in which case a default

spark git commit: [SPARK-5858][MLLIB] Remove unnecessary first() call in GLM

2015-02-17 Thread meng
Repository: spark Updated Branches: refs/heads/master 3ce46e94f - c76da36c2 [SPARK-5858][MLLIB] Remove unnecessary first() call in GLM `numFeatures` is only used by multinomial logistic regression. Calling `.first()` for every GLM causes performance regression, especially in Python. Author:

[1/2] spark git commit: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API Cleanup / Documentation

2015-02-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master c76da36c2 - c74b07fa9 http://git-wip-us.apache.org/repos/asf/spark/blob/c74b07fa/sql/core/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegration.scala -- diff --git

[2/2] spark git commit: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API Cleanup / Documentation

2015-02-17 Thread marmbrus
[SPARK-5166][SPARK-5247][SPARK-5258][SQL] API Cleanup / Documentation Author: Michael Armbrust mich...@databricks.com Closes #4642 from marmbrus/docs and squashes the following commits: d291c34 [Michael Armbrust] python tests 9be66e3 [Michael Armbrust] comments d56afc2 [Michael Armbrust] fix

[1/2] spark git commit: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API Cleanup / Documentation

2015-02-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.3 97cb568a2 - cd3d41587 http://git-wip-us.apache.org/repos/asf/spark/blob/cd3d4158/sql/core/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegration.scala -- diff --git

spark git commit: [SPARK-5858][MLLIB] Remove unnecessary first() call in GLM

2015-02-17 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.3 824062912 - 97cb568a2 [SPARK-5858][MLLIB] Remove unnecessary first() call in GLM `numFeatures` is only used by multinomial logistic regression. Calling `.first()` for every GLM causes performance regression, especially in Python.

spark git commit: [SPARK-5859] [PySpark] [SQL] fix DataFrame Python API

2015-02-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.3 cd3d41587 - 4a581aa3f [SPARK-5859] [PySpark] [SQL] fix DataFrame Python API 1. added explain() 2. add isLocal() 3. do not call show() in __repl__ 4. add foreach() and foreachPartition() 5. add distinct() 6. fix

spark git commit: [SPARK-3381] [MLlib] Eliminate bins for unordered features in DecisionTrees

2015-02-17 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master b271c265b - 9b746f380 [SPARK-3381] [MLlib] Eliminate bins for unordered features in DecisionTrees For unordered features, it is sufficient to use splits since the threshold of the split corresponds the threshold of the HighSplit of the

[2/2] spark git commit: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API Cleanup / Documentation

2015-02-17 Thread marmbrus
[SPARK-5166][SPARK-5247][SPARK-5258][SQL] API Cleanup / Documentation Author: Michael Armbrust mich...@databricks.com Closes #4642 from marmbrus/docs and squashes the following commits: d291c34 [Michael Armbrust] python tests 9be66e3 [Michael Armbrust] comments d56afc2 [Michael Armbrust] fix

spark git commit: [SPARK-5778] throw if nonexistent metrics config file provided

2015-02-17 Thread pwendell
Repository: spark Updated Branches: refs/heads/master d8adefefc - d8f69cf78 [SPARK-5778] throw if nonexistent metrics config file provided previous behavior was to log an error; this is fine in the general case where no `spark.metrics.conf` parameter was specified, in which case a default

spark git commit: [SPARK-5859] [PySpark] [SQL] fix DataFrame Python API

2015-02-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master c74b07fa9 - d8adefefc [SPARK-5859] [PySpark] [SQL] fix DataFrame Python API 1. added explain() 2. add isLocal() 3. do not call show() in __repl__ 4. add foreach() and foreachPartition() 5. add distinct() 6. fix

spark git commit: [SPARK-5826][Streaming] Fix Configuration not serializable problem

2015-02-17 Thread srowen
Repository: spark Updated Branches: refs/heads/master c06e42f2c - a65766bf0 [SPARK-5826][Streaming] Fix Configuration not serializable problem Author: jerryshao saisai.s...@intel.com Closes #4612 from jerryshao/SPARK-5826 and squashes the following commits: 7ec71db [jerryshao] Remove

spark git commit: [SPARK-5826][Streaming] Fix Configuration not serializable problem

2015-02-17 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-1.3 e9241fa70 - b8da5c390 [SPARK-5826][Streaming] Fix Configuration not serializable problem Author: jerryshao saisai.s...@intel.com Closes #4612 from jerryshao/SPARK-5826 and squashes the following commits: 7ec71db [jerryshao] Remove

spark git commit: [SPARK-5661]function hasShutdownDeleteTachyonDir should use shutdownDeleteTachyonPaths to determine whether contains file

2015-02-17 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-1.3 2bf2b56ef - 420bc9b3a [SPARK-5661]function hasShutdownDeleteTachyonDir should use shutdownDeleteTachyonPaths to determine whether contains file hasShutdownDeleteTachyonDir(file: TachyonFile) should use shutdownDeleteTachyonPaths(not

spark git commit: [Minor][SQL] Use same function to check path parameter in JSONRelation

2015-02-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 4611de1ce - ac506b7c2 [Minor][SQL] Use same function to check path parameter in JSONRelation Author: Liang-Chi Hsieh vii...@gmail.com Closes #4649 from viirya/use_checkpath and squashes the following commits: 0f9a1a1 [Liang-Chi Hsieh]

spark git commit: MAINTENANCE: Automated closing of pull requests.

2015-02-17 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 9b746f380 - 24f358b9d MAINTENANCE: Automated closing of pull requests. This commit exists to close the following pull requests on Github: Closes #3297 (close requested by 'andrewor14') Closes #3345 (close requested by 'pwendell') Closes

spark git commit: SPARK-5841 [CORE] [HOTFIX] Memory leak in DiskBlockManager

2015-02-17 Thread srowen
Repository: spark Updated Branches: refs/heads/master 24f358b9d - 49c19fdba SPARK-5841 [CORE] [HOTFIX] Memory leak in DiskBlockManager Avoid call to remove shutdown hook being called from shutdown hook CC pwendell JoshRosen MattWhelan Author: Sean Owen so...@cloudera.com Closes #4648 from

spark git commit: [SPARK-5864] [PySpark] support .jar as python package

2015-02-17 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.3 e64afcd84 - 71cf6e295 [SPARK-5864] [PySpark] support .jar as python package A jar file containing Python sources in it could be used as a Python package, just like zip file. spark-submit already put the jar file into PYTHONPATH, this

spark git commit: [SQL] [Minor] Update the HiveContext Unittest

2015-02-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master ac506b7c2 - 9d281fa56 [SQL] [Minor] Update the HiveContext Unittest In unit test, the table src(key INT, value STRING) is not the same as HIVE src(key STRING, value STRING)

spark git commit: [SQL] [Minor] Update the HiveContext Unittest

2015-02-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.3 d74d5e86a - 01356514e [SQL] [Minor] Update the HiveContext Unittest In unit test, the table src(key INT, value STRING) is not the same as HIVE src(key STRING, value STRING)

spark git commit: [Minor][SQL] Use same function to check path parameter in JSONRelation

2015-02-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.3 62063b7a3 - d74d5e86a [Minor][SQL] Use same function to check path parameter in JSONRelation Author: Liang-Chi Hsieh vii...@gmail.com Closes #4649 from viirya/use_checkpath and squashes the following commits: 0f9a1a1 [Liang-Chi

spark git commit: [SPARK-5871] output explain in Python

2015-02-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 445a755b8 - 3df85dccb [SPARK-5871] output explain in Python Author: Davies Liu dav...@databricks.com Closes #4658 from davies/explain and squashes the following commits: db87ea2 [Davies Liu] output explain in Python Project:

spark git commit: [SPARK-5868][SQL] Fix python UDFs in HiveContext and checks in SQLContext

2015-02-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.3 01356514e - e65dc1fd5 [SPARK-5868][SQL] Fix python UDFs in HiveContext and checks in SQLContext Author: Michael Armbrust mich...@databricks.com Closes #4657 from marmbrus/pythonUdfs and squashes the following commits: a7823a8

spark git commit: [SPARK-5868][SQL] Fix python UDFs in HiveContext and checks in SQLContext

2015-02-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 9d281fa56 - de4836f8f [SPARK-5868][SQL] Fix python UDFs in HiveContext and checks in SQLContext Author: Michael Armbrust mich...@databricks.com Closes #4657 from marmbrus/pythonUdfs and squashes the following commits: a7823a8 [Michael

spark git commit: [SPARK-4172] [PySpark] Progress API in Python

2015-02-17 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.3 e65dc1fd5 - 35e23ff14 [SPARK-4172] [PySpark] Progress API in Python This patch bring the pull based progress API into Python, also a example in Python. Author: Davies Liu dav...@databricks.com Closes #3027 from davies/progress_api

spark git commit: [SPARK-4172] [PySpark] Progress API in Python

2015-02-17 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master de4836f8f - 445a755b8 [SPARK-4172] [PySpark] Progress API in Python This patch bring the pull based progress API into Python, also a example in Python. Author: Davies Liu dav...@databricks.com Closes #3027 from davies/progress_api and