spark git commit: [SPARK-15997][DOC][ML] Update user guide for HashingTF, QuantileVectorizer and CountVectorizer

2016-06-24 Thread mlnick
Repository: spark Updated Branches: refs/heads/branch-2.0 201d5e8db -> 76741b570 [SPARK-15997][DOC][ML] Update user guide for HashingTF, QuantileVectorizer and CountVectorizer ## What changes were proposed in this pull request? Made changes to HashingTF,QuantileVectorizer and

spark git commit: [SPARK-15997][DOC][ML] Update user guide for HashingTF, QuantileVectorizer and CountVectorizer

2016-06-24 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 158af162e -> be88383e1 [SPARK-15997][DOC][ML] Update user guide for HashingTF, QuantileVectorizer and CountVectorizer ## What changes were proposed in this pull request? Made changes to HashingTF,QuantileVectorizer and CountVectorizer

spark git commit: [SPARK-16129][CORE][SQL] Eliminate direct use of commons-lang classes in favor of commons-lang3

2016-06-24 Thread srowen
Repository: spark Updated Branches: refs/heads/master f4fd7432f -> 158af162e [SPARK-16129][CORE][SQL] Eliminate direct use of commons-lang classes in favor of commons-lang3 ## What changes were proposed in this pull request? Replace use of `commons-lang` in favor of `commons-lang3` and

spark git commit: [SPARK-16129][CORE][SQL] Eliminate direct use of commons-lang classes in favor of commons-lang3

2016-06-24 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 b6420db9e -> 201d5e8db [SPARK-16129][CORE][SQL] Eliminate direct use of commons-lang classes in favor of commons-lang3 ## What changes were proposed in this pull request? Replace use of `commons-lang` in favor of `commons-lang3` and

spark git commit: [SPARK-15963][CORE] Catch `TaskKilledException` correctly in Executor.TaskRunner

2016-06-24 Thread irashid
Repository: spark Updated Branches: refs/heads/master be88383e1 -> a4851ed05 [SPARK-15963][CORE] Catch `TaskKilledException` correctly in Executor.TaskRunner ## The problem Before this change, if either of the following cases happened to a task , the task would be marked as `FAILED` instead

spark git commit: [SPARK-13709][SQL] Initialize deserializer with both table and partition properties when reading partitioned tables

2016-06-24 Thread yhuai
Repository: spark Updated Branches: refs/heads/master cc6778ee0 -> 2d2f607bf [SPARK-13709][SQL] Initialize deserializer with both table and partition properties when reading partitioned tables ## What changes were proposed in this pull request? When reading partitions of a partitioned Hive

spark git commit: [SPARK-13709][SQL] Initialize deserializer with both table and partition properties when reading partitioned tables

2016-06-24 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 3d8d95644 -> 3ccdd6b9c [SPARK-13709][SQL] Initialize deserializer with both table and partition properties when reading partitioned tables ## What changes were proposed in this pull request? When reading partitions of a partitioned

spark git commit: [SPARK-16125][YARN] Fix not test yarn cluster mode correctly in YarnClusterSuite

2016-06-24 Thread srowen
Repository: spark Updated Branches: refs/heads/master 2d2f607bf -> f4fd7432f [SPARK-16125][YARN] Fix not test yarn cluster mode correctly in YarnClusterSuite ## What changes were proposed in this pull request? Since SPARK-13220(Deprecate "yarn-client" and "yarn-cluster"), YarnClusterSuite

spark git commit: [SPARK-16125][YARN] Fix not test yarn cluster mode correctly in YarnClusterSuite

2016-06-24 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 3ccdd6b9c -> b6420db9e [SPARK-16125][YARN] Fix not test yarn cluster mode correctly in YarnClusterSuite ## What changes were proposed in this pull request? Since SPARK-13220(Deprecate "yarn-client" and "yarn-cluster"),

spark git commit: [SPARK-16077] [PYSPARK] catch the exception from pickle.whichmodule()

2016-06-24 Thread davies
Repository: spark Updated Branches: refs/heads/branch-1.6 4fdac3c27 -> d7223bb9f [SPARK-16077] [PYSPARK] catch the exception from pickle.whichmodule() ## What changes were proposed in this pull request? In the case that we don't know which module a object came from, will call

spark git commit: [SPARK-16077] [PYSPARK] catch the exception from pickle.whichmodule()

2016-06-24 Thread davies
Repository: spark Updated Branches: refs/heads/branch-2.0 76741b570 -> 4bb8cca44 [SPARK-16077] [PYSPARK] catch the exception from pickle.whichmodule() ## What changes were proposed in this pull request? In the case that we don't know which module a object came from, will call

spark git commit: [SQL][MINOR] Simplify data source predicate filter translation.

2016-06-24 Thread yhuai
Repository: spark Updated Branches: refs/heads/master d48935400 -> 5f8de2160 [SQL][MINOR] Simplify data source predicate filter translation. ## What changes were proposed in this pull request? This is a small patch to rewrite the predicate filter translation in DataSourceStrategy. The

spark git commit: [SPARK-16186] [SQL] Support partition batch pruning with `IN` predicate in InMemoryTableScanExec

2016-06-24 Thread davies
Repository: spark Updated Branches: refs/heads/master 4435de1bd -> a65bcbc27 [SPARK-16186] [SQL] Support partition batch pruning with `IN` predicate in InMemoryTableScanExec ## What changes were proposed in this pull request? One of the most frequent usage patterns for Spark SQL is using

spark git commit: [SPARK-16179][PYSPARK] fix bugs for Python udf in generate

2016-06-24 Thread rxin
Repository: spark Updated Branches: refs/heads/master 5f8de2160 -> 4435de1bd [SPARK-16179][PYSPARK] fix bugs for Python udf in generate ## What changes were proposed in this pull request? This PR fix the bug when Python UDF is used in explode (generator), GenerateExec requires that all the

spark git commit: [SPARK-16179][PYSPARK] fix bugs for Python udf in generate

2016-06-24 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 4bb8cca44 -> df137e3e0 [SPARK-16179][PYSPARK] fix bugs for Python udf in generate ## What changes were proposed in this pull request? This PR fix the bug when Python UDF is used in explode (generator), GenerateExec requires that all

spark git commit: [SPARK-16195][SQL] Allow users to specify empty over clause in window expressions through dataset API

2016-06-24 Thread hvanhovell
Repository: spark Updated Branches: refs/heads/branch-2.0 9de095513 -> 9e2384845 [SPARK-16195][SQL] Allow users to specify empty over clause in window expressions through dataset API ## What changes were proposed in this pull request? Allow to specify empty over clause in window expressions

spark git commit: Revert "[SPARK-16186] [SQL] Support partition batch pruning with `IN` predicate in InMemoryTableScanExec"

2016-06-24 Thread davies
Repository: spark Updated Branches: refs/heads/master a65bcbc27 -> 20768dade Revert "[SPARK-16186] [SQL] Support partition batch pruning with `IN` predicate in InMemoryTableScanExec" This reverts commit a65bcbc27dcd9b3053cb13c5d67251c8d48f4397. Project:

spark git commit: [SPARK-16173] [SQL] Can't join describe() of DataFrame in Scala 2.10

2016-06-24 Thread davies
Repository: spark Updated Branches: refs/heads/master 20768dade -> e5d0928e2 [SPARK-16173] [SQL] Can't join describe() of DataFrame in Scala 2.10 ## What changes were proposed in this pull request? This PR fixes `DataFrame.describe()` by forcing materialization to make the `Seq`

spark git commit: [SPARK-16195][SQL] Allow users to specify empty over clause in window expressions through dataset API

2016-06-24 Thread hvanhovell
Repository: spark Updated Branches: refs/heads/master e5d0928e2 -> 9053054c7 [SPARK-16195][SQL] Allow users to specify empty over clause in window expressions through dataset API ## What changes were proposed in this pull request? Allow to specify empty over clause in window expressions

spark git commit: [SPARK-16173] [SQL] Can't join describe() of DataFrame in Scala 2.10

2016-06-24 Thread davies
Repository: spark Updated Branches: refs/heads/branch-2.0 df137e3e0 -> 9de095513 [SPARK-16173] [SQL] Can't join describe() of DataFrame in Scala 2.10 ## What changes were proposed in this pull request? This PR fixes `DataFrame.describe()` by forcing materialization to make the `Seq`

spark git commit: [SPARK-16192][SQL] Add type checks in CollectSet

2016-06-24 Thread hvanhovell
Repository: spark Updated Branches: refs/heads/master 9053054c7 -> d2e44d7db [SPARK-16192][SQL] Add type checks in CollectSet ## What changes were proposed in this pull request? `CollectSet` cannot have map-typed data because MapTypeData does not implement `equals`. So, this pr is to add

spark git commit: [SPARK-16192][SQL] Add type checks in CollectSet

2016-06-24 Thread hvanhovell
Repository: spark Updated Branches: refs/heads/branch-2.0 9e2384845 -> d079b5de7 [SPARK-16192][SQL] Add type checks in CollectSet ## What changes were proposed in this pull request? `CollectSet` cannot have map-typed data because MapTypeData does not implement `equals`. So, this pr is to add

spark git commit: [SPARK-16186] [SQL] Support partition batch pruning with `IN` predicate in InMemoryTableScanExec

2016-06-24 Thread davies
Repository: spark Updated Branches: refs/heads/master d2e44d7db -> a7d29499d [SPARK-16186] [SQL] Support partition batch pruning with `IN` predicate in InMemoryTableScanExec ## What changes were proposed in this pull request? One of the most frequent usage patterns for Spark SQL is using

spark git commit: [SPARK-16173] [SQL] Can't join describe() of DataFrame in Scala 2.10

2016-06-24 Thread davies
Repository: spark Updated Branches: refs/heads/branch-1.5 6001138fd -> 576265f83 [SPARK-16173] [SQL] Can't join describe() of DataFrame in Scala 2.10 ## What changes were proposed in this pull request? This PR fixes `DataFrame.describe()` by forcing materialization to make the `Seq`

spark git commit: [SPARK-16173] [SQL] Can't join describe() of DataFrame in Scala 2.10

2016-06-24 Thread davies
Repository: spark Updated Branches: refs/heads/branch-1.6 d7223bb9f -> b7acc1b71 [SPARK-16173] [SQL] Can't join describe() of DataFrame in Scala 2.10 ## What changes were proposed in this pull request? This PR fixes `DataFrame.describe()` by forcing materialization to make the `Seq`