spark git commit: [SPARK-14535][SQL] Remove buildInternalScan from FileFormat

2016-04-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 52a801124 -> 678b96e77 [SPARK-14535][SQL] Remove buildInternalScan from FileFormat ## What changes were proposed in this pull request? Now `HadoopFsRelation` with all kinds of file formats can be handled in `FileSourceStrategy`, we can re

spark git commit: [SPARK-14554][SQL] disable whole stage codegen if there are too many input columns

2016-04-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 2d81ba542 -> 52a801124 [SPARK-14554][SQL] disable whole stage codegen if there are too many input columns ## What changes were proposed in this pull request? In https://github.com/apache/spark/pull/12047/files#diff-94a1f59bcc9b6758c4ca87

spark git commit: [SPARK-14362][SPARK-14406][SQL][FOLLOW-UP] DDL Native Support: Drop View and Drop Table

2016-04-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 83fb96403 -> 2d81ba542 [SPARK-14362][SPARK-14406][SQL][FOLLOW-UP] DDL Native Support: Drop View and Drop Table What changes were proposed in this pull request? In this PR, we are trying to address the comment in the original PR: http

spark git commit: [SPARK-14132][SPARK-14133][SQL] Alter table partition DDLs

2016-04-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/master e9e1adc03 -> 83fb96403 [SPARK-14132][SPARK-14133][SQL] Alter table partition DDLs ## What changes were proposed in this pull request? This implements a few alter table partition commands using the `SessionCatalog`. In particular: ``` ALTE

spark git commit: [MINOR][ML] Fixed MLlib build warnings

2016-04-11 Thread srowen
Repository: spark Updated Branches: refs/heads/master 26d7af911 -> e9e1adc03 [MINOR][ML] Fixed MLlib build warnings ## What changes were proposed in this pull request? Fixes to eliminate warnings during package and doc builds. ## How was this patch tested? Existing unit tests Author: Josep

spark git commit: [SPARK-14242][CORE][NETWORK] avoid copy in compositeBuffer for frame decoder

2016-04-11 Thread davies
Repository: spark Updated Branches: refs/heads/branch-1.6 05dbc2846 -> 663a492f0 [SPARK-14242][CORE][NETWORK] avoid copy in compositeBuffer for frame decoder ## What changes were proposed in this pull request? In this patch, we set the initial `maxNumComponents` to `Integer.MAX_VALUE` instead

spark git commit: [SPARK-14520][SQL] Use correct return type in VectorizedParquetInputFormat

2016-04-11 Thread rxin
Repository: spark Updated Branches: refs/heads/master 6f27027d9 -> 26d7af911 [SPARK-14520][SQL] Use correct return type in VectorizedParquetInputFormat ## What changes were proposed in this pull request? JIRA: https://issues.apache.org/jira/browse/SPARK-14520 `VectorizedParquetInputFormat` in

spark git commit: [SPARK-14475] Propagate user-defined context from driver to executors

2016-04-11 Thread rxin
Repository: spark Updated Branches: refs/heads/master 94de63053 -> 6f27027d9 [SPARK-14475] Propagate user-defined context from driver to executors ## What changes were proposed in this pull request? This adds a new API call `TaskContext.getLocalProperty` for getting properties set in the dri

spark git commit: [SPARK-10521][SQL] Utilize Docker for test DB2 JDBC Dialect support

2016-04-11 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 3f0f40800 -> 94de63053 [SPARK-10521][SQL] Utilize Docker for test DB2 JDBC Dialect support Add integration tests based on docker to test DB2 JDBC dialect support Author: Luciano Resende Closes #9893 from lresende/SPARK-10521. Project:

spark git commit: [SPARK-14298][ML][MLLIB] LDA should support disable checkpoint

2016-04-11 Thread jkbradley
Repository: spark Updated Branches: refs/heads/branch-1.5 1e61ff4ca -> cb7a90ad5 [SPARK-14298][ML][MLLIB] LDA should support disable checkpoint ## What changes were proposed in this pull request? In the doc of [```checkpointInterval```](https://github.com/apache/spark/blob/master/mllib/src/ma

spark git commit: [SPARK-14298][ML][MLLIB] LDA should support disable checkpoint

2016-04-11 Thread jkbradley
Repository: spark Updated Branches: refs/heads/branch-1.6 f4110cd3b -> 05dbc2846 [SPARK-14298][ML][MLLIB] LDA should support disable checkpoint ## What changes were proposed in this pull request? In the doc of [```checkpointInterval```](https://github.com/apache/spark/blob/master/mllib/src/ma

spark git commit: [BUILD][HOTFIX] Download Maven from regular mirror network rather than archive.apache.org

2016-04-11 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.6 c12db0d33 -> f4110cd3b [BUILD][HOTFIX] Download Maven from regular mirror network rather than archive.apache.org [archive.apache.org](https://archive.apache.org/) is undergoing maintenance, breaking our `build/mvn` script: > We are i

spark git commit: [SPARK-14298][ML][MLLIB] Add unit test for EM LDA disable checkpointing

2016-04-11 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 89a41c5b7 -> 3f0f40800 [SPARK-14298][ML][MLLIB] Add unit test for EM LDA disable checkpointing ## What changes were proposed in this pull request? This is follow up for #12089, add unit test for EM LDA which test disable checkpointing when

spark git commit: [SPARK-13600][MLLIB] Use approxQuantile from DataFrame stats in QuantileDiscretizer

2016-04-11 Thread meng
Repository: spark Updated Branches: refs/heads/master 2dacc81ec -> 89a41c5b7 [SPARK-13600][MLLIB] Use approxQuantile from DataFrame stats in QuantileDiscretizer ## What changes were proposed in this pull request? QuantileDiscretizer can return an unexpected number of buckets in certain cases

spark git commit: [SPARK-14494][SQL] Fix the race conditions in MemoryStream and MemorySink

2016-04-11 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 5de26194a -> 2dacc81ec [SPARK-14494][SQL] Fix the race conditions in MemoryStream and MemorySink ## What changes were proposed in this pull request? Make sure accessing mutable variables in MemoryStream and MemorySink are protected by `sy

spark git commit: [SPARK-14454] [1.6] Better exception handling while marking tasks as failed

2016-04-11 Thread davies
Repository: spark Updated Branches: refs/heads/branch-1.6 baf29854e -> c12db0d33 [SPARK-14454] [1.6] Better exception handling while marking tasks as failed Backports https://github.com/apache/spark/pull/12234 to 1.6. Original description below: ## What changes were proposed in this pull req

spark git commit: [SPARK-14290] [SPARK-13352] [CORE] [BACKPORT-1.6] avoid significant memory copy in Netty's tran…

2016-04-11 Thread davies
Repository: spark Updated Branches: refs/heads/branch-1.6 7a02c446f -> baf29854e [SPARK-14290] [SPARK-13352] [CORE] [BACKPORT-1.6] avoid significant memory copy in Netty's tran… ## What changes were proposed in this pull request? When netty transfer data that is not `FileRegion`, data will

spark git commit: [SPARK-14502] [SQL] Add optimization for Binary Comparison Simplification

2016-04-11 Thread davies
Repository: spark Updated Branches: refs/heads/master 652c47030 -> 5de26194a [SPARK-14502] [SQL] Add optimization for Binary Comparison Simplification ## What changes were proposed in this pull request? We can simplifies binary comparisons with semantically-equal operands: 1. Replace '<=>' w

spark git commit: [SPARK-14528] [SQL] Fix same result of Union

2016-04-11 Thread davies
Repository: spark Updated Branches: refs/heads/master efaf7d182 -> 652c47030 [SPARK-14528] [SQL] Fix same result of Union ## What changes were proposed in this pull request? This PR fix resultResult() for Union. ## How was this patch tested? Added regression test. Author: Davies Liu Clos

spark git commit: [SPARK-14462][ML][MLLIB] Add the mllib-local build to maven pom

2016-04-11 Thread meng
Repository: spark Updated Branches: refs/heads/master 643b4e225 -> efaf7d182 [SPARK-14462][ML][MLLIB] Add the mllib-local build to maven pom ## What changes were proposed in this pull request? In order to separate the linear algebra, and vector matrix classes into a standalone jar, we need t

spark git commit: [SPARK-14510][MLLIB] Add args-checking for LDA and StreamingKMeans

2016-04-11 Thread meng
Repository: spark Updated Branches: refs/heads/master 1c751fcf4 -> 643b4e225 [SPARK-14510][MLLIB] Add args-checking for LDA and StreamingKMeans ## What changes were proposed in this pull request? add the checking for LDA and StreamingKMeans ## How was this patch tested? manual tests Author:

[2/2] spark git commit: [SPARK-14500] [ML] Accept Dataset[_] instead of DataFrame in MLlib APIs

2016-04-11 Thread meng
[SPARK-14500] [ML] Accept Dataset[_] instead of DataFrame in MLlib APIs ## What changes were proposed in this pull request? This PR updates MLlib APIs to accept `Dataset[_]` as input where `DataFrame` was the input type. This PR doesn't change the output type. In Java, `Dataset[_]` maps to `Dat

[1/2] spark git commit: [SPARK-14500] [ML] Accept Dataset[_] instead of DataFrame in MLlib APIs

2016-04-11 Thread meng
Repository: spark Updated Branches: refs/heads/master e82d95bf6 -> 1c751fcf4 http://git-wip-us.apache.org/repos/asf/spark/blob/1c751fcf/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala -- diff --git a/mllib/src

spark git commit: [SPARK-14372][SQL] Dataset.randomSplit() needs a Java version

2016-04-11 Thread lian
Repository: spark Updated Branches: refs/heads/master 1a0cca1fc -> e82d95bf6 [SPARK-14372][SQL] Dataset.randomSplit() needs a Java version ## What changes were proposed in this pull request? 1.Added method randomSplitAsList() in Dataset for java for https://issues.apache.org/jira/browse/SPARK

spark git commit: [MINOR][DOCS] Fix wrong data types in JSON Datasets example.

2016-04-11 Thread srowen
Repository: spark Updated Branches: refs/heads/master 9f838bd24 -> 1a0cca1fc [MINOR][DOCS] Fix wrong data types in JSON Datasets example. ## What changes were proposed in this pull request? This PR fixes the `age` data types from `integer` to `long` in `SQL Programming Guide: JSON Datasets`.