spark git commit: [SPARK-16907][SQL] Fix performance regression for parquet table when vectorized parquet record reader is not being used

2016-08-04 Thread wenchen
Repository: spark Updated Branches: refs/heads/branch-2.0 824d6268d -> dae08fb5a [SPARK-16907][SQL] Fix performance regression for parquet table when vectorized parquet record reader is not being used ## What changes were proposed in this pull request? For non-partitioned parquet table, if

spark git commit: [SPARK-16907][SQL] Fix performance regression for parquet table when vectorized parquet record reader is not being used

2016-08-04 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 53e766cfe -> 1fa644497 [SPARK-16907][SQL] Fix performance regression for parquet table when vectorized parquet record reader is not being used ## What changes were proposed in this pull request? For non-partitioned parquet table, if the

spark git commit: MAINTENANCE. Cleaning up stale PRs.

2016-08-04 Thread vanzin
Repository: spark Updated Branches: refs/heads/master d91c6755a -> 53e766cfe MAINTENANCE. Cleaning up stale PRs. Closing the following PRs due to requests or unresponsive users. Closes #13923 Closes #14462 Closes #13123 Closes #14423 (requested by srowen) Closes #14424 (requested by srowen)

spark git commit: [HOTFIX] Remove unnecessary imports from #12944 that broke build

2016-08-04 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 9c15d079d -> d91c6755a [HOTFIX] Remove unnecessary imports from #12944 that broke build Author: Josh Rosen Closes #14499 from JoshRosen/hotfix. Project: http://git-wip-us.apache.org/repos/asf/spark/repo

spark git commit: [SPARK-15074][SHUFFLE] Cache shuffle index file to speedup shuffle fetch

2016-08-04 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 0e2e5d7d0 -> 9c15d079d [SPARK-15074][SHUFFLE] Cache shuffle index file to speedup shuffle fetch ## What changes were proposed in this pull request? Shuffle fetch on large intermediate dataset is slow because the shuffle service

spark git commit: [SPARK-16863][ML] ProbabilisticClassifier.fit check threshoulds' length

2016-08-04 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 818ddcf98 -> 824d6268d [SPARK-16863][ML] ProbabilisticClassifier.fit check threshoulds' length ## What changes were proposed in this pull request? Add threshoulds' length checking for Classifiers which extends ProbabilisticClassifier

spark git commit: [SPARK-16863][ML] ProbabilisticClassifier.fit check threshoulds' length

2016-08-04 Thread srowen
Repository: spark Updated Branches: refs/heads/master 1d781572e -> 0e2e5d7d0 [SPARK-16863][ML] ProbabilisticClassifier.fit check threshoulds' length ## What changes were proposed in this pull request? Add threshoulds' length checking for Classifiers which extends ProbabilisticClassifier ##

spark git commit: [SPARK-16877][BUILD] Add rules for preventing to use Java annotations (Deprecated and Override)

2016-08-04 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 c66338b3a -> 818ddcf98 [SPARK-16877][BUILD] Add rules for preventing to use Java annotations (Deprecated and Override) ## What changes were proposed in this pull request? This PR adds both rules for preventing to use `Deprecated` and

spark git commit: [SPARK-16877][BUILD] Add rules for preventing to use Java annotations (Deprecated and Override)

2016-08-04 Thread srowen
Repository: spark Updated Branches: refs/heads/master 462784ffa -> 1d781572e [SPARK-16877][BUILD] Add rules for preventing to use Java annotations (Deprecated and Override) ## What changes were proposed in this pull request? This PR adds both rules for preventing to use `Deprecated` and

spark git commit: [SPARK-16880][ML][MLLIB] make ann training data persisted if needed

2016-08-04 Thread srowen
Repository: spark Updated Branches: refs/heads/master be8ea4b2f -> 462784ffa [SPARK-16880][ML][MLLIB] make ann training data persisted if needed ## What changes were proposed in this pull request? To Make sure ANN layer input training data to be persisted, so that it can avoid overhead cost

spark git commit: [SPARK-16880][ML][MLLIB] make ann training data persisted if needed

2016-08-04 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 ddbff011e -> c66338b3a [SPARK-16880][ML][MLLIB] make ann training data persisted if needed ## What changes were proposed in this pull request? To Make sure ANN layer input training data to be persisted, so that it can avoid overhead

spark git commit: [SPARK-16875][SQL] Add args checking for DataSet randomSplit and sample

2016-08-04 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 182991edd -> ddbff011e [SPARK-16875][SQL] Add args checking for DataSet randomSplit and sample ## What changes were proposed in this pull request? Add the missing args-checking for randomSplit and sample ## How was this patch tested?

spark git commit: [SPARK-16875][SQL] Add args checking for DataSet randomSplit and sample

2016-08-04 Thread srowen
Repository: spark Updated Branches: refs/heads/master ac2a26d09 -> be8ea4b2f [SPARK-16875][SQL] Add args checking for DataSet randomSplit and sample ## What changes were proposed in this pull request? Add the missing args-checking for randomSplit and sample ## How was this patch tested?

spark git commit: [SPARK-16884] Move DataSourceScanExec out of ExistingRDD.scala file

2016-08-04 Thread davies
Repository: spark Updated Branches: refs/heads/master 9d4e6212f -> ac2a26d09 [SPARK-16884] Move DataSourceScanExec out of ExistingRDD.scala file ## What changes were proposed in this pull request? This moves DataSourceScanExec out so it's more discoverable, and now that it doesn't

spark git commit: [SPARK-16802] [SQL] fix overflow in LongToUnsafeRowMap

2016-08-04 Thread davies
Repository: spark Updated Branches: refs/heads/master 9d7a47406 -> 9d4e6212f [SPARK-16802] [SQL] fix overflow in LongToUnsafeRowMap ## What changes were proposed in this pull request? This patch fix the overflow in LongToUnsafeRowMap when the range of key is very wide (the key is much much

spark git commit: [SPARK-16802] [SQL] fix overflow in LongToUnsafeRowMap

2016-08-04 Thread davies
Repository: spark Updated Branches: refs/heads/branch-2.0 11854e5a1 -> 182991edd [SPARK-16802] [SQL] fix overflow in LongToUnsafeRowMap ## What changes were proposed in this pull request? This patch fix the overflow in LongToUnsafeRowMap when the range of key is very wide (the key is much

spark git commit: [SPARK-16853][SQL] fixes encoder error in DataSet typed select

2016-08-04 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 43f4fd6f9 -> 9d7a47406 [SPARK-16853][SQL] fixes encoder error in DataSet typed select ## What changes were proposed in this pull request? For DataSet typed select: ``` def select[U1: Encoder](c1: TypedColumn[T, U1]): Dataset[U1] ``` If

spark git commit: [SPARK-16867][SQL] createTable and alterTable in ExternalCatalog should not take db

2016-08-04 Thread lian
Repository: spark Updated Branches: refs/heads/master 27e815c31 -> 43f4fd6f9 [SPARK-16867][SQL] createTable and alterTable in ExternalCatalog should not take db ## What changes were proposed in this pull request? These 2 methods take `CatalogTable` as parameter, which already have the