spark git commit: [SPARK-7470] [SQL] Spark shell SQLContext crashes without hive

2015-05-07 Thread yhuai
sqlContext import sqlContext.sql ^ ``` yhuai marmbrus Author: Andrew Or and...@databricks.com Closes #5997 from andrewor14/sql-shell-crash and squashes the following commits: 61147e6 [Andrew Or] Also expect NoClassDefFoundError Project: http://git-wip-us.apache.org/repos/asf

spark git commit: [SPARK-7232] [SQL] Add a Substitution batch for spark sql analyzer

2015-05-07 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 1a3e9e982 - bb5872f2d [SPARK-7232] [SQL] Add a Substitution batch for spark sql analyzer Added a new batch named `Substitution` before `Resolution` batch. The motivation for this is there are kind of cases we want to do some

spark git commit: [SPARK-7232] [SQL] Add a Substitution batch for spark sql analyzer

2015-05-07 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 714db2ef5 - f496bf3c5 [SPARK-7232] [SQL] Add a Substitution batch for spark sql analyzer Added a new batch named `Substitution` before `Resolution` batch. The motivation for this is there are kind of cases we want to do some

[2/2] spark git commit: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread yhuai
[SPARK-6908] [SQL] Use isolated Hive client This PR switches Spark SQL's Hive support to use the isolated hive client interface introduced by #5851, instead of directly interacting with the client. By using this isolated client we can now allow users to dynamically configure the version of

[1/2] spark git commit: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 2e8a141b5 - 05454fd8a http://git-wip-us.apache.org/repos/asf/spark/blob/05454fd8/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala -- diff

[1/2] spark git commit: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 22ab70e06 - cd1d4110c http://git-wip-us.apache.org/repos/asf/spark/blob/cd1d4110/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala -- diff --git

spark git commit: [SPARK-6986] [SQL] Use Serializer2 in more cases.

2015-05-07 Thread yhuai
to determine it is handling a key-value pari, a key, or a value. It is safe to use `SparkSqlSerializer2` in more cases. Author: Yin Huai yh...@databricks.com Closes #5849 from yhuai/serializer2MoreCases and squashes the following commits: 53a5eaa [Yin Huai] Josh's comments. 487f540 [Yin Huai

spark git commit: [SPARK-6986] [SQL] Use Serializer2 in more cases.

2015-05-07 Thread yhuai
to determine it is handling a key-value pari, a key, or a value. It is safe to use `SparkSqlSerializer2` in more cases. Author: Yin Huai yh...@databricks.com Closes #5849 from yhuai/serializer2MoreCases and squashes the following commits: 53a5eaa [Yin Huai] Josh's comments. 487f540 [Yin Huai] Use

spark git commit: [SPARK-7470] [SQL] Spark shell SQLContext crashes without hive

2015-05-07 Thread yhuai
sqlContext import sqlContext.sql ^ ``` yhuai marmbrus Author: Andrew Or and...@databricks.com Closes #5997 from andrewor14/sql-shell-crash and squashes the following commits: 61147e6 [Andrew Or] Also expect NoClassDefFoundError (cherry picked from commit

spark git commit: [HOT-FIX] Move HiveWindowFunctionQuerySuite.scala to hive compatibility dir.

2015-05-06 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 845d1d4d0 - 774099670 [HOT-FIX] Move HiveWindowFunctionQuerySuite.scala to hive compatibility dir. Author: Yin Huai yh...@databricks.com Closes #5951 from yhuai/fixBuildMaven and squashes the following commits: fdde183 [Yin Huai] Move

spark git commit: [HOT-FIX] Move HiveWindowFunctionQuerySuite.scala to hive compatibility dir.

2015-05-06 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 2163367ea - 14bcb84e8 [HOT-FIX] Move HiveWindowFunctionQuerySuite.scala to hive compatibility dir. Author: Yin Huai yh...@databricks.com Closes #5951 from yhuai/fixBuildMaven and squashes the following commits: fdde183 [Yin Huai

spark git commit: [SPARK-6201] [SQL] promote string and do widen types for IN

2015-05-06 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 150f671c2 - c3eb441f5 [SPARK-6201] [SQL] promote string and do widen types for IN huangjs Acutally spark sql will first go through analysis period, in which we do widen types and promote strings, and then optimization, where constant IN

spark git commit: [SPARK-7375] [SQL] Avoid row copying in exchange when sort.serializeMapOutputs takes effect

2015-05-08 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 0a901dd3a - cde548388 [SPARK-7375] [SQL] Avoid row copying in exchange when sort.serializeMapOutputs takes effect This patch refactors the SQL `Exchange` operator's logic for determining whether map outputs need to be copied before being

spark git commit: [SPARK-7375] [SQL] Avoid row copying in exchange when sort.serializeMapOutputs takes effect

2015-05-08 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 448ff333f - 21212a27c [SPARK-7375] [SQL] Avoid row copying in exchange when sort.serializeMapOutputs takes effect This patch refactors the SQL `Exchange` operator's logic for determining whether map outputs need to be copied before

spark git commit: [SPARK-7330] [SQL] avoid NPE at jdbc rdd

2015-05-07 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 4f87e9562 - ed9be06a4 [SPARK-7330] [SQL] avoid NPE at jdbc rdd Thank nadavoosh point this out in #5590 Author: Daoyuan Wang daoyuan.w...@intel.com Closes #5877 from adrian-wang/jdbcrdd and squashes the following commits: cc11900

spark git commit: [SPARK-7330] [SQL] avoid NPE at jdbc rdd

2015-05-07 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 91ce13109 - 84ee348bc [SPARK-7330] [SQL] avoid NPE at jdbc rdd Thank nadavoosh point this out in #5590 Author: Daoyuan Wang daoyuan.w...@intel.com Closes #5877 from adrian-wang/jdbcrdd and squashes the following commits: cc11900

spark git commit: [SPARK-7330] [SQL] avoid NPE at jdbc rdd

2015-05-07 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.3 cbf232daa - edcd3643a [SPARK-7330] [SQL] avoid NPE at jdbc rdd Thank nadavoosh point this out in #5590 Author: Daoyuan Wang daoyuan.w...@intel.com Closes #5877 from adrian-wang/jdbcrdd and squashes the following commits: cc11900

spark git commit: [SQL] [MINOR] use catalyst type converter in ScalaUdf

2015-05-17 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 e0632ffaf - be66d1924 [SQL] [MINOR] use catalyst type converter in ScalaUdf It's a follow-up of https://github.com/apache/spark/pull/5154, we can speed up scala udf evaluation by create type converter in advance. Author: Wenchen Fan

spark git commit: [SQL] [MINOR] use catalyst type converter in ScalaUdf

2015-05-17 Thread yhuai
Repository: spark Updated Branches: refs/heads/master ca4257aec - 2f22424e9 [SQL] [MINOR] use catalyst type converter in ScalaUdf It's a follow-up of https://github.com/apache/spark/pull/5154, we can speed up scala udf evaluation by create type converter in advance. Author: Wenchen Fan

spark git commit: [SPARK-7673] [SQL] WIP: HadoopFsRelation and ParquetRelation2 performance optimizations

2015-05-18 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 530397ba2 - 9dadf019b [SPARK-7673] [SQL] WIP: HadoopFsRelation and ParquetRelation2 performance optimizations This PR introduces several performance optimizations to `HadoopFsRelation` and `ParquetRelation2`: 1. Moving `FileStatus`

spark git commit: [SPARK-7713] [SQL] Use shared broadcast hadoop conf for partitioned table scan.

2015-05-20 Thread yhuai
).explain(true) ``` In our master `explain` takes 40s in my laptop. With this PR, `explain` takes 14s. Author: Yin Huai yh...@databricks.com Closes #6252 from yhuai/broadcastHadoopConf and squashes the following commits: 6fa73df [Yin Huai] Address comments of Josh and Andrew. 807fbf9 [Yin Huai] Make

spark git commit: [SPARK-7713] [SQL] Use shared broadcast hadoop conf for partitioned table scan.

2015-05-20 Thread yhuai
).explain(true) ``` In our master `explain` takes 40s in my laptop. With this PR, `explain` takes 14s. Author: Yin Huai yh...@databricks.com Closes #6252 from yhuai/broadcastHadoopConf and squashes the following commits: 6fa73df [Yin Huai] Address comments of Josh and Andrew. 807fbf9 [Yin Huai

spark git commit: [SPARK-7320] [SQL] Add Cube / Rollup for dataframe

2015-05-20 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 b6182ce89 - 4fd674336 [SPARK-7320] [SQL] Add Cube / Rollup for dataframe This is a follow up for #6257, which broke the maven test. Add cube rollup for DataFrame For example: ```scala testData.rollup($a + $b, $b).agg(sum($a - $b))

spark git commit: [SPARK-7320] [SQL] Add Cube / Rollup for dataframe

2015-05-20 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 895baf8f7 - 42c592adb [SPARK-7320] [SQL] Add Cube / Rollup for dataframe This is a follow up for #6257, which broke the maven test. Add cube rollup for DataFrame For example: ```scala testData.rollup($a + $b, $b).agg(sum($a - $b))

spark git commit: [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make metadataHive get constructed too early

2015-06-02 Thread yhuai
Repository: spark Updated Branches: refs/heads/master bcb47ad77 - 7b7f7b6c6 [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make metadataHive get constructed too early https://issues.apache.org/jira/browse/SPARK-8020 Author: Yin Huai yh...@databricks.com Closes #6571 from yhuai

spark git commit: [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make metadataHive get constructed too early

2015-06-02 Thread yhuai
yhuai/SPARK-8020-1 and squashes the following commits: 0398f5b [Yin Huai] First populate the SQLConf and then construct executionHive and metadataHive. (cherry picked from commit 7b7f7b6c6fd903e2ecfc886d29eaa9df58adcfc3) Signed-off-by: Yin Huai yh...@databricks.com Project: http://git-wip

spark git commit: [SPARK-8121] [SQL] Fixes InsertIntoHadoopFsRelation job initialization for Hadoop 1.x

2015-06-08 Thread yhuai
Repository: spark Updated Branches: refs/heads/master ed5c2dccd - bbdfc0a40 [SPARK-8121] [SQL] Fixes InsertIntoHadoopFsRelation job initialization for Hadoop 1.x For Hadoop 1.x, `TaskAttemptContext` constructor clones the `Configuration` argument, thus configurations done in

spark git commit: [SPARK-8121] [SQL] Fixes InsertIntoHadoopFsRelation job initialization for Hadoop 1.x (branch 1.4 backport based on https://github.com/apache/spark/pull/6669)

2015-06-08 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 a3afc2cba - 69197c3e3 [SPARK-8121] [SQL] Fixes InsertIntoHadoopFsRelation job initialization for Hadoop 1.x (branch 1.4 backport based on https://github.com/apache/spark/pull/6669) Project:

spark git commit: [SPARK-7747] [SQL] [DOCS] spark.sql.planner.externalSort

2015-06-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 4f16d3fe2 - 4060526cd [SPARK-7747] [SQL] [DOCS] spark.sql.planner.externalSort Add documentation for spark.sql.planner.externalSort Author: Luca Martinetti l...@luca.io Closes #6272 from lucamartinetti/docs-externalsort and squashes the

spark git commit: [SPARK-7747] [SQL] [DOCS] spark.sql.planner.externalSort

2015-06-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 200c980a1 - 94f65bcce [SPARK-7747] [SQL] [DOCS] spark.sql.planner.externalSort Add documentation for spark.sql.planner.externalSort Author: Luca Martinetti l...@luca.io Closes #6272 from lucamartinetti/docs-externalsort and squashes

spark git commit: [SPARK-6964] [SQL] Support Cancellation in the Thrift Server

2015-06-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 6ebe419f3 - eb19d3f75 [SPARK-6964] [SQL] Support Cancellation in the Thrift Server Support runInBackground in SparkExecuteStatementOperation, and add cancellation Author: Dong Wang d...@databricks.com Closes #6207 from

spark git commit: [SPARK-8014] [SQL] Avoid premature metadata discovery when writing a HadoopFsRelation with a save mode other than Append

2015-06-02 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 815e05654 - cbaf59544 [SPARK-8014] [SQL] Avoid premature metadata discovery when writing a HadoopFsRelation with a save mode other than Append The current code references the schema of the DataFrame to be written before checking save

spark git commit: [SPARK-7973] [SQL] Increase the timeout of two CliSuite tests.

2015-06-03 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 ee7f365bd - 54a4ea407 [SPARK-7973] [SQL] Increase the timeout of two CliSuite tests. https://issues.apache.org/jira/browse/SPARK-7973 Author: Yin Huai yh...@databricks.com Closes #6525 from yhuai/SPARK-7973 and squashes the following

spark git commit: [SPARK-7973] [SQL] Increase the timeout of two CliSuite tests.

2015-06-03 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 28dbde387 - f1646e102 [SPARK-7973] [SQL] Increase the timeout of two CliSuite tests. https://issues.apache.org/jira/browse/SPARK-7973 Author: Yin Huai yh...@databricks.com Closes #6525 from yhuai/SPARK-7973 and squashes the following

spark git commit: [SPARK-8406] [SQL] Backports SPARK-8406 and PR #6864 to branch-1.4

2015-06-22 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 b836bac3f - 451c8722a [SPARK-8406] [SQL] Backports SPARK-8406 and PR #6864 to branch-1.4 Author: Cheng Lian l...@databricks.com Closes #6932 from liancheng/spark-8406-for-1.4 and squashes the following commits: a0168fe [Cheng Lian]

spark git commit: [SPARK-8406] [SQL] Adding UUID to output file name to avoid accidental overwriting

2015-06-22 Thread yhuai
://github.com/liancheng/spark/tree/spark-8513 Some background and a summary of offline discussion with yhuai about this issue for better understanding: In 1.4.0, we added `HadoopFsRelation` to abstract partition support of all data sources that are based on Hadoop `FileSystem` interface

spark git commit: [SPARK-8420] [SQL] Fix comparision of timestamps/dates with strings (branch-1.4)

2015-06-22 Thread yhuai
/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala Author: Michael Armbrust mich...@databricks.com Closes #6914 from yhuai/timeCompareString-1.4 and squashes the following commits: 9882915 [Michael Armbrust] [SPARK-8420] [SQL] Fix comparision of timestamps/dates

spark git commit: [SPARK-8532] [SQL] In Python's DataFrameWriter, save/saveAsTable/json/parquet/jdbc always override mode

2015-06-22 Thread yhuai
actions (i.e. `save/saveAsTable/json/parquet/jdbc`) always override mode. Second, it adds input argument `partitionBy` to `save/saveAsTable/parquet`. Author: Yin Huai yh...@databricks.com Closes #6937 from yhuai/SPARK-8532 and squashes the following commits: f972d5d [Yin Huai] davies's comment

spark git commit: [SPARK-8532] [SQL] In Python's DataFrameWriter, save/saveAsTable/json/parquet/jdbc always override mode

2015-06-22 Thread yhuai
actions (i.e. `save/saveAsTable/json/parquet/jdbc`) always override mode. Second, it adds input argument `partitionBy` to `save/saveAsTable/parquet`. Author: Yin Huai yh...@databricks.com Closes #6937 from yhuai/SPARK-8532 and squashes the following commits: f972d5d [Yin Huai] davies's comment

spark git commit: [SPARK-8420] [SQL] Fix comparision of timestamps/dates with strings

2015-06-19 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 9814b971f - a333a72e0 [SPARK-8420] [SQL] Fix comparision of timestamps/dates with strings In earlier versions of Spark SQL we casted `TimestampType` and `DataType` to `StringType` when it was involved in a binary comparison with a

spark git commit: [SPARK-8093] [SQL] Remove empty structs inferred from JSON documents

2015-06-19 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 1a6b51078 - 0131142d9 [SPARK-8093] [SQL] Remove empty structs inferred from JSON documents Author: Nathan Howell nhow...@godaddy.com Closes #6799 from NathanHowell/spark-8093 and squashes the following commits: 76ac3e8 [Nathan

spark git commit: [SPARK-8093] [SQL] Remove empty structs inferred from JSON documents

2015-06-19 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 1fa29c2df - 9814b971f [SPARK-8093] [SQL] Remove empty structs inferred from JSON documents Author: Nathan Howell nhow...@godaddy.com Closes #6799 from NathanHowell/spark-8093 and squashes the following commits: 76ac3e8 [Nathan Howell]

spark git commit: [HOTFIX] Hotfix branch-1.4 building by removing avgMetrics in CrossValidatorSuite

2015-06-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 2a7ea31a9 - b836bac3f [HOTFIX] Hotfix branch-1.4 building by removing avgMetrics in CrossValidatorSuite Ref. #6905 ping yhuai Author: Liang-Chi Hsieh vii...@gmail.com Closes #6929 from viirya/hot_fix_cv_test and squashes

spark git commit: [HOT-FIX] Fix compilation (caused by 0131142d98b191f6cc112d383aa10582a3ac35bf)

2015-06-19 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 0131142d9 - 2510365fa [HOT-FIX] Fix compilation (caused by 0131142d98b191f6cc112d383aa10582a3ac35bf) Author: Yin Huai yh...@databricks.com Closes #6913 from yhuai/branch-1.4-hotfix and squashes the following commits: 7f91fa0 [Yin

spark git commit: [SPARK-8498] [SQL] Add regression test for SPARK-8470

2015-06-19 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 2510365fa - 2248ad8b7 [SPARK-8498] [SQL] Add regression test for SPARK-8470 **Summary of the problem in SPARK-8470.** When using `HiveContext` to create a data frame of a user case class, Spark throws

spark git commit: [SPARK-8498] [SQL] Add regression test for SPARK-8470

2015-06-19 Thread yhuai
Repository: spark Updated Branches: refs/heads/master b305e377f - 093c34838 [SPARK-8498] [SQL] Add regression test for SPARK-8470 **Summary of the problem in SPARK-8470.** When using `HiveContext` to create a data frame of a user case class, Spark throws

spark git commit: [SPARK-8567] [SQL] Add logs to record the progress of HiveSparkSubmitSuite.

2015-06-25 Thread yhuai
Repository: spark Updated Branches: refs/heads/master e988adb58 - f9b397f54 [SPARK-8567] [SQL] Add logs to record the progress of HiveSparkSubmitSuite. Author: Yin Huai yh...@databricks.com Closes #7009 from yhuai/SPARK-8567 and squashes the following commits: 62fb1f9 [Yin Huai] Add sc.stop

spark git commit: [SPARK-6749] [SQL] Make metastore client robust to underlying socket connection loss

2015-06-23 Thread yhuai
Repository: spark Updated Branches: refs/heads/master a458efc66 - 50c3a86f4 [SPARK-6749] [SQL] Make metastore client robust to underlying socket connection loss This works around a bug in the underlying RetryingMetaStoreClient (HIVE-10384) by refreshing the metastore client on thrift

spark git commit: [SPARK-8567] [SQL] Debugging flaky HiveSparkSubmitSuite

2015-06-24 Thread yhuai
. (This test suite only fails on Jenkins and doesn't spill out any log...) cc yhuai Author: Cheng Lian l...@databricks.com Closes #6978 from liancheng/debug-hive-spark-submit-suite and squashes the following commits: b031647 [Cheng Lian] Prints process stdout/stderr instead of logging them

spark git commit: [SPARK-8578] [SQL] Should ignore user defined output committer when appending data (branch 1.4)

2015-06-24 Thread yhuai
#6966 from yhuai/SPARK-8578-branch-1.4 and squashes the following commits: 9c3947b [Yin Huai] Do not use a custom output commiter when appendiing data. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7e53ff25 Tree: http://git

spark git commit: [SPARK-8578] [SQL] Should ignore user defined output committer when appending data

2015-06-24 Thread yhuai
to an existing dir. This changes adds the logic to check if we are appending data, and if so, we use the output committer associated with the file output format. Author: Yin Huai yh...@databricks.com Closes #6964 from yhuai/SPARK-8578 and squashes the following commits: 43544c4 [Yin Huai] Do not use

spark git commit: [SPARK-7859] [SQL] Collect_set() behavior differences which fails the unit test under jdk8

2015-06-22 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 994abbaeb - d73900a90 [SPARK-7859] [SQL] Collect_set() behavior differences which fails the unit test under jdk8 To reproduce that: ``` JAVA_HOME=/home/hcheng/Java/jdk1.8.0_45 | build/sbt -Phadoop-2.3 -Phive 'test-only

spark git commit: [SPARK-7853] [SQL] Fix HiveContext in Spark Shell

2015-05-28 Thread yhuai
Context fails to create in spark shell because of the class loader issue. Author: Yin Huai yh...@databricks.com Closes #6459 from yhuai/SPARK-7853 and squashes the following commits: 37ad33e [Yin Huai] Do not use hiveQlTable at all. 47cdb6d [Yin Huai] Move hiveconf.set to the end of setConf

spark git commit: [SPARK-7853] [SQL] Fix HiveContext in Spark Shell

2015-05-28 Thread yhuai
that Hive Context fails to create in spark shell because of the class loader issue. Author: Yin Huai yh...@databricks.com Closes #6459 from yhuai/SPARK-7853 and squashes the following commits: 37ad33e [Yin Huai] Do not use hiveQlTable at all. 47cdb6d [Yin Huai] Move hiveconf.set to the end of setConf

spark git commit: [SPARK-7847] [SQL] Fixes dynamic partition directory escaping

2015-05-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 90525c9ba - a25ce91f9 [SPARK-7847] [SQL] Fixes dynamic partition directory escaping Please refer to [SPARK-7847] [1] for details. [1]: https://issues.apache.org/jira/browse/SPARK-7847 Author: Cheng Lian l...@databricks.com Closes

spark git commit: [SPARK-7847] [SQL] Fixes dynamic partition directory escaping

2015-05-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/master ff0ddff46 - 15459db4f [SPARK-7847] [SQL] Fixes dynamic partition directory escaping Please refer to [SPARK-7847] [1] for details. [1]: https://issues.apache.org/jira/browse/SPARK-7847 Author: Cheng Lian l...@databricks.com Closes #6389

spark git commit: [SPARK-7790] [SQL] date and decimal conversion for dynamic partition key

2015-05-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 6fec1a940 - 8161562ea [SPARK-7790] [SQL] date and decimal conversion for dynamic partition key Author: Daoyuan Wang daoyuan.w...@intel.com Closes #6318 from adrian-wang/dynpart and squashes the following commits: ad73b61 [Daoyuan Wang]

spark git commit: [SPARK-7684] [SQL] Refactoring MetastoreDataSourcesSuite to workaround SPARK-7684

2015-05-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 d33142fd8 - 89fe93fc3 [SPARK-7684] [SQL] Refactoring MetastoreDataSourcesSuite to workaround SPARK-7684 As stated in SPARK-7684, currently `TestHive.reset` has some execution order specific bug, which makes running specific test

spark git commit: [SPARK-7853] [SQL] Fixes a class loader issue in Spark SQL

2015-05-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 89fe93fc3 - e07b71560 [SPARK-7853] [SQL] Fixes a class loader issue in Spark SQL This PR is based on PR #6396 authored by chenghao-intel. Essentially, Spark SQL should use context classloader to load SerDe classes. yhuai helped

spark git commit: [SPARK-7853] [SQL] Fixes a class loader issue in Spark SQL

2015-05-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/master b97ddff00 - db3fd054f [SPARK-7853] [SQL] Fixes a class loader issue in Spark SQL This PR is based on PR #6396 authored by chenghao-intel. Essentially, Spark SQL should use context classloader to load SerDe classes. yhuai helped updating

spark git commit: [SPARK-7868] [SQL] Ignores _temporary directories in HadoopFsRelation

2015-05-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 faadbd4d9 - d0bd68ff8 [SPARK-7868] [SQL] Ignores _temporary directories in HadoopFsRelation So that potential partial/corrupted data files left by failed tasks/jobs won't affect normal data scan. Author: Cheng Lian

spark git commit: [SPARK-7950] [SQL] Sets spark.sql.hive.version in HiveThriftServer2.startWithContext()

2015-05-29 Thread yhuai
Repository: spark Updated Branches: refs/heads/master a51b133de - e7b617755 [SPARK-7950] [SQL] Sets spark.sql.hive.version in HiveThriftServer2.startWithContext() When starting `HiveThriftServer2` via `startWithContext`, property `spark.sql.hive.version` isn't set. This causes Simba ODBC

spark git commit: [SPARK-7950] [SQL] Sets spark.sql.hive.version in HiveThriftServer2.startWithContext()

2015-05-29 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 23bd05fff - caea7a618 [SPARK-7950] [SQL] Sets spark.sql.hive.version in HiveThriftServer2.startWithContext() When starting `HiveThriftServer2` via `startWithContext`, property `spark.sql.hive.version` isn't set. This causes Simba

spark git commit: [SPARK-7907] [SQL] [UI] Rename tab ThriftServer to SQL.

2015-05-27 Thread yhuai
`; and 3. Renaming the title of the session page from `ThriftServer` to `JDBC/ODBC Session`. https://issues.apache.org/jira/browse/SPARK-7907 Author: Yin Huai yh...@databricks.com Closes #6448 from yhuai/JDBCServer and squashes the following commits: eadcc3d [Yin Huai] Update test. 9168005 [Yin Huai

spark git commit: [SPARK-7907] [SQL] [UI] Rename tab ThriftServer to SQL.

2015-05-27 Thread yhuai
. Renaming the title of the session page from `ThriftServer` to `JDBC/ODBC Session`. https://issues.apache.org/jira/browse/SPARK-7907 Author: Yin Huai yh...@databricks.com Closes #6448 from yhuai/JDBCServer and squashes the following commits: eadcc3d [Yin Huai] Update test. 9168005 [Yin Huai] Use

spark git commit: [HOT-FIX] Add EvaluatedType back to RDG

2015-06-02 Thread yhuai
from yhuai/getBackEvaluatedType and squashes the following commits: 618c2eb [Yin Huai] Add EvaluatedType back. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8c3fc3a6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree

spark git commit: [SPARK-8776] Increase the default MaxPermSize

2015-07-02 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 de0802499 - f142867ec [SPARK-8776] Increase the default MaxPermSize I am increasing the perm gen size to 256m. https://issues.apache.org/jira/browse/SPARK-8776 Author: Yin Huai yh...@databricks.com Closes #7196 from yhuai/SPARK-8776

spark git commit: [SPARK-8776] Increase the default MaxPermSize

2015-07-02 Thread yhuai
Repository: spark Updated Branches: refs/heads/master a59d14f62 - f743c79ab [SPARK-8776] Increase the default MaxPermSize I am increasing the perm gen size to 256m. https://issues.apache.org/jira/browse/SPARK-8776 Author: Yin Huai yh...@databricks.com Closes #7196 from yhuai/SPARK-8776

spark git commit: [SPARK-7805] [SQL] Move SQLTestUtils.scala and ParquetTest.scala to src/test

2015-05-24 Thread yhuai
`SQLTestUtils` and `ParquetTest` in `src/main`. We should only add stuff that will be needed by `sql/console` or Python tests (for Python, we need it in `src/main`, right? davies). Author: Yin Huai yh...@databricks.com Closes #6334 from yhuai/SPARK-7805 and squashes the following commits

spark git commit: [SPARK-7845] [BUILD] Bump Hadoop 1 tests to version 1.2.1

2015-05-24 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 947d700ec - 11d998eb7 [SPARK-7845] [BUILD] Bump Hadoop 1 tests to version 1.2.1 https://issues.apache.org/jira/browse/SPARK-7845 Author: Yin Huai yh...@databricks.com Closes #6384 from yhuai/hadoop1Test and squashes the following

spark git commit: [SPARK-7845] [BUILD] Bump Hadoop 1 tests to version 1.2.1

2015-05-24 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 3c1a2d049 - bfbc0df72 [SPARK-7845] [BUILD] Bump Hadoop 1 tests to version 1.2.1 https://issues.apache.org/jira/browse/SPARK-7845 Author: Yin Huai yh...@databricks.com Closes #6384 from yhuai/hadoop1Test and squashes the following commits

spark git commit: [SPARK-7322, SPARK-7836, SPARK-7822][SQL] DataFrame window function related updates

2015-05-23 Thread yhuai
Repository: spark Updated Branches: refs/heads/master ad0badba1 - efe3bfdf4 [SPARK-7322, SPARK-7836, SPARK-7822][SQL] DataFrame window function related updates 1. ntile should take an integer as parameter. 2. Added Python API (based on #6364) 3. Update documentation of various DataFrame

spark git commit: [SPARK-7322, SPARK-7836, SPARK-7822][SQL] DataFrame window function related updates

2015-05-23 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 ea9db50bc - d1515381c [SPARK-7322, SPARK-7836, SPARK-7822][SQL] DataFrame window function related updates 1. ntile should take an integer as parameter. 2. Added Python API (based on #6364) 3. Update documentation of various DataFrame

spark git commit: [SPARK-7654] [SQL] Move insertInto into reader/writer interface.

2015-05-23 Thread yhuai
Closes #6366 from yhuai/insert and squashes the following commits: 3d717fb [Yin Huai] Use insertInto to handle the casue when table exists and Append is used for saveAsTable. 56d2540 [Yin Huai] Add PreWriteCheck to HiveContext's analyzer. c636e35 [Yin Huai] Remove unnecessary empty lines

spark git commit: [SPARK-7654] [SQL] Move insertInto into reader/writer interface.

2015-05-23 Thread yhuai
...@databricks.com Closes #6366 from yhuai/insert and squashes the following commits: 3d717fb [Yin Huai] Use insertInto to handle the casue when table exists and Append is used for saveAsTable. 56d2540 [Yin Huai] Add PreWriteCheck to HiveContext's analyzer. c636e35 [Yin Huai] Remove unnecessary empty lines

spark git commit: [SPARK-7749] [SQL] Fixes partition discovery for non-partitioned tables

2015-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 13348e21b - 8730fbb47 [SPARK-7749] [SQL] Fixes partition discovery for non-partitioned tables When no partition columns can be found, we should have an empty `PartitionSpec`, rather than a `PartitionSpec` with empty partition columns.

spark git commit: [SPARK-7749] [SQL] Fixes partition discovery for non-partitioned tables

2015-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 b97a8053a - 70d9839cf [SPARK-7749] [SQL] Fixes partition discovery for non-partitioned tables When no partition columns can be found, we should have an empty `PartitionSpec`, rather than a `PartitionSpec` with empty partition columns.

spark git commit: [SPARK-7763] [SPARK-7616] [SQL] Persists partition columns into metastore

2015-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 33e0e - 96c82515b [SPARK-7763] [SPARK-7616] [SQL] Persists partition columns into metastore Author: Yin Huai yh...@databricks.com Author: Cheng Lian l...@databricks.com Closes #6285 from liancheng/spark-7763 and squashes the

spark git commit: [SPARK-7763] [SPARK-7616] [SQL] Persists partition columns into metastore

2015-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 311fab6f1 - 30f3f556f [SPARK-7763] [SPARK-7616] [SQL] Persists partition columns into metastore Author: Yin Huai yh...@databricks.com Author: Cheng Lian l...@databricks.com Closes #6285 from liancheng/spark-7763 and squashes the following

spark git commit: [SPARK-7718] [SQL] Speed up partitioning by avoiding closure cleaning

2015-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 c9a80fc40 - ba04b5236 [SPARK-7718] [SQL] Speed up partitioning by avoiding closure cleaning According to yhuai we spent 6-7 seconds cleaning closures in a partitioning job that takes 12 seconds. Since we provide these closures

spark git commit: [SPARK-7718] [SQL] Speed up partitioning by avoiding closure cleaning

2015-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 6b18cdc1b - 5287eec5a [SPARK-7718] [SQL] Speed up partitioning by avoiding closure cleaning According to yhuai we spent 6-7 seconds cleaning closures in a partitioning job that takes 12 seconds. Since we provide these closures in Spark we

spark git commit: [SPARK-7565] [SQL] fix MapType in JsonRDD

2015-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/master feb3a9d3f - a25c1ab8f [SPARK-7565] [SQL] fix MapType in JsonRDD The key of Map in JsonRDD should be converted into UTF8String (also failed records), Thanks to yhuai viirya Closes #6084 Author: Davies Liu dav...@databricks.com Closes

spark git commit: [SPARK-7565] [SQL] fix MapType in JsonRDD

2015-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 f0e421351 - 3aa618510 [SPARK-7565] [SQL] fix MapType in JsonRDD The key of Map in JsonRDD should be converted into UTF8String (also failed records), Thanks to yhuai viirya Closes #6084 Author: Davies Liu dav...@databricks.com

spark git commit: [SPARK-7320] [SQL] [Minor] Move the testData into beforeAll()

2015-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 1ee8eb431 - feb3a9d3f [SPARK-7320] [SQL] [Minor] Move the testData into beforeAll() Follow up of #6340, to avoid the test report missing once it fails. Author: Cheng Hao hao.ch...@intel.com Closes #6312 from chenghao-intel/rollup_minor

spark git commit: [SPARK-7320] [SQL] [Minor] Move the testData into beforeAll()

2015-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 f08c6f319 - f0e421351 [SPARK-7320] [SQL] [Minor] Move the testData into beforeAll() Follow up of #6340, to avoid the test report missing once it fails. Author: Cheng Hao hao.ch...@intel.com Closes #6312 from

spark git commit: [SPARK-2205] [SQL] Avoid unnecessary exchange operators in multi-way joins

2015-08-02 Thread yhuai
join t2 on (t1.x = t2.x) join t3 on (t2.x = t3.x)` will only have three Exchange operators (when shuffled joins are needed) instead of four. The code in this PR was authored by yhuai; I'm opening this PR to factor out this change from #7685, a larger pull request which contains two other

spark git commit: [SPARK-7289] [SPARK-9949] [SQL] Backport SPARK-7289 and SPARK-9949 to branch 1.4

2015-08-17 Thread yhuai
(https://github.com/apache/spark/pull/6780). Also, we need to backport the fix of `TakeOrderedAndProject` as well (https://github.com/apache/spark/pull/8179). Author: Wenchen Fan cloud0...@outlook.com Author: Yin Huai yh...@databricks.com Closes #8252 from yhuai/backport7289And9949. Project

spark git commit: [SPARK-10005] [SQL] Fixes schema merging for nested structs

2015-08-16 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 e2c6ef810 - 90245f65c [SPARK-10005] [SQL] Fixes schema merging for nested structs In case of schema merging, we only handled first level fields when converting Parquet groups to `InternalRow`s. Nested struct fields are not properly

spark git commit: [SPARK-10005] [SQL] Fixes schema merging for nested structs

2015-08-16 Thread yhuai
Repository: spark Updated Branches: refs/heads/master cf016075a - ae2370e72 [SPARK-10005] [SQL] Fixes schema merging for nested structs In case of schema merging, we only handled first level fields when converting Parquet groups to `InternalRow`s. Nested struct fields are not properly

spark git commit: [SPARK-10143] [SQL] Use parquet's block size (row group size) setting as the min split size if necessary.

2015-08-21 Thread yhuai
: Yin Huai yh...@databricks.com Closes #8346 from yhuai/parquetMinSplit. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e3355090 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e3355090 Diff: http://git-wip-us.apache.org

spark git commit: [SPARK-10143] [SQL] Use parquet's block size (row group size) setting as the min split size if necessary.

2015-08-21 Thread yhuai
. Author: Yin Huai yh...@databricks.com Closes #8346 from yhuai/parquetMinSplit. (cherry picked from commit e3355090d4030daffed5efb0959bf1d724c13c13) Signed-off-by: Yin Huai yh...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf

spark git commit: [SPARK-10100] [SQL] Eliminate hash table lookup if there is no grouping key in aggregation.

2015-08-20 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 675e22494 - 5be517584 [SPARK-10100] [SQL] Eliminate hash table lookup if there is no grouping key in aggregation. This improves performance by ~ 20 - 30% in one of my local test and should fix the performance regression from 1.4 to

spark git commit: [SPARK-10100] [SQL] Eliminate hash table lookup if there is no grouping key in aggregation.

2015-08-20 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 43e013542 - b4f4e91c3 [SPARK-10100] [SQL] Eliminate hash table lookup if there is no grouping key in aggregation. This improves performance by ~ 20 - 30% in one of my local test and should fix the performance regression from 1.4 to 1.5

spark git commit: [SPARK-8567] [SQL] Add logs to record the progress of HiveSparkSubmitSuite (1.4 branch)

2015-06-29 Thread yhuai
from yhuai/SPARK-8567-1.4 and squashes the following commits: 0ae2e14 [Yin Huai] [SPARK-8567] [SQL] Add logs to record the progress of HiveSparkSubmitSuite. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0de1737a Tree: http

spark git commit: [SPARK-8715] ArrayOutOfBoundsException fixed for DataFrameStatSuite.crosstab

2015-06-29 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 80d53565a - ffc793a6c [SPARK-8715] ArrayOutOfBoundsException fixed for DataFrameStatSuite.crosstab cc yhuai Author: Burak Yavuz brk...@gmail.com Closes #7100 from brkyvz/ct-flakiness-fix and squashes the following commits: abc299a

spark git commit: [SPARK-8650] [SQL] Use the user-specified app name priority in SparkSQLCLIDriver or HiveThriftServer2

2015-06-29 Thread yhuai
Repository: spark Updated Branches: refs/heads/master f79410c49 - e6c3f7462 [SPARK-8650] [SQL] Use the user-specified app name priority in SparkSQLCLIDriver or HiveThriftServer2 When run `./bin/spark-sql --name query1.sql` [Before]

spark git commit: [SPARK-9422] [SQL] Remove the placeholder attributes used in the aggregation buffers

2015-07-28 Thread yhuai
Repository: spark Updated Branches: refs/heads/master e78ec1a8f - 3744b7fd4 [SPARK-9422] [SQL] Remove the placeholder attributes used in the aggregation buffers https://issues.apache.org/jira/browse/SPARK-9422 Author: Yin Huai yh...@databricks.com Closes #7737 from yhuai/removePlaceHolder

spark git commit: [SPARK-9466] [SQL] Increate two timeouts in CliSuite.

2015-07-31 Thread yhuai
# from yhuai/SPARK-9466 and squashes the following commits: e0e3a86 [Yin Huai] Increate the timeout. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/815c8245 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/815c8245

spark git commit: [SPARK-8640] [SQL] Enable Processing of Multiple Window Frames in a Single Window Operator

2015-07-31 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 0a1d2ca42 - 39ab199a3 [SPARK-8640] [SQL] Enable Processing of Multiple Window Frames in a Single Window Operator This PR enables the processing of multiple window frames in a single window operator. This should improve the performance of

spark git commit: [SPARK-9233] [SQL] Enable code-gen in window function unit tests

2015-07-31 Thread yhuai
Author: Yin Huai yh...@databricks.com Closes #7832 from yhuai/SPARK-9233 and squashes the following commits: 4e4e4cc [Yin Huai] style ca80e07 [Yin Huai] Test window function with codegen. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark

  1   2   3   4   5   6   7   8   >