spark git commit: [SPARK-11679][SQL] Invoking method " apply(fields: java.util.List[StructField])" in "StructType" gets ClassCastException

2015-11-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 21fac5434 -> e8833dd12 [SPARK-11679][SQL] Invoking method " apply(fields: java.util.List[StructField])" in "StructType" gets ClassCastException In the previous method, fields.toArray will cast java.util.List[StructField] into

spark git commit: [SPARK-10186][SQL] support postgre array type in JDBCRDD

2015-11-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 0158ff773 -> d92514966 [SPARK-10186][SQL] support postgre array type in JDBCRDD Add ARRAY support to `PostgresDialect`. Nested ARRAY is not allowed for now because it's hard to get the array dimension info. See

spark git commit: [SPARK-11754][SQL] consolidate `ExpressionEncoder.tuple` and `Encoders.tuple`

2015-11-16 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 38fe092ff -> fbe65c592 [SPARK-11754][SQL] consolidate `ExpressionEncoder.tuple` and `Encoders.tuple` These 2 are very similar, we can consolidate them into one. Also add tests for it and fix a bug. Author: Wenchen Fan

spark git commit: [SPARK-11390][SQL] Query plan with/without filterPushdown indistinguishable

2015-11-16 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 fbe65c592 -> 90d71bff0 [SPARK-11390][SQL] Query plan with/without filterPushdown indistinguishable …ishable Propagate pushed filters to PhyicalRDD in DataSourceStrategy.apply Author: Zee Chen Closes #9679 from

spark git commit: [SPARK-11390][SQL] Query plan with/without filterPushdown indistinguishable

2015-11-16 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master b1a966262 -> 985b38dd2 [SPARK-11390][SQL] Query plan with/without filterPushdown indistinguishable …ishable Propagate pushed filters to PhyicalRDD in DataSourceStrategy.apply Author: Zee Chen Closes #9679 from

spark git commit: [SPARK-11754][SQL] consolidate `ExpressionEncoder.tuple` and `Encoders.tuple`

2015-11-16 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 24477d270 -> b1a966262 [SPARK-11754][SQL] consolidate `ExpressionEncoder.tuple` and `Encoders.tuple` These 2 are very similar, we can consolidate them into one. Also add tests for it and fix a bug. Author: Wenchen Fan

spark git commit: [SPARK-11625][SQL] add java test for typed aggregate

2015-11-16 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 75ee12f09 -> fd14936be [SPARK-11625][SQL] add java test for typed aggregate Author: Wenchen Fan Closes #9591 from cloud-fan/agg-test. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-11553][SQL] Primitive Row accessors should not convert null to default value

2015-11-16 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 3bd72eafc -> 6c8e0c0ff [SPARK-11553][SQL] Primitive Row accessors should not convert null to default value Invocation of getters for type extending AnyVal returns default value (if field value is null) instead of throwing NPE. Please

spark git commit: [SPARK-8658][SQL] AttributeReference's equals method compares all the members

2015-11-16 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 6c8e0c0ff -> e042780cd [SPARK-8658][SQL] AttributeReference's equals method compares all the members This fix is to change the equals method to check all of the specified fields for equality of AttributeReference. Author: gatorsmile

spark git commit: [SPARK-8658][SQL] AttributeReference's equals method compares all the members

2015-11-16 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 31296628a -> 75ee12f09 [SPARK-8658][SQL] AttributeReference's equals method compares all the members This fix is to change the equals method to check all of the specified fields for equality of AttributeReference. Author: gatorsmile

spark git commit: [SPARK-11625][SQL] add java test for typed aggregate

2015-11-16 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 e042780cd -> 4f8c7e18f [SPARK-11625][SQL] add java test for typed aggregate Author: Wenchen Fan Closes #9591 from cloud-fan/agg-test. (cherry picked from commit fd14936be7beff543dbbcf270f2f9749f7a803c4)

spark git commit: [SPARK-11553][SQL] Primitive Row accessors should not convert null to default value

2015-11-16 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master bcea0bfda -> 31296628a [SPARK-11553][SQL] Primitive Row accessors should not convert null to default value Invocation of getters for type extending AnyVal returns default value (if field value is null) instead of throwing NPE. Please

spark git commit: [SPARK-11654][SQL][FOLLOW-UP] fix some mistakes and clean up

2015-11-13 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 6459a6747 -> 3035e9d23 [SPARK-11654][SQL][FOLLOW-UP] fix some mistakes and clean up * rename `AppendColumn` to `AppendColumns` to be consistent with the physical plan name. * clean up stale comments. * always pass in resolved encoder

spark git commit: [SPARK-11727][SQL] Split ExpressionEncoder into FlatEncoder and ProductEncoder

2015-11-13 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 3035e9d23 -> 8757221a3 [SPARK-11727][SQL] Split ExpressionEncoder into FlatEncoder and ProductEncoder also add more tests for encoders, and fix bugs that I found: * when convert array to catalyst array, we can only skip element

spark git commit: [SPARK-11654][SQL][FOLLOW-UP] fix some mistakes and clean up

2015-11-13 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master a24477996 -> 23b8188f7 [SPARK-11654][SQL][FOLLOW-UP] fix some mistakes and clean up * rename `AppendColumn` to `AppendColumns` to be consistent with the physical plan name. * clean up stale comments. * always pass in resolved encoder to

spark git commit: [SPARK-11191][SQL] Looks up temporary function using execution Hive client

2015-11-12 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master df0e31815 -> 4fe99c72c [SPARK-11191][SQL] Looks up temporary function using execution Hive client When looking up Hive temporary functions, we should always use the `SessionState` within the execution Hive client, since temporary

spark git commit: [SPARK-10113][SQL] Explicit error message for unsigned Parquet logical types

2015-11-12 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 ecf027edd -> 68fa5c713 [SPARK-10113][SQL] Explicit error message for unsigned Parquet logical types Parquet supports some unsigned datatypes. However, Since Spark does not support unsigned datatypes, it needs to emit an exception with

spark git commit: [SPARK-11654][SQL] add reduce to GroupedDataset

2015-11-12 Thread marmbrus
rcing the above constraints in the type system (i.e. `fromRow` only exists on a `ResolvedEncoder`), but we should probably wait before spending too much time on this. Author: Michael Armbrust <mich...@databricks.com> Author: Wenchen Fan <wenc...@databricks.com> Closes #9673 from

spark git commit: [SPARK-11654][SQL] add reduce to GroupedDataset

2015-11-12 Thread marmbrus
sider enforcing the above constraints in the type system (i.e. `fromRow` only exists on a `ResolvedEncoder`), but we should probably wait before spending too much time on this. Author: Michael Armbrust <mich...@databricks.com> Author: Wenchen Fan <wenc...@databrick

spark git commit: [SPARK-11191][SQL] Looks up temporary function using execution Hive client

2015-11-12 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 4aacbe9e6 -> ecf027edd [SPARK-11191][SQL] Looks up temporary function using execution Hive client When looking up Hive temporary functions, we should always use the `SessionState` within the execution Hive client, since temporary

spark git commit: [SPARK-11656][SQL] support typed aggregate in project list

2015-11-11 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master c964fc101 -> 9c57bc0ef [SPARK-11656][SQL] support typed aggregate in project list insert `aEncoder` like we do in `agg` Author: Wenchen Fan Closes #9630 from cloud-fan/select. Project:

spark git commit: [SPARK-11564][SQL][FOLLOW-UP] clean up java tuple encoder

2015-11-11 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 f9aeb961e -> 9bf988555 [SPARK-11564][SQL][FOLLOW-UP] clean up java tuple encoder We need to support custom classes like java beans and combine them into tuple, and it's very hard to do it with the TypeTag-based approach. We should

spark git commit: [SPARK-11564][SQL][FOLLOW-UP] clean up java tuple encoder

2015-11-11 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 9c57bc0ef -> ec2b80721 [SPARK-11564][SQL][FOLLOW-UP] clean up java tuple encoder We need to support custom classes like java beans and combine them into tuple, and it's very hard to do it with the TypeTag-based approach. We should keep

spark git commit: [SQL][MINOR] remove newLongEncoder in functions

2015-11-11 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master ec2b80721 -> e71ba5658 [SQL][MINOR] remove newLongEncoder in functions it may shadows the one from implicits in some case. Author: Wenchen Fan Closes #9629 from cloud-fan/minor. Project:

spark git commit: [SQL][MINOR] remove newLongEncoder in functions

2015-11-11 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 9bf988555 -> 47cc1fe06 [SQL][MINOR] remove newLongEncoder in functions it may shadows the one from implicits in some case. Author: Wenchen Fan Closes #9629 from cloud-fan/minor. (cherry picked from commit

spark git commit: [SPARK-6152] Use shaded ASM5 to support closure cleaning of Java 8 compiled classes

2015-11-11 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master e71ba5658 -> 529a1d338 [SPARK-6152] Use shaded ASM5 to support closure cleaning of Java 8 compiled classes This patch modifies Spark's closure cleaner (and a few other places) to use ASM 5, which is necessary in order to support cleaning

spark git commit: [SPARK-6152] Use shaded ASM5 to support closure cleaning of Java 8 compiled classes

2015-11-11 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 47cc1fe06 -> 1fbfc1b48 [SPARK-6152] Use shaded ASM5 to support closure cleaning of Java 8 compiled classes This patch modifies Spark's closure cleaner (and a few other places) to use ASM 5, which is necessary in order to support

spark git commit: [SPARK-11578][SQL][FOLLOW-UP] complete the user facing api for typed aggregation

2015-11-10 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 47735cdc2 -> dfcfcbcc0 [SPARK-11578][SQL][FOLLOW-UP] complete the user facing api for typed aggregation Currently the user facing api for typed aggregation has some limitations: * the customized typed aggregation must be the first of

spark git commit: [SPARK-11578][SQL][FOLLOW-UP] complete the user facing api for typed aggregation

2015-11-10 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 d2405cb5e -> 6e2e84f3e [SPARK-11578][SQL][FOLLOW-UP] complete the user facing api for typed aggregation Currently the user facing api for typed aggregation has some limitations: * the customized typed aggregation must be the first of

spark git commit: [SPARK-10371][SQL] Implement subexpr elimination for UnsafeProjections

2015-11-10 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 53600854c -> 87aedc48c [SPARK-10371][SQL] Implement subexpr elimination for UnsafeProjections This patch adds the building blocks for codegening subexpr elimination and implements it end to end for UnsafeProjection. The building blocks

spark git commit: [SPARK-10371][SQL] Implement subexpr elimination for UnsafeProjections

2015-11-10 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 5ccc1eb08 -> f38509a76 [SPARK-10371][SQL] Implement subexpr elimination for UnsafeProjections This patch adds the building blocks for codegening subexpr elimination and implements it end to end for UnsafeProjection. The building

spark git commit: [SPARK-7841][BUILD] Stop using retrieveManaged to retrieve dependencies in SBT

2015-11-10 Thread marmbrus
der as part of the `assembly` task. `dev/mima` also depended on `lib_managed` in a hacky way in order to set classpaths when generating MiMa excludes; I've updated this to obtain the classpaths directly from SBT instead. /cc dragos marmbrus pwendell srowen Author: Josh Rosen <joshro...@databric

[3/4] spark git commit: [SPARK-9830][SQL] Remove AggregateExpression1 and Aggregate Operator used to evaluate AggregateExpression1s

2015-11-10 Thread marmbrus
http://git-wip-us.apache.org/repos/asf/spark/blob/e0701c75/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Utils.scala -- diff --git

[2/4] spark git commit: [SPARK-9830][SQL] Remove AggregateExpression1 and Aggregate Operator used to evaluate AggregateExpression1s

2015-11-10 Thread marmbrus
http://git-wip-us.apache.org/repos/asf/spark/blob/e0701c75/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala -- diff --git

[3/4] spark git commit: [SPARK-9830][SQL] Remove AggregateExpression1 and Aggregate Operator used to evaluate AggregateExpression1s

2015-11-10 Thread marmbrus
http://git-wip-us.apache.org/repos/asf/spark/blob/7c4ade0d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Utils.scala -- diff --git

[2/4] spark git commit: [SPARK-9830][SQL] Remove AggregateExpression1 and Aggregate Operator used to evaluate AggregateExpression1s

2015-11-10 Thread marmbrus
http://git-wip-us.apache.org/repos/asf/spark/blob/7c4ade0d/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala -- diff --git

[1/4] spark git commit: [SPARK-9830][SQL] Remove AggregateExpression1 and Aggregate Operator used to evaluate AggregateExpression1s

2015-11-10 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 6e5fc3788 -> e0701c756 http://git-wip-us.apache.org/repos/asf/spark/blob/e0701c75/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala -- diff --git

[4/4] spark git commit: [SPARK-9830][SQL] Remove AggregateExpression1 and Aggregate Operator used to evaluate AggregateExpression1s

2015-11-10 Thread marmbrus
[SPARK-9830][SQL] Remove AggregateExpression1 and Aggregate Operator used to evaluate AggregateExpression1s https://issues.apache.org/jira/browse/SPARK-9830 This PR contains the following main changes. * Removing `AggregateExpression1`. * Removing `Aggregate` operator, which is used to evaluate

[1/4] spark git commit: [SPARK-9830][SQL] Remove AggregateExpression1 and Aggregate Operator used to evaluate AggregateExpression1s

2015-11-10 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 825e971d0 -> 7c4ade0d7 http://git-wip-us.apache.org/repos/asf/spark/blob/7c4ade0d/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala -- diff --git

[4/4] spark git commit: [SPARK-9830][SQL] Remove AggregateExpression1 and Aggregate Operator used to evaluate AggregateExpression1s

2015-11-10 Thread marmbrus
[SPARK-9830][SQL] Remove AggregateExpression1 and Aggregate Operator used to evaluate AggregateExpression1s https://issues.apache.org/jira/browse/SPARK-9830 This PR contains the following main changes. * Removing `AggregateExpression1`. * Removing `Aggregate` operator, which is used to evaluate

spark git commit: [SPARK-11616][SQL] Improve toString for Dataset

2015-11-10 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master dba1a62cf -> 724cf7a38 [SPARK-11616][SQL] Improve toString for Dataset Author: Michael Armbrust <mich...@databricks.com> Closes #9586 from marmbrus/dataset-toString. Project: http://git-wip-us.apache.org/repos/asf/spark/re

spark git commit: [SPARK-11616][SQL] Improve toString for Dataset

2015-11-10 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 f0180106a -> 8fb7b8304 [SPARK-11616][SQL] Improve toString for Dataset Author: Michael Armbrust <mich...@databricks.com> Closes #9586 from marmbrus/dataset-toString. (cherry picked from commit 724cf7a38c551bf2a79b87a8158bbe1

spark git commit: [SPARK-11564][SQL][FOLLOW-UP] improve java api for GroupedDataset

2015-11-09 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 1585f559d -> b9adfdf9c [SPARK-11564][SQL][FOLLOW-UP] improve java api for GroupedDataset created `MapGroupFunction`, `FlatMapGroupFunction`, `CoGroupFunction` Author: Wenchen Fan Closes #9564 from

spark git commit: [SPARK-9557][SQL] Refactor ParquetFilterSuite and remove old ParquetFilters code

2015-11-09 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 b9adfdf9c -> c42433d02 [SPARK-9557][SQL] Refactor ParquetFilterSuite and remove old ParquetFilters code Actually this was resolved by https://github.com/apache/spark/pull/8275. But I found the JIRA issue for this is not marked as

spark git commit: [SPARK-9301][SQL] Add collect_set and collect_list aggregate functions

2015-11-09 Thread marmbrus
ing primitive types. I chose snake_case here instead of camelCase because it seems to be used in the majority of the multi-word fns. Do we also want to add these to `functions.py`? This approach was recommended here: https://github.com/apache/spark/pull/8592#issuecomment-154247089 marmbrus r

spark git commit: [SPARK-9301][SQL] Add collect_set and collect_list aggregate functions

2015-11-09 Thread marmbrus
ort aggregating primitive types. I chose snake_case here instead of camelCase because it seems to be used in the majority of the multi-word fns. Do we also want to add these to `functions.py`? This approach was recommended here: https://github.com/apache/spark/pull/8592#issuecomment-154247089 marmbrus r

spark git commit: [SPARK-11578][SQL] User API for Typed Aggregation

2015-11-09 Thread marmbrus
t;, 30, 30, 2L), ("b", 3, 3, 2L), ("c", 1, 1, 1L) ``` The current implementation focuses on integrating this into the typed API, but currently only supports running aggregations that return a single long value as explained in `TypedAggregateExpression`. This will be improved in a

spark git commit: [SPARK-11578][SQL] User API for Typed Aggregation

2015-11-09 Thread marmbrus
t;, 30, 30, 2L), ("b", 3, 3, 2L), ("c", 1, 1, 1L) ``` The current implementation focuses on integrating this into the typed API, but currently only supports running aggregations that return a single long value as explained in `TypedAggregateExpression`. This will be improved in a

spark git commit: [SPARK-11554][SQL] add map/flatMap to GroupedDataset

2015-11-08 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 26739059b -> b2d195e13 [SPARK-11554][SQL] add map/flatMap to GroupedDataset Author: Wenchen Fan Closes #9521 from cloud-fan/map. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-11554][SQL] add map/flatMap to GroupedDataset

2015-11-08 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 27161f59e -> 6ade67e5f [SPARK-11554][SQL] add map/flatMap to GroupedDataset Author: Wenchen Fan Closes #9521 from cloud-fan/map. (cherry picked from commit b2d195e137fad88d567974659fa7023ff4da96cd)

spark git commit: [SPARK-11546] Thrift server makes too many logs about result schema

2015-11-06 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 6d0ead322 -> 1c80d66e5 [SPARK-11546] Thrift server makes too many logs about result schema SparkExecuteStatementOperation logs result schema for each getNextRowSet() calls which is by default every 1000 rows, overwhelming whole log file.

spark git commit: [SPARK-11546] Thrift server makes too many logs about result schema

2015-11-06 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 162a7704c -> 9bf77d5c3 [SPARK-11546] Thrift server makes too many logs about result schema SparkExecuteStatementOperation logs result schema for each getNextRowSet() calls which is by default every 1000 rows, overwhelming whole log

spark git commit: [SPARK-9241][SQL] Supporting multiple DISTINCT columns (2) - Rewriting Rule

2015-11-06 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 40a5db561 -> 162a7704c [SPARK-9241][SQL] Supporting multiple DISTINCT columns (2) - Rewriting Rule The second PR for SPARK-9241, this adds support for multiple distinct columns to the new aggregation code path. This PR solves the

spark git commit: [SPARK-9241][SQL] Supporting multiple DISTINCT columns (2) - Rewriting Rule

2015-11-06 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 1ab72b086 -> 6d0ead322 [SPARK-9241][SQL] Supporting multiple DISTINCT columns (2) - Rewriting Rule The second PR for SPARK-9241, this adds support for multiple distinct columns to the new aggregation code path. This PR solves the

spark git commit: [SPARK-11450] [SQL] Add Unsafe Row processing to Expand

2015-11-06 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 49f1a8203 -> f328fedaf [SPARK-11450] [SQL] Add Unsafe Row processing to Expand This PR enables the Expand operator to process and produce Unsafe Rows. Author: Herman van Hovell Closes #9414 from

spark git commit: [SPARK-11450] [SQL] Add Unsafe Row processing to Expand

2015-11-06 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 7755b50b4 -> efa1e4a25 [SPARK-11450] [SQL] Add Unsafe Row processing to Expand This PR enables the Expand operator to process and produce Unsafe Rows. Author: Herman van Hovell Closes #9414 from

Git Push Summary

2015-11-05 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 [created] f80f7b69a - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11188] [SQL] Elide stacktraces in bin/spark-sql for AnalysisExceptions

2015-11-05 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.4 6c5e9a3a0 -> 4f98014b9 [SPARK-11188] [SQL] Elide stacktraces in bin/spark-sql for AnalysisExceptions Only print the error message to the console for Analysis Exceptions in sql-shell Author: Dilip Biswal Closes

spark git commit: [SPARK-11404] [SQL] Support for groupBy using column expressions

2015-11-03 Thread marmbrus
), ("c", 1)) ``` Author: Michael Armbrust <mich...@databricks.com> Closes #9359 from marmbrus/columnGroupBy and squashes the following commits: bbcb03b [Michael Armbrust] Update DatasetSuite.scala 8fd2908 [Michael Armbrust] Update DatasetSuite.scala 0b0e2f8 [Michael Armbrust] [SP

spark git commit: [SPARK-11436] [SQL] rebind right encoder when join 2 datasets

2015-11-03 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 67e23b39a -> 425ff03f5 [SPARK-11436] [SQL] rebind right encoder when join 2 datasets When we join 2 datasets, we will combine 2 encoders into a tupled one, and use it as the encoder for the jioned dataset. Assume both of the 2 encoders

spark git commit: [SPARK-11477] [SQL] support create Dataset from RDD

2015-11-03 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 1d04dc95c -> f6fcb4874 [SPARK-11477] [SQL] support create Dataset from RDD Author: Wenchen Fan Closes #9434 from cloud-fan/rdd2ds and squashes the following commits: 0892d72 [Wenchen Fan] support create Dataset

spark git commit: [SPARK-11188] [SQL] Elide stacktraces in bin/spark-sql for AnalysisExceptions

2015-11-03 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.5 b85bf8f49 -> 5604ce9c1 [SPARK-11188] [SQL] Elide stacktraces in bin/spark-sql for AnalysisExceptions Only print the error message to the console for Analysis Exceptions in sql-shell Author: Dilip Biswal Closes

spark git commit: [SPARK-11393] [SQL] CoGroupedIterator should respect the fact that GroupedIterator.hasNext is not idempotent

2015-10-30 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 59db9e9c3 -> 14d08b990 [SPARK-11393] [SQL] CoGroupedIterator should respect the fact that GroupedIterator.hasNext is not idempotent When we cogroup 2 `GroupedIterator`s in `CoGroupedIterator`, if the right side is smaller, we will

spark git commit: [SPARK-11188][SQL] Elide stacktraces in bin/spark-sql for AnalysisExceptions

2015-10-29 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master f7a51deeb -> 8185f038c [SPARK-11188][SQL] Elide stacktraces in bin/spark-sql for AnalysisExceptions Only print the error message to the console for Analysis Exceptions in sql-shell. Author: Dilip Biswal Closes #9194

spark git commit: [SPARK-11370] [SQL] fix a bug in GroupedIterator and create unit test for it

2015-10-29 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 87f28fc24 -> f79ebf2a9 [SPARK-11370] [SQL] fix a bug in GroupedIterator and create unit test for it Before this PR, user has to consume the iterator of one group before process next group, or we will get into infinite loops. Author:

spark git commit: [SPARK-11379][SQL] ExpressionEncoder can't handle top level primitive type correctly

2015-10-29 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 3dfa4ea52 -> 87f28fc24 [SPARK-11379][SQL] ExpressionEncoder can't handle top level primitive type correctly For inner primitive type(e.g. inside `Product`), we use `schemaFor` to get the catalyst type for it,

spark git commit: [SPARK-11313][SQL] implement cogroup on DataSets (support 2 datasets)

2015-10-28 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 5f1cee6f1 -> 075ce4914 [SPARK-11313][SQL] implement cogroup on DataSets (support 2 datasets) A simpler version of https://github.com/apache/spark/pull/9279, only support 2 datasets. Author: Wenchen Fan Closes

spark git commit: [SPARK-11303][SQL] filter should not be pushed down into sample

2015-10-28 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.5 86ee81e5c -> 3bd596de4 [SPARK-11303][SQL] filter should not be pushed down into sample When sampling and then filtering DataFrame, the SQL Optimizer will push down filter into sample and produce wrong result. This is due to the

spark git commit: [SPARK-11277][SQL] sort_array throws exception scala.MatchError

2015-10-27 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 17f499920 -> 958a0ec8f [SPARK-11277][SQL] sort_array throws exception scala.MatchError I'm new to spark. I was trying out the sort_array function then hit this exception. I looked into the spark source code. I found the root cause is that

spark git commit: [SPARK-11303][SQL] filter should not be pushed down into sample

2015-10-27 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 958a0ec8f -> 360ed832f [SPARK-11303][SQL] filter should not be pushed down into sample When sampling and then filtering DataFrame, the SQL Optimizer will push down filter into sample and produce wrong result. This is due to the sampler is

spark git commit: [SPARK-11216][SQL][FOLLOW-UP] add encoder/decoder for external row

2015-10-22 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master f6d06adf0 -> 42d225f44 [SPARK-11216][SQL][FOLLOW-UP] add encoder/decoder for external row address comments in https://github.com/apache/spark/pull/9184 Author: Wenchen Fan Closes #9212 from cloud-fan/encoder.

spark git commit: [SPARK-11088][SQL] Merges partition values using UnsafeProjection

2015-10-19 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 16906ef23 -> 8b877cc4e [SPARK-11088][SQL] Merges partition values using UnsafeProjection `DataSourceStrategy.mergeWithPartitionValues` is essentially a projection implemented in a quite inefficient way. This PR optimizes this method with

spark git commit: [SPARK-11135] [SQL] Exchange incorrectly skips sorts when existing ordering is non-empty subset of required ordering

2015-10-15 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 6a2359ff1 -> eb0b4d6e2 [SPARK-11135] [SQL] Exchange incorrectly skips sorts when existing ordering is non-empty subset of required ordering In Spark SQL, the Exchange planner tries to avoid unnecessary sorts in cases where the data has

spark git commit: [SPARK-11080] [SQL] Incorporate per-JVM id into ExprId to prevent unsafe cross-JVM comparisions

2015-10-13 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 0d1b73b78 -> ef72673b2 [SPARK-11080] [SQL] Incorporate per-JVM id into ExprId to prevent unsafe cross-JVM comparisions In the current implementation of named expressions' `ExprIds`, we rely on a per-JVM AtomicLong to ensure that

spark git commit: [SPARK-10389] [SQL] [1.5] support order by non-attribute grouping expression on Aggregate

2015-10-13 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.5 15d2736af -> 94e6d8f72 [SPARK-10389] [SQL] [1.5] support order by non-attribute grouping expression on Aggregate backport https://github.com/apache/spark/pull/8548 to 1.5 Author: Wenchen Fan Closes #9102 from

spark git commit: [SPARK-11090] [SQL] Constructor for Product types from InternalRow

2015-10-13 Thread marmbrus
; Closes #9100 from marmbrus/productContructor. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/328d1b3e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/328d1b3e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff

spark git commit: [SPARK-11032] [SQL] correctly handle having

2015-10-13 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 328d1b3e4 -> e170c2216 [SPARK-11032] [SQL] correctly handle having We should not stop resolving having when the having condtion is resolved, or something like `count(1)` will crash. Author: Wenchen Fan Closes #9105

spark git commit: [SPARK-8654] [SQL] Fix Analysis exception when using NULL IN (...)

2015-10-08 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 5c9fdf74e -> dcbd58a92 [SPARK-8654] [SQL] Fix Analysis exception when using NULL IN (...) In the analysis phase , while processing the rules for IN predicate, we compare the in-list types to the lhs expression type and generate cast

spark git commit: [SPARK-10998] [SQL] Show non-children in default Expression.toString

2015-10-08 Thread marmbrus
t#1, intField, IntegerType)` Author: Michael Armbrust <mich...@databricks.com> Closes #9022 from marmbrus/expressionToString. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5c9fdf74 Tree: http://git-wip-us.apache.org/repos/

spark git commit: Revert [SPARK-8654] [SQL] Fix Analysis exception when using NULL IN

2015-10-08 Thread marmbrus
9034 from marmbrus/revert8654. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a8226a9f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a8226a9f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a8226a9f Branch: r

[1/2] spark git commit: [SPARK-10966] [SQL] Codegen framework cleanup

2015-10-07 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 9672602c7 -> f5d154bc7 http://git-wip-us.apache.org/repos/asf/spark/blob/f5d154bc/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala

[2/2] spark git commit: [SPARK-10966] [SQL] Codegen framework cleanup

2015-10-07 Thread marmbrus
a variable for input row instead of hardcoding "i" everywhere - rename `primitive` -> `value` (since its often actually an object) Author: Michael Armbrust <mich...@databricks.com> Closes #9006 from marmbrus/codegen-cleanup. Project: http://git-wip-us.apache.org/repos/asf/spar

spark git commit: [SPARK-10403] Allow UnsafeRowSerializer to work with tungsten-sort ShuffleManager

2015-09-23 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.5 6c6cadb8f -> 64cc62cb5 [SPARK-10403] Allow UnsafeRowSerializer to work with tungsten-sort ShuffleManager This patch attempts to fix an issue where Spark SQL's UnsafeRowSerializer was incompatible with the `tungsten-sort`

spark git commit: [SPARK-10403] Allow UnsafeRowSerializer to work with tungsten-sort ShuffleManager

2015-09-23 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 27bfa9ab3 -> a18208047 [SPARK-10403] Allow UnsafeRowSerializer to work with tungsten-sort ShuffleManager This patch attempts to fix an issue where Spark SQL's UnsafeRowSerializer was incompatible with the `tungsten-sort` ShuffleManager.

spark git commit: [SPARK-10650] Clean before building docs

2015-09-17 Thread marmbrus
uce this is to `test:compile` before running `unidoc`. To prevent this from happening again, I've added a clean before doc generation. Author: Michael Armbrust <mich...@databricks.com> Closes #8787 from marmbrus/testsInDocs. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Com

spark git commit: [SPARK-10650] Clean before building docs

2015-09-17 Thread marmbrus
can reproduce this is to `test:compile` before running `unidoc`. To prevent this from happening again, I've added a clean before doc generation. Author: Michael Armbrust <mich...@databricks.com> Closes #8787 from marmbrus/testsInDocs. (cherry picked from commit e0dc2bc232206d2f4da4278502c1f88

spark git commit: [SPARK-10639] [SQL] Need to convert UDAF's result from scala to sql type

2015-09-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master e0dc2bc23 -> aad644fbe [SPARK-10639] [SQL] Need to convert UDAF's result from scala to sql type https://issues.apache.org/jira/browse/SPARK-10639 Author: Yin Huai Closes #8788 from yhuai/udafConversion. Project:

spark git commit: [SPARK-10639] [SQL] Need to convert UDAF's result from scala to sql type

2015-09-17 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.5 fd58ed48d -> 464d6e7d1 [SPARK-10639] [SQL] Need to convert UDAF's result from scala to sql type https://issues.apache.org/jira/browse/SPARK-10639 Author: Yin Huai Closes #8788 from yhuai/udafConversion.

spark git commit: [SPARK-10475] [SQL] improve column prunning for Project on Sort

2015-09-15 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 841972e22 -> 31a229aa7 [SPARK-10475] [SQL] improve column prunning for Project on Sort Sometimes we can't push down the whole `Project` though `Sort`, but we still have a chance to push down part of it. Author: Wenchen Fan

spark git commit: [SPARK-10437] [SQL] Support aggregation expressions in Order By

2015-09-15 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master b42059d2e -> 841972e22 [SPARK-10437] [SQL] Support aggregation expressions in Order By JIRA: https://issues.apache.org/jira/browse/SPARK-10437 If an expression in `SortOrder` is a resolved one, such as `count(1)`, the corresponding rule

spark git commit: [SPARK-6981] [SQL] Factor out SparkPlanner and QueryExecution from SQLContext

2015-09-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 7e32387ae -> 64f04154e [SPARK-6981] [SQL] Factor out SparkPlanner and QueryExecution from SQLContext Alternative to PR #6122; in this case the refactored out classes are replaced by inner classes with the same name for backwards binary

spark git commit: [SPARK-7142] [SQL] Minor enhancement to BooleanSimplification Optimizer rule. Incorporate review comments

2015-09-11 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master d5d647380 -> 1eede3b25 [SPARK-7142] [SQL] Minor enhancement to BooleanSimplification Optimizer rule. Incorporate review comments Adding changes suggested by cloud-fan in #5700 cc marmbrus Author: Yash Datta <yash.da...@guav

spark git commit: [SPARK-7142] [SQL] Minor enhancement to BooleanSimplification Optimizer rule

2015-09-10 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 4f1daa1ef -> f892d927d [SPARK-7142] [SQL] Minor enhancement to BooleanSimplification Optimizer rule Use these in the optimizer as well: A and (not(A) or B) => A and B not(A and B) => not(A) or not(B)

spark git commit: [SPARK-10316] [SQL] respect nondeterministic expressions in PhysicalOperation

2015-09-08 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 5b2192e84 -> 5fd57955e [SPARK-10316] [SQL] respect nondeterministic expressions in PhysicalOperation We did a lot of special handling for non-deterministic expressions in `Optimizer`. However, `PhysicalOperation` just collects all

spark git commit: [SPARK-10441] [SQL] Save data correctly to json.

2015-09-08 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master f7b55dbfc -> 7a9dcbc91 [SPARK-10441] [SQL] Save data correctly to json. https://issues.apache.org/jira/browse/SPARK-10441 Author: Yin Huai Closes #8597 from yhuai/timestampJson. Project:

spark git commit: [SPARK-10327] [SQL] Cache Table is not working while subquery has alias in its project list

2015-09-08 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 52b24a602 -> d637a666d [SPARK-10327] [SQL] Cache Table is not working while subquery has alias in its project list ```scala import org.apache.spark.sql.hive.execution.HiveTableScan sql("select key, value, key + 1 from

spark git commit: [HOTFIX] Fix build break caused by #8494

2015-09-08 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master d637a666d -> 2143d592c [HOTFIX] Fix build break caused by #8494 Author: Michael Armbrust <mich...@databricks.com> Closes #8659 from marmbrus/testBuildBreak. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: h

spark git commit: [SPARK-10034] [SQL] add regression test for Sort on Aggregate

2015-09-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master c3b881a7d -> 56c4c172e [SPARK-10034] [SQL] add regression test for Sort on Aggregate Before #8371, there was a bug for `Sort` on `Aggregate` that we can't use aggregate expressions named `_aggOrdering` and can't use more than one ordering

spark git commit: [SPARK-10389] [SQL] support order by non-attribute grouping expression on Aggregate

2015-09-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 56c4c172e -> fc4830779 [SPARK-10389] [SQL] support order by non-attribute grouping expression on Aggregate For example, we can write `SELECT MAX(value) FROM src GROUP BY key + 1 ORDER BY key + 1` in PostgreSQL, and we should support this

spark git commit: [SPARK-10289] [SQL] A direct write API for testing Parquet

2015-08-29 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 5369be806 - 24ffa85c0 [SPARK-10289] [SQL] A direct write API for testing Parquet This PR introduces a direct write API for testing Parquet. It's a DSL flavored version of the [`writeDirect` method] [1] comes with parquet-avro testing

<    1   2   3   4   5   6   7   8   9   10   >