GitHub user zhizu2018 opened a pull request:
https://github.com/apache/spark/pull/19291
Branch 2.1
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise,
remove this)
Please review http://spark.apache.org/contributing.html before opening a
pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/spark branch-2.1
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19291.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19291
commit a3d5300a030fb5f1c275e671603e0745b6466735
Author: Stan Zhai
Date: 2017-02-09T20:01:25Z
[SPARK-19509][SQL] Grouping Sets do not respect nullable grouping columns
## What changes were proposed in this pull request?
The analyzer currently does not check if a column used in grouping sets is
actually nullable itself. This can cause the nullability of the column to be
incorrect, which can cause null pointer exceptions down the line. This PR fixes
that by also consider the nullability of the column.
This is only a problem for Spark 2.1 and below. The latest master uses a
different approach.
Closes https://github.com/apache/spark/pull/16874
## How was this patch tested?
Added a regression test to `SQLQueryTestSuite.grouping_set`.
Author: Herman van Hovell
Closes #16873 from hvanhovell/SPARK-19509.
commit ff5818b8cee7c718ef5bdef125c8d6971d64acde
Author: Bogdan Raducanu
Date: 2017-02-10T09:50:07Z
[SPARK-19512][BACKPORT-2.1][SQL] codegen for compare structs fails #16852
## What changes were proposed in this pull request?
Set currentVars to null in GenerateOrdering.genComparisons before genCode
is called. genCode ignores INPUT_ROW if currentVars is not null and in
genComparisons we want it to use INPUT_ROW.
## How was this patch tested?
Added test with 2 queries in WholeStageCodegenSuite
Author: Bogdan Raducanu
Closes #16875 from bogdanrdc/SPARK-19512-2.1.
commit 7b5ea000e246f7052e7324fd7f2e99f32aaece17
Author: Burak Yavuz
Date: 2017-02-10T11:55:06Z
[SPARK-19543] from_json fails when the input row is empty
## What changes were proposed in this pull request?
Using from_json on a column with an empty string results in:
java.util.NoSuchElementException: head of empty list.
This is because `parser.parse(input)` may return `Nil` when
`input.trim.isEmpty`
## How was this patch tested?
Regression test in `JsonExpressionsSuite`
Author: Burak Yavuz
Closes #16881 from brkyvz/json-fix.
(cherry picked from commit d5593f7f5794bd0343e783ac4957864fed9d1b38)
Signed-off-by: Herman van Hovell
commit e580bb035236dd92ade126af6bb98288d88179c4
Author: Andrew Ray
Date: 2016-12-13T07:49:22Z
[SPARK-18717][SQL] Make code generation for Scala Map work with
immutable.Map also
## What changes were proposed in this pull request?
Fixes compile errors in generated code when user has case class with a
`scala.collections.immutable.Map` instead of a `scala.collections.Map`. Since
ArrayBasedMapData.toScalaMap returns the immutable version we can make it work
with both.
## How was this patch tested?
Additional unit tests.
Author: Andrew Ray
Closes #16161 from aray/fix-map-codegen.
(cherry picked from commit 46d30ac4846b3ec94426cc482c42cff72ebd6d92)
Signed-off-by: Cheng Lian
commit 173c2387a38b260b46d7646b332e404f6ebe1a17
Author: titicaca
Date: 2017-02-12T18:42:15Z
[SPARK-19342][SPARKR] bug fixed in collect method for collecting timestamp
column
## What changes were proposed in this pull request?
Fix a bug in collect method for collecting timestamp column, the bug can be
reproduced as shown in the following codes and outputs:
```
library(SparkR)
sparkR.session(master = "local")
df <- data.frame(col1 = c(0, 1, 2),
col2 = c(as.POSIXct("2017-01-01 00:00:01"), NA,
as.POSIXct("2017-01-01 12:00:01")))
sdf1 <- createDataFrame(df)
print(dtypes(sdf1))
df1 <- collect(sdf1)
print(lapply(df1, class))
sdf2 <- filter(sdf1, "col1 > 0")
print(dtypes(sdf2))
df2 <-