[GitHub] spark pull request #20262: Branch 2.2

2018-02-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20262


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20262: Branch 2.2

2018-01-13 Thread alanyeok
GitHub user alanyeok opened a pull request:

https://github.com/apache/spark/pull/20262

Branch 2.2

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20262.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20262


commit 8b08fd06c0e22e7967c05aee83654b6be446efb4
Author: Herman van Hovell 
Date:   2017-06-30T04:34:09Z

[SPARK-21258][SQL] Fix WindowExec complex object aggregation with spilling

## What changes were proposed in this pull request?
`WindowExec` currently improperly stores complex objects (UnsafeRow, 
UnsafeArrayData, UnsafeMapData, UTF8String) during aggregation by keeping a 
reference in the buffer used by `GeneratedMutableProjections` to the actual 
input data. Things go wrong when the input object (or the backing bytes) are 
reused for other things. This could happen in window functions when it starts 
spilling to disk. When reading the back the spill files the 
`UnsafeSorterSpillReader` reuses the buffer to which the `UnsafeRow` points, 
leading to weird corruption scenario's. Note that this only happens for 
aggregate functions that preserve (parts of) their input, for example `FIRST`, 
`LAST`, `MIN` & `MAX`.

This was not seen before, because the spilling logic was not doing actual 
spills as much and actually used an in-memory page. This page was not cleaned 
up during window processing and made sure unsafe objects point to their own 
dedicated memory location. This was changed by 
https://github.com/apache/spark/pull/16909, after this PR Spark spills more 
eagerly.

This PR provides a surgical fix because we are close to releasing Spark 
2.2. This change just makes sure that there cannot be any object reuse at the 
expensive of a little bit of performance. We will follow-up with a more subtle 
solution at a later point.

## How was this patch tested?
Added a regression test to `DataFrameWindowFunctionsSuite`.

Author: Herman van Hovell 

Closes #18470 from hvanhovell/SPARK-21258.

(cherry picked from commit e2f32ee45ac907f1f53fde7e412676a849a94872)
Signed-off-by: Wenchen Fan 

commit 29a0be2b3d42bfe991f47725f077892918731e08
Author: Xiao Li 
Date:   2017-06-30T21:23:56Z

[SPARK-21129][SQL] Arguments of SQL function call should not be named 
expressions

### What changes were proposed in this pull request?

Function argument should not be named expressions. It could cause two 
issues:
- Misleading error message
- Unexpected query results when the column name is `distinct`, which is not 
a reserved word in our parser.

```
spark-sql> select count(distinct c1, distinct c2) from t1;
Error in query: cannot resolve '`distinct`' given input columns: [c1, c2]; 
line 1 pos 26;
'Project [unresolvedalias('count(c1#30, 'distinct), None)]
+- SubqueryAlias t1
   +- CatalogRelation `default`.`t1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#30, c2#31]
```

After the fix, the error message becomes
```
spark-sql> select count(distinct c1, distinct c2) from t1;
Error in query:
extraneous input 'c2' expecting {')', ',', '.', '[', 'OR', 'AND', 'IN', 
NOT, 'BETWEEN', 'LIKE', RLIKE, 'IS', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, 
'+', '-', '*', '/', '%', 'DIV', '&', '|', '||', '^'}(line 1, pos 35)

== SQL ==
select count(distinct c1, distinct c2) from t1
---^^^
```

### How was this patch tested?
Added a test case to parser suite.

Author: Xiao Li 
Author: gatorsmile 

Closes #18338 from gatorsmile/parserDistinctAggFunc.

(cherry picked from commit eed9c4ef859fdb75a816a3e0ce2d593b34b23444)
Signed-off-by: gatorsmile 

commit a2c7b2133cfee7fa9abfaa2bfbfb637155466783
Author: Patrick Wendell 
Date:   2017-06-30T22:54:34Z

Preparing Spark release v2.2.0-rc6

commit 85fddf406429dac00ddfb2e6c30870da450455bd
Author: Patrick Wendell 
Date:   2017-06-30T22:54:39Z

Preparing development version 2.2.1-SNAPSHOT

commit