[GitHub] spark pull request #19209: Branch 2.2 udf nullability

2017-09-12 Thread ptkool
Github user ptkool closed the pull request at:

https://github.com/apache/spark/pull/19209


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19209: Branch 2.2 udf nullability

2017-09-12 Thread ptkool
GitHub user ptkool opened a pull request:

https://github.com/apache/spark/pull/19209

Branch 2.2 udf nullability

## What changes were proposed in this pull request?

When registering a Python UDF, a user may know whether the function can 
return null values or not. PythonUDF and all related classes should handle 
nullability.

## How was this patch tested?

Existing tests and a few new tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Shopify/spark branch-2.2-udf_nullability

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19209.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19209


commit cfa6bcbe83b9a4b9607e23ac889963b6aa02f0d9
Author: Ryan Blue 
Date:   2017-05-01T21:48:02Z

[SPARK-20540][CORE] Fix unstable executor requests.

There are two problems fixed in this commit. First, the
ExecutorAllocationManager sets a timeout to avoid requesting executors
too often. However, the timeout is always updated based on its value and
a timeout, not the current time. If the call is delayed by locking for
more than the ongoing scheduler timeout, the manager will request more
executors on every run. This seems to be the main cause of SPARK-20540.

The second problem is that the total number of requested executors is
not tracked by the CoarseGrainedSchedulerBackend. Instead, it calculates
the value based on the current status of 3 variables: the number of
known executors, the number of executors that have been killed, and the
number of pending executors. But, the number of pending executors is
never less than 0, even though there may be more known than requested.
When executors are killed and not replaced, this can cause the request
sent to YARN to be incorrect because there were too many executors due
to the scheduler's state being slightly out of date. This is fixed by 
tracking
the currently requested size explicitly.

## How was this patch tested?

Existing tests.

Author: Ryan Blue 

Closes #17813 from rdblue/SPARK-20540-fix-dynamic-allocation.

(cherry picked from commit 2b2dd08e975dd7fbf261436aa877f1d7497ed31f)
Signed-off-by: Marcelo Vanzin 

commit 5a0a8b0396df2feadb8333876cc08edf219fa177
Author: Sean Owen 
Date:   2017-05-02T00:01:05Z

[SPARK-20459][SQL] JdbcUtils throws IllegalStateException: Cause already 
initialized after getting SQLException

## What changes were proposed in this pull request?

Avoid failing to initCause on JDBC exception with cause initialized to null

## How was this patch tested?

Existing tests

Author: Sean Owen 

Closes #17800 from srowen/SPARK-20459.

(cherry picked from commit af726cd6117de05c6e3b9616b8699d884a53651b)
Signed-off-by: Xiao Li 

commit b7c1c2f973635a2ec05aedd89456765d830dfdce
Author: Felix Cheung 
Date:   2017-05-02T04:03:48Z

[SPARK-20192][SPARKR][DOC] SparkR migration guide to 2.2.0

## What changes were proposed in this pull request?

Updating R Programming Guide

## How was this patch tested?

manually

Author: Felix Cheung 

Closes #17816 from felixcheung/r22relnote.

(cherry picked from commit d20a976e8918ca8d607af452301e8014fe14e64a)
Signed-off-by: Felix Cheung 

commit b146481fff1ce529245f9c03b35c73ea604712d0
Author: Kazuaki Ishizaki 
Date:   2017-05-02T05:56:41Z

[SPARK-20537][CORE] Fixing OffHeapColumnVector reallocation

## What changes were proposed in this pull request?

As #17773 revealed `OnHeapColumnVector` may copy a part of the original 
storage.

`OffHeapColumnVector` reallocation also copies to the new storage data up 
to 'elementsAppended'. This variable is only updated when using the 
`ColumnVector.appendX` API, while `ColumnVector.putX` is more commonly used.
This PR copies the new storage data up to the previously-allocated size 
in`OffHeapColumnVector`.

## How was this patch tested?

Existing test suites

Author: Kazuaki Ishizaki 

Closes #17811 from kiszk/SPARK-20537.

(cherry picked from commit afb21bf22a59c9416c04637412fb69d1442e6826)
Signed-off-by: Wenchen Fan 

commit ef5e2a0509801f6afced3bc80f8d700acf84e0dd
Author: Burak Yavuz 
Date:   2017-05-02T06:08:16Z

[SPARK-20549] java.io.CharConversionException: Invalid UTF-32' in 
JsonToStructs

## What changes were proposed in this pull request?

A fix for the same problem was made in #17693 but ignored `JsonToStructs`. 
This PR uses the same fix for `JsonToStructs`.

## How was this patch tested?

Regression test

Author: Burak Yavuz 

Closes #17826 from b