[GitHub] spark pull request #20587: Branch 2.2

2018-02-12 Thread zhuge134
GitHub user zhuge134 opened a pull request:

https://github.com/apache/spark/pull/20587

Branch 2.2



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20587.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20587


commit 4e53a4edd72e372583f243c660bbcc0572205716
Author: Tathagata Das 
Date:   2017-07-06T07:20:26Z

[SS][MINOR] Fix flaky test in DatastreamReaderWriterSuite. temp checkpoint 
dir should be deleted

## What changes were proposed in this pull request?

Stopping query while it is being initialized can throw interrupt exception, 
in which case temporary checkpoint directories will not be deleted, and the 
test will fail.

Author: Tathagata Das 

Closes #18442 from tdas/DatastreamReaderWriterSuite-fix.

(cherry picked from commit 60043f22458668ac7ecba94fa78953f23a6bdcec)
Signed-off-by: Tathagata Das 

commit 576fd4c3a67b4affc5ac50979e27ae929472f0d9
Author: Tathagata Das 
Date:   2017-07-07T00:28:20Z

[SPARK-21267][SS][DOCS] Update Structured Streaming Documentation

## What changes were proposed in this pull request?

Few changes to the Structured Streaming documentation
- Clarify that the entire stream input table is not materialized
- Add information for Ganglia
- Add Kafka Sink to the main docs
- Removed a couple of leftover experimental tags
- Added more associated reading material and talk videos.

In addition, https://github.com/apache/spark/pull/16856 broke the link to 
the RDD programming guide in several places while renaming the page. This PR 
fixes those sameeragarwal cloud-fan.
- Added a redirection to avoid breaking internal and possible external 
links.
- Removed unnecessary redirection pages that were there since the separate 
scala, java, and python programming guides were merged together in 2013 or 2014.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

Author: Tathagata Das 

Closes #18485 from tdas/SPARK-21267.

(cherry picked from commit 0217dfd26f89133f146197359b556c9bf5aca172)
Signed-off-by: Shixiong Zhu 

commit ab12848d624f6b74d401e924255c0b4fcc535231
Author: Prashant Sharma 
Date:   2017-07-08T06:33:12Z

[SPARK-21069][SS][DOCS] Add rate source to programming guide.

## What changes were proposed in this pull request?

SPARK-20979 added a new structured streaming source: Rate source. This 
patch adds the corresponding documentation to programming guide.

## How was this patch tested?

Tested by running jekyll locally.

Author: Prashant Sharma 
Author: Prashant Sharma 

Closes #18562 from ScrapCodes/spark-21069/rate-source-docs.

(cherry picked from commit d0bfc6733521709e453d643582df2bdd68f28de7)
Signed-off-by: Shixiong Zhu 

commit 7d0b1c927d92cc2a4932262514ffd12c47593b80
Author: Bogdan Raducanu 
Date:   2017-07-08T12:14:59Z

[SPARK-21228][SQL][BRANCH-2.2] InSet incorrect handling of structs

## What changes were proposed in this pull request?

This is backport of https://github.com/apache/spark/pull/18455
When data type is struct, InSet now uses TypeUtils.getInterpretedOrdering 
(similar to EqualTo) to build a TreeSet. In other cases it will use a HashSet 
as before (which should be faster). Similarly, In.eval uses Ordering.equiv 
instead of equals.

## How was this patch tested?
New test in SQLQuerySuite.

Author: Bogdan Raducanu 

Closes #18563 from bogdanrdc/SPARK-21228-BRANCH2.2.

commit a64f10800244a8057f7f32c3d2f4a719c5080d05
Author: Dongjoon Hyun 
Date:   2017-07-08T12:16:47Z

[SPARK-21345][SQL][TEST][TEST-MAVEN] SparkSessionBuilderSuite should clean 
up stopped sessions.

`SparkSessionBuilderSuite` should clean up stopped sessions. Otherwise, it 
leaves behind some stopped `SparkContext`s interfereing with other test suites 
using `ShardSQLContext`.

Recently, master branch fails consequtively.
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/

Pass the Jenkins with a updated suite.

Author: Dongjoon Hyun 

Closes #18567 from dongjoon-hyun/SPARK-SESSION.

(cherry picked from commit 0b8dd2d08460f3e6eb578727d2c336b6f11959e7)
Signed-off-by: Wenchen Fan 

commit c8d7855b905742033b7588ce7ee28bc23de13709
Author: Marcelo Vanzin 
Date:   2017-07-08T16:24

[GitHub] spark pull request #20586: Branch 2.1

2018-02-12 Thread zhuge134
GitHub user zhuge134 opened a pull request:

https://github.com/apache/spark/pull/20586

Branch 2.1

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20586.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20586


commit 21afc4534f90e063330ad31033aa178b37ef8340
Author: Marcelo Vanzin 
Date:   2017-02-22T21:19:31Z

[SPARK-19652][UI] Do auth checks for REST API access (branch-2.1).

The REST API has a security filter that performs auth checks
based on the UI root's security manager. That works fine when
the UI root is the app's UI, but not when it's the history server.

In the SHS case, all users would be allowed to see all applications
through the REST API, even if the UI itself wouldn't be available
to them.

This change adds auth checks for each app access through the API
too, so that only authorized users can see the app's data.

The change also modifies the existing security filter to use
`HttpServletRequest.getRemoteUser()`, which is used in other
places. That is not necessarily the same as the principal's
name; for example, when using Hadoop's SPNEGO auth filter,
the remote user strips the realm information, which then matches
the user name registered as the owner of the application.

I also renamed the UIRootFromServletContext trait to a more generic
name since I'm using it to store more context information now.

Tested manually with an authentication filter enabled.

Author: Marcelo Vanzin 

Closes #17019 from vanzin/SPARK-19652_2.1.

commit d30238f1b9096c9fd85527d95be639de9388fcc7
Author: actuaryzhang 
Date:   2017-02-23T19:12:02Z

[SPARK-19682][SPARKR] Issue warning (or error) when subset method "[[" 
takes vector index

## What changes were proposed in this pull request?
The `[[` method is supposed to take a single index and return a column. 
This is different from base R which takes a vector index.  We should check for 
this and issue warning or error when vector index is supplied (which is very 
likely given the behavior in base R).

Currently I'm issuing a warning message and just take the first element of 
the vector index. We could change this to an error it that's better.

## How was this patch tested?
new tests

Author: actuaryzhang 

Closes #17017 from actuaryzhang/sparkRSubsetter.

(cherry picked from commit 7bf09433f5c5e08154ba106be21fe24f17cd282b)
Signed-off-by: Felix Cheung 

commit 43084b3cc3918b720fe28053d2037fa22a71264e
Author: Herman van Hovell 
Date:   2017-02-23T22:58:02Z

[SPARK-19459][SQL][BRANCH-2.1] Support for nested char/varchar fields in ORC

## What changes were proposed in this pull request?
This is a backport of the two following commits: 
https://github.com/apache/spark/commit/78eae7e67fd5dec0c2d5b1853ce86cd0f1ae 
& 
https://github.com/apache/spark/commit/de8a03e68202647555e30fffba551f65bc77608d

This PR adds support for ORC tables with (nested) char/varchar fields.

## How was this patch tested?
Added a regression test to `OrcSourceSuite`.

Author: Herman van Hovell 

Closes #17041 from hvanhovell/SPARK-19459-branch-2.1.

commit 66a7ca28a9de92e67ce24896a851a0c96c92aec6
Author: Takeshi Yamamuro 
Date:   2017-02-24T09:54:00Z

[SPARK-19691][SQL][BRANCH-2.1] Fix ClassCastException when calculating 
percentile of decimal column

## What changes were proposed in this pull request?
This is a backport of the two following commits: 
https://github.com/apache/spark/commit/93aa4271596a30752dc5234d869c3ae2f6e8e723

This pr fixed a class-cast exception below;
```
scala> spark.range(10).selectExpr("cast (id as decimal) as 
x").selectExpr("percentile(x, 0.5)").collect()
 java.lang.ClassCastException: org.apache.spark.sql.types.Decimal cannot be 
cast to java.lang.Number
at 
org.apache.spark.sql.catalyst.expressions.aggregate.Percentile.update(Percentile.scala:141)
at 
org.apache.spark.sql.catalyst.expressions.aggregate.Percentile.update(Percentile.scala:58)
at 
org.apache.sp