[GitHub] spark pull request #19541: ABCD

2017-10-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19541


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19541: ABCD

2017-10-20 Thread souravaswal
GitHub user souravaswal opened a pull request:

https://github.com/apache/spark/pull/19541

ABCD

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19541.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19541


commit 9e451bcf36151bf401f72dcd66001b9ceb079738
Author: Dongjoon Hyun 
Date:   2017-09-05T21:35:09Z

[MINOR][DOC] Update `Partition Discovery` section to enumerate all 
available file sources

## What changes were proposed in this pull request?

All built-in data sources support `Partition Discovery`. We had better 
update the document to give the users more benefit clearly.

**AFTER**

https://user-images.githubusercontent.com/9700541/30083628-14278908-9244-11e7-98dc-9ad45fe233a9.png;>

## How was this patch tested?

```
SKIP_API=1 jekyll serve --watch
```

Author: Dongjoon Hyun 

Closes #19139 from dongjoon-hyun/partitiondiscovery.

commit 6a2325448000ba431ba3b982d181c017559abfe3
Author: jerryshao 
Date:   2017-09-06T01:39:39Z

[SPARK-18061][THRIFTSERVER] Add spnego auth support for ThriftServer 
thrift/http protocol

Spark ThriftServer doesn't support spnego auth for thrift/http protocol, 
this mainly used for knox+thriftserver scenario. Since in HiveServer2 
CLIService there already has existing codes to support it. So here copy it to 
Spark ThriftServer to make it support.

Related Hive JIRA HIVE-6697.

Manual verification.

Author: jerryshao 

Closes #18628 from jerryshao/SPARK-21407.

Change-Id: I61ef0c09f6972bba982475084a6b0ae3a74e385e

commit 445f1790ade1c53cf7eee1f282395648e4d0992c
Author: jerryshao 
Date:   2017-09-06T04:28:54Z

[SPARK-9104][CORE] Expose Netty memory metrics in Spark

## What changes were proposed in this pull request?

This PR exposes Netty memory usage for Spark's `TransportClientFactory` and 
`TransportServer`, including the details of each direct arena and heap arena 
metrics, as well as aggregated metrics. The purpose of adding the Netty metrics 
is to better know the memory usage of Netty in Spark shuffle, rpc and others 
network communications, and guide us to better configure the memory size of 
executors.

This PR doesn't expose these metrics to any sink, to leverage this feature, 
still requires to connect to either MetricsSystem or collect them back to 
Driver to display.

## How was this patch tested?

Add Unit test to verify it, also manually verified in real cluster.

Author: jerryshao 

Closes #18935 from jerryshao/SPARK-9104.

commit 4ee7dfe41b27abbd4c32074ecc8f268f6193c3f4
Author: Riccardo Corbella 
Date:   2017-09-06T07:22:57Z

[SPARK-21924][DOCS] Update structured streaming programming guide doc

## What changes were proposed in this pull request?

Update the line "For example, the data (12:09, cat) is out of order and 
late, and it falls in windows 12:05 - 12:15 and 12:10 - 12:20." as follow "For 
example, the data (12:09, cat) is out of order and late, and it falls in 
windows 12:00 - 12:10 and 12:05 - 12:15." under the programming structured 
streaming programming guide.

Author: Riccardo Corbella 

Closes #19137 from riccardocorbella/bugfix.

commit 16c4c03c71394ab30c8edaf4418973e1a2c5ebfe
Author: Bryan Cutler 
Date:   2017-09-06T12:12:27Z

[SPARK-19357][ML] Adding parallel model evaluation in ML tuning

## What changes were proposed in this pull request?
Modified `CrossValidator` and `TrainValidationSplit` to be able to evaluate 
models in parallel for a given parameter grid.  The level of parallelism is 
controlled by a parameter `numParallelEval` used to schedule a number of models 
to be trained/evaluated so that the jobs can be run concurrently.  This is a 
naive approach that does not check the cluster for needed resources, so care 
must be taken by the user to tune the parameter appropriately.  The default 
value is `1` which will train/evaluate in serial.

## How was this