[GitHub] spark pull request #19187: Branch 2.1

2017-09-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19187


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19187: Branch 2.1

2017-09-11 Thread engineeyao
GitHub user engineeyao opened a pull request:

https://github.com/apache/spark/pull/19187

Branch 2.1

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19187.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19187


commit 62fab5beee147c90d8b7f8092b4ee76ba611ee8e
Author: uncleGen 
Date:   2017-02-07T05:03:20Z

[SPARK-19407][SS] defaultFS is used FileSystem.get instead of getting it 
from uri scheme

## What changes were proposed in this pull request?

```
Caused by: java.lang.IllegalArgumentException: Wrong FS: 
s3a://**/checkpoint/7b2231a3-d845-4740-bfa3-681850e5987f/metadata, 
expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:649)
at 
org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82)
at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:606)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
at 
org.apache.spark.sql.execution.streaming.StreamMetadata$.read(StreamMetadata.scala:51)
at 
org.apache.spark.sql.execution.streaming.StreamExecution.(StreamExecution.scala:100)
at 
org.apache.spark.sql.streaming.StreamingQueryManager.createQuery(StreamingQueryManager.scala:232)
at 
org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:269)
at 
org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:262)
```

Can easily replicate on spark standalone cluster by providing checkpoint 
location uri scheme anything other than "file://" and not overriding in config.

WorkAround  --conf spark.hadoop.fs.defaultFS=s3a://somebucket or set it in 
sparkConf or spark-default.conf

## How was this patch tested?

existing ut

Author: uncleGen 

Closes #16815 from uncleGen/SPARK-19407.

(cherry picked from commit 7a0a630e0f699017c7d0214923cd4aa0227e62ff)
Signed-off-by: Shixiong Zhu 

commit dd1abef138581f30ab7a8dfacb616fe7dd64b421
Author: Aseem Bansal 
Date:   2017-02-07T11:44:14Z

[SPARK-19444][ML][DOCUMENTATION] Fix imports not being present in 
documentation

## What changes were proposed in this pull request?

SPARK-19444 imports not being present in documentation

## How was this patch tested?

Manual

## Disclaimer

Contribution is original work and I license the work to the project under 
the project’s open source license

Author: Aseem Bansal 

Closes #16789 from anshbansal/patch-1.

(cherry picked from commit aee2bd2c7ee97a58f0adec82ec52e5625b39e804)
Signed-off-by: Sean Owen 

commit e642a07d57798f98b25ba08ed7ae3abe0f597941
Author: Tyson Condie 
Date:   2017-02-07T22:31:23Z

[SPARK-18682][SS] Batch Source for Kafka

Today, you can start a stream that reads from kafka. However, given kafka's 
configurable retention period, it seems like sometimes you might just want to 
read all of the data that is available now. As such we should add a version 
that works with spark.read as well.
The options should be the same as the streaming kafka source, with the 
following differences:
startingOffsets should default to earliest, and should not allow latest 
(which would always be empty).
endingOffsets should also be allowed and should default to latest. the same 
assign json format as startingOffsets should also be accepted.
It would be really good, if things like .limit(n) were enough to prevent 
all the data from being read (this might just work).

KafkaRelationSuite was added for testing batch queries via KafkaUtils.

Author: Tyson Condie 

Closes