[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19861
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19861
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84995/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19861
  
**[Test build #84995 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84995/testReport)**
 for PR 19861 at commit 
[`5292329`](https://github.com/apache/spark/commit/52923296a946ac734c988fe10725921ea3c2b313).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19998
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84993/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-12-15 Thread yashs360
Github user yashs360 commented on the issue:

https://github.com/apache/spark/pull/18029
  
Hi @brkyvz Thinking on these lines, Adding them as Java objects adds more 
complexity to our design. We again have to think about making the objects 
singleton and thread safe. The Scala case class were very simple and minimal.

This is how we would have to implement the java classes for the Initial 
positions. It looks a bit unclean to me. Thoughts ?

```

abstract class InitialPosition {
public static final InitialPositionInStream initialPositionInStream = 
InitialPositionInStream.LATEST;
}

class Latest extends InitialPosition {
private static final Latest instance = new Latest();
static final InitialPositionInStream initialPositionInStream = 
InitialPositionInStream.LATEST;

private Latest(){}

public static InitialPosition getInstance() {
return instance;
}
}

class TrimHorizon extends InitialPosition {
private static final TrimHorizon instance = new TrimHorizon();
static final InitialPositionInStream initialPositionInStream = 
InitialPositionInStream.TRIM_HORIZON;

private TrimHorizon(){}

public static InitialPosition getInstance() {
return instance;
}
}

class AtTimestamp extends InitialPosition {
static final InitialPositionInStream initialPositionInStream = 
InitialPositionInStream.AT_TIMESTAMP;
Date timestamp;

private AtTimestamp(Date timestamp){
this.timestamp = timestamp;
}

public static InitialPosition getInstance(Date timestamp) {
return new AtTimestamp(timestamp);
}
}

```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19998
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19998
  
**[Test build #84993 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84993/testReport)**
 for PR 19998 at commit 
[`964e5ff`](https://github.com/apache/spark/commit/964e5ff22cefe336cd47d3a9309a8d1428b476b6).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19999: JDBC support date/timestamp type as partitionColumn

2017-12-15 Thread morenn520
Github user morenn520 commented on the issue:

https://github.com/apache/spark/pull/1
  
@gatorsmile we fix it in SPARK 1.6.2, and take in use for two month. For 
further reason, I give one pr on master branch. I will test it next week.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19999: JDBC support date/timestamp type as partitionColumn

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/1
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19999: JDBC support date/timestamp type as partitionColumn

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/1
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84998/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19999: JDBC support date/timestamp type as partitionColumn

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/1
  
**[Test build #84998 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84998/testReport)**
 for PR 1 at commit 
[`d1d310c`](https://github.com/apache/spark/commit/d1d310c6df782830378083d4bd80762591ba867e).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19999: JDBC support date/timestamp type as partitionColumn

2017-12-15 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/1
  
The support is interesting, but the current impl is not clean. cc 
@dongjoon-hyun Could you help reviewing this PR?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19999: JDBC support date/timestamp type as partitionColumn

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/1
  
**[Test build #84998 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84998/testReport)**
 for PR 1 at commit 
[`d1d310c`](https://github.com/apache/spark/commit/d1d310c6df782830378083d4bd80762591ba867e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19999: JDBC support date/timestamp type as partitionColumn

2017-12-15 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/1
  
Please update the PR title 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19998
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19998
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84992/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19999: JDBC support date/timestamp type as partitionColumn

2017-12-15 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/1
  
Could you write a test case? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19999: JDBC support date/timestamp type as partitionColumn

2017-12-15 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/1
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19998
  
**[Test build #84992 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84992/testReport)**
 for PR 19998 at commit 
[`b384336`](https://github.com/apache/spark/commit/b384336d9b71b992ce6478b56378b7b1cabdbd3c).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19999: JDBC support date/timestamp type as partitionColumn

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/1
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19999: JDBC support date/timestamp type as partitionColu...

2017-12-15 Thread morenn520
GitHub user morenn520 opened a pull request:

https://github.com/apache/spark/pull/1

JDBC support date/timestamp type as partitionColumn

Jira: https://issues.apache.org/jira/browse/SPARK-22814

PartitionColumn must be a numeric column from the table.
However, there are lots of table, which has no primary key, and has some 
date/timestamp indexes.

This patch solve this problem.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/morenn520/spark SPARK-22814

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1


commit d1d310c6df782830378083d4bd80762591ba867e
Author: Chen Yuechen 
Date:   2017-12-16T06:26:57Z

JDBC support date/timestamp type as partitionColumn




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19594
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84991/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19594
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19594
  
**[Test build #84991 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84991/testReport)**
 for PR 19594 at commit 
[`2637429`](https://github.com/apache/spark/commit/263742914e21ba607904acb0ad35ced32aad48ab).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19998
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19998
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84990/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19998
  
**[Test build #84990 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84990/testReport)**
 for PR 19998 at commit 
[`969bc22`](https://github.com/apache/spark/commit/969bc227f255d721044e057da633c5f2becca2af).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19998
  
**[Test build #84997 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84997/testReport)**
 for PR 19998 at commit 
[`6c29a11`](https://github.com/apache/spark/commit/6c29a11e6f08a83cd10eaeda3240b49f15aea07b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof i...

2017-12-15 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19998#discussion_r157334828
  
--- Diff: dev/run-tests.py ---
@@ -253,9 +253,11 @@ def kill_zinc_on_port(zinc_port):
 """
 Kill the Zinc process running on the given port, if one exists.
 """
-cmd = ("/usr/sbin/lsof -P |grep %s | grep LISTEN "
-   "| awk '{ print $2; }' | xargs kill") % zinc_port
-subprocess.check_call(cmd, shell=True)
+cmd = "%s -P |grep %s | grep LISTEN | awk '{ print $2; }' | xargs kill"
--- End diff --

Great catch, I also check it.

```
>>> print(which("lsof"))
/usr/bin/lsof
>>> 
% ls /usr/bin/lsof /usr/sbin/lsof
ls: cannot access '/usr/sbin/lsof': No such file or directory
/usr/bin/lsof
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19995
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19995
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84986/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19995
  
**[Test build #84986 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84986/testReport)**
 for PR 19995 at commit 
[`b3e1af3`](https://github.com/apache/spark/commit/b3e1af3b3f4efad820dad9e989c580c74654390f).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19977
  
**[Test build #84996 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84996/testReport)**
 for PR 19977 at commit 
[`2d3926e`](https://github.com/apache/spark/commit/2d3926e546b3aa60c46449151706941ed17e2441).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19977
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84987/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19977
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-15 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19977
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19977
  
**[Test build #84987 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84987/testReport)**
 for PR 19977 at commit 
[`2d3926e`](https://github.com/apache/spark/commit/2d3926e546b3aa60c46449151706941ed17e2441).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  class FunctionArgumentConversion(conf: SQLConf) extends 
TypeCoercionRule `
  * `case class Concat(children: Seq[Expression], isBinaryMode: Boolean = 
false)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19594
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19594
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84989/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19594
  
**[Test build #84989 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84989/testReport)**
 for PR 19594 at commit 
[`2a4ee99`](https://github.com/apache/spark/commit/2a4ee99526c654834f3a50ef66e674bda673f926).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19861
  
**[Test build #84995 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84995/testReport)**
 for PR 19861 at commit 
[`5292329`](https://github.com/apache/spark/commit/52923296a946ac734c988fe10725921ea3c2b313).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...

2017-12-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19861
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...

2017-12-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19861
  
Hm .. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19861
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19861
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84988/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19861
  
**[Test build #84988 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84988/testReport)**
 for PR 19861 at commit 
[`5292329`](https://github.com/apache/spark/commit/52923296a946ac734c988fe10725921ea3c2b313).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof i...

2017-12-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19998#discussion_r157333856
  
--- Diff: dev/run-tests.py ---
@@ -253,9 +253,11 @@ def kill_zinc_on_port(zinc_port):
 """
 Kill the Zinc process running on the given port, if one exists.
 """
-cmd = ("/usr/sbin/lsof -P |grep %s | grep LISTEN "
-   "| awk '{ print $2; }' | xargs kill") % zinc_port
-subprocess.check_call(cmd, shell=True)
+cmd = "%s -P |grep %s | grep LISTEN | awk '{ print $2; }' | xargs kill"
--- End diff --

Ah, @kiszk, I think we can actually use 
`sparktestsupport.shellutils.which("...")` too like what we do for java:


https://github.com/apache/spark/blob/964e5ff22cefe336cd47d3a9309a8d1428b476b6/dev/run-tests.py#L153

So, like ..

```python
cmd = "%s -P |grep %s | grep LISTEN | awk '{ print $2; }' | xargs kill"
lsof_exe = which("lsof") 
subprocess.check_call(cmd % (lsof_exe if lsof_exe else "/usr/sbin/lsof", 
zinc_port), shell=True)
```

I just double checked:
```
>>> lsof_exe = which("lsof")
>>> cmd % (lsof_exe if lsof_exe else "/usr/sbin/lsof", zinc_port)
"/usr/sbin/lsof -P |grep 1234 | grep LISTEN | awk '{ print $2; }' | xargs 
kill"
>>> lsof_exe = which("lsof")
>>> cmd % (lsof_exe if lsof_exe else "/usr/bin/lsof", zinc_port)
"/usr/sbin/lsof -P |grep 1234 | grep LISTEN | awk '{ print $2; }' | xargs 
kill"
```

```
>>> lsof_exe = which("foo")
>>> cmd % (lsof_exe if lsof_exe else "/usr/sbin/lsof", zinc_port)
"/usr/sbin/lsof -P |grep 1234 | grep LISTEN | awk '{ print $2; }' | xargs 
kill"
>>> lsof_exe = which("bar")
>>> cmd % (lsof_exe if lsof_exe else "/usr/bin/lsof", zinc_port)
"/usr/bin/lsof -P |grep 1234 | grep LISTEN | awk '{ print $2; }' | xargs 
kill"
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19981
  
**[Test build #84994 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84994/testReport)**
 for PR 19981 at commit 
[`bc300f9`](https://github.com/apache/spark/commit/bc300f9a31a351f8630c9b9b189f5b499fd858a1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...

2017-12-15 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/19981
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19981: [SPARK-22786][SQL] only use AppStatusPlugin in hi...

2017-12-15 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19981#discussion_r15724
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
@@ -82,6 +82,19 @@ private[sql] class SharedState(val sparkContext: 
SparkContext) extends Logging {
*/
   val cacheManager: CacheManager = new CacheManager
 
+  /**
+   * A status store to query SQL status/metrics of this Spark application, 
based on SQL-specific
+   * [[org.apache.spark.scheduler.SparkListenerEvent]]s.
+   */
+  val statusStore: SQLAppStatusStore = {
--- End diff --

Sure, it's fine if you want to expose it. But I'm pointing out that it's 
pretty weird to expose a class in a ".internal" package through the API. Those 
are not documented nor go through mima checks, so there's absolutely zero 
guarantees about them.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19998
  
**[Test build #84993 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84993/testReport)**
 for PR 19998 at commit 
[`964e5ff`](https://github.com/apache/spark/commit/964e5ff22cefe336cd47d3a9309a8d1428b476b6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof i...

2017-12-15 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19998#discussion_r157333278
  
--- Diff: dev/run-tests.py ---
@@ -253,9 +253,11 @@ def kill_zinc_on_port(zinc_port):
 """
 Kill the Zinc process running on the given port, if one exists.
 """
-cmd = ("/usr/sbin/lsof -P |grep %s | grep LISTEN "
-   "| awk '{ print $2; }' | xargs kill") % zinc_port
-subprocess.check_call(cmd, shell=True)
+cmd = "%s -P |grep %s | grep LISTEN | awk '{ print $2; }' | xargs kill"
+try:
+subprocess.check_call(cmd % ("lsof", zinc_port), shell=True)
+except:
+subprocess.call(cmd % ("/usr/sbin/lsof", zinc_port), shell=True)
--- End diff --

I see. Since this change is not strong preference, I will revert this 
change to keep the original behavior.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19954
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84985/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19954
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19954
  
**[Test build #84985 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84985/testReport)**
 for PR 19954 at commit 
[`46a8c99`](https://github.com/apache/spark/commit/46a8c9961312ee820743ddf893cc8666ce9360fa).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof i...

2017-12-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19998#discussion_r157333189
  
--- Diff: dev/run-tests.py ---
@@ -253,9 +253,11 @@ def kill_zinc_on_port(zinc_port):
 """
 Kill the Zinc process running on the given port, if one exists.
 """
-cmd = ("/usr/sbin/lsof -P |grep %s | grep LISTEN "
-   "| awk '{ print $2; }' | xargs kill") % zinc_port
-subprocess.check_call(cmd, shell=True)
+cmd = "%s -P |grep %s | grep LISTEN | awk '{ print $2; }' | xargs kill"
+try:
+subprocess.check_call(cmd % ("lsof", zinc_port), shell=True)
+except:
+subprocess.call(cmd % ("/usr/sbin/lsof", zinc_port), shell=True)
--- End diff --

Hm, but it changes what originally `kill_zinc_on_port` does though because 
now it is not guaranteed to kill it. I see the point but let's stick to the 
original behaviour.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof i...

2017-12-15 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19998#discussion_r157333050
  
--- Diff: dev/run-tests.py ---
@@ -253,9 +253,11 @@ def kill_zinc_on_port(zinc_port):
 """
 Kill the Zinc process running on the given port, if one exists.
 """
-cmd = ("/usr/sbin/lsof -P |grep %s | grep LISTEN "
-   "| awk '{ print $2; }' | xargs kill") % zinc_port
-subprocess.check_call(cmd, shell=True)
+cmd = "%s -P |grep %s | grep LISTEN | awk '{ print $2; }' | xargs kill"
+try:
+subprocess.check_call(cmd % ("lsof", zinc_port), shell=True)
+except:
+subprocess.call(cmd % ("/usr/sbin/lsof", zinc_port), shell=True)
--- End diff --

I intentionally use `subprocess.call` to continue the execution even if 
`lsof` and `/usr/sbin/lsof` do not exist. This is because it is ok for other 
steps if we fail to kill `zinc`.

WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof i...

2017-12-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19998#discussion_r157332995
  
--- Diff: dev/run-tests.py ---
@@ -253,9 +253,11 @@ def kill_zinc_on_port(zinc_port):
 """
 Kill the Zinc process running on the given port, if one exists.
 """
-cmd = ("/usr/sbin/lsof -P |grep %s | grep LISTEN "
-   "| awk '{ print $2; }' | xargs kill") % zinc_port
-subprocess.check_call(cmd, shell=True)
+cmd = "%s -P |grep %s | grep LISTEN | awk '{ print $2; }' | xargs kill"
+try:
+subprocess.check_call(cmd % ("lsof", zinc_port), shell=True)
+except:
+subprocess.call(cmd % ("/usr/sbin/lsof", zinc_port), shell=True)
--- End diff --

Maybe, `subprocess.call` -> `subprocess.check_call`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19998
  
**[Test build #84992 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84992/testReport)**
 for PR 19998 at commit 
[`b384336`](https://github.com/apache/spark/commit/b384336d9b71b992ce6478b56378b7b1cabdbd3c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19955: [SPARK-21867][CORE] Support async spilling in UnsafeShuf...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19955
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19955: [SPARK-21867][CORE] Support async spilling in UnsafeShuf...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19955
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84983/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19955: [SPARK-21867][CORE] Support async spilling in UnsafeShuf...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19955
  
**[Test build #84983 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84983/testReport)**
 for PR 19955 at commit 
[`59e7720`](https://github.com/apache/spark/commit/59e7720cf0895d4359decdee57eec6fc11bc2fe0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class MultiShuffleSorter extends ShuffleSorter `
  * `public class ShuffleExternalSorter extends ShuffleSorter `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19594
  
**[Test build #84991 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84991/testReport)**
 for PR 19594 at commit 
[`2637429`](https://github.com/apache/spark/commit/263742914e21ba607904acb0ad35ced32aad48ab).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19996: [MINOR][DOC] Fix the link of 'Getting Started'

2017-12-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19996
  
Oh, maybe it's not quite related but do you mind if I ask to fix the below 
too?


https://github.com/apache/spark/blob/ccdf21f56e4ff5497d7770dcbee2f7a60bb9e3a7/docs/sql-programming-guide.md#L501-L504

to (just adding a newline)

```
  
  

 ### Run SQL on files directly 
```

because it currently breaks doc rendering as below:

https://user-images.githubusercontent.com/6477701/31481516-cd9ddb80-af5e-11e7-970b-d2c279f025d4.png;
 width="200" />



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19984: [SPARK-22789] Map-only continuous processing execution

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19984
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84981/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19984: [SPARK-22789] Map-only continuous processing execution

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19984
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19984: [SPARK-22789] Map-only continuous processing execution

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19984
  
**[Test build #84981 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84981/testReport)**
 for PR 19984 at commit 
[`f50488c`](https://github.com/apache/spark/commit/f50488cf94ab015019e99d187b54ab922e4ca6c2).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof i...

2017-12-15 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19998#discussion_r157332103
  
--- Diff: dev/run-tests.py ---
@@ -253,9 +253,14 @@ def kill_zinc_on_port(zinc_port):
 """
 Kill the Zinc process running on the given port, if one exists.
 """
-cmd = ("/usr/sbin/lsof -P |grep %s | grep LISTEN "
-   "| awk '{ print $2; }' | xargs kill") % zinc_port
-subprocess.check_call(cmd, shell=True)
+try:
+cmd = ("lsof -P |grep %s | grep LISTEN "
+   "| awk '{ print $2; }' | xargs kill") % zinc_port
+subprocess.check_call(cmd, shell=True)
+except:
--- End diff --

Yes, if the command does not exist, an exception occurs. Thus, we can 
execute one of the two cases.

Yea, to use `cmd` is fine.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof i...

2017-12-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19998#discussion_r157331922
  
--- Diff: dev/run-tests.py ---
@@ -253,9 +253,14 @@ def kill_zinc_on_port(zinc_port):
 """
 Kill the Zinc process running on the given port, if one exists.
 """
-cmd = ("/usr/sbin/lsof -P |grep %s | grep LISTEN "
-   "| awk '{ print $2; }' | xargs kill") % zinc_port
-subprocess.check_call(cmd, shell=True)
+try:
+cmd = ("lsof -P |grep %s | grep LISTEN "
+   "| awk '{ print $2; }' | xargs kill") % zinc_port
+subprocess.check_call(cmd, shell=True)
+except:
--- End diff --

Could we catch the explicit exception?

Also, I think we could this like:

```python
cmd = "%s -P |grep %s | grep LISTEN | awk '{ print $2; }' | xargs kill"
...
lsof = "lsof"
subprocess.check_call(cmd % (lsof  zinc_port), shell=True)
...
lsof = "/usr/sbin/lsof"
subprocess.check_call(cmd % (lsof  zinc_port), shell=True)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...

2017-12-15 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19998
  
@srowen @HyukjinKwon could you please review this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19998
  
**[Test build #84990 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84990/testReport)**
 for PR 19998 at commit 
[`969bc22`](https://github.com/apache/spark/commit/969bc227f255d721044e057da633c5f2becca2af).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...

2017-12-15 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19594#discussion_r157331840
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala
 ---
@@ -114,4 +115,183 @@ object EstimationUtils {
 }
   }
 
+  /**
+   * Returns overlapped ranges between two histograms, in the given value 
range [newMin, newMax].
+   */
+  def getOverlappedRanges(
+  leftHistogram: Histogram,
+  rightHistogram: Histogram,
+  newMin: Double,
+  newMax: Double): Seq[OverlappedRange] = {
+val overlappedRanges = new ArrayBuffer[OverlappedRange]()
+// Only bins whose range intersect [newMin, newMax] have join 
possibility.
+val leftBins = leftHistogram.bins
+  .filter(b => b.lo <= newMax && b.hi >= newMin)
+val rightBins = rightHistogram.bins
+  .filter(b => b.lo <= newMax && b.hi >= newMin)
+
+leftBins.foreach { lb =>
+  rightBins.foreach { rb =>
--- End diff --

We only collect `OverlappedRange` when [left part and right part 
intersect](https://github.com/apache/spark/pull/19594/files#diff-56eed9f23127c954d9add0f6c5c93820R237),
 and the decision is based on some computation, it's not very convenient to use 
it as guards. So it seems `yield` form is not very suitable for this case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19998: [SPARK-22377][BUILD] Use lsof or /usr/sbin/lsof i...

2017-12-15 Thread kiszk
GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/19998

[SPARK-22377][BUILD] Use lsof or /usr/sbin/lsof in run-tests.py

## What changes were proposed in this pull request?

In [the environment where `/usr/sbin/lsof` does not 
exist](https://github.com/apache/spark/pull/19695#issuecomment-342865001), 
`./dev/run-tests.py` for `maven` causes the following error. This is because 
the current `./dev/run-tests.py` checks existence of only `/usr/sbin/lsof` and 
aborts immediately if it does not exist.

This PR changes as follows:

1. Check whether `lsof` or `/usr/sbin/lsof` exists
2. Go forward if both of them do not exist

```
/bin/sh: 1: /usr/sbin/lsof: not found

Usage:
 kill [options]  [...]

Options:
  [...]send signal to every  listed
 -, -s, --signal 
specify the  to be sent
 -l, --list=[]  list all signal names, or convert one to a name
 -L, --tablelist all signal names in a nice table

 -h, --help display this help and exit
 -V, --version  output version information and exit

For more details see kill(1).
Traceback (most recent call last):
  File "./dev/run-tests.py", line 626, in 
main()
  File "./dev/run-tests.py", line 597, in main
build_apache_spark(build_tool, hadoop_version)
  File "./dev/run-tests.py", line 389, in build_apache_spark
build_spark_maven(hadoop_version)
  File "./dev/run-tests.py", line 329, in build_spark_maven
exec_maven(profiles_and_goals)
  File "./dev/run-tests.py", line 270, in exec_maven
kill_zinc_on_port(zinc_port)
  File "./dev/run-tests.py", line 258, in kill_zinc_on_port
subprocess.check_call(cmd, shell=True)
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '/usr/sbin/lsof -P |grep 3156 | grep 
LISTEN | awk '{ print $2; }' | xargs kill' returned non-zero exit status 123
```

## How was this patch tested?

manually tested

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-22813

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19998.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19998


commit 969bc227f255d721044e057da633c5f2becca2af
Author: Kazuaki Ishizaki 
Date:   2017-12-16T02:14:14Z

initial commit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19594
  
**[Test build #84989 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84989/testReport)**
 for PR 19594 at commit 
[`2a4ee99`](https://github.com/apache/spark/commit/2a4ee99526c654834f3a50ef66e674bda673f926).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...

2017-12-15 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19594#discussion_r157331711
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala
 ---
@@ -191,8 +191,16 @@ case class JoinEstimation(join: Join) extends Logging {
   val rInterval = ValueInterval(rightKeyStat.min, rightKeyStat.max, 
rightKey.dataType)
   if (ValueInterval.isIntersected(lInterval, rInterval)) {
 val (newMin, newMax) = ValueInterval.intersect(lInterval, 
rInterval, leftKey.dataType)
-val (card, joinStat) = computeByNdv(leftKey, rightKey, newMin, 
newMax)
-keyStatsAfterJoin += (leftKey -> joinStat, rightKey -> joinStat)
+val (card, joinStat) = (leftKeyStat.histogram, 
rightKeyStat.histogram) match {
+  case (Some(l: Histogram), Some(r: Histogram)) =>
+computeByEquiHeightHistogram(leftKey, rightKey, l, r, newMin, 
newMax)
+  case _ =>
+computeByNdv(leftKey, rightKey, newMin, newMax)
+}
+keyStatsAfterJoin += (
+  leftKey -> joinStat.copy(histogram = leftKeyStat.histogram),
+  rightKey -> joinStat.copy(histogram = rightKeyStat.histogram)
--- End diff --

ah right, we can keep it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19981: [SPARK-22786][SQL] only use AppStatusPlugin in hi...

2017-12-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19981#discussion_r157331435
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
@@ -82,6 +82,19 @@ private[sql] class SharedState(val sparkContext: 
SparkContext) extends Logging {
*/
   val cacheManager: CacheManager = new CacheManager
 
+  /**
+   * A status store to query SQL status/metrics of this Spark application, 
based on SQL-specific
+   * [[org.apache.spark.scheduler.SparkListenerEvent]]s.
+   */
+  val statusStore: SQLAppStatusStore = {
--- End diff --

at least it's developer-facing, as a developer I don't care about the 
naming changing, or API changing, but I just want the same functionality.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...

2017-12-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19995
  
Seems it's related with 
https://github.com/apache/spark/commit/e58f275678fb4f904124a4a2a1762f04c835eb0e 
somehow and then fine back now. I am not yet entirely sure how this change 
relates to CRAN check. Will take a look soon.

Some related discussions - `https://github.com/apache/spark/pull/19721`, 
`https://github.com/apache/spark/pull/19944`, 
`https://github.com/apache/spark/pull/19957` and 
`https://github.com/apache/spark/pull/19961` in an order. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19681: [SPARK-20652][sql] Store SQL UI data in the new a...

2017-12-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19681#discussion_r157331270
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLListenerSuite.scala
 ---
@@ -36,13 +36,14 @@ import org.apache.spark.sql.catalyst.util.quietly
 import org.apache.spark.sql.execution.{LeafExecNode, QueryExecution, 
SparkPlanInfo, SQLExecution}
 import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics}
 import org.apache.spark.sql.test.SharedSQLContext
-import org.apache.spark.ui.SparkUI
+import org.apache.spark.status.config._
 import org.apache.spark.util.{AccumulatorMetadata, JsonProtocol, 
LongAccumulator}
-
+import org.apache.spark.util.kvstore.InMemoryStore
 
 class SQLListenerSuite extends SparkFunSuite with SharedSQLContext with 
JsonTestUtils {
   import testImplicits._
-  import org.apache.spark.AccumulatorSuite.makeInfo
+
+  override protected def sparkConf = 
super.sparkConf.set(LIVE_ENTITY_UPDATE_PERIOD, 0L)
--- End diff --

ah you are right, it's only shared in hive tests


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19861
  
**[Test build #84988 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84988/testReport)**
 for PR 19861 at commit 
[`5292329`](https://github.com/apache/spark/commit/52923296a946ac734c988fe10725921ea3c2b313).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19977
  
**[Test build #84987 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84987/testReport)**
 for PR 19977 at commit 
[`2d3926e`](https://github.com/apache/spark/commit/2d3926e546b3aa60c46449151706941ed17e2441).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19997: [SPARK-22811][pyspark][ml] Fix pyspark.ml.tests f...

2017-12-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19997


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...

2017-12-15 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/19995
  
The R tests are pretty flaky recently, any ideas @HyukjinKwon ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19997: [SPARK-22811][pyspark][ml] Fix pyspark.ml.tests failure ...

2017-12-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19997
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...

2017-12-15 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/19861
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19995
  
**[Test build #84986 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84986/testReport)**
 for PR 19995 at commit 
[`b3e1af3`](https://github.com/apache/spark/commit/b3e1af3b3f4efad820dad9e989c580c74654390f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...

2017-12-15 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/19995
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19997: [SPARK-22811][pyspark][ml] Fix pyspark.ml.tests f...

2017-12-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19997#discussion_r157330796
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -44,6 +44,7 @@
 import numpy as np
 from numpy import abs, all, arange, array, array_equal, inf, ones, tile, 
zeros
 import inspect
+import py4j
--- End diff --

Ah, it was my bad. Yup, you are right.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...

2017-12-15 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/19995
  
@ueshin @vanzin SparkR failure seems unrelated to me. Any ideas? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19995
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84979/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19995
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19995
  
**[Test build #84979 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84979/testReport)**
 for PR 19995 at commit 
[`b3e1af3`](https://github.com/apache/spark/commit/b3e1af3b3f4efad820dad9e989c580c74654390f).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...

2017-12-15 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/19995
  
LGTM pending tests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19954: [SPARK-22757][Kubernetes] Enable use of remote de...

2017-12-15 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19954#discussion_r157328425
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/rest/k8s/KubernetesSparkDependencyDownloadInitContainer.scala
 ---
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.rest.k8s
+
+import java.io.File
+import java.util.concurrent.TimeUnit
+
+import scala.concurrent.{ExecutionContext, Future}
+import scala.concurrent.duration.Duration
+
+import org.apache.spark.{SecurityManager => SparkSecurityManager, 
SparkConf}
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.{ThreadUtils, Utils}
+
+/**
+ * Process that fetches files from a resource staging server and/or 
arbitrary remote locations.
+ *
+ * The init-container can handle fetching files from any of those sources, 
but not all of the
+ * sources need to be specified. This allows for composing multiple 
instances of this container
+ * with different configurations for different download sources, or using 
the same container to
+ * download everything at once.
+ */
+private[spark] class KubernetesSparkDependencyDownloadInitContainer(
+sparkConf: SparkConf,
+fileFetcher: FileFetcher) extends Logging {
+
+  private implicit val downloadExecutor = 
ExecutionContext.fromExecutorService(
+ThreadUtils.newDaemonCachedThreadPool("download-executor"))
+
+  private val jarsDownloadDir = new File(
+sparkConf.get(JARS_DOWNLOAD_LOCATION))
+  private val filesDownloadDir = new File(
+sparkConf.get(FILES_DOWNLOAD_LOCATION))
+
+  private val remoteJars = sparkConf.get(INIT_CONTAINER_REMOTE_JARS)
+  private val remoteFiles = sparkConf.get(INIT_CONTAINER_REMOTE_FILES)
+
+  private val downloadTimeoutMinutes = 
sparkConf.get(INIT_CONTAINER_MOUNT_TIMEOUT)
+
+  def run(): Unit = {
+val remoteJarsDownload = Future[Unit] {
+  logInfo(s"Downloading remote jars: $remoteJars")
+  downloadFiles(
+remoteJars,
+jarsDownloadDir,
+s"Remote jars download directory specified at $jarsDownloadDir 
does not exist " +
+  "or is not a directory.")
+}
+val remoteFilesDownload = Future[Unit] {
+  logInfo(s"Downloading remote files: $remoteFiles")
+  downloadFiles(
+remoteFiles,
+filesDownloadDir,
+s"Remote files download directory specified at $filesDownloadDir 
does not exist " +
+  "or is not a directory.")
+}
+waitForFutures(
--- End diff --

Got it, will address this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19954: [SPARK-22757][Kubernetes] Enable use of remote de...

2017-12-15 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19954#discussion_r157328327
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -133,30 +132,78 @@ private[spark] object Config extends Logging {
 
   val JARS_DOWNLOAD_LOCATION =
 ConfigBuilder("spark.kubernetes.mountDependencies.jarsDownloadDir")
-  .doc("Location to download jars to in the driver and executors. When 
using" +
-" spark-submit, this directory must be empty and will be mounted 
as an empty directory" +
-" volume on the driver and executor pod.")
+  .doc("Location to download jars to in the driver and executors. When 
using " +
+"spark-submit, this directory must be empty and will be mounted as 
an empty directory " +
+"volume on the driver and executor pod.")
   .stringConf
   .createWithDefault("/var/spark-data/spark-jars")
 
   val FILES_DOWNLOAD_LOCATION =
 ConfigBuilder("spark.kubernetes.mountDependencies.filesDownloadDir")
-  .doc("Location to download files to in the driver and executors. 
When using" +
-" spark-submit, this directory must be empty and will be mounted 
as an empty directory" +
-" volume on the driver and executor pods.")
+  .doc("Location to download files to in the driver and executors. 
When using " +
+"spark-submit, this directory must be empty and will be mounted as 
an empty directory " +
+"volume on the driver and executor pods.")
   .stringConf
   .createWithDefault("/var/spark-data/spark-files")
 
+  val INIT_CONTAINER_DOCKER_IMAGE =
+ConfigBuilder("spark.kubernetes.initContainer.docker.image")
--- End diff --

Renamed to `spark.kubernetes.initContainer.image`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nested expr...

2017-12-15 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19813
  
@mgaido91 Thanks for the comment. I agreed that to make the contract is the 
easiest way. If we don't make this contract, seems to me a significant change 
is needed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19954
  
**[Test build #84985 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84985/testReport)**
 for PR 19954 at commit 
[`46a8c99`](https://github.com/apache/spark/commit/46a8c9961312ee820743ddf893cc8666ce9360fa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...

2017-12-15 Thread liyinan926
Github user liyinan926 commented on the issue:

https://github.com/apache/spark/pull/19954
  
@vanzin Addressed your comments so far. PTAL. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19954: [SPARK-22757][Kubernetes] Enable use of remote de...

2017-12-15 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19954#discussion_r157327812
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/rest/k8s/FileFetcher.scala
 ---
@@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.rest.k8s
+
+import java.io.File
+
+/**
+ * Utility for fetching remote file dependencies.
+ */
+private[spark] trait FileFetcher {
--- End diff --

Yeah, removed the trait.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19954: [SPARK-22757][Kubernetes] Enable use of remote de...

2017-12-15 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19954#discussion_r157327721
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -133,30 +132,78 @@ private[spark] object Config extends Logging {
 
   val JARS_DOWNLOAD_LOCATION =
 ConfigBuilder("spark.kubernetes.mountDependencies.jarsDownloadDir")
-  .doc("Location to download jars to in the driver and executors. When 
using" +
-" spark-submit, this directory must be empty and will be mounted 
as an empty directory" +
-" volume on the driver and executor pod.")
+  .doc("Location to download jars to in the driver and executors. When 
using " +
+"spark-submit, this directory must be empty and will be mounted as 
an empty directory " +
+"volume on the driver and executor pod.")
   .stringConf
   .createWithDefault("/var/spark-data/spark-jars")
 
   val FILES_DOWNLOAD_LOCATION =
 ConfigBuilder("spark.kubernetes.mountDependencies.filesDownloadDir")
-  .doc("Location to download files to in the driver and executors. 
When using" +
-" spark-submit, this directory must be empty and will be mounted 
as an empty directory" +
-" volume on the driver and executor pods.")
+  .doc("Location to download files to in the driver and executors. 
When using " +
+"spark-submit, this directory must be empty and will be mounted as 
an empty directory " +
+"volume on the driver and executor pods.")
   .stringConf
   .createWithDefault("/var/spark-data/spark-files")
 
+  val INIT_CONTAINER_DOCKER_IMAGE =
+ConfigBuilder("spark.kubernetes.initContainer.docker.image")
--- End diff --

>  Is it a required config?

No, as one may forgo the init container if they're building the deps into 
the docker image itself and supplying it via `local:///` paths.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19954: [SPARK-22757][Kubernetes] Enable use of remote de...

2017-12-15 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19954#discussion_r157327642
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/rest/k8s/KubernetesSparkDependencyDownloadInitContainer.scala
 ---
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.rest.k8s
+
+import java.io.File
+import java.util.concurrent.TimeUnit
+
+import scala.concurrent.{ExecutionContext, Future}
+import scala.concurrent.duration.Duration
+
+import org.apache.spark.{SecurityManager => SparkSecurityManager, 
SparkConf}
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.{ThreadUtils, Utils}
+
+/**
+ * Process that fetches files from a resource staging server and/or 
arbitrary remote locations.
+ *
+ * The init-container can handle fetching files from any of those sources, 
but not all of the
+ * sources need to be specified. This allows for composing multiple 
instances of this container
+ * with different configurations for different download sources, or using 
the same container to
+ * download everything at once.
+ */
+private[spark] class KubernetesSparkDependencyDownloadInitContainer(
+sparkConf: SparkConf,
+fileFetcher: FileFetcher) extends Logging {
+
+  private implicit val downloadExecutor = 
ExecutionContext.fromExecutorService(
+ThreadUtils.newDaemonCachedThreadPool("download-executor"))
+
+  private val jarsDownloadDir = new File(
+sparkConf.get(JARS_DOWNLOAD_LOCATION))
+  private val filesDownloadDir = new File(
+sparkConf.get(FILES_DOWNLOAD_LOCATION))
+
+  private val remoteJars = sparkConf.get(INIT_CONTAINER_REMOTE_JARS)
+  private val remoteFiles = sparkConf.get(INIT_CONTAINER_REMOTE_FILES)
+
+  private val downloadTimeoutMinutes = 
sparkConf.get(INIT_CONTAINER_MOUNT_TIMEOUT)
+
+  def run(): Unit = {
+val remoteJarsDownload = Future[Unit] {
+  logInfo(s"Downloading remote jars: $remoteJars")
+  downloadFiles(
+remoteJars,
+jarsDownloadDir,
+s"Remote jars download directory specified at $jarsDownloadDir 
does not exist " +
+  "or is not a directory.")
+}
+val remoteFilesDownload = Future[Unit] {
+  logInfo(s"Downloading remote files: $remoteFiles")
+  downloadFiles(
+remoteFiles,
+filesDownloadDir,
+s"Remote files download directory specified at $filesDownloadDir 
does not exist " +
+  "or is not a directory.")
+}
+waitForFutures(
--- End diff --

Sure, but that's not my point. If you have 10 jars and 10 files to 
download, the current code will only download 2 at a time. If you submit each 
jar / file separately, you'll download as many as your thread pool allows, and 
you can make that configurable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >