[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19861 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19861 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84995/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19861 **[Test build #84995 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84995/testReport)** for PR 19861 at commit [`5292329`](https://github.com/apache/spark/commit/52923296a946ac734c988fe10725921ea3c2b313). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19998 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84993/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...
Github user yashs360 commented on the issue: https://github.com/apache/spark/pull/18029 Hi @brkyvz Thinking on these lines, Adding them as Java objects adds more complexity to our design. We again have to think about making the objects singleton and thread safe. The Scala case class were very simple and minimal. This is how we would have to implement the java classes for the Initial positions. It looks a bit unclean to me. Thoughts ? ``` abstract class InitialPosition { public static final InitialPositionInStream initialPositionInStream = InitialPositionInStream.LATEST; } class Latest extends InitialPosition { private static final Latest instance = new Latest(); static final InitialPositionInStream initialPositionInStream = InitialPositionInStream.LATEST; private Latest(){} public static InitialPosition getInstance() { return instance; } } class TrimHorizon extends InitialPosition { private static final TrimHorizon instance = new TrimHorizon(); static final InitialPositionInStream initialPositionInStream = InitialPositionInStream.TRIM_HORIZON; private TrimHorizon(){} public static InitialPosition getInstance() { return instance; } } class AtTimestamp extends InitialPosition { static final InitialPositionInStream initialPositionInStream = InitialPositionInStream.AT_TIMESTAMP; Date timestamp; private AtTimestamp(Date timestamp){ this.timestamp = timestamp; } public static InitialPosition getInstance(Date timestamp) { return new AtTimestamp(timestamp); } } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19998 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19998 **[Test build #84993 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84993/testReport)** for PR 19998 at commit [`964e5ff`](https://github.com/apache/spark/commit/964e5ff22cefe336cd47d3a9309a8d1428b476b6). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19999: JDBC support date/timestamp type as partitionColumn
Github user morenn520 commented on the issue: https://github.com/apache/spark/pull/1 @gatorsmile we fix it in SPARK 1.6.2, and take in use for two month. For further reason, I give one pr on master branch. I will test it next week. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19999: JDBC support date/timestamp type as partitionColumn
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/1 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19999: JDBC support date/timestamp type as partitionColumn
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/1 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84998/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19999: JDBC support date/timestamp type as partitionColumn
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/1 **[Test build #84998 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84998/testReport)** for PR 1 at commit [`d1d310c`](https://github.com/apache/spark/commit/d1d310c6df782830378083d4bd80762591ba867e). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19999: JDBC support date/timestamp type as partitionColumn
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/1 The support is interesting, but the current impl is not clean. cc @dongjoon-hyun Could you help reviewing this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19999: JDBC support date/timestamp type as partitionColumn
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/1 **[Test build #84998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84998/testReport)** for PR 1 at commit [`d1d310c`](https://github.com/apache/spark/commit/d1d310c6df782830378083d4bd80762591ba867e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19999: JDBC support date/timestamp type as partitionColumn
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/1 Please update the PR title --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19998 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19998 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84992/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19999: JDBC support date/timestamp type as partitionColumn
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/1 Could you write a test case? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19999: JDBC support date/timestamp type as partitionColumn
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/1 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19998 **[Test build #84992 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84992/testReport)** for PR 19998 at commit [`b384336`](https://github.com/apache/spark/commit/b384336d9b71b992ce6478b56378b7b1cabdbd3c). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19999: JDBC support date/timestamp type as partitionColumn
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/1 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19999: JDBC support date/timestamp type as partitionColu...
GitHub user morenn520 opened a pull request: https://github.com/apache/spark/pull/1 JDBC support date/timestamp type as partitionColumn Jira: https://issues.apache.org/jira/browse/SPARK-22814 PartitionColumn must be a numeric column from the table. However, there are lots of table, which has no primary key, and has some date/timestamp indexes. This patch solve this problem. You can merge this pull request into a Git repository by running: $ git pull https://github.com/morenn520/spark SPARK-22814 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1 commit d1d310c6df782830378083d4bd80762591ba867e Author: Chen YuechenDate: 2017-12-16T06:26:57Z JDBC support date/timestamp type as partitionColumn --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19594 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84991/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19594 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19594 **[Test build #84991 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84991/testReport)** for PR 19594 at commit [`2637429`](https://github.com/apache/spark/commit/263742914e21ba607904acb0ad35ced32aad48ab). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19998 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19998 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84990/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19998 **[Test build #84990 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84990/testReport)** for PR 19998 at commit [`969bc22`](https://github.com/apache/spark/commit/969bc227f255d721044e057da633c5f2becca2af). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19998 **[Test build #84997 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84997/testReport)** for PR 19998 at commit [`6c29a11`](https://github.com/apache/spark/commit/6c29a11e6f08a83cd10eaeda3240b49f15aea07b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof i...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19998#discussion_r157334828 --- Diff: dev/run-tests.py --- @@ -253,9 +253,11 @@ def kill_zinc_on_port(zinc_port): """ Kill the Zinc process running on the given port, if one exists. """ -cmd = ("/usr/sbin/lsof -P |grep %s | grep LISTEN " - "| awk '{ print $2; }' | xargs kill") % zinc_port -subprocess.check_call(cmd, shell=True) +cmd = "%s -P |grep %s | grep LISTEN | awk '{ print $2; }' | xargs kill" --- End diff -- Great catch, I also check it. ``` >>> print(which("lsof")) /usr/bin/lsof >>> % ls /usr/bin/lsof /usr/sbin/lsof ls: cannot access '/usr/sbin/lsof': No such file or directory /usr/bin/lsof ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19995 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19995 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84986/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19995 **[Test build #84986 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84986/testReport)** for PR 19995 at commit [`b3e1af3`](https://github.com/apache/spark/commit/b3e1af3b3f4efad820dad9e989c580c74654390f). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19977 **[Test build #84996 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84996/testReport)** for PR 19977 at commit [`2d3926e`](https://github.com/apache/spark/commit/2d3926e546b3aa60c46449151706941ed17e2441). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19977 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84987/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19977 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/19977 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19977 **[Test build #84987 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84987/testReport)** for PR 19977 at commit [`2d3926e`](https://github.com/apache/spark/commit/2d3926e546b3aa60c46449151706941ed17e2441). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` class FunctionArgumentConversion(conf: SQLConf) extends TypeCoercionRule ` * `case class Concat(children: Seq[Expression], isBinaryMode: Boolean = false)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19594 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19594 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84989/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19594 **[Test build #84989 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84989/testReport)** for PR 19594 at commit [`2a4ee99`](https://github.com/apache/spark/commit/2a4ee99526c654834f3a50ef66e674bda673f926). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19861 **[Test build #84995 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84995/testReport)** for PR 19861 at commit [`5292329`](https://github.com/apache/spark/commit/52923296a946ac734c988fe10725921ea3c2b313). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19861 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19861 Hm .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19861 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19861 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84988/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19861 **[Test build #84988 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84988/testReport)** for PR 19861 at commit [`5292329`](https://github.com/apache/spark/commit/52923296a946ac734c988fe10725921ea3c2b313). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof i...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19998#discussion_r157333856 --- Diff: dev/run-tests.py --- @@ -253,9 +253,11 @@ def kill_zinc_on_port(zinc_port): """ Kill the Zinc process running on the given port, if one exists. """ -cmd = ("/usr/sbin/lsof -P |grep %s | grep LISTEN " - "| awk '{ print $2; }' | xargs kill") % zinc_port -subprocess.check_call(cmd, shell=True) +cmd = "%s -P |grep %s | grep LISTEN | awk '{ print $2; }' | xargs kill" --- End diff -- Ah, @kiszk, I think we can actually use `sparktestsupport.shellutils.which("...")` too like what we do for java: https://github.com/apache/spark/blob/964e5ff22cefe336cd47d3a9309a8d1428b476b6/dev/run-tests.py#L153 So, like .. ```python cmd = "%s -P |grep %s | grep LISTEN | awk '{ print $2; }' | xargs kill" lsof_exe = which("lsof") subprocess.check_call(cmd % (lsof_exe if lsof_exe else "/usr/sbin/lsof", zinc_port), shell=True) ``` I just double checked: ``` >>> lsof_exe = which("lsof") >>> cmd % (lsof_exe if lsof_exe else "/usr/sbin/lsof", zinc_port) "/usr/sbin/lsof -P |grep 1234 | grep LISTEN | awk '{ print $2; }' | xargs kill" >>> lsof_exe = which("lsof") >>> cmd % (lsof_exe if lsof_exe else "/usr/bin/lsof", zinc_port) "/usr/sbin/lsof -P |grep 1234 | grep LISTEN | awk '{ print $2; }' | xargs kill" ``` ``` >>> lsof_exe = which("foo") >>> cmd % (lsof_exe if lsof_exe else "/usr/sbin/lsof", zinc_port) "/usr/sbin/lsof -P |grep 1234 | grep LISTEN | awk '{ print $2; }' | xargs kill" >>> lsof_exe = which("bar") >>> cmd % (lsof_exe if lsof_exe else "/usr/bin/lsof", zinc_port) "/usr/bin/lsof -P |grep 1234 | grep LISTEN | awk '{ print $2; }' | xargs kill" ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19981 **[Test build #84994 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84994/testReport)** for PR 19981 at commit [`bc300f9`](https://github.com/apache/spark/commit/bc300f9a31a351f8630c9b9b189f5b499fd858a1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/19981 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19981: [SPARK-22786][SQL] only use AppStatusPlugin in hi...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19981#discussion_r15724 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala --- @@ -82,6 +82,19 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging { */ val cacheManager: CacheManager = new CacheManager + /** + * A status store to query SQL status/metrics of this Spark application, based on SQL-specific + * [[org.apache.spark.scheduler.SparkListenerEvent]]s. + */ + val statusStore: SQLAppStatusStore = { --- End diff -- Sure, it's fine if you want to expose it. But I'm pointing out that it's pretty weird to expose a class in a ".internal" package through the API. Those are not documented nor go through mima checks, so there's absolutely zero guarantees about them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19998 **[Test build #84993 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84993/testReport)** for PR 19998 at commit [`964e5ff`](https://github.com/apache/spark/commit/964e5ff22cefe336cd47d3a9309a8d1428b476b6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof i...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19998#discussion_r157333278 --- Diff: dev/run-tests.py --- @@ -253,9 +253,11 @@ def kill_zinc_on_port(zinc_port): """ Kill the Zinc process running on the given port, if one exists. """ -cmd = ("/usr/sbin/lsof -P |grep %s | grep LISTEN " - "| awk '{ print $2; }' | xargs kill") % zinc_port -subprocess.check_call(cmd, shell=True) +cmd = "%s -P |grep %s | grep LISTEN | awk '{ print $2; }' | xargs kill" +try: +subprocess.check_call(cmd % ("lsof", zinc_port), shell=True) +except: +subprocess.call(cmd % ("/usr/sbin/lsof", zinc_port), shell=True) --- End diff -- I see. Since this change is not strong preference, I will revert this change to keep the original behavior. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19954 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84985/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19954 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19954 **[Test build #84985 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84985/testReport)** for PR 19954 at commit [`46a8c99`](https://github.com/apache/spark/commit/46a8c9961312ee820743ddf893cc8666ce9360fa). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof i...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19998#discussion_r157333189 --- Diff: dev/run-tests.py --- @@ -253,9 +253,11 @@ def kill_zinc_on_port(zinc_port): """ Kill the Zinc process running on the given port, if one exists. """ -cmd = ("/usr/sbin/lsof -P |grep %s | grep LISTEN " - "| awk '{ print $2; }' | xargs kill") % zinc_port -subprocess.check_call(cmd, shell=True) +cmd = "%s -P |grep %s | grep LISTEN | awk '{ print $2; }' | xargs kill" +try: +subprocess.check_call(cmd % ("lsof", zinc_port), shell=True) +except: +subprocess.call(cmd % ("/usr/sbin/lsof", zinc_port), shell=True) --- End diff -- Hm, but it changes what originally `kill_zinc_on_port` does though because now it is not guaranteed to kill it. I see the point but let's stick to the original behaviour. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof i...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19998#discussion_r157333050 --- Diff: dev/run-tests.py --- @@ -253,9 +253,11 @@ def kill_zinc_on_port(zinc_port): """ Kill the Zinc process running on the given port, if one exists. """ -cmd = ("/usr/sbin/lsof -P |grep %s | grep LISTEN " - "| awk '{ print $2; }' | xargs kill") % zinc_port -subprocess.check_call(cmd, shell=True) +cmd = "%s -P |grep %s | grep LISTEN | awk '{ print $2; }' | xargs kill" +try: +subprocess.check_call(cmd % ("lsof", zinc_port), shell=True) +except: +subprocess.call(cmd % ("/usr/sbin/lsof", zinc_port), shell=True) --- End diff -- I intentionally use `subprocess.call` to continue the execution even if `lsof` and `/usr/sbin/lsof` do not exist. This is because it is ok for other steps if we fail to kill `zinc`. WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof i...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19998#discussion_r157332995 --- Diff: dev/run-tests.py --- @@ -253,9 +253,11 @@ def kill_zinc_on_port(zinc_port): """ Kill the Zinc process running on the given port, if one exists. """ -cmd = ("/usr/sbin/lsof -P |grep %s | grep LISTEN " - "| awk '{ print $2; }' | xargs kill") % zinc_port -subprocess.check_call(cmd, shell=True) +cmd = "%s -P |grep %s | grep LISTEN | awk '{ print $2; }' | xargs kill" +try: +subprocess.check_call(cmd % ("lsof", zinc_port), shell=True) +except: +subprocess.call(cmd % ("/usr/sbin/lsof", zinc_port), shell=True) --- End diff -- Maybe, `subprocess.call` -> `subprocess.check_call`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19998 **[Test build #84992 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84992/testReport)** for PR 19998 at commit [`b384336`](https://github.com/apache/spark/commit/b384336d9b71b992ce6478b56378b7b1cabdbd3c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19955: [SPARK-21867][CORE] Support async spilling in UnsafeShuf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19955 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19955: [SPARK-21867][CORE] Support async spilling in UnsafeShuf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19955 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84983/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19955: [SPARK-21867][CORE] Support async spilling in UnsafeShuf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19955 **[Test build #84983 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84983/testReport)** for PR 19955 at commit [`59e7720`](https://github.com/apache/spark/commit/59e7720cf0895d4359decdee57eec6fc11bc2fe0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class MultiShuffleSorter extends ShuffleSorter ` * `public class ShuffleExternalSorter extends ShuffleSorter ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19594 **[Test build #84991 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84991/testReport)** for PR 19594 at commit [`2637429`](https://github.com/apache/spark/commit/263742914e21ba607904acb0ad35ced32aad48ab). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19996: [MINOR][DOC] Fix the link of 'Getting Started'
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19996 Oh, maybe it's not quite related but do you mind if I ask to fix the below too? https://github.com/apache/spark/blob/ccdf21f56e4ff5497d7770dcbee2f7a60bb9e3a7/docs/sql-programming-guide.md#L501-L504 to (just adding a newline) ``` ### Run SQL on files directly ``` because it currently breaks doc rendering as below: https://user-images.githubusercontent.com/6477701/31481516-cd9ddb80-af5e-11e7-970b-d2c279f025d4.png; width="200" /> --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19984: [SPARK-22789] Map-only continuous processing execution
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19984 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84981/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19984: [SPARK-22789] Map-only continuous processing execution
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19984 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19984: [SPARK-22789] Map-only continuous processing execution
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19984 **[Test build #84981 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84981/testReport)** for PR 19984 at commit [`f50488c`](https://github.com/apache/spark/commit/f50488cf94ab015019e99d187b54ab922e4ca6c2). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof i...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19998#discussion_r157332103 --- Diff: dev/run-tests.py --- @@ -253,9 +253,14 @@ def kill_zinc_on_port(zinc_port): """ Kill the Zinc process running on the given port, if one exists. """ -cmd = ("/usr/sbin/lsof -P |grep %s | grep LISTEN " - "| awk '{ print $2; }' | xargs kill") % zinc_port -subprocess.check_call(cmd, shell=True) +try: +cmd = ("lsof -P |grep %s | grep LISTEN " + "| awk '{ print $2; }' | xargs kill") % zinc_port +subprocess.check_call(cmd, shell=True) +except: --- End diff -- Yes, if the command does not exist, an exception occurs. Thus, we can execute one of the two cases. Yea, to use `cmd` is fine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof i...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19998#discussion_r157331922 --- Diff: dev/run-tests.py --- @@ -253,9 +253,14 @@ def kill_zinc_on_port(zinc_port): """ Kill the Zinc process running on the given port, if one exists. """ -cmd = ("/usr/sbin/lsof -P |grep %s | grep LISTEN " - "| awk '{ print $2; }' | xargs kill") % zinc_port -subprocess.check_call(cmd, shell=True) +try: +cmd = ("lsof -P |grep %s | grep LISTEN " + "| awk '{ print $2; }' | xargs kill") % zinc_port +subprocess.check_call(cmd, shell=True) +except: --- End diff -- Could we catch the explicit exception? Also, I think we could this like: ```python cmd = "%s -P |grep %s | grep LISTEN | awk '{ print $2; }' | xargs kill" ... lsof = "lsof" subprocess.check_call(cmd % (lsof zinc_port), shell=True) ... lsof = "/usr/sbin/lsof" subprocess.check_call(cmd % (lsof zinc_port), shell=True) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19998 @srowen @HyukjinKwon could you please review this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19998: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19998 **[Test build #84990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84990/testReport)** for PR 19998 at commit [`969bc22`](https://github.com/apache/spark/commit/969bc227f255d721044e057da633c5f2becca2af). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19594#discussion_r157331840 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +115,183 @@ object EstimationUtils { } } + /** + * Returns overlapped ranges between two histograms, in the given value range [newMin, newMax]. + */ + def getOverlappedRanges( + leftHistogram: Histogram, + rightHistogram: Histogram, + newMin: Double, + newMax: Double): Seq[OverlappedRange] = { +val overlappedRanges = new ArrayBuffer[OverlappedRange]() +// Only bins whose range intersect [newMin, newMax] have join possibility. +val leftBins = leftHistogram.bins + .filter(b => b.lo <= newMax && b.hi >= newMin) +val rightBins = rightHistogram.bins + .filter(b => b.lo <= newMax && b.hi >= newMin) + +leftBins.foreach { lb => + rightBins.foreach { rb => --- End diff -- We only collect `OverlappedRange` when [left part and right part intersect](https://github.com/apache/spark/pull/19594/files#diff-56eed9f23127c954d9add0f6c5c93820R237), and the decision is based on some computation, it's not very convenient to use it as guards. So it seems `yield` form is not very suitable for this case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19998: [SPARK-22377][BUILD] Use lsof or /usr/sbin/lsof i...
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/19998 [SPARK-22377][BUILD] Use lsof or /usr/sbin/lsof in run-tests.py ## What changes were proposed in this pull request? In [the environment where `/usr/sbin/lsof` does not exist](https://github.com/apache/spark/pull/19695#issuecomment-342865001), `./dev/run-tests.py` for `maven` causes the following error. This is because the current `./dev/run-tests.py` checks existence of only `/usr/sbin/lsof` and aborts immediately if it does not exist. This PR changes as follows: 1. Check whether `lsof` or `/usr/sbin/lsof` exists 2. Go forward if both of them do not exist ``` /bin/sh: 1: /usr/sbin/lsof: not found Usage: kill [options] [...] Options: [...]send signal to every listed -, -s, --signal specify the to be sent -l, --list=[] list all signal names, or convert one to a name -L, --tablelist all signal names in a nice table -h, --help display this help and exit -V, --version output version information and exit For more details see kill(1). Traceback (most recent call last): File "./dev/run-tests.py", line 626, in main() File "./dev/run-tests.py", line 597, in main build_apache_spark(build_tool, hadoop_version) File "./dev/run-tests.py", line 389, in build_apache_spark build_spark_maven(hadoop_version) File "./dev/run-tests.py", line 329, in build_spark_maven exec_maven(profiles_and_goals) File "./dev/run-tests.py", line 270, in exec_maven kill_zinc_on_port(zinc_port) File "./dev/run-tests.py", line 258, in kill_zinc_on_port subprocess.check_call(cmd, shell=True) File "/usr/lib/python2.7/subprocess.py", line 541, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/usr/sbin/lsof -P |grep 3156 | grep LISTEN | awk '{ print $2; }' | xargs kill' returned non-zero exit status 123 ``` ## How was this patch tested? manually tested You can merge this pull request into a Git repository by running: $ git pull https://github.com/kiszk/spark SPARK-22813 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19998.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19998 commit 969bc227f255d721044e057da633c5f2becca2af Author: Kazuaki IshizakiDate: 2017-12-16T02:14:14Z initial commit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19594 **[Test build #84989 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84989/testReport)** for PR 19594 at commit [`2a4ee99`](https://github.com/apache/spark/commit/2a4ee99526c654834f3a50ef66e674bda673f926). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19594#discussion_r157331711 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala --- @@ -191,8 +191,16 @@ case class JoinEstimation(join: Join) extends Logging { val rInterval = ValueInterval(rightKeyStat.min, rightKeyStat.max, rightKey.dataType) if (ValueInterval.isIntersected(lInterval, rInterval)) { val (newMin, newMax) = ValueInterval.intersect(lInterval, rInterval, leftKey.dataType) -val (card, joinStat) = computeByNdv(leftKey, rightKey, newMin, newMax) -keyStatsAfterJoin += (leftKey -> joinStat, rightKey -> joinStat) +val (card, joinStat) = (leftKeyStat.histogram, rightKeyStat.histogram) match { + case (Some(l: Histogram), Some(r: Histogram)) => +computeByEquiHeightHistogram(leftKey, rightKey, l, r, newMin, newMax) + case _ => +computeByNdv(leftKey, rightKey, newMin, newMax) +} +keyStatsAfterJoin += ( + leftKey -> joinStat.copy(histogram = leftKeyStat.histogram), + rightKey -> joinStat.copy(histogram = rightKeyStat.histogram) --- End diff -- ah right, we can keep it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19981: [SPARK-22786][SQL] only use AppStatusPlugin in hi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19981#discussion_r157331435 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala --- @@ -82,6 +82,19 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging { */ val cacheManager: CacheManager = new CacheManager + /** + * A status store to query SQL status/metrics of this Spark application, based on SQL-specific + * [[org.apache.spark.scheduler.SparkListenerEvent]]s. + */ + val statusStore: SQLAppStatusStore = { --- End diff -- at least it's developer-facing, as a developer I don't care about the naming changing, or API changing, but I just want the same functionality. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19995 Seems it's related with https://github.com/apache/spark/commit/e58f275678fb4f904124a4a2a1762f04c835eb0e somehow and then fine back now. I am not yet entirely sure how this change relates to CRAN check. Will take a look soon. Some related discussions - `https://github.com/apache/spark/pull/19721`, `https://github.com/apache/spark/pull/19944`, `https://github.com/apache/spark/pull/19957` and `https://github.com/apache/spark/pull/19961` in an order. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19681: [SPARK-20652][sql] Store SQL UI data in the new a...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19681#discussion_r157331270 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLListenerSuite.scala --- @@ -36,13 +36,14 @@ import org.apache.spark.sql.catalyst.util.quietly import org.apache.spark.sql.execution.{LeafExecNode, QueryExecution, SparkPlanInfo, SQLExecution} import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics} import org.apache.spark.sql.test.SharedSQLContext -import org.apache.spark.ui.SparkUI +import org.apache.spark.status.config._ import org.apache.spark.util.{AccumulatorMetadata, JsonProtocol, LongAccumulator} - +import org.apache.spark.util.kvstore.InMemoryStore class SQLListenerSuite extends SparkFunSuite with SharedSQLContext with JsonTestUtils { import testImplicits._ - import org.apache.spark.AccumulatorSuite.makeInfo + + override protected def sparkConf = super.sparkConf.set(LIVE_ENTITY_UPDATE_PERIOD, 0L) --- End diff -- ah you are right, it's only shared in hive tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19861 **[Test build #84988 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84988/testReport)** for PR 19861 at commit [`5292329`](https://github.com/apache/spark/commit/52923296a946ac734c988fe10725921ea3c2b313). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19977 **[Test build #84987 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84987/testReport)** for PR 19977 at commit [`2d3926e`](https://github.com/apache/spark/commit/2d3926e546b3aa60c46449151706941ed17e2441). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19997: [SPARK-22811][pyspark][ml] Fix pyspark.ml.tests f...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19997 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19995 The R tests are pretty flaky recently, any ideas @HyukjinKwon ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19997: [SPARK-22811][pyspark][ml] Fix pyspark.ml.tests failure ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19997 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19861 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19995 **[Test build #84986 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84986/testReport)** for PR 19995 at commit [`b3e1af3`](https://github.com/apache/spark/commit/b3e1af3b3f4efad820dad9e989c580c74654390f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19995 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19997: [SPARK-22811][pyspark][ml] Fix pyspark.ml.tests f...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19997#discussion_r157330796 --- Diff: python/pyspark/ml/tests.py --- @@ -44,6 +44,7 @@ import numpy as np from numpy import abs, all, arange, array, array_equal, inf, ones, tile, zeros import inspect +import py4j --- End diff -- Ah, it was my bad. Yup, you are right. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...
Github user foxish commented on the issue: https://github.com/apache/spark/pull/19995 @ueshin @vanzin SparkR failure seems unrelated to me. Any ideas? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19995 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84979/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19995 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19995 **[Test build #84979 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84979/testReport)** for PR 19995 at commit [`b3e1af3`](https://github.com/apache/spark/commit/b3e1af3b3f4efad820dad9e989c580c74654390f). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19995: [SPARK-22807] [Scheduler] Remove config that says docker...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19995 LGTM pending tests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19954: [SPARK-22757][Kubernetes] Enable use of remote de...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19954#discussion_r157328425 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/rest/k8s/KubernetesSparkDependencyDownloadInitContainer.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.rest.k8s + +import java.io.File +import java.util.concurrent.TimeUnit + +import scala.concurrent.{ExecutionContext, Future} +import scala.concurrent.duration.Duration + +import org.apache.spark.{SecurityManager => SparkSecurityManager, SparkConf} +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.internal.Logging +import org.apache.spark.util.{ThreadUtils, Utils} + +/** + * Process that fetches files from a resource staging server and/or arbitrary remote locations. + * + * The init-container can handle fetching files from any of those sources, but not all of the + * sources need to be specified. This allows for composing multiple instances of this container + * with different configurations for different download sources, or using the same container to + * download everything at once. + */ +private[spark] class KubernetesSparkDependencyDownloadInitContainer( +sparkConf: SparkConf, +fileFetcher: FileFetcher) extends Logging { + + private implicit val downloadExecutor = ExecutionContext.fromExecutorService( +ThreadUtils.newDaemonCachedThreadPool("download-executor")) + + private val jarsDownloadDir = new File( +sparkConf.get(JARS_DOWNLOAD_LOCATION)) + private val filesDownloadDir = new File( +sparkConf.get(FILES_DOWNLOAD_LOCATION)) + + private val remoteJars = sparkConf.get(INIT_CONTAINER_REMOTE_JARS) + private val remoteFiles = sparkConf.get(INIT_CONTAINER_REMOTE_FILES) + + private val downloadTimeoutMinutes = sparkConf.get(INIT_CONTAINER_MOUNT_TIMEOUT) + + def run(): Unit = { +val remoteJarsDownload = Future[Unit] { + logInfo(s"Downloading remote jars: $remoteJars") + downloadFiles( +remoteJars, +jarsDownloadDir, +s"Remote jars download directory specified at $jarsDownloadDir does not exist " + + "or is not a directory.") +} +val remoteFilesDownload = Future[Unit] { + logInfo(s"Downloading remote files: $remoteFiles") + downloadFiles( +remoteFiles, +filesDownloadDir, +s"Remote files download directory specified at $filesDownloadDir does not exist " + + "or is not a directory.") +} +waitForFutures( --- End diff -- Got it, will address this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19954: [SPARK-22757][Kubernetes] Enable use of remote de...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19954#discussion_r157328327 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -133,30 +132,78 @@ private[spark] object Config extends Logging { val JARS_DOWNLOAD_LOCATION = ConfigBuilder("spark.kubernetes.mountDependencies.jarsDownloadDir") - .doc("Location to download jars to in the driver and executors. When using" + -" spark-submit, this directory must be empty and will be mounted as an empty directory" + -" volume on the driver and executor pod.") + .doc("Location to download jars to in the driver and executors. When using " + +"spark-submit, this directory must be empty and will be mounted as an empty directory " + +"volume on the driver and executor pod.") .stringConf .createWithDefault("/var/spark-data/spark-jars") val FILES_DOWNLOAD_LOCATION = ConfigBuilder("spark.kubernetes.mountDependencies.filesDownloadDir") - .doc("Location to download files to in the driver and executors. When using" + -" spark-submit, this directory must be empty and will be mounted as an empty directory" + -" volume on the driver and executor pods.") + .doc("Location to download files to in the driver and executors. When using " + +"spark-submit, this directory must be empty and will be mounted as an empty directory " + +"volume on the driver and executor pods.") .stringConf .createWithDefault("/var/spark-data/spark-files") + val INIT_CONTAINER_DOCKER_IMAGE = +ConfigBuilder("spark.kubernetes.initContainer.docker.image") --- End diff -- Renamed to `spark.kubernetes.initContainer.image`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nested expr...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19813 @mgaido91 Thanks for the comment. I agreed that to make the contract is the easiest way. If we don't make this contract, seems to me a significant change is needed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19954 **[Test build #84985 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84985/testReport)** for PR 19954 at commit [`46a8c99`](https://github.com/apache/spark/commit/46a8c9961312ee820743ddf893cc8666ce9360fa). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...
Github user liyinan926 commented on the issue: https://github.com/apache/spark/pull/19954 @vanzin Addressed your comments so far. PTAL. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19954: [SPARK-22757][Kubernetes] Enable use of remote de...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19954#discussion_r157327812 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/rest/k8s/FileFetcher.scala --- @@ -0,0 +1,27 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.rest.k8s + +import java.io.File + +/** + * Utility for fetching remote file dependencies. + */ +private[spark] trait FileFetcher { --- End diff -- Yeah, removed the trait. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19954: [SPARK-22757][Kubernetes] Enable use of remote de...
Github user foxish commented on a diff in the pull request: https://github.com/apache/spark/pull/19954#discussion_r157327721 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -133,30 +132,78 @@ private[spark] object Config extends Logging { val JARS_DOWNLOAD_LOCATION = ConfigBuilder("spark.kubernetes.mountDependencies.jarsDownloadDir") - .doc("Location to download jars to in the driver and executors. When using" + -" spark-submit, this directory must be empty and will be mounted as an empty directory" + -" volume on the driver and executor pod.") + .doc("Location to download jars to in the driver and executors. When using " + +"spark-submit, this directory must be empty and will be mounted as an empty directory " + +"volume on the driver and executor pod.") .stringConf .createWithDefault("/var/spark-data/spark-jars") val FILES_DOWNLOAD_LOCATION = ConfigBuilder("spark.kubernetes.mountDependencies.filesDownloadDir") - .doc("Location to download files to in the driver and executors. When using" + -" spark-submit, this directory must be empty and will be mounted as an empty directory" + -" volume on the driver and executor pods.") + .doc("Location to download files to in the driver and executors. When using " + +"spark-submit, this directory must be empty and will be mounted as an empty directory " + +"volume on the driver and executor pods.") .stringConf .createWithDefault("/var/spark-data/spark-files") + val INIT_CONTAINER_DOCKER_IMAGE = +ConfigBuilder("spark.kubernetes.initContainer.docker.image") --- End diff -- > Is it a required config? No, as one may forgo the init container if they're building the deps into the docker image itself and supplying it via `local:///` paths. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19954: [SPARK-22757][Kubernetes] Enable use of remote de...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19954#discussion_r157327642 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/rest/k8s/KubernetesSparkDependencyDownloadInitContainer.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.rest.k8s + +import java.io.File +import java.util.concurrent.TimeUnit + +import scala.concurrent.{ExecutionContext, Future} +import scala.concurrent.duration.Duration + +import org.apache.spark.{SecurityManager => SparkSecurityManager, SparkConf} +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.internal.Logging +import org.apache.spark.util.{ThreadUtils, Utils} + +/** + * Process that fetches files from a resource staging server and/or arbitrary remote locations. + * + * The init-container can handle fetching files from any of those sources, but not all of the + * sources need to be specified. This allows for composing multiple instances of this container + * with different configurations for different download sources, or using the same container to + * download everything at once. + */ +private[spark] class KubernetesSparkDependencyDownloadInitContainer( +sparkConf: SparkConf, +fileFetcher: FileFetcher) extends Logging { + + private implicit val downloadExecutor = ExecutionContext.fromExecutorService( +ThreadUtils.newDaemonCachedThreadPool("download-executor")) + + private val jarsDownloadDir = new File( +sparkConf.get(JARS_DOWNLOAD_LOCATION)) + private val filesDownloadDir = new File( +sparkConf.get(FILES_DOWNLOAD_LOCATION)) + + private val remoteJars = sparkConf.get(INIT_CONTAINER_REMOTE_JARS) + private val remoteFiles = sparkConf.get(INIT_CONTAINER_REMOTE_FILES) + + private val downloadTimeoutMinutes = sparkConf.get(INIT_CONTAINER_MOUNT_TIMEOUT) + + def run(): Unit = { +val remoteJarsDownload = Future[Unit] { + logInfo(s"Downloading remote jars: $remoteJars") + downloadFiles( +remoteJars, +jarsDownloadDir, +s"Remote jars download directory specified at $jarsDownloadDir does not exist " + + "or is not a directory.") +} +val remoteFilesDownload = Future[Unit] { + logInfo(s"Downloading remote files: $remoteFiles") + downloadFiles( +remoteFiles, +filesDownloadDir, +s"Remote files download directory specified at $filesDownloadDir does not exist " + + "or is not a directory.") +} +waitForFutures( --- End diff -- Sure, but that's not my point. If you have 10 jars and 10 files to download, the current code will only download 2 at a time. If you submit each jar / file separately, you'll download as many as your thread pool allows, and you can make that configurable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org