[GitHub] spark pull request: Fix: 'Create table ..as select ..from..order b...
GitHub user guowei2 opened a pull request: https://github.com/apache/spark/pull/3821 Fix: 'Create table ..as select ..from..order by .. limit 10' report error when one col is a Decimal You can merge this pull request into a Git repository by running: $ git pull https://github.com/guowei2/spark SPARK-4988 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3821.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3821 commit 1bab9e4b782e62485f01f4f650a54c5ccb86f2a1 Author: guowei2 guow...@asiainfo.com Date: 2014-12-29T07:57:51Z Fix: 'Create table ..as select ..from..order by .. limit 10' report error when one col is a Decimal --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix: 'Create table ..as select ..from..order b...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3821#issuecomment-68238731 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4985] [SQL] parquet support for date ty...
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/3822 [SPARK-4985] [SQL] parquet support for date type This PR might have some issues with #3732 , and this would have merge conflicts with #3820 so the review can be delayed till that 2 were merged. You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark parquetdate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3822.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3822 commit 0ebe356bceff169fe89134bed603a17514dc1108 Author: Daoyuan Wang daoyuan.w...@intel.com Date: 2014-12-29T07:59:37Z parquet support for date type --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4985] [SQL] parquet support for date ty...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3822#issuecomment-68238990 [Test build #24856 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24856/consoleFull) for PR 3822 at commit [`0ebe356`](https://github.com/apache/spark/commit/0ebe356bceff169fe89134bed603a17514dc1108). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3819#issuecomment-68239441 [Test build #24854 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24854/consoleFull) for PR 3819 at commit [`8fe74b0`](https://github.com/apache/spark/commit/8fe74b03e63e36d370ba61946181194b1f0c84a2). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3819#issuecomment-68239445 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24854/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4990]to find default properties file, s...
GitHub user WangTaoTheTonic opened a pull request: https://github.com/apache/spark/pull/3823 [SPARK-4990]to find default properties file, search SPARK_CONF_DIR first https://issues.apache.org/jira/browse/SPARK-4990 You can merge this pull request into a Git repository by running: $ git pull https://github.com/WangTaoTheTonic/spark SPARK-4990 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3823.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3823 commit c5a85eb37389f3c849129267fcef0dfa608d09c6 Author: WangTaoTheTonic barneystin...@aliyun.com Date: 2014-12-29T08:17:32Z to find default properties file, search SPARK_CONF_DIR first --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3823#issuecomment-68239625 [Test build #24857 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24857/consoleFull) for PR 3823 at commit [`c5a85eb`](https://github.com/apache/spark/commit/c5a85eb37389f3c849129267fcef0dfa608d09c6). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...
GitHub user liyezhang556520 opened a pull request: https://github.com/apache/spark/pull/3824 [SPARK-4989][CORE] avoid wrong eventlog conf cause cluster down in standalone mode when enabling eventlog in standalone mode, if give the wrong configuration, the standalone cluster will down (cause master restart, lose connection with workers). How to reproduce: just give an invalid value to spark.eventLog.dir, for example: spark.eventLog.dir=hdfs://tmp/logdir1, hdfs://tmp/logdir2. This will throw illegalArgumentException, which will cause the Master restart. And the whole cluster is not available. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liyezhang556520/spark wrongConf4Cluster Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3824.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3824 commit 5c1fa33799bc503ac1e2d5e9838e8e364bf1f61f Author: Zhang, Liye liye.zh...@intel.com Date: 2014-12-26T08:23:53Z cache exceptions when eventlog with wrong conf commit 12eee8590fb9899c267b29d3a129a169b6cf6ec1 Author: Zhang, Liye liye.zh...@intel.com Date: 2014-12-26T08:49:04Z add more message in log and on webUI --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3824#issuecomment-68239849 [Test build #24858 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24858/consoleFull) for PR 3824 at commit [`12eee85`](https://github.com/apache/spark/commit/12eee8590fb9899c267b29d3a129a169b6cf6ec1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an Arti...
Github user bgreeven commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-68241121 I have compared the ANN with Support Vector Machine (SVM) and Logistic Regression. I have tested using a master local(5) configuration, and applied the MNIST dataset, using 6 training examples and 1 test examples. Since SVM and Logistic Regression are binary classifiers, I applied two methods to convert them to a multinary classifier: majority vote and ad-hoc tree. For the majority vote, I trained 10 different models, each to distinguish a single class from the rest. The classification was done by looking at which model gives the highest positive output. I performed 100 iterations per class, leading to 1000 iterations in total. For ANN, I used a single hidden layer with 32 nodes (not counting the bias nodes). I performed 100 iterations. For LBFGS I used tolerance 1e-5. Because of the poor performance of SVM+SGD, I re-ran it with 1000 iterations per class (1 in total). The performance was similar. I found the following results for the test set: ``` Algorithm Accuracy Time# correct # incorrect +-+--+---+---+-+ | ANN (LBFGS) |95.1% | 665s | 9510 | 490 | +-+--+---+---+-+ | Logistic Regression (SGD) |72.0% | 1325s | 7202 | 2798 | +-+--+---+---+-+ | Logistic Regression (LBFGS) |86.6% | 1635s | 8658 | 1342 | +-+--+---+---+-+ | SVM (SGD) |18.6% | 1294s | 1860 | 8140 | +-+--+---+---+-+ | (SVM (SGD) 1000 iterations) |18.5% |12658s | 1850 | 8150 | +-+--+---+---+-+ | SVM (LBFGS) |86.2% | 1453s | 8622 | 1378 | +-+--+---+---+-+ ``` I also created an ad-hoc tree model. This separates the collection of training examples in two approximately equal size partitions, where I tried to separate the numbers based on how different they look. I continued with the two separated partitions, until each output class corresponded to a single number. The partioning choice was made manually and intuitively, as follows: 0123456789 - (04689, 12357) 04689 - (068, 49) 068 - (0, 68) 68 - (6, 8) 49 - (4, 9) 12357 - (17, 235) 17 - (1, 7) 235 - (2, 35) 35 - (3, 5) Notice that this leads to only nine classification runs, not ten as in the voting scheme. After training, I used the trained models to classify the test set. I got the following results (same parameters as with the voting scheme): ``` Algorithm Accuracy Time# correct # incorrect +-+--+---+---+-+ | ANN (LBFGS) |95.1% | 665s | 9510 | 490 | +-+--+---+---+-+ | Logistic Regression (SGD) |82.3% | 1146s | 8228 | 1772 | +-+--+---+---+-+ | Logistic Regression (LBFGS) |87.2% | 1273s | 8719 | 1281 | +-+--+---+---+-+ | SVM (SGD) |61.1% | 1148s | 6113 | 3887 | +-+--+---+---+-+ | SVM (LBFGS) |87.5% | 1182s | 8753 | 1247 | +-+--+---+---+-+ ``` Notice that I left ANN in the table because this is to compare ANN with other algorithms. Since ANN is a multinary classifier by nature, it didn't use the ad-hoc tree. It would be great if someone could verify of my results. I am particularly amazed of the low performance of SVM+SGD with voting, and the increase with the ad-hoc tree. I used the same code for SGD and LBFGS, and only changed the optimiser and related parameters. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at
[GitHub] spark pull request: Added Java serialization util functions back i...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3792#discussion_r22305806 --- Diff: network/common/src/main/java/org/apache/spark/network/util/JavaUtils.java --- @@ -41,6 +41,34 @@ public class JavaUtils { private static final Logger logger = LoggerFactory.getLogger(JavaUtils.class); + /** Deserialize a byte array using Java serialization. */ + public static T T deserialize(byte[] bytes) { +try { + ObjectInputStream is = new ObjectInputStream(new ByteArrayInputStream(bytes)); + Object out = is.readObject(); + is.close(); + return (T) out; +} catch (ClassNotFoundException e) { + throw new RuntimeException(Could not deserialize object, e); --- End diff -- I was thinking that you don't expect to not have the class on hand... But sure, IllegalArgumentException because the bytes describe something invalid? The principle is to avoid RuntimeException since it is the superclass of all unchecked exceptions. If you ever wanted to catch this exception to deal with it you'd have no hope of distinguishing with a catch block. So reach for another standard and slightly more specific exception. Marginal argument here, but I think still common good practice in Java. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...
GitHub user liyezhang556520 opened a pull request: https://github.com/apache/spark/pull/3825 [SPARK-4991][CORE] Worker should reconnect to Master when Master actor restart This is a following JIRA of [SPARK-4989](https://issues.apache.org/jira/browse/SPARK-4991). when Master akka actor encounter an exception, the Master will restart (akka actor restart not JVM restart). And all old information are cleared on Master (including workers, applications, etc). However, the workers are not aware of this at all. The state of the cluster is that: the master is on, and all workers are also on, but master is not aware of the exists of workers, and will ignore all worker's heartbeat because all workers are not registered. So that the whole cluster is not available. In this PR, master will tell worker the connection is disconnected, so that worker will register to master again. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liyezhang556520/spark workerReconn Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3825.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3825 commit 107e5c58fdbe143fe6eabcfdb5d91d7b1184bb35 Author: Zhang, Liye liye.zh...@intel.com Date: 2014-12-29T07:35:45Z worker reconnect to master when master restart for exception commit e9c99e3969f6e058e46d65575d796d1289351318 Author: Zhang, Liye liye.zh...@intel.com Date: 2014-12-29T08:51:50Z add log info --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68241590 [Test build #24859 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24859/consoleFull) for PR 3825 at commit [`e9c99e3`](https://github.com/apache/spark/commit/e9c99e3969f6e058e46d65575d796d1289351318). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-68241728 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24855/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-68241724 [Test build #24855 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24855/consoleFull) for PR 3820 at commit [`d44831a`](https://github.com/apache/spark/commit/d44831a2462b2c049b0222fbb7b8e08023d1f67c). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4921. TaskSetManager.dequeueTask returns...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3816#issuecomment-68241811 This seems like a fairly simple fix, but given that I don't 100% understand the discussion on SPARK-2294 / #1313, it might be good for @codingcat, @kayousterhout @mridulm, or @mateiz to take a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] enable view test
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/3826 [SQL] enable view test This is a follow up of #3396 , just add a test to white list. You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark viewtest Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3826.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3826 commit f105f68ef33381e272985866fb63a7e7775b76bb Author: Daoyuan Wang daoyuan.w...@intel.com Date: 2014-12-29T09:04:24Z enable view test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3824#discussion_r22306027 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -758,13 +760,14 @@ private[spark] class Master( // Event logging is enabled for this application, but no event logs are found val title = sApplication history not found (${app.id}) var msg = sNo event logs found for application $appName in $eventLogFile. -logWarning(msg) +val exception = URLEncoder.encode(Utils.exceptionString(fnf), UTF-8) +logWarning(msg, fnf) msg += Did you specify the correct logging directory? msg = URLEncoder.encode(msg, UTF-8) -app.desc.appUiUrl = notFoundBasePath + s?msg=$msgtitle=$title +app.desc.appUiUrl = notFoundBasePath + s?msg=$msgexception=$exceptiontitle=$title false case e: Exception = -// Relay exception message to application UI page +// Replay exception message to application UI page --- End diff -- The word `Relay` was correct here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3824#discussion_r22306063 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -758,13 +760,14 @@ private[spark] class Master( // Event logging is enabled for this application, but no event logs are found val title = sApplication history not found (${app.id}) var msg = sNo event logs found for application $appName in $eventLogFile. -logWarning(msg) +val exception = URLEncoder.encode(Utils.exceptionString(fnf), UTF-8) +logWarning(msg, fnf) msg += Did you specify the correct logging directory? msg = URLEncoder.encode(msg, UTF-8) -app.desc.appUiUrl = notFoundBasePath + s?msg=$msgtitle=$title +app.desc.appUiUrl = notFoundBasePath + s?msg=$msgexception=$exceptiontitle=$title --- End diff -- This will likely be too long in general to put in a URL. Did you add this URL param elsewhere? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] enable view test
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3826#issuecomment-68242119 [Test build #24860 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24860/consoleFull) for PR 3826 at commit [`f105f68`](https://github.com/apache/spark/commit/f105f68ef33381e272985866fb63a7e7775b76bb). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/3824#discussion_r22306099 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -758,13 +760,14 @@ private[spark] class Master( // Event logging is enabled for this application, but no event logs are found val title = sApplication history not found (${app.id}) var msg = sNo event logs found for application $appName in $eventLogFile. -logWarning(msg) +val exception = URLEncoder.encode(Utils.exceptionString(fnf), UTF-8) +logWarning(msg, fnf) msg += Did you specify the correct logging directory? msg = URLEncoder.encode(msg, UTF-8) -app.desc.appUiUrl = notFoundBasePath + s?msg=$msgtitle=$title +app.desc.appUiUrl = notFoundBasePath + s?msg=$msgexception=$exceptiontitle=$title false case e: Exception = -// Relay exception message to application UI page +// Replay exception message to application UI page --- End diff -- Yes, you are write, relay is correct, replay is not correct, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3823#discussion_r22306164 --- Diff: bin/spark-submit --- @@ -42,7 +42,10 @@ while (($#)); do shift done -DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf +DEFAULT_PROPERTIES_FILE=$SPARK_CONF_DIR/spark-defaults.conf --- End diff -- `SparkSubmitArguments` already ultimately handles this case, right? What does this fix? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/3824#discussion_r22306172 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -758,13 +760,14 @@ private[spark] class Master( // Event logging is enabled for this application, but no event logs are found val title = sApplication history not found (${app.id}) var msg = sNo event logs found for application $appName in $eventLogFile. -logWarning(msg) +val exception = URLEncoder.encode(Utils.exceptionString(fnf), UTF-8) +logWarning(msg, fnf) msg += Did you specify the correct logging directory? msg = URLEncoder.encode(msg, UTF-8) -app.desc.appUiUrl = notFoundBasePath + s?msg=$msgtitle=$title +app.desc.appUiUrl = notFoundBasePath + s?msg=$msgexception=$exceptiontitle=$title --- End diff -- No --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4985] [SQL] parquet support for date ty...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3822#issuecomment-68242437 [Test build #24856 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24856/consoleFull) for PR 3822 at commit [`0ebe356`](https://github.com/apache/spark/commit/0ebe356bceff169fe89134bed603a17514dc1108). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4985] [SQL] parquet support for date ty...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3822#issuecomment-68242443 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24856/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3819#issuecomment-68242474 This can be turned into a simple mouseover with much less work, no CSS or Javascript. Just display the shortened version, and make the long description the `title` attribute of an enclosing tag like `span` or `div`. I think that might even be more intuitive? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4982][DOC]The `spark.ui.retainedJobs` m...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3818#issuecomment-68243274 Yes looks like a simple copy-and-paste error. Doesn't even really need a JIRA as there's nothing more to this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...
Github user WangTaoTheTonic commented on a diff in the pull request: https://github.com/apache/spark/pull/3823#discussion_r22306481 --- Diff: bin/spark-submit --- @@ -42,7 +42,10 @@ while (($#)); do shift done -DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf +DEFAULT_PROPERTIES_FILE=$SPARK_CONF_DIR/spark-defaults.conf --- End diff -- Used for if [[ $SPARK_SUBMIT_DEPLOY_MODE == client -f $SPARK_SUBMIT_PROPERTIES_FILE ]]; then # Parse the properties file only if the special configs exist contains_special_configs=$( grep -e spark.driver.extra*\|spark.driver.memory $SPARK_SUBMIT_PROPERTIES_FILE | \ grep -v ^[[:space:]]*# ) if [ -n $contains_special_configs ]; then export SPARK_SUBMIT_BOOTSTRAP_DRIVER=1 fi fi --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3819#issuecomment-68243522 Hi @srowen, I am not a expert of css/javascript, my question here if do like your suggest(mouseover), can we copy the full sql statement in ui? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3824#issuecomment-68243544 [Test build #24861 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24861/consoleFull) for PR 3824 at commit [`a49c52f`](https://github.com/apache/spark/commit/a49c52fc995c7ac110d0ab07a4da2f87cf74de2d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3732#issuecomment-68243880 [Test build #24862 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24862/consoleFull) for PR 3732 at commit [`3b4d5d8`](https://github.com/apache/spark/commit/3b4d5d80dc716a9fe2782115399a77f171d66cc7). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3819#issuecomment-68243955 A mouseover can have whatever you like, whatever you can put in a `title` attribute. The browser will lay it out. You could try and see if it's an effective view. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: HiveTableScan return mutable row with copy
GitHub user yanbohappy opened a pull request: https://github.com/apache/spark/pull/3827 HiveTableScan return mutable row with copy https://issues.apache.org/jira/browse/SPARK-4963 SchemaRDD.sample() return wrong results due to GapSamplingIterator operating on mutable row. HiveTableScan make RDD with SpecificMutableRow and SchemaRDD.sample() will return GapSamplingIterator for iterating. override def next(): T = { val r = data.next() advance r } GapSamplingIterator.next() return the current underlying element and assigned it to r. However if the underlying iterator is mutable row just like what HiveTableScan returned, underlying iterator and r will point to the same object. After advance operation, we drop some underlying elments and it also changed r which is not expected. Then we return the wrong value different from initial r. To fix this issue, the most direct way is to make HiveTableScan return mutable row with copy just like the initial commit that I have made. This solution will make HiveTableScan can not get the full advantage of reusable MutableRow, but it can make sample operation return correct result. Further more, we need to investigate GapSamplingIterator.next() and make it can implement copy operation inside it. To achieve this, we should define every elements that RDD can store implement the function like cloneable and it will make huge change. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanbohappy/spark spark-4963 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3827.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3827 commit 6eaee5e7b1b5aca7f6abd16892f8312c7d6d7917 Author: Yanbo Liang yanboha...@gmail.com Date: 2014-12-29T09:00:44Z HiveTableScan return mutable row with copy --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3819#issuecomment-68244071 thanks, i will try with it:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3823#discussion_r22306764 --- Diff: bin/spark-submit --- @@ -42,7 +42,10 @@ while (($#)); do shift done -DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf +DEFAULT_PROPERTIES_FILE=$SPARK_CONF_DIR/spark-defaults.conf +if [ ! -f DEFAULT_PROPERTIES_FILE ]; then + DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf +fi export SPARK_SUBMIT_DEPLOY_MODE=${SPARK_SUBMIT_DEPLOY_MODE:-client} export SPARK_SUBMIT_PROPERTIES_FILE=${SPARK_SUBMIT_PROPERTIES_FILE:-$DEFAULT_PROPERTIES_FILE} --- End diff -- It's used here actually, but the purpose below seems to be detecting whether the user has set particular properties. Finding the default config doesn't matter here since it is a case where the user hasn't set these properties. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3823#issuecomment-68244188 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24857/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3823#issuecomment-68244186 [Test build #24857 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24857/consoleFull) for PR 3823 at commit [`c5a85eb`](https://github.com/apache/spark/commit/c5a85eb37389f3c849129267fcef0dfa608d09c6). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4963 [SQL] HiveTableScan return mutable ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3827#issuecomment-68244170 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3824#issuecomment-68244528 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24858/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3824#issuecomment-68244526 [Test build #24858 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24858/consoleFull) for PR 3824 at commit [`12eee85`](https://github.com/apache/spark/commit/12eee8590fb9899c267b29d3a129a169b6cf6ec1). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3732#issuecomment-68246138 [Test build #24862 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24862/consoleFull) for PR 3732 at commit [`3b4d5d8`](https://github.com/apache/spark/commit/3b4d5d80dc716a9fe2782115399a77f171d66cc7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `final class Date extends Ordered[Date] with Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3732#issuecomment-68246141 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24862/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] enable view test
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3826#issuecomment-68246366 [Test build #24860 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24860/consoleFull) for PR 3826 at commit [`f105f68`](https://github.com/apache/spark/commit/f105f68ef33381e272985866fb63a7e7775b76bb). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] enable view test
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3826#issuecomment-68246371 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24860/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68246470 [Test build #24859 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24859/consoleFull) for PR 3825 at commit [`e9c99e3`](https://github.com/apache/spark/commit/e9c99e3969f6e058e46d65575d796d1289351318). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class MasterDisconnected(masterUrl: String) extends DeployMessage` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68246474 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24859/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...
Github user WangTaoTheTonic commented on a diff in the pull request: https://github.com/apache/spark/pull/3823#discussion_r22307986 --- Diff: bin/spark-submit --- @@ -42,7 +42,10 @@ while (($#)); do shift done -DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf +DEFAULT_PROPERTIES_FILE=$SPARK_CONF_DIR/spark-defaults.conf +if [ ! -f DEFAULT_PROPERTIES_FILE ]; then + DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf +fi export SPARK_SUBMIT_DEPLOY_MODE=${SPARK_SUBMIT_DEPLOY_MODE:-client} export SPARK_SUBMIT_PROPERTIES_FILE=${SPARK_SUBMIT_PROPERTIES_FILE:-$DEFAULT_PROPERTIES_FILE} --- End diff -- If user didn't pass `--properties-file` and define SPARK_CONF_DIR while not SPARK_HOME, then spark-submit will see if `spark.driver.extra*` in directory specified by `DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf`. Obviously it will make the wrong judgement when SPARK_CONF_DIR does not equal `$SPARK_HOME/conf`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3823#discussion_r22308280 --- Diff: bin/spark-submit --- @@ -42,7 +42,10 @@ while (($#)); do shift done -DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf +DEFAULT_PROPERTIES_FILE=$SPARK_CONF_DIR/spark-defaults.conf +if [ ! -f DEFAULT_PROPERTIES_FILE ]; then + DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf +fi export SPARK_SUBMIT_DEPLOY_MODE=${SPARK_SUBMIT_DEPLOY_MODE:-client} export SPARK_SUBMIT_PROPERTIES_FILE=${SPARK_SUBMIT_PROPERTIES_FILE:-$DEFAULT_PROPERTIES_FILE} --- End diff -- OK I see the scenario now where this matters. So the default config for an installation might in fact set these special properties, which the script needs to handle before `SparkSubmit` starts. Maybe someone else can double-check, but that makes sense to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3819#issuecomment-68248321 hi @srowen, i tried like this ``` div title=full sql: shortened sql/div ``` but i can not copy the full sql, or i missed something? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...
Github user WangTaoTheTonic commented on a diff in the pull request: https://github.com/apache/spark/pull/3823#discussion_r22308521 --- Diff: bin/spark-submit --- @@ -42,7 +42,10 @@ while (($#)); do shift done -DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf +DEFAULT_PROPERTIES_FILE=$SPARK_CONF_DIR/spark-defaults.conf +if [ ! -f DEFAULT_PROPERTIES_FILE ]; then + DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf +fi export SPARK_SUBMIT_DEPLOY_MODE=${SPARK_SUBMIT_DEPLOY_MODE:-client} export SPARK_SUBMIT_PROPERTIES_FILE=${SPARK_SUBMIT_PROPERTIES_FILE:-$DEFAULT_PROPERTIES_FILE} --- End diff -- I mean there is a possibility that user don't use the conf sub-directory default installation location but specified. For instance, I touch a spark-defaults.conf under `/etc/my-spark/` and use it for submitting applications, so I set SPARK_CONF_DIR to `/etc/my-spark/` to make the properties file work. The properties file under `$SPARK_HOME/conf` could be unused or used for submitting other applications. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3819#issuecomment-68248476 @scwf You should be able to put whatever you want in there as the `title`. What's the issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3824#issuecomment-68248630 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24861/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3824#issuecomment-68248623 [Test build #24861 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24861/consoleFull) for PR 3824 at commit [`a49c52f`](https://github.com/apache/spark/commit/a49c52fc995c7ac110d0ab07a4da2f87cf74de2d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3819#issuecomment-68249501 Sorry, did not get you. Do you mean i put others here( to implement an attribute similar with ```title```,) instead of ```title```? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3819#issuecomment-68250243 I'm suggesting something like replacing ``` divem{lastStageDescription}/em/div ``` with ``` div title={lastStageDescription}em{shortLastStageDescription}/em/div ``` ... when the description is long. This should cause the short description to pop up the full description in a mouseover. Maybe I'm missing something as to why that won't work, but it is a lot simpler at least. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4994][network]Cleanup removed executors...
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/3828 [SPARK-4994][network]Cleanup removed executors' ShuffleInfo in yarn shuffle service when the application is completed, yarn's nodemanager can remove application's local-dirs.but all executors' metadata of completed application havenot be removed. now it let yarn ShuffleService to have much more memory to store Executors' ShuffleInfo. so these metadata need to be removed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lianhuiwang/spark SPARK-4994 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3828.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3828 commit f3ba1d283834b3583da829306a475781fb12ecb9 Author: lianhuiwang lianhuiwan...@gmail.com Date: 2014-12-29T12:34:38Z Cleanup removed executors' ShuffleInfo --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4994][network]Cleanup removed executors...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3828#issuecomment-68254595 [Test build #24863 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24863/consoleFull) for PR 3828 at commit [`f3ba1d2`](https://github.com/apache/spark/commit/f3ba1d283834b3583da829306a475781fb12ecb9). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3819#issuecomment-68254852 Hi @srowen, this and ```div title=full sql: shortened sql/div``` both work for me. Why i do not use this solution is that we maybe want to copy(ctrl + c) the full description from UI sometimes, if using ```title``` we can not copy the full description:) But this is really much simpler and if you think it's very low probability for user to copy the full desc, i will change it to this simple solution:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...
Github user OopsOutOfMemory commented on the pull request: https://github.com/apache/spark/pull/3823#issuecomment-68256422 Hi, @WangTaoTheTonic : ) This make sense to me, like hadoop also have HADOOP_CONF_DIR. But I prefer to `check` if the `SPARK_CONF_DIR` `directory` is exists first, not only check the `configuration file`. If there are many files under SPARK_CONF_DIR need to be added in spark-submit in the future, you will need to check each file is exists or not. you can do it like : ```shell if [ ! -d $SPARK_CONF_DIR ]; then ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/3823#issuecomment-68257882 @OopsOutOfMemory Thanks for your comment and I understand your concern. Actually it doesn't matter that if SPARK_CONF_DIR does not exist here because we can use `$SPARK_HOME/conf` instead. And checking logic of properties file contains that of SPARK_CONF_DIR. In other places of spark codes, we usually use a `getOrElse` logic to handle configuration. It is easy to analyse when use got some specific config wrong and we'd better not to broke this tradition. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3771#issuecomment-68260017 Can you please be a bit more specific and detail out exact what happens here? Are you referring to when RM has to failover or during rolling upgrade. Is the container brought down and then back up again... please just describe the scenario and what exactly is happening. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4994][network]Cleanup removed executors...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3828#issuecomment-68260130 [Test build #24863 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24863/consoleFull) for PR 3828 at commit [`f3ba1d2`](https://github.com/apache/spark/commit/f3ba1d283834b3583da829306a475781fb12ecb9). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4994][network]Cleanup removed executors...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3828#issuecomment-68260135 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24863/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4966][YARN]The MemoryOverhead value is ...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3797#issuecomment-68260624 Looks good. +1. Thanks @lianhuiwang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4966][YARN]The MemoryOverhead value is ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3797 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...
Github user OopsOutOfMemory commented on the pull request: https://github.com/apache/spark/pull/3823#issuecomment-68261526 Sorry, maybe here is a misunderstanding. What I mean is to __`change` the `checking logic of properties file`__ instead of __`checking whether the SPARK_CONF_DIR is `user-specifc` or `default` . But not to add an extra checking directory here . Let me raise an example: ``` if [ ! -d $SPARK_CONF_DIR ]; then export SPARK_CONF_DIR = $SPARK_HOME/conf fi DEFAULT_PROPERTIES_FILE=$SPARK_CONF_DIR/spark-defaults.conf XXX_PROPERTIES_FILE = $SPARK_CONF_DIR/xxx.conf ``` To check conf directory is more reasonable, because the key point is the `SPARK_CONF_DIR `. The original concern here is to change the `path` of `SPARK_CONF_DIR` but not only `spark-default.conf`. analyse when use got some specific config wrong We may add an extra warning for the `key configuration file` here. i.e If `spark-deault.conf` is missing or changed to a user-specific dir under conf_dir, we may raise an `warning log` to let user aware of this before submitting. Currently, I think both of the two solutions is ok! : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...
Github user zapletal-martin commented on the pull request: https://github.com/apache/spark/pull/3519#issuecomment-68279092 1) Can you please clarify if you are sugesting to use RDD[(Double, Double, Double)] - i.e label, feature, weight or RDD[(Double, Double)] - i.e. just label, weight and already expect the data to be ordered? Also I assume there should be API with weight default to 1 (so user does not have to specify it). 2) IsotonicRegressionModel extends RegressionModel. It implements methods predict(testData: RDD[Vector]) and predict(testData: Vector). Are these still relevant if we implement the changes in 1)? There would never be a Vector, just Double. Also we would need feature in 1) to be able to predict label. 3) How do you expect the java api to look like? Unfortunately the java/scala interop here is not very helpful. When train method expects tuple of scala.Double then when called from java you get: [error] IsotonicRegressionModel model = IsotonicRegression.train(testRDD.rdd(), true); [error] ^ [error] required: RDDTuple3Object,Object,Object,boolean [error] found: RDDTuple3Double,Double,Double,boolean [error] reason: actual argument RDDTuple3Double,Double,Double cannot be converted to RDDTuple3Object,Object,Object by method invocation conversion There are solutions to this problem, but most of them quite ugly. See for example http://stackoverflow.com/questions/17071061/scala-java-interoperability-how-to-deal-with-options-containing-int-long-primi or http://www.scala-notes.org/2011/04/specializing-for-primitive-types/. Is there another public java api that uses primitive type in generic that I could use as reference? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib]Vectors.sparse() add support to unsorte...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3791#issuecomment-68279740 @hzlyx As @srowen mentioned, this is a contract to avoid the expensive check. You can use https://github.com/hzlyx/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala#L191 if the indices are not ordered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Added setMinCount to Word2Vec.scala
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3693#issuecomment-68279848 @ganonp Could you update the branch and remove the last commit? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3955 part 2 [CORE] [HOTFIX] Different ve...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/3829 SPARK-3955 part 2 [CORE] [HOTFIX] Different versions between jackson-mapper-asl and jackson-core-asl @pwendell https://github.com/apache/spark/commit/2483c1efb6429a7d8a20c96d18ce2fec93a1aff9 didn't actually add a reference to `jackson-core-asl` as intended, but a second redundant reference to `jackson-mapper-asl`, as @markhamstra picked up on (https://github.com/apache/spark/pull/3716#issuecomment-68180192) This just rectifies the typo. I missed it as well; the original PR https://github.com/apache/spark/pull/2818 had it correct and I also didn't see the problem. You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-3955 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3829.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3829 commit 6cfdc4e3dfe0a04e32a955bedffd5747fad9d70c Author: Sean Owen so...@cloudera.com Date: 2014-12-29T18:13:29Z Actually refer to jackson-core-asl --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3955 part 2 [CORE] [HOTFIX] Different ve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3829#issuecomment-68282805 [Test build #24864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24864/consoleFull) for PR 3829 at commit [`6cfdc4e`](https://github.com/apache/spark/commit/6cfdc4e3dfe0a04e32a955bedffd5747fad9d70c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4921. TaskSetManager.dequeueTask returns...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/3816#issuecomment-68282936 Why was this a problem? You need to make sure that this won't change the locality level the scheduler launches tasks at due to delay scheduling. For example, if a stage contained both process-local and no-pref tasks, and it was still able to launch tasks locally (without the delay expiring), this change might make it forget that and not wait long enough, thus not getting local tasks. Please write down something explaining why this was a problem and why the fix won't break other things. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3707#issuecomment-68283433 @witgo commented on the actual commit: https://github.com/apache/spark/commit/a3e51cc990812c8099dcaf1f3bd6d5bae45cf8e6#commitcomment-9101060 It seems that every time you run ./build/mvn had to re-download scala-2.10.4.tgz Can you investigate? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3955 part 2 [CORE] [HOTFIX] Different ve...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3829#issuecomment-68283989 Gotcha - thanks sean. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4921. TaskSetManager.dequeueTask returns...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3816#issuecomment-68284050 @mateiz the JIRA claims that this results in extra unnecessary locality delay. I thought that the problem might have been an obvious typo, but it sounds like you're saying this may have been the intended behavior. I'll look deeper into it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-core - [SPARK-4787] - Stop sparkcontext ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3809#discussion_r22322821 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -329,8 +329,11 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli try { dagScheduler = new DAGScheduler(this) } catch { -case e: Exception = throw - new SparkException(DAGScheduler cannot be initialized due to %s.format(e.getMessage)) +case e: Exception = { + stop() --- End diff -- Also, do you think this should be in a try-finally block so that we don't swallow the useful DAGScheduler could not be initialized exception if the stop() call somehow fails? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68284859 Wouldn't it be better to ensure that actors like Master and DAGScheduler never die due to uncaught exceptions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4982][DOC]The `spark.ui.retainedJobs` m...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3818#issuecomment-68284924 LGTM, thanks. I agree with Sean that this doesn't need a JIRA issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4982][DOC]The `spark.ui.retainedJobs` m...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3818 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Adde LICENSE Header to build/mvn, build/sbt an...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3817#issuecomment-68285274 LGTM, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Added setMinCount to Word2Vec.scala
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3693#issuecomment-68285349 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Adde LICENSE Header to build/mvn, build/sbt an...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3817 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Added setMinCount to Word2Vec.scala
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3693#issuecomment-68285678 [Test build #24865 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24865/consoleFull) for PR 3693 at commit [`ad534f2`](https://github.com/apache/spark/commit/ad534f26c44a7bdc8ee91f73d80a93bd13aa6805). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/3707#issuecomment-68285768 @JoshRosen I can't reproduce this error and, after looking through the code, I'm not seeing where an issue like that could crop up :/ @witgo could you help me understand when you're seeing this and provide me the output of `bash -x ./build/mvn clean`? With that I can much better understand how to fix this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3824#discussion_r22323425 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -758,13 +760,14 @@ private[spark] class Master( // Event logging is enabled for this application, but no event logs are found val title = sApplication history not found (${app.id}) var msg = sNo event logs found for application $appName in $eventLogFile. -logWarning(msg) +val exception = URLEncoder.encode(Utils.exceptionString(fnf), UTF-8) +logWarning(msg, fnf) msg += Did you specify the correct logging directory? msg = URLEncoder.encode(msg, UTF-8) -app.desc.appUiUrl = notFoundBasePath + s?msg=$msgtitle=$title +app.desc.appUiUrl = notFoundBasePath + s?msg=$msgexception=$exceptiontitle=$title --- End diff -- @srowen It looks like this same `exception` URL param is used in other exception-handling code in this same file (the first instance was added by @andrewor14 in 6afca2d1079bac6309a595b8e0ffc74ae93fa662). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4921. TaskSetManager.dequeueTask returns...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/3816#issuecomment-68285949 Well, what I'm saying is to look at how it affects the rest of the scheduler. That was set to PROCESS_LOCAL there for a reason, it wasn't a typo. It was to make sure that launching a no-pref task doesn't then cause you to increase your allowed locality level and miss waiting for other local ones. I'd also like to see what performance different this makes in the original case, and why it was a problem there (e.g. was this an InputFormat with no locality info at all or something). One fix by the way may be to not count NO_PREF launches at all when deciding how to update delay scheduling variables, but even then it's good to understand what this was doing and make sure it won't break it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-68286086 @srowen Sorry for the delay! I'm really starting to wonder about this JIRA, though. The collect() should return one BinaryLabelCounter per partition. I'd assume people would have enough memory to store at least a few million BinaryLabelCounter instances on the driver. Does that mean they have more than a few million partitions? Sorry I didn't think about this earlier, and perhaps I'm just confusing myself now---let me know what you think. Is there an issue to solve here? Previously, I'd have said: With the update, this LGTM Also, I did think of one use case which may change things: We've been talking about people using these methods to make plots. Do you think people ever use them to choose thresholds? If so, then people might want much finer-grained ROC curves than we've been thinking, and it might be worthwhile to do a fancy implementation which avoids binning. At any rate, apologies for so much back-and-forth. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3824#discussion_r22323517 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -719,26 +719,28 @@ private[spark] class Master( def rebuildSparkUI(app: ApplicationInfo): Boolean = { val appName = app.desc.name val notFoundBasePath = HistoryServer.UI_PATH_PREFIX + /not-found -val eventLogFile = app.desc.eventLogDir - .map { dir = EventLoggingListener.getLogPath(dir, app.id) } - .getOrElse { -// Event logging is not enabled for this application -app.desc.appUiUrl = notFoundBasePath -return false -} -val fs = Utils.getHadoopFileSystem(eventLogFile, hadoopConf) +var eventLogFile: String = null --- End diff -- It looks like `eventLogFile` is only read from inside the `try` block on the following line, so why not move it inside and make it into a `val` instead? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68286943 More specifically, I guess I'm suggesting that we modify wrap the `receive` and `receiveWithLogging` methods of our actors with try-catch blocks to log any exceptions that bubble up to the top of the actors. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3519#issuecomment-68286932 2a) `(label: Double, feature: Double, weight: Double)` sounds good to me. We may add weight support to `LabeledPoint` as part of SPARK-3702, which should be orthogonal to this PR. We can update the API here (before 1.3) once that gets merged. 2b) Isotonic regression is a univariate regression algorithm. It is not necessary to have its model extend RegressionModel. It should have `predict(RDD[Double])` and `predict(Double)`. 2c) Try `train(JavaPairRDDjava.lang.Double, java.lang.Double)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3824#issuecomment-68287342 This change seems okay to me overall, aside from one minor nit. Most of the change is just broadening the scope of the `try` block to handle some cases that didn't seem like they could fail. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4893] Clean up uses of System.setProper...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3739#issuecomment-68287961 /cc @pwendell @andrewor14 @tdas Could one of you review this? It's blocking a couple of other PRs that I'd like to merge. This looks like a lot of changes, but they're isolated to test code and most cases are small, local changes to replace system property usage with SparkConf. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4968: takeOrdered to skip reduce step in...
GitHub user saucam opened a pull request: https://github.com/apache/spark/pull/3830 SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions takeOrdered should skip reduce step in case mapped RDDs have no partitions. This prevents the mentioned exception : 4. run query SELECT * FROM testTable WHERE market = 'market2' ORDER BY End_Time DESC LIMIT 100; Error trace java.lang.UnsupportedOperationException: empty collection at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863) at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.reduce(RDD.scala:863) at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1136) You can merge this pull request into a Git repository by running: $ git pull https://github.com/saucam/spark fix_takeorder Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3830.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3830 commit 5974d10c619dac2ca2433d331e43ed48e6822f90 Author: Yash Datta yash.da...@guavus.com Date: 2014-12-29T19:06:32Z SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4968: takeOrdered to skip reduce step in...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3830#issuecomment-68292137 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4946] [CORE] Using AkkaUtils.askWithRep...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3785#issuecomment-68292416 On the surface, this seems like an okay change. I wonder whether this retry logic could have unexpected consequences. Let me try to reason it out: - `askTracker` is only called with `GetMapOutputStatuses`. - In the master actor, it calls `getSerializedMapOutputStatuses`. This method never throws exceptions: if a shuffle is missing, then it just stores an empty array and serializes it. - It's possible that the serialized map statuses could exceed the Akka frame size (although extremely unlikely and perhaps impossible with the new output status compression techniques). In this case, though, the master would throw an exception and fail to send a reply back to the asker. In this case, with this patch we'd end up performing a bunch of retries for an operation that will ultimately fail, so we'll take longer to detect a failure. In the common cases, though, this seems fine, even if the map output statuses are missing (since it won't introduce a bunch of futile retries). Therefore, I think we should pull this in; I don't know if this fixes an actual bug, but it seems like it could make things more robust. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4946] [CORE] Using AkkaUtils.askWithRep...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3785#issuecomment-68293600 Alright, I'm going to merge this into `master` (1.3.0). Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4946] [CORE] Using AkkaUtils.askWithRep...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3785 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4961] [CORE] Put HadoopRDD.getPartition...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3794#issuecomment-68294158 To reformat the PR description to make it a little easier to read: HadoopRDD.getPartitions is lazyied to process in DAGScheduler.JobSubmitted. If inputdir is large, getPartitions may spend much time. For example, in our cluster, it needs from 0.029s to 766.699s. If one JobSubmitted event is processing, others should wait. Thus, we want to put HadoopRDD.getPartitions forward to reduce DAGScheduler.JobSubmitted processing time. Then other JobSubmitted event don't need to wait much time. HadoopRDD object could get its partitons when it is instantiated. We could analyse and compare the execution time before and after optimization. ``` TaskScheduler.start execution time: [time1__] DAGScheduler.JobSubmitted (excluding HadoopRDD.getPartitions or TaskScheduler.start) execution time: [time2_] HadoopRDD.getPartitions execution time: [time3___] Stages execution time: [time4_]. ``` (1) The app has only one job (a) ``` The execution time of the job before optimization is [time1__][time2_][time3___][time4_]. The execution time of the job after optimization is[time1__][time3___][time2_][time4_]. ``` In summary, if the app has only one job, the total execution time is same before and after optimization. (2) The app has 4 jobs (a) Before optimization, ``` job1 execution time is [time2_][time3___][time4_], job2 execution time is [time2__][time3___][time4_], job3 execution time is[time2][time3___][time4_], job4 execution time is[time2_][time3___][time4_]. ``` After optimization, ``` job1 execution time is [time3___][time2_][time4_], job2 execution time is [time3___][time2__][time4_], job3 execution time is[time3___][time2_][time4_], job4 execution time is[time3___][time2__][time4_]. ``` In summary, if the app has multiple jobs, average execution time after optimization is less than before. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4963 [SQL] HiveTableScan return mutable ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3827#issuecomment-68294587 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org