[jira] [Updated] (SPARK-2456) Scheduler refactoring
[ https://issues.apache.org/jira/browse/SPARK-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2456: --- Description: This is an umbrella ticket to track scheduler refactoring. We want to clearly define semantics and responsibilities of each component, and define explicit public interfaces for them so it is easier to understand and to contribute (also less buggy). was: This is an umbrella ticket to track scheduler refactoring. We want to clearly define semantics and responsibilities of each component, and define explicit public interfaces for them so it is easier to understand and to contribute (also less buggy). [~kayousterhout] Scheduler refactoring - Key: SPARK-2456 URL: https://issues.apache.org/jira/browse/SPARK-2456 Project: Spark Issue Type: Improvement Reporter: Reynold Xin Assignee: Reynold Xin This is an umbrella ticket to track scheduler refactoring. We want to clearly define semantics and responsibilities of each component, and define explicit public interfaces for them so it is easier to understand and to contribute (also less buggy). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2456) Scheduler refactoring
[ https://issues.apache.org/jira/browse/SPARK-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072860#comment-14072860 ] Reynold Xin commented on SPARK-2456: One related PR: https://github.com/apache/spark/pull/1561 Scheduler refactoring - Key: SPARK-2456 URL: https://issues.apache.org/jira/browse/SPARK-2456 Project: Spark Issue Type: Improvement Reporter: Reynold Xin Assignee: Reynold Xin This is an umbrella ticket to track scheduler refactoring. We want to clearly define semantics and responsibilities of each component, and define explicit public interfaces for them so it is easier to understand and to contribute (also less buggy). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2310) Support arbitrary options on the command line with spark-submit
[ https://issues.apache.org/jira/browse/SPARK-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2310: --- Assignee: Sandy Ryza Support arbitrary options on the command line with spark-submit --- Key: SPARK-2310 URL: https://issues.apache.org/jira/browse/SPARK-2310 Project: Spark Issue Type: Improvement Components: Deploy Affects Versions: 1.0.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.1.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2310) Support arbitrary options on the command line with spark-submit
[ https://issues.apache.org/jira/browse/SPARK-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2310. Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 1253 [https://github.com/apache/spark/pull/1253] Support arbitrary options on the command line with spark-submit --- Key: SPARK-2310 URL: https://issues.apache.org/jira/browse/SPARK-2310 Project: Spark Issue Type: Improvement Components: Deploy Affects Versions: 1.0.0 Reporter: Sandy Ryza Fix For: 1.1.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2664) Deal with `--conf` options in spark-submit that relate to flags
Patrick Wendell created SPARK-2664: -- Summary: Deal with `--conf` options in spark-submit that relate to flags Key: SPARK-2664 URL: https://issues.apache.org/jira/browse/SPARK-2664 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Assignee: Sandy Ryza Priority: Blocker If someone sets a spark conf that relates to an existing flag `--master`, we should set it correctly like we do with the defaults file. Otherwise it can have confusing semantics. I noticed this after merging it, otherwise I would have mentioned it in the review. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2664) Deal with `--conf` options in spark-submit that relate to flags
[ https://issues.apache.org/jira/browse/SPARK-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2664: --- Description: If someone sets a spark conf that relates to an existing flag `--master`, we should set it correctly like we do with the defaults file. Otherwise it can have confusing semantics. I noticed this after merging it, otherwise I would have mentioned it in the review. I think it's as simple as modifying loadDefaults to check the user-supplied options also. We might change it to loadUserProperties since it's no longer just the defaults file. was:If someone sets a spark conf that relates to an existing flag `--master`, we should set it correctly like we do with the defaults file. Otherwise it can have confusing semantics. I noticed this after merging it, otherwise I would have mentioned it in the review. Deal with `--conf` options in spark-submit that relate to flags --- Key: SPARK-2664 URL: https://issues.apache.org/jira/browse/SPARK-2664 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Assignee: Sandy Ryza Priority: Blocker If someone sets a spark conf that relates to an existing flag `--master`, we should set it correctly like we do with the defaults file. Otherwise it can have confusing semantics. I noticed this after merging it, otherwise I would have mentioned it in the review. I think it's as simple as modifying loadDefaults to check the user-supplied options also. We might change it to loadUserProperties since it's no longer just the defaults file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2652) Turning default configurations for PySpark
[ https://issues.apache.org/jira/browse/SPARK-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072909#comment-14072909 ] Apache Spark commented on SPARK-2652: - User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/1568 Turning default configurations for PySpark -- Key: SPARK-2652 URL: https://issues.apache.org/jira/browse/SPARK-2652 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 1.0.0 Reporter: Davies Liu Assignee: Davies Liu Labels: Configuration, Python Fix For: 1.1.0 Original Estimate: 48h Remaining Estimate: 48h Some default value of configuration does not make sense for PySpark, change them to reasonable ones, such as spark.serializer and spark.kryo.referenceTracking -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2661) Unpersist last RDD in bagel iteration
[ https://issues.apache.org/jira/browse/SPARK-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2661. -- Resolution: Fixed Unpersist last RDD in bagel iteration - Key: SPARK-2661 URL: https://issues.apache.org/jira/browse/SPARK-2661 Project: Spark Issue Type: Improvement Reporter: Adrian Wang Assignee: Adrian Wang Fix For: 1.1.0 In bagel iteration, we only depend on RDD[n] to get RDD[n+1], so we can unpersist RDD[n-1] after we get RDD[n]. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2664) Deal with `--conf` options in spark-submit that relate to flags
[ https://issues.apache.org/jira/browse/SPARK-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072925#comment-14072925 ] Sandy Ryza commented on SPARK-2664: --- I think the right behavior here is worth a little thought. What's the mental model we expect the user to have about the relationship between properties specified through --conf and properties that get their own flag? My first thought is - if we're ok with taking properties like master through --conf, is there a point (beyond compatibility) in having flags for these properties at all? Flags that aren't conf are there because they impact what happens before the SparkContext is created. These fall into a couple categories: 1. Flags that have no property Spark conf equivalent like --executor-cores 2. Flags that have a direct Spark conf equivalent like --executor-cores (spark.executor.memory) 3. Flags that impact a Spark conf like --deploy-mode (which can mean we set spark.master to yarn-cluster) I think the two ways to look at it are: 1. We're OK with taking properties that have related flags. In the case of a property in the 2nd category, we have a policy over which takes precedence. In the case of a property in the 3rd category, we have some (possibly complex) resolution logic. This approach would be the most accepting, but requires the user to have a model of how these conflicts get resolved. 2. We're not OK with taking properties that have related flags. --conf specifies property that gets passed to the SparkContext and has no effect on anything that happens before it's created. To save users from themselves, if someone passes spark.master or spark.app.name through --conf, we ignore it or throw an error. I'm a little more partial to approach 2 because I think the mental model is a little simpler. Either way, we should probably enforce the same behavior when a config comes from the defaults file. Lastly, how do we allow setting a default for one of these special flags? E.g. make it so that all jobs run on YARN or Mesos by default. With approach 1, this is relatively straightforward - we use the same logic we'd use on a property that comes in through --conf for making defaults take effect. We might need to add spark properties for flags that don't have them already like --executor-cores. With approach 2, we'd need to add support in the defaults file or somewhere else for specifying flag defaults. Deal with `--conf` options in spark-submit that relate to flags --- Key: SPARK-2664 URL: https://issues.apache.org/jira/browse/SPARK-2664 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Assignee: Sandy Ryza Priority: Blocker If someone sets a spark conf that relates to an existing flag `--master`, we should set it correctly like we do with the defaults file. Otherwise it can have confusing semantics. I noticed this after merging it, otherwise I would have mentioned it in the review. I think it's as simple as modifying loadDefaults to check the user-supplied options also. We might change it to loadUserProperties since it's no longer just the defaults file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-2664) Deal with `--conf` options in spark-submit that relate to flags
[ https://issues.apache.org/jira/browse/SPARK-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072925#comment-14072925 ] Sandy Ryza edited comment on SPARK-2664 at 7/24/14 7:18 AM: I think the right behavior here is worth a little thought. What's the mental model we expect the user to have about the relationship between properties specified through --conf and properties that get their own flag? My first thought is - if we're ok with taking properties like master through --conf, is there a point (beyond compatibility) in having flags for these properties at all? Flags that aren't Spark confs are there because they impact what happens before the SparkContext is created. These fall into a couple categories: 1. Flags that have no property Spark conf equivalent like --executor-cores 2. Flags that have a direct Spark conf equivalent like --executor-cores (spark.executor.memory) 3. Flags that impact a Spark conf like --deploy-mode (which can mean we set spark.master to yarn-cluster) I think the two ways to look at it are: 1. We're OK with taking properties that have related flags. In the case of a property in the 2nd category, we have a policy over which takes precedence. In the case of a property in the 3rd category, we have some (possibly complex) resolution logic. This approach would be the most accepting, but requires the user to have a model of how these conflicts get resolved. 2. We're not OK with taking properties that have related flags. --conf specifies property that gets passed to the SparkContext and has no effect on anything that happens before it's created. To save users from themselves, if someone passes spark.master or spark.app.name through --conf, we ignore it or throw an error. I'm a little more partial to approach 2 because I think the mental model is a little simpler. Either way, we should probably enforce the same behavior when a config comes from the defaults file. Lastly, how do we allow setting a default for one of these special flags? E.g. make it so that all jobs run on YARN or Mesos by default. With approach 1, this is relatively straightforward - we use the same logic we'd use on a property that comes in through --conf for making defaults take effect. We might need to add spark properties for flags that don't have them already like --executor-cores. With approach 2, we'd need to add support in the defaults file or somewhere else for specifying flag defaults. was (Author: sandyr): I think the right behavior here is worth a little thought. What's the mental model we expect the user to have about the relationship between properties specified through --conf and properties that get their own flag? My first thought is - if we're ok with taking properties like master through --conf, is there a point (beyond compatibility) in having flags for these properties at all? Flags that aren't conf are there because they impact what happens before the SparkContext is created. These fall into a couple categories: 1. Flags that have no property Spark conf equivalent like --executor-cores 2. Flags that have a direct Spark conf equivalent like --executor-cores (spark.executor.memory) 3. Flags that impact a Spark conf like --deploy-mode (which can mean we set spark.master to yarn-cluster) I think the two ways to look at it are: 1. We're OK with taking properties that have related flags. In the case of a property in the 2nd category, we have a policy over which takes precedence. In the case of a property in the 3rd category, we have some (possibly complex) resolution logic. This approach would be the most accepting, but requires the user to have a model of how these conflicts get resolved. 2. We're not OK with taking properties that have related flags. --conf specifies property that gets passed to the SparkContext and has no effect on anything that happens before it's created. To save users from themselves, if someone passes spark.master or spark.app.name through --conf, we ignore it or throw an error. I'm a little more partial to approach 2 because I think the mental model is a little simpler. Either way, we should probably enforce the same behavior when a config comes from the defaults file. Lastly, how do we allow setting a default for one of these special flags? E.g. make it so that all jobs run on YARN or Mesos by default. With approach 1, this is relatively straightforward - we use the same logic we'd use on a property that comes in through --conf for making defaults take effect. We might need to add spark properties for flags that don't have them already like --executor-cores. With approach 2, we'd need to add support in the defaults file or somewhere else for specifying flag defaults. Deal with `--conf` options in spark-submit that relate to flags
[jira] [Created] (SPARK-2665) Add EqualNS support for HiveQL
Cheng Hao created SPARK-2665: Summary: Add EqualNS support for HiveQL Key: SPARK-2665 URL: https://issues.apache.org/jira/browse/SPARK-2665 Project: Spark Issue Type: New Feature Components: SQL Reporter: Cheng Hao Hive Supports the operator =, which returns same result with EQUAL(=) operator for non-null operands, but returns TRUE if both are NULL, FALSE if one of the them is NULL. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2414) Remove jquery
[ https://issues.apache.org/jira/browse/SPARK-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2414: --- Assignee: (was: Reynold Xin) Remove jquery - Key: SPARK-2414 URL: https://issues.apache.org/jira/browse/SPARK-2414 Project: Spark Issue Type: Improvement Components: Web UI Reporter: Reynold Xin Priority: Minor Labels: starter SPARK-2384 introduces jquery for tooltip display. We can probably just create a very simple javascript for tooltip instead of pulling in jquery. https://github.com/apache/spark/pull/1314 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-1405) parallel Latent Dirichlet Allocation (LDA) atop of spark in MLlib
[ https://issues.apache.org/jira/browse/SPARK-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073013#comment-14073013 ] lukovnikov edited comment on SPARK-1405 at 7/24/14 9:10 AM: @Isaac, I think it's at https://github.com/yinxusen/spark/blob/lda/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala and here (https://github.com/apache/spark/pull/476/files) for the other changed files as well was (Author: lukovnikov): @Isaac, I think it's at https://github.com/yinxusen/spark/blob/lda/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala parallel Latent Dirichlet Allocation (LDA) atop of spark in MLlib - Key: SPARK-1405 URL: https://issues.apache.org/jira/browse/SPARK-1405 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.1.0 Reporter: Xusen Yin Assignee: Xusen Yin Labels: features Fix For: 0.9.0 Original Estimate: 336h Remaining Estimate: 336h Latent Dirichlet Allocation (a.k.a. LDA) is a topic model which extracts topics from text corpus. Different with current machine learning algorithms in MLlib, instead of using optimization algorithms such as gradient desent, LDA uses expectation algorithms such as Gibbs sampling. In this PR, I prepare a LDA implementation based on Gibbs sampling, with a wholeTextFiles API (solved yet), a word segmentation (import from Lucene), and a Gibbs sampling core. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1405) parallel Latent Dirichlet Allocation (LDA) atop of spark in MLlib
[ https://issues.apache.org/jira/browse/SPARK-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073020#comment-14073020 ] lukovnikov commented on SPARK-1405: --- btw, could this please be merged with the main? there are some conflicts parallel Latent Dirichlet Allocation (LDA) atop of spark in MLlib - Key: SPARK-1405 URL: https://issues.apache.org/jira/browse/SPARK-1405 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.1.0 Reporter: Xusen Yin Assignee: Xusen Yin Labels: features Fix For: 0.9.0 Original Estimate: 336h Remaining Estimate: 336h Latent Dirichlet Allocation (a.k.a. LDA) is a topic model which extracts topics from text corpus. Different with current machine learning algorithms in MLlib, instead of using optimization algorithms such as gradient desent, LDA uses expectation algorithms such as Gibbs sampling. In this PR, I prepare a LDA implementation based on Gibbs sampling, with a wholeTextFiles API (solved yet), a word segmentation (import from Lucene), and a Gibbs sampling core. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2604) Spark Application hangs on yarn in edge case scenario of executor memory requirement
[ https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073085#comment-14073085 ] Apache Spark commented on SPARK-2604: - User 'twinkle-sachdeva' has created a pull request for this issue: https://github.com/apache/spark/pull/1571 Spark Application hangs on yarn in edge case scenario of executor memory requirement Key: SPARK-2604 URL: https://issues.apache.org/jira/browse/SPARK-2604 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Twinkle Sachdeva In yarn environment, let's say : MaxAM = Maximum allocatable memory ExecMem - Executor's memory if (MaxAM ExecMem ( MaxAM - ExecMem) 384m )) then Maximum resource validation fails w.r.t executor memory , and application master gets launched, but when resource is allocated and again validated, they are returned and application appears to be hanged. Typical use case is to ask for executor memory = maximum allowed memory as per yarn config -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2604) Spark Application hangs on yarn in edge case scenario of executor memory requirement
[ https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073086#comment-14073086 ] Twinkle Sachdeva commented on SPARK-2604: - Please review the pull request : https://github.com/apache/spark/pull/1571 Spark Application hangs on yarn in edge case scenario of executor memory requirement Key: SPARK-2604 URL: https://issues.apache.org/jira/browse/SPARK-2604 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Twinkle Sachdeva In yarn environment, let's say : MaxAM = Maximum allocatable memory ExecMem - Executor's memory if (MaxAM ExecMem ( MaxAM - ExecMem) 384m )) then Maximum resource validation fails w.r.t executor memory , and application master gets launched, but when resource is allocated and again validated, they are returned and application appears to be hanged. Typical use case is to ask for executor memory = maximum allowed memory as per yarn config -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2575) SVMWithSGD throwing Input Validation failed
[ https://issues.apache.org/jira/browse/SPARK-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073111#comment-14073111 ] navanee commented on SPARK-2575: spark SVM supports multinomial or binomial classification? SVMWithSGD throwing Input Validation failed Key: SPARK-2575 URL: https://issues.apache.org/jira/browse/SPARK-2575 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.0.1 Reporter: navanee SVMWithSGD throwing Input Validation failed while using Sparse Array as Input. Though SVMWihtSGD accepts LibSVM format. Exception trace : org.apache.spark.SparkException: Input validation failed. at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:145) at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:124) at org.apache.spark.mllib.classification.SVMWithSGD$.train(SVM.scala:154) at org.apache.spark.mllib.classification.SVMWithSGD$.train(SVM.scala:188) at org.apache.spark.mllib.classification.SVMWithSGD.train(SVM.scala) at com.xurmo.ai.hades.classification.algo.Svm.train(Svm.java:143) at com.xurmo.ai.hades.classification.algo.SimpleSVMTest.generateModelFile(SimpleSVMTest.java:172) at com.xurmo.ai.hades.classification.algo.SimpleSVMTest.trainSampleDataTest(SimpleSVMTest.java:65) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2666) when task is FetchFailed cancel running tasks of failedStage
Lianhui Wang created SPARK-2666: --- Summary: when task is FetchFailed cancel running tasks of failedStage Key: SPARK-2666 URL: https://issues.apache.org/jira/browse/SPARK-2666 Project: Spark Issue Type: Bug Reporter: Lianhui Wang in DAGScheduler's handleTaskCompletion,when reason of failed task is FetchFailed, cancel running tasks of failedStage before add failedStage to failedStages queue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2576) slave node throws NoClassDefFoundError $line11.$read$ when executing a Spark QL query on HDFS CSV file
[ https://issues.apache.org/jira/browse/SPARK-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073114#comment-14073114 ] Prashant Sharma commented on SPARK-2576: Looking at it. slave node throws NoClassDefFoundError $line11.$read$ when executing a Spark QL query on HDFS CSV file -- Key: SPARK-2576 URL: https://issues.apache.org/jira/browse/SPARK-2576 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 1.0.1 Environment: One Mesos 0.19 master without zookeeper and 4 mesos slaves. JDK 1.7.51 and Scala 2.10.4 on all nodes. HDFS from CDH5.0.3 Spark version: I tried both with the pre-built CDH5 spark package available from http://spark.apache.org/downloads.html and by packaging spark with sbt 0.13.2, JDK 1.7.51 and scala 2.10.4 as explained here http://mesosphere.io/learn/run-spark-on-mesos/ All nodes are running Debian 3.2.51-1 x86_64 GNU/Linux and have Reporter: Svend Vanderveken Assignee: Yin Huai Priority: Blocker Execution of SQL query against HDFS systematically throws a class not found exception on slave nodes when executing . (this was originally reported on the user list: http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-1-spark-sql-error-java-lang-NoClassDefFoundError-Could-not-initialize-class-line11-read-tc10135.html) Sample code (ran from spark-shell): {code} val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.createSchemaRDD case class Car(timestamp: Long, objectid: String, isGreen: Boolean) // I get the same error when pointing to the folder hdfs://vm28:8020/test/cardata val data = sc.textFile(hdfs://vm28:8020/test/cardata/part-0) val cars = data.map(_.split(,)).map ( ar = Car(ar(0).toLong, ar(1), ar(2).toBoolean)) cars.registerAsTable(mcars) val allgreens = sqlContext.sql(SELECT objectid from mcars where isGreen = true) allgreens.collect.take(10).foreach(println) {code} Stack trace on the slave nodes: {code} I0716 13:01:16.215158 13631 exec.cpp:131] Version: 0.19.0 I0716 13:01:16.219285 13656 exec.cpp:205] Executor registered on slave 20140714-142853-485682442-5050-25487-2 14/07/16 13:01:16 INFO MesosExecutorBackend: Registered with Mesos as executor ID 20140714-142853-485682442-5050-25487-2 14/07/16 13:01:16 INFO SecurityManager: Changing view acls to: mesos,mnubohadoop 14/07/16 13:01:16 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mesos, mnubohadoop) 14/07/16 13:01:17 INFO Slf4jLogger: Slf4jLogger started 14/07/16 13:01:17 INFO Remoting: Starting remoting 14/07/16 13:01:17 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@vm23:38230] 14/07/16 13:01:17 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@vm23:38230] 14/07/16 13:01:17 INFO SparkEnv: Connecting to MapOutputTracker: akka.tcp://spark@vm28:41632/user/MapOutputTracker 14/07/16 13:01:17 INFO SparkEnv: Connecting to BlockManagerMaster: akka.tcp://spark@vm28:41632/user/BlockManagerMaster 14/07/16 13:01:17 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20140716130117-8ea0 14/07/16 13:01:17 INFO MemoryStore: MemoryStore started with capacity 294.9 MB. 14/07/16 13:01:17 INFO ConnectionManager: Bound socket to port 44501 with id = ConnectionManagerId(vm23-hulk-priv.mtl.mnubo.com,44501) 14/07/16 13:01:17 INFO BlockManagerMaster: Trying to register BlockManager 14/07/16 13:01:17 INFO BlockManagerMaster: Registered BlockManager 14/07/16 13:01:17 INFO HttpFileServer: HTTP File server directory is /tmp/spark-ccf6f36c-2541-4a25-8fe4-bb4ba00ee633 14/07/16 13:01:17 INFO HttpServer: Starting HTTP Server 14/07/16 13:01:18 INFO Executor: Using REPL class URI: http://vm28:33973 14/07/16 13:01:18 INFO Executor: Running task ID 2 14/07/16 13:01:18 INFO HttpBroadcast: Started reading broadcast variable 0 14/07/16 13:01:18 INFO MemoryStore: ensureFreeSpace(125590) called with curMem=0, maxMem=309225062 14/07/16 13:01:18 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 122.6 KB, free 294.8 MB) 14/07/16 13:01:18 INFO HttpBroadcast: Reading broadcast variable 0 took 0.294602722 s 14/07/16 13:01:19 INFO HadoopRDD: Input split: hdfs://vm28:8020/test/cardata/part-0:23960450+23960451 I0716 13:01:19.905113 13657 exec.cpp:378] Executor asked to shutdown 14/07/16 13:01:20 ERROR Executor: Exception in task ID 2 java.lang.NoClassDefFoundError: $line11/$read$ at $line12.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(console:19) at $line12.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(console:19) at
[jira] [Commented] (SPARK-2666) when task is FetchFailed cancel running tasks of failedStage
[ https://issues.apache.org/jira/browse/SPARK-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073116#comment-14073116 ] Apache Spark commented on SPARK-2666: - User 'lianhuiwang' has created a pull request for this issue: https://github.com/apache/spark/pull/1572 when task is FetchFailed cancel running tasks of failedStage Key: SPARK-2666 URL: https://issues.apache.org/jira/browse/SPARK-2666 Project: Spark Issue Type: Bug Reporter: Lianhui Wang in DAGScheduler's handleTaskCompletion,when reason of failed task is FetchFailed, cancel running tasks of failedStage before add failedStage to failedStages queue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2456) Scheduler refactoring
[ https://issues.apache.org/jira/browse/SPARK-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073122#comment-14073122 ] Nan Zhu commented on SPARK-2456: maybe it's also related: https://github.com/apache/spark/pull/637 Scheduler refactoring - Key: SPARK-2456 URL: https://issues.apache.org/jira/browse/SPARK-2456 Project: Spark Issue Type: Improvement Reporter: Reynold Xin Assignee: Reynold Xin This is an umbrella ticket to track scheduler refactoring. We want to clearly define semantics and responsibilities of each component, and define explicit public interfaces for them so it is easier to understand and to contribute (also less buggy). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2667) getCallSiteInfo doesn't take into account that graphx is part of spark.
[ https://issues.apache.org/jira/browse/SPARK-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Budau updated SPARK-2667: Description: getCallSiteInfo from org.apache.spark.util.Utils uses a regex pattern to match when a function is part of spark or not. At the moment this does not include GraphX getCallSiteInfo doesn't take into account that graphx is part of spark. --- Key: SPARK-2667 URL: https://issues.apache.org/jira/browse/SPARK-2667 Project: Spark Issue Type: Bug Components: GraphX Affects Versions: 1.0.0 Environment: Mac Os X, although its on all versions Reporter: Adrian Budau Priority: Trivial getCallSiteInfo from org.apache.spark.util.Utils uses a regex pattern to match when a function is part of spark or not. At the moment this does not include GraphX -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2667) getCallSiteInfo doesn't take into account that graphx is part of spark.
Adrian Budau created SPARK-2667: --- Summary: getCallSiteInfo doesn't take into account that graphx is part of spark. Key: SPARK-2667 URL: https://issues.apache.org/jira/browse/SPARK-2667 Project: Spark Issue Type: Bug Components: GraphX Affects Versions: 1.0.0 Environment: Mac Os X, although its on all versions Reporter: Adrian Budau Priority: Trivial -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2668) Support log4j log to yarn container log directory
Peng Zhang created SPARK-2668: - Summary: Support log4j log to yarn container log directory Key: SPARK-2668 URL: https://issues.apache.org/jira/browse/SPARK-2668 Project: Spark Issue Type: Improvement Components: YARN Reporter: Peng Zhang Fix For: 1.0.0 Assign value of yarn container log directory to java opts spark.yarn.log.dir, So user defined log4j.properties can reference this value and write log to YARN container directory. Otherwise, user defined file append will log to CWD, and files will not be displayed on YARN UI,and either cannot be aggregated to HDFS log directory after job finished. User defined log4j.properties reference example: {code} log4j.appender.rolling_file.File = ${spark.yarn.log.dir}/spark.log {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2668) Support log4j log to yarn container log directory
[ https://issues.apache.org/jira/browse/SPARK-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073199#comment-14073199 ] Apache Spark commented on SPARK-2668: - User 'renozhang' has created a pull request for this issue: https://github.com/apache/spark/pull/1573 Support log4j log to yarn container log directory - Key: SPARK-2668 URL: https://issues.apache.org/jira/browse/SPARK-2668 Project: Spark Issue Type: Improvement Components: YARN Reporter: Peng Zhang Fix For: 1.0.0 Assign value of yarn container log directory to java opts spark.yarn.log.dir, So user defined log4j.properties can reference this value and write log to YARN container directory. Otherwise, user defined file append will log to CWD, and files will not be displayed on YARN UI,and either cannot be aggregated to HDFS log directory after job finished. User defined log4j.properties reference example: {code} log4j.appender.rolling_file.File = ${spark.yarn.log.dir}/spark.log {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2150) Provide direct link to finished application UI in yarn resource manager UI
[ https://issues.apache.org/jira/browse/SPARK-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-2150: - Assignee: Rahul Singhal Provide direct link to finished application UI in yarn resource manager UI -- Key: SPARK-2150 URL: https://issues.apache.org/jira/browse/SPARK-2150 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 1.0.0 Reporter: Rahul Singhal Assignee: Rahul Singhal Priority: Minor Fix For: 1.1.0 Currently the link that is provide as the tracking URL for a finished application in yarn resource manager UI is of the Spark history server home page. We should provide a direct link to the application UI so that the user does not have to figure out the correspondence between yarn application ID and the link on the Spark history server home page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1112) When spark.akka.frameSize 10, task results bigger than 10MiB block execution
[ https://issues.apache.org/jira/browse/SPARK-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073244#comment-14073244 ] DjvuLee commented on SPARK-1112: Does anyone test in version0.9.2,I found it also failed , while v1.0.1 v1.1.0 is ok. When spark.akka.frameSize 10, task results bigger than 10MiB block execution -- Key: SPARK-1112 URL: https://issues.apache.org/jira/browse/SPARK-1112 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0, 1.0.0 Reporter: Guillaume Pitel Assignee: Xiangrui Meng Priority: Blocker Fix For: 0.9.2 When I set the spark.akka.frameSize to something over 10, the messages sent from the executors to the driver completely block the execution if the message is bigger than 10MiB and smaller than the frameSize (if it's above the frameSize, it's ok) Workaround is to set the spark.akka.frameSize to 10. In this case, since 0.8.1, the blockManager deal with the data to be sent. It seems slower than akka direct message though. The configuration seems to be correctly read (see actorSystemConfig.txt), so I don't see where the 10MiB could come from -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2668) Support log4j log to yarn container log directory
[ https://issues.apache.org/jira/browse/SPARK-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073246#comment-14073246 ] Thomas Graves commented on SPARK-2668: -- Sorry I don't follow what you are saying here. spark on yarn uses the yarn approved logging directories and aggregation works fine. Perhaps YARN is misconfigured? Support log4j log to yarn container log directory - Key: SPARK-2668 URL: https://issues.apache.org/jira/browse/SPARK-2668 Project: Spark Issue Type: Improvement Components: YARN Reporter: Peng Zhang Fix For: 1.0.0 Assign value of yarn container log directory to java opts spark.yarn.log.dir, So user defined log4j.properties can reference this value and write log to YARN container directory. Otherwise, user defined file append will log to CWD, and files will not be displayed on YARN UI,and either cannot be aggregated to HDFS log directory after job finished. User defined log4j.properties reference example: {code} log4j.appender.rolling_file.File = ${spark.yarn.log.dir}/spark.log {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2668) Support log4j log to yarn container log directory
[ https://issues.apache.org/jira/browse/SPARK-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073278#comment-14073278 ] Peng Zhang commented on SPARK-2668: --- [~tgraves] Original log works fine, and log will be written to yarn container log directory and named as stderr. But when I want to define my own log4j configuration, for example using RollingAppender to avoid log file too big, especially for spark Streaming(7 x 24 hours), I should can't specify the base directory for log. So adding spark.yarn.log.dir will help for reference in log4j.properties, like the example in description. Otherwise, log files will be located in container's working directory. Support log4j log to yarn container log directory - Key: SPARK-2668 URL: https://issues.apache.org/jira/browse/SPARK-2668 Project: Spark Issue Type: Improvement Components: YARN Reporter: Peng Zhang Fix For: 1.0.0 Assign value of yarn container log directory to java opts spark.yarn.log.dir, So user defined log4j.properties can reference this value and write log to YARN container directory. Otherwise, user defined file append will log to CWD, and files will not be displayed on YARN UI,and either cannot be aggregated to HDFS log directory after job finished. User defined log4j.properties reference example: {code} log4j.appender.rolling_file.File = ${spark.yarn.log.dir}/spark.log {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-2668) Support log4j log to yarn container log directory
[ https://issues.apache.org/jira/browse/SPARK-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073278#comment-14073278 ] Peng Zhang edited comment on SPARK-2668 at 7/24/14 3:12 PM: [~tgraves] Original log works fine, and log will be written to yarn container log directory and named as stderr. But when I want to define my own log4j configuration, for example using RollingAppender to avoid log file too big, especially for spark Streaming(7 x 24 hours), I should specify the base directory for log. So adding spark.yarn.log.dir will help for reference in log4j.properties, like the example in description. Otherwise, log files will be located in container's working directory. was (Author: peng.zhang): [~tgraves] Original log works fine, and log will be written to yarn container log directory and named as stderr. But when I want to define my own log4j configuration, for example using RollingAppender to avoid log file too big, especially for spark Streaming(7 x 24 hours), I should can't specify the base directory for log. So adding spark.yarn.log.dir will help for reference in log4j.properties, like the example in description. Otherwise, log files will be located in container's working directory. Support log4j log to yarn container log directory - Key: SPARK-2668 URL: https://issues.apache.org/jira/browse/SPARK-2668 Project: Spark Issue Type: Improvement Components: YARN Reporter: Peng Zhang Fix For: 1.0.0 Assign value of yarn container log directory to java opts spark.yarn.log.dir, So user defined log4j.properties can reference this value and write log to YARN container directory. Otherwise, user defined file append will log to CWD, and files will not be displayed on YARN UI,and either cannot be aggregated to HDFS log directory after job finished. User defined log4j.properties reference example: {code} log4j.appender.rolling_file.File = ${spark.yarn.log.dir}/spark.log {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2668) Support log4j log to yarn container log directory
[ https://issues.apache.org/jira/browse/SPARK-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073310#comment-14073310 ] Thomas Graves commented on SPARK-2668: -- Oh, I see you just want a variable to reference from the log4j config. I understand the use case and really YARN should solve this for you. There are jira out there to support long running tasks on yarn, the one for logs is: https://issues.apache.org/jira/browse/YARN-1104 This might be ok for short term workaround for that since its just reading and not allowing user to set it. I need to look at it a bit closer. Support log4j log to yarn container log directory - Key: SPARK-2668 URL: https://issues.apache.org/jira/browse/SPARK-2668 Project: Spark Issue Type: Improvement Components: YARN Reporter: Peng Zhang Fix For: 1.0.0 Assign value of yarn container log directory to java opts spark.yarn.log.dir, So user defined log4j.properties can reference this value and write log to YARN container directory. Otherwise, user defined file append will log to CWD, and files will not be displayed on YARN UI,and either cannot be aggregated to HDFS log directory after job finished. User defined log4j.properties reference example: {code} log4j.appender.rolling_file.File = ${spark.yarn.log.dir}/spark.log {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2575) SVMWithSGD throwing Input Validation failed
[ https://issues.apache.org/jira/browse/SPARK-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073329#comment-14073329 ] Xiangrui Meng commented on SPARK-2575: -- [~dbtsai] sent a PR for multinomial logistic regression: https://github.com/apache/spark/pull/1379 Btw, is your problem solved? SVMWithSGD throwing Input Validation failed Key: SPARK-2575 URL: https://issues.apache.org/jira/browse/SPARK-2575 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.0.1 Reporter: navanee SVMWithSGD throwing Input Validation failed while using Sparse Array as Input. Though SVMWihtSGD accepts LibSVM format. Exception trace : org.apache.spark.SparkException: Input validation failed. at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:145) at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:124) at org.apache.spark.mllib.classification.SVMWithSGD$.train(SVM.scala:154) at org.apache.spark.mllib.classification.SVMWithSGD$.train(SVM.scala:188) at org.apache.spark.mllib.classification.SVMWithSGD.train(SVM.scala) at com.xurmo.ai.hades.classification.algo.Svm.train(Svm.java:143) at com.xurmo.ai.hades.classification.algo.SimpleSVMTest.generateModelFile(SimpleSVMTest.java:172) at com.xurmo.ai.hades.classification.algo.SimpleSVMTest.trainSampleDataTest(SimpleSVMTest.java:65) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2669) Hadoop configuration is not localised when submitting job in yarn-cluster mode
Maxim Ivanov created SPARK-2669: --- Summary: Hadoop configuration is not localised when submitting job in yarn-cluster mode Key: SPARK-2669 URL: https://issues.apache.org/jira/browse/SPARK-2669 Project: Spark Issue Type: Bug Reporter: Maxim Ivanov I'd like to propose a fix for a problem when Hadoop configuration is not localized when job is submitted in yarn-cluster mode. Here is a description from github pull request https://github.com/apache/spark/pull/1574 This patch fixes a problem when Spark driver is run in the container managed by YARN ResourceManager it inherits configuration from a NodeManager process, which can be different from the Hadoop configuration present on the client (submitting machine). Problem is most vivid when fs.defaultFS property differs between these two. Hadoop MR solves it by serializing client's Hadoop configuration into job.xml in application staging directory and then making Application Master to use it. That guarantees that regardless of execution nodes configurations all application containers use same config identical to one on the client side. This patch uses similar approach. YARN ClientBase serializes configuration and adds it to ClientDistributedCacheManager under job.xml link name. ClientDistributedCacheManager is then utilizes Hadoop localizer to deliver it to whatever container is started by this application, including the one running Spark driver. YARN ClientBase also adds SPARK_LOCAL_HADOOPCONF env variable to AM container request which is then used by SparkHadoopUtil.newConfiguration to trigger new behavior when machine-wide hadoop configuration is merged with application specific job.xml (exactly how it is done in Hadoop MR). SparkContext is then follows same approach, adding SPARK_LOCAL_HADOOPCONF env to all spawned containers to make them use client-side Hadopo configuration. Also all the references to new Configuration() which might be executed on YARN cluster side are changed to use SparkHadoopUtil.get.conf Please note that it fixes only core Spark, the part which I am comfortable to test and verify the result. I didn't descend into steaming/shark directories, so things might need to be changed there too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2669) Hadoop configuration is not localised when submitting job in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073338#comment-14073338 ] Apache Spark commented on SPARK-2669: - User 'redbaron' has created a pull request for this issue: https://github.com/apache/spark/pull/1574 Hadoop configuration is not localised when submitting job in yarn-cluster mode -- Key: SPARK-2669 URL: https://issues.apache.org/jira/browse/SPARK-2669 Project: Spark Issue Type: Bug Reporter: Maxim Ivanov I'd like to propose a fix for a problem when Hadoop configuration is not localized when job is submitted in yarn-cluster mode. Here is a description from github pull request https://github.com/apache/spark/pull/1574 This patch fixes a problem when Spark driver is run in the container managed by YARN ResourceManager it inherits configuration from a NodeManager process, which can be different from the Hadoop configuration present on the client (submitting machine). Problem is most vivid when fs.defaultFS property differs between these two. Hadoop MR solves it by serializing client's Hadoop configuration into job.xml in application staging directory and then making Application Master to use it. That guarantees that regardless of execution nodes configurations all application containers use same config identical to one on the client side. This patch uses similar approach. YARN ClientBase serializes configuration and adds it to ClientDistributedCacheManager under job.xml link name. ClientDistributedCacheManager is then utilizes Hadoop localizer to deliver it to whatever container is started by this application, including the one running Spark driver. YARN ClientBase also adds SPARK_LOCAL_HADOOPCONF env variable to AM container request which is then used by SparkHadoopUtil.newConfiguration to trigger new behavior when machine-wide hadoop configuration is merged with application specific job.xml (exactly how it is done in Hadoop MR). SparkContext is then follows same approach, adding SPARK_LOCAL_HADOOPCONF env to all spawned containers to make them use client-side Hadopo configuration. Also all the references to new Configuration() which might be executed on YARN cluster side are changed to use SparkHadoopUtil.get.conf Please note that it fixes only core Spark, the part which I am comfortable to test and verify the result. I didn't descend into steaming/shark directories, so things might need to be changed there too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1264) Documentation for setting heap sizes across all configurations
[ https://issues.apache.org/jira/browse/SPARK-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Davidson updated SPARK-1264: -- Assignee: (was: Aaron Davidson) Documentation for setting heap sizes across all configurations -- Key: SPARK-1264 URL: https://issues.apache.org/jira/browse/SPARK-1264 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 1.0.0 Reporter: Andrew Ash As a user, there are lots of places to configure heap sizes, and it takes a bit of trial and error to figure out how to configure what you want. We need some more clear documentation on how set these for the cross product of Spark components (master, worker, executor, driver, shell) and deployment modes (Standalone, YARN, Mesos, EC2?). I'm happy to do the authoring if someone can help pull together the relevant details. Here's the best I've got so far: {noformat} # Standalone cluster Master - SPARK_DAEMON_MEMORY - default: 512mb Worker - SPARK_DAEMON_MEMORY vs SPARK_WORKER_MEMORY? - default: ? See WorkerArguments.inferDefaultMemory() Executor - spark.executor.memory Driver - SPARK_DRIVER_MEMORY - default: 512mb Shell - A pre-built driver so SPARK_DRIVER_MEMORY - default: 512mb # EC2 cluster Master - ? Worker - ? Executor - ? Driver - ? Shell - ? # Mesos cluster Master - SPARK_DAEMON_MEMORY Worker - SPARK_DAEMON_MEMORY Executor - SPARK_EXECUTOR_MEMORY Driver - SPARK_DRIVER_MEMORY Shell - A pre-built driver so SPARK_DRIVER_MEMORY # YARN cluster Master - SPARK_MASTER_MEMORY ? Worker - SPARK_WORKER_MEMORY ? Executor - SPARK_EXECUTOR_MEMORY Driver - SPARK_DRIVER_MEMORY Shell - A pre-built driver so SPARK_DRIVER_MEMORY {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2583) ConnectionManager cannot distinguish whether error occurred or not
[ https://issues.apache.org/jira/browse/SPARK-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073419#comment-14073419 ] Kousuke Saruta commented on SPARK-2583: --- I have added some test cases to my PR for this issue. ConnectionManager cannot distinguish whether error occurred or not -- Key: SPARK-2583 URL: https://issues.apache.org/jira/browse/SPARK-2583 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Kousuke Saruta Assignee: Kousuke Saruta Priority: Critical ConnectionManager#handleMessage sent empty messages to another peer if some error occurred or not in onReceiveCalback. {code} val ackMessage = if (onReceiveCallback != null) { logDebug(Calling back) onReceiveCallback(bufferMessage, connectionManagerId) } else { logDebug(Not calling back as callback is null) None } if (ackMessage.isDefined) { if (!ackMessage.get.isInstanceOf[BufferMessage]) { logDebug(Response to + bufferMessage + is not a buffer message, it is of type + ackMessage.get.getClass) } else if (!ackMessage.get.asInstanceOf[BufferMessage].hasAckId) { logDebug(Response to + bufferMessage + does not have ack id set) ackMessage.get.asInstanceOf[BufferMessage].ackId = bufferMessage.id } } // We have no way to tell peer whether error occurred or not sendMessage(connectionManagerId, ackMessage.getOrElse { Message.createBufferMessage(bufferMessage.id) }) } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2479) Comparing floating-point numbers using relative error in UnitTests
[ https://issues.apache.org/jira/browse/SPARK-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073430#comment-14073430 ] Apache Spark commented on SPARK-2479: - User 'mengxr' has created a pull request for this issue: https://github.com/apache/spark/pull/1576 Comparing floating-point numbers using relative error in UnitTests -- Key: SPARK-2479 URL: https://issues.apache.org/jira/browse/SPARK-2479 Project: Spark Issue Type: Improvement Reporter: DB Tsai Assignee: DB Tsai Floating point math is not exact, and most floating-point numbers end up being slightly imprecise due to rounding errors. Simple values like 0.1 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations or the precision of intermediates can change the result. That means that comparing two floats to see if they are equal is usually not what we want. As long as this imprecision stays small, it can usually be ignored. See the following famous article for detail. http://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/ For example: float a = 0.15 + 0.15 float b = 0.1 + 0.2 if(a == b) // can be false! if(a = b) // can also be false! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2538) External aggregation in Python
[ https://issues.apache.org/jira/browse/SPARK-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2538: - Priority: Critical (was: Major) External aggregation in Python -- Key: SPARK-2538 URL: https://issues.apache.org/jira/browse/SPARK-2538 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 1.0.0, 1.0.1 Reporter: Davies Liu Assignee: Davies Liu Priority: Critical Labels: pyspark Fix For: 1.0.0, 1.0.1 Original Estimate: 72h Remaining Estimate: 72h For huge reduce tasks, user will got out of memory exception when all the data can not fit in memory. It should put some of the data into disks and then merge them together, just like what we do in Scala. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2670) FetchFailedException should be thrown when local fetch has failed
Kousuke Saruta created SPARK-2670: - Summary: FetchFailedException should be thrown when local fetch has failed Key: SPARK-2670 URL: https://issues.apache.org/jira/browse/SPARK-2670 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Kousuke Saruta In BasicBlockFetchIterator, when remote fetch has failed, then FetchResult which size is -1 is set to results. {code} case None = { logError(Could not get block(s) from + cmId) for ((blockId, size) - req.blocks) { results.put(new FetchResult(blockId, -1, null)) } {code} The size -1 means fetch fail and BlockStoreShuffleFetcher#unpackBlock throws FetchFailedException so that we can retry. But, when local fetch has failed, the failed FetchResult is not set. So, we cannot retry for the FetchResult. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2619) Configurable file-mode for spark/bin folder in the .deb package.
[ https://issues.apache.org/jira/browse/SPARK-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2619: --- Assignee: Christian Tzolov Configurable file-mode for spark/bin folder in the .deb package. - Key: SPARK-2619 URL: https://issues.apache.org/jira/browse/SPARK-2619 Project: Spark Issue Type: Improvement Components: Build, Deploy Reporter: Christian Tzolov Assignee: Christian Tzolov Currently the /bin folder in the .dep package is hardcoded to 744. So only the Root user (deb.user defaults to root) can run Spark jobs. If we make /bin filemode a configural maven property then we easily generate a package with less restrictive execution rights. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2619) Configurable file-mode for spark/bin folder in the .deb package.
[ https://issues.apache.org/jira/browse/SPARK-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2619. Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 1531 [https://github.com/apache/spark/pull/1531] Configurable file-mode for spark/bin folder in the .deb package. - Key: SPARK-2619 URL: https://issues.apache.org/jira/browse/SPARK-2619 Project: Spark Issue Type: Improvement Components: Build, Deploy Reporter: Christian Tzolov Assignee: Christian Tzolov Fix For: 1.1.0 Currently the /bin folder in the .dep package is hardcoded to 744. So only the Root user (deb.user defaults to root) can run Spark jobs. If we make /bin filemode a configural maven property then we easily generate a package with less restrictive execution rights. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2671) BlockObjectWriter should create parent directory when the directory doesn't exist
Kousuke Saruta created SPARK-2671: - Summary: BlockObjectWriter should create parent directory when the directory doesn't exist Key: SPARK-2671 URL: https://issues.apache.org/jira/browse/SPARK-2671 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Kousuke Saruta Priority: Minor BlockObjectWriter#open expects parent directory is present. {code} override def open(): BlockObjectWriter = { fos = new FileOutputStream(file, true) ts = new TimeTrackingOutputStream(fos) channel = fos.getChannel() lastValidPosition = initialPosition bs = compressStream(new BufferedOutputStream(ts, bufferSize)) objOut = serializer.newInstance().serializeStream(bs) initialized = true this } {code} Normally, the parent directory is created by DiskBlockManager#createLocalDirs but, just in case, BlockObjectWriter#open should check the existence of the directory and create the directory if the directory does not exist. I think, recoverable error should be recovered. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2603) Remove unnecessary toMap and toList in converting Java collections to Scala collections JsonRDD.scala
[ https://issues.apache.org/jira/browse/SPARK-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2603: Fix Version/s: 1.0.2 1.1.0 Remove unnecessary toMap and toList in converting Java collections to Scala collections JsonRDD.scala - Key: SPARK-2603 URL: https://issues.apache.org/jira/browse/SPARK-2603 Project: Spark Issue Type: Improvement Components: SQL Reporter: Yin Huai Assignee: Yin Huai Priority: Minor Fix For: 1.1.0, 1.0.2 In JsonRDD.scalafy, we are using toMap/toList to convert a Java Map/List to a Scala one. These two operations are pretty expensive because they read elements from a Java Map/List and then load to a Scala Map/List. We can use Scala wrappers to wrap those Java collections instead of using toMap/toList. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2603) Remove unnecessary toMap and toList in converting Java collections to Scala collections JsonRDD.scala
[ https://issues.apache.org/jira/browse/SPARK-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-2603. - Resolution: Fixed Remove unnecessary toMap and toList in converting Java collections to Scala collections JsonRDD.scala - Key: SPARK-2603 URL: https://issues.apache.org/jira/browse/SPARK-2603 Project: Spark Issue Type: Improvement Components: SQL Reporter: Yin Huai Assignee: Yin Huai Priority: Minor In JsonRDD.scalafy, we are using toMap/toList to convert a Java Map/List to a Scala one. These two operations are pretty expensive because they read elements from a Java Map/List and then load to a Scala Map/List. We can use Scala wrappers to wrap those Java collections instead of using toMap/toList. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2672) support compressed file in wholeFile()
Davies Liu created SPARK-2672: - Summary: support compressed file in wholeFile() Key: SPARK-2672 URL: https://issues.apache.org/jira/browse/SPARK-2672 Project: Spark Issue Type: Improvement Components: PySpark, Spark Core Affects Versions: 1.0.0, 1.0.1 Reporter: Davies Liu Fix For: 1.1.0 The wholeFile() can not read compressed files, it should be, just like textFile(). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2673) Improve Spark so that we can attach Debugger to Executors easily
Kousuke Saruta created SPARK-2673: - Summary: Improve Spark so that we can attach Debugger to Executors easily Key: SPARK-2673 URL: https://issues.apache.org/jira/browse/SPARK-2673 Project: Spark Issue Type: Improvement Reporter: Kousuke Saruta In current implementation, we are difficult to attach debugger to each Executor in the cluster. There are reasons as follows. 1) It's difficult for Executors running on the same machine to open debug port because we can only pass same JVM options to all executors. 2) Even if we can open unique debug port to each Executors running on the same machine, it's a bother to check debug port of each executor. To solve those problem, I think following 2 improvement is needed. 1) Enable executor to open unique debug port on a machine. 2) Expand WebUI to be able to show debug ports opening in each executor. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2464) Twitter Receiver does not stop correctly when streamingContext.stop is called
[ https://issues.apache.org/jira/browse/SPARK-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073518#comment-14073518 ] Apache Spark commented on SPARK-2464: - User 'tdas' has created a pull request for this issue: https://github.com/apache/spark/pull/1577 Twitter Receiver does not stop correctly when streamingContext.stop is called - Key: SPARK-2464 URL: https://issues.apache.org/jira/browse/SPARK-2464 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.0.0, 1.0.1 Reporter: Tathagata Das Assignee: Tathagata Das Priority: Critical -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1154) Spark fills up disk with app-* folders
[ https://issues.apache.org/jira/browse/SPARK-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073521#comment-14073521 ] Andrew Ash commented on SPARK-1154: --- For the record, this is Evan's PR that closed this ticket: https://github.com/apache/spark/pull/288 Spark fills up disk with app-* folders -- Key: SPARK-1154 URL: https://issues.apache.org/jira/browse/SPARK-1154 Project: Spark Issue Type: Improvement Components: Deploy Reporter: Evan Chan Assignee: Mingyu Kim Priority: Critical Labels: starter Fix For: 1.0.0 Current version of Spark fills up the disk with many app-* folders: $ ls /var/lib/spark app-20140210022347-0597 app-20140212173327-0627 app-20140218154110-0657 app-20140225232537-0017 app-20140225233548-0047 app-20140210022407-0598 app-20140212173347-0628 app-20140218154130-0658 app-20140225232551-0018 app-20140225233556-0048 app-20140210022427-0599 app-20140212173754-0629 app-20140218164232-0659 app-20140225232611-0019 app-20140225233603-0049 app-20140210022447-0600 app-20140212182235-0630 app-20140218165133-0660 app-20140225232802-0020 app-20140225233610-0050 app-20140210022508-0601 app-20140212182256-0631 app-20140218165148-0661 app-20140225232822-0021 app-20140225233617-0051 app-20140210022528-0602 app-2014021314-0632 app-20140218165225-0662 app-20140225232940-0022 app-20140225233624-0052 app-20140211024356-0603 app-20140213002026-0633 app-20140218165249-0663 app-20140225233002-0023 app-20140225233631-0053 app-20140211024417-0604 app-20140213154948-0634 app-20140218172030-0664 app-20140225233056-0024 app-20140225233725-0054 app-20140211024437-0605 app-20140213171810-0635 app-20140218193853-0665 app-20140225233108-0025 app-20140225233731-0055 app-20140211024457-0606 app-20140213193637-0636 app-20140218194442-0666 app-20140225233124-0026 app-20140225233733-0056 app-20140211024517-0607 app-20140214011513-0637 app-20140218194746-0667 app-20140225233133-0027 app-20140225233734-0057 app-20140211024538-0608 app-20140214012151-0638 app-20140218194822-0668 app-20140225233147-0028 app-20140225233749-0058 app-20140211193443-0609 app-20140214013134-0639 app-20140218212317-0669 app-20140225233208-0029 app-20140225233759-0059 app-20140211195210-0610 app-20140214013332-0640 app-20140225180142- app-20140225233215-0030 app-20140225233809-0060 app-20140211213935-0611 app-20140214013642-0641 app-20140225180411-0001 app-20140225233224-0031 app-20140225233828-0061 app-20140211214227-0612 app-20140214014246-0642 app-20140225180431-0002 app-20140225233232-0032 app-20140225234719-0062 app-20140211215317-0613 app-20140214014607-0643 app-20140225180452-0003 app-20140225233239-0033 app-20140226032845-0063 app-20140211224601-0614 app-20140214184943-0644 app-20140225180512-0004 app-20140225233320-0034 app-20140226033004-0064 app-20140212022206-0615 app-20140214185118-0645 app-20140225180533-0005 app-20140225233328-0035 app-20140226033119-0065 app-2014021206-0616 app-20140214185851-0646 app-20140225180553-0006 app-20140225233354-0036 app-2014022604-0066 app-20140212022246-0617 app-20140214222856-0647 app-20140225181115-0007 app-20140225233402-0037 app-20140226033354-0067 app-20140212043704-0618 app-20140214231312-0648 app-20140225181244-0008 app-20140225233409-0038 app-20140226033538-0068 app-20140212043724-0619 app-20140214231434-0649 app-20140225182051-0009 app-20140225233416-0039 app-20140226033826-0069 app-20140212043745-0620 app-20140214231542-0650 app-20140225183009-0010 app-20140225233426-0040 app-20140226034002-0070 app-20140212044016-0621 app-20140214231616-0651 app-20140225184133-0011 app-20140225233432-0041 app-20140226034053-0071 app-20140212044203-0622 app-20140214233016-0652 app-20140225184318-0012 app-20140225233439-0042 app-20140226034234-0072 app-20140212044224-0623 app-20140214233037-0653 app-20140225184709-0013 app-20140225233447-0043 app-20140226034426-0073 app-20140212045034-0624 app-20140218153242-0654 app-20140225184844-0014 app-20140225233526-0044 app-20140226034447-0074 app-20140212045119-0625 app-20140218153341-0655 app-20140225190051-0015 app-20140225233534-0045 app-20140212173310-0626 app-20140218153442-0656 app-20140225232516-0016 app-20140225233540-0046 This problem is particularly bad if you have a whole bunch of fast jobs. Also what makes the problem worse is that any jars for jobs is downloaded into the app-* folder, so that fills up the disk particularly fast. I would like to propose two things: 1) Spark should have a cleanup thread (or actor) which periodically removes old app-* folders; This
[jira] [Commented] (SPARK-2670) FetchFailedException should be thrown when local fetch has failed
[ https://issues.apache.org/jira/browse/SPARK-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073539#comment-14073539 ] Apache Spark commented on SPARK-2670: - User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/1578 FetchFailedException should be thrown when local fetch has failed - Key: SPARK-2670 URL: https://issues.apache.org/jira/browse/SPARK-2670 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Kousuke Saruta In BasicBlockFetchIterator, when remote fetch has failed, then FetchResult which size is -1 is set to results. {code} case None = { logError(Could not get block(s) from + cmId) for ((blockId, size) - req.blocks) { results.put(new FetchResult(blockId, -1, null)) } {code} The size -1 means fetch fail and BlockStoreShuffleFetcher#unpackBlock throws FetchFailedException so that we can retry. But, when local fetch has failed, the failed FetchResult is not set. So, we cannot retry for the FetchResult. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2674) Add date and time types to inferSchema
Hossein Falaki created SPARK-2674: - Summary: Add date and time types to inferSchema Key: SPARK-2674 URL: https://issues.apache.org/jira/browse/SPARK-2674 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 1.0.0 Reporter: Hossein Falaki When I try inferSchema in PySpark on an RDD of dictionary that contains a datatime.datetime object, I get the following exception: {code} Object of type java.util.GregorianCalendar[time=?,areFieldsSet=false,areAllFieldsSet=false,lenient=true,zone=sun.util.calendar.ZoneInfo[id=Etc/UTC,offset=0,dstSavings=0,useDaylight=false,transitions=0,lastRule=null],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=?,YEAR=2014,MONTH=3,WEEK_OF_YEAR=?,WEEK_OF_MONTH=?,DAY_OF_MONTH=22,DAY_OF_YEAR=?,DAY_OF_WEEK=?,DAY_OF_WEEK_IN_MONTH=?,AM_PM=0,HOUR=0,HOUR_OF_DAY=0,MINUTE=0,SECOND=0,MILLISECOND=4,ZONE_OFFSET=?,DST_OFFSET=?] cannot be used {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (SPARK-2676) CLONE - LiveListenerBus should set higher capacity for its event queue
[ https://issues.apache.org/jira/browse/SPARK-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zongheng Yang closed SPARK-2676. Resolution: Duplicate CLONE - LiveListenerBus should set higher capacity for its event queue --- Key: SPARK-2676 URL: https://issues.apache.org/jira/browse/SPARK-2676 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0, 1.0.1 Reporter: Zongheng Yang Assignee: Zongheng Yang -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2675) LiveListenerBus should set higher capacity for its event queue
[ https://issues.apache.org/jira/browse/SPARK-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073546#comment-14073546 ] Apache Spark commented on SPARK-2675: - User 'concretevitamin' has created a pull request for this issue: https://github.com/apache/spark/pull/1579 LiveListenerBus should set higher capacity for its event queue --- Key: SPARK-2675 URL: https://issues.apache.org/jira/browse/SPARK-2675 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0, 1.0.1 Reporter: Zongheng Yang Assignee: Zongheng Yang -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2671) BlockObjectWriter should create parent directory when the directory doesn't exist
[ https://issues.apache.org/jira/browse/SPARK-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073548#comment-14073548 ] Apache Spark commented on SPARK-2671: - User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/1580 BlockObjectWriter should create parent directory when the directory doesn't exist - Key: SPARK-2671 URL: https://issues.apache.org/jira/browse/SPARK-2671 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Kousuke Saruta Priority: Minor BlockObjectWriter#open expects parent directory is present. {code} override def open(): BlockObjectWriter = { fos = new FileOutputStream(file, true) ts = new TimeTrackingOutputStream(fos) channel = fos.getChannel() lastValidPosition = initialPosition bs = compressStream(new BufferedOutputStream(ts, bufferSize)) objOut = serializer.newInstance().serializeStream(bs) initialized = true this } {code} Normally, the parent directory is created by DiskBlockManager#createLocalDirs but, just in case, BlockObjectWriter#open should check the existence of the directory and create the directory if the directory does not exist. I think, recoverable error should be recovered. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2037) yarn client mode doesn't support spark.yarn.max.executor.failures
[ https://issues.apache.org/jira/browse/SPARK-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-2037. -- Resolution: Fixed Fix Version/s: 1.1.0 yarn client mode doesn't support spark.yarn.max.executor.failures - Key: SPARK-2037 URL: https://issues.apache.org/jira/browse/SPARK-2037 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.0.0 Reporter: Thomas Graves Assignee: Guoqiang Li Fix For: 1.1.0 yarn client mode doesn't support the config spark.yarn.max.executor.failures. We should investigate if we need it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2674) Add date and time types to inferSchema
[ https://issues.apache.org/jira/browse/SPARK-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2674: Assignee: Davies Liu (was: Michael Armbrust) Add date and time types to inferSchema -- Key: SPARK-2674 URL: https://issues.apache.org/jira/browse/SPARK-2674 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 1.0.0 Reporter: Hossein Falaki Assignee: Davies Liu When I try inferSchema in PySpark on an RDD of dictionary that contains a datatime.datetime object, I get the following exception: {code} Object of type java.util.GregorianCalendar[time=?,areFieldsSet=false,areAllFieldsSet=false,lenient=true,zone=sun.util.calendar.ZoneInfo[id=Etc/UTC,offset=0,dstSavings=0,useDaylight=false,transitions=0,lastRule=null],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=?,YEAR=2014,MONTH=3,WEEK_OF_YEAR=?,WEEK_OF_MONTH=?,DAY_OF_MONTH=22,DAY_OF_YEAR=?,DAY_OF_WEEK=?,DAY_OF_WEEK_IN_MONTH=?,AM_PM=0,HOUR=0,HOUR_OF_DAY=0,MINUTE=0,SECOND=0,MILLISECOND=4,ZONE_OFFSET=?,DST_OFFSET=?] cannot be used {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (SPARK-2674) Add date and time types to inferSchema
[ https://issues.apache.org/jira/browse/SPARK-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust reassigned SPARK-2674: --- Assignee: Michael Armbrust Add date and time types to inferSchema -- Key: SPARK-2674 URL: https://issues.apache.org/jira/browse/SPARK-2674 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 1.0.0 Reporter: Hossein Falaki Assignee: Michael Armbrust When I try inferSchema in PySpark on an RDD of dictionary that contains a datatime.datetime object, I get the following exception: {code} Object of type java.util.GregorianCalendar[time=?,areFieldsSet=false,areAllFieldsSet=false,lenient=true,zone=sun.util.calendar.ZoneInfo[id=Etc/UTC,offset=0,dstSavings=0,useDaylight=false,transitions=0,lastRule=null],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=?,YEAR=2014,MONTH=3,WEEK_OF_YEAR=?,WEEK_OF_MONTH=?,DAY_OF_MONTH=22,DAY_OF_YEAR=?,DAY_OF_WEEK=?,DAY_OF_WEEK_IN_MONTH=?,AM_PM=0,HOUR=0,HOUR_OF_DAY=0,MINUTE=0,SECOND=0,MILLISECOND=4,ZONE_OFFSET=?,DST_OFFSET=?] cannot be used {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2674) Add date and time types to inferSchema
[ https://issues.apache.org/jira/browse/SPARK-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2674: Target Version/s: 1.1.0 Add date and time types to inferSchema -- Key: SPARK-2674 URL: https://issues.apache.org/jira/browse/SPARK-2674 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 1.0.0 Reporter: Hossein Falaki Assignee: Davies Liu When I try inferSchema in PySpark on an RDD of dictionary that contains a datatime.datetime object, I get the following exception: {code} Object of type java.util.GregorianCalendar[time=?,areFieldsSet=false,areAllFieldsSet=false,lenient=true,zone=sun.util.calendar.ZoneInfo[id=Etc/UTC,offset=0,dstSavings=0,useDaylight=false,transitions=0,lastRule=null],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=?,YEAR=2014,MONTH=3,WEEK_OF_YEAR=?,WEEK_OF_MONTH=?,DAY_OF_MONTH=22,DAY_OF_YEAR=?,DAY_OF_WEEK=?,DAY_OF_WEEK_IN_MONTH=?,AM_PM=0,HOUR=0,HOUR_OF_DAY=0,MINUTE=0,SECOND=0,MILLISECOND=4,ZONE_OFFSET=?,DST_OFFSET=?] cannot be used {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2387) Remove the stage barrier for better resource utilization
[ https://issues.apache.org/jira/browse/SPARK-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073597#comment-14073597 ] Kay Ousterhout commented on SPARK-2387: --- Have you done experiments to understand how much this improves performance? With Hadoop MapReduce, I've seen this behavior significantly worsen performance for a few reasons. Ultimately, the problem is that the reduce stage (the one the depends on the shuffle map stage) can't finish until all of the map tasks finish. So, if there is a long map straggler, the reduce tasks can't finish anyway -- and now many more slots are hogged by the early reducers, preventing other jobs from making progress. Even worse, if reduce tasks are launched before all map tasks have been launched, the early reducers keep map tasks from being launched, but can end up stopped waiting for input from mappers that haven't completed yet. (Although I didn't look closely at PR1328 so I'm not sure if the latter issue was explicitly prevented in your pull request.) As a result of the above issues, I've heard that many places (I think Facebook, for example) disable this behavior in Hadoop. So, we should make sure this will not hurt performance (and will significantly help!) before adding a lot of complexity to Spark in order to implement it. Remove the stage barrier for better resource utilization Key: SPARK-2387 URL: https://issues.apache.org/jira/browse/SPARK-2387 Project: Spark Issue Type: New Feature Components: Spark Core Reporter: Rui Li DAGScheduler divides a Spark job into multiple stages according to RDD dependencies. Whenever there’s a shuffle dependency, DAGScheduler creates a shuffle map stage on the map side, and another stage depending on that stage. Currently, the downstream stage cannot start until all its depended stages have finished. This barrier between stages leads to idle slots when waiting for the last few upstream tasks to finish and thus wasting cluster resources. Therefore we propose to remove the barrier and pre-start the reduce stage once there're free slots. This can achieve better resource utilization and improve the overall job performance, especially when there're lots of executors granted to the application. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-2387) Remove the stage barrier for better resource utilization
[ https://issues.apache.org/jira/browse/SPARK-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073597#comment-14073597 ] Kay Ousterhout edited comment on SPARK-2387 at 7/24/14 8:23 PM: Have you done experiments to understand how much this improves performance? With Hadoop MapReduce, I've seen this behavior significantly worsen performance for a few reasons. Ultimately, the problem is that the reduce stage (the one the depends on the shuffle map stage) can't finish until all of the map tasks finish. So, if there is a long map straggler, the reduce tasks can't finish anyway -- and now many more slots are hogged by the early reducers, preventing other jobs from making progress. Even worse, if reduce tasks are launched before all map tasks have been launched, the early reducers keep map tasks from being launched, but can end up stopped waiting for input from mappers that haven't completed yet. (Although it looks like your pull request is done in a way that tries to avoid the latter problem.) As a result of the above issues, I've heard that many places (I think Facebook, for example) disable this behavior in Hadoop. So, we should make sure this will not hurt performance (and will significantly help!) before adding a lot of complexity to Spark in order to implement it. was (Author: kayousterhout): Have you done experiments to understand how much this improves performance? With Hadoop MapReduce, I've seen this behavior significantly worsen performance for a few reasons. Ultimately, the problem is that the reduce stage (the one the depends on the shuffle map stage) can't finish until all of the map tasks finish. So, if there is a long map straggler, the reduce tasks can't finish anyway -- and now many more slots are hogged by the early reducers, preventing other jobs from making progress. Even worse, if reduce tasks are launched before all map tasks have been launched, the early reducers keep map tasks from being launched, but can end up stopped waiting for input from mappers that haven't completed yet. (Although I didn't look closely at PR1328 so I'm not sure if the latter issue was explicitly prevented in your pull request.) As a result of the above issues, I've heard that many places (I think Facebook, for example) disable this behavior in Hadoop. So, we should make sure this will not hurt performance (and will significantly help!) before adding a lot of complexity to Spark in order to implement it. Remove the stage barrier for better resource utilization Key: SPARK-2387 URL: https://issues.apache.org/jira/browse/SPARK-2387 Project: Spark Issue Type: New Feature Components: Spark Core Reporter: Rui Li DAGScheduler divides a Spark job into multiple stages according to RDD dependencies. Whenever there’s a shuffle dependency, DAGScheduler creates a shuffle map stage on the map side, and another stage depending on that stage. Currently, the downstream stage cannot start until all its depended stages have finished. This barrier between stages leads to idle slots when waiting for the last few upstream tasks to finish and thus wasting cluster resources. Therefore we propose to remove the barrier and pre-start the reduce stage once there're free slots. This can achieve better resource utilization and improve the overall job performance, especially when there're lots of executors granted to the application. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2250) show stage RDDs in UI
[ https://issues.apache.org/jira/browse/SPARK-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2250: --- Assignee: Neville Li show stage RDDs in UI - Key: SPARK-2250 URL: https://issues.apache.org/jira/browse/SPARK-2250 Project: Spark Issue Type: New Feature Components: Web UI Reporter: Neville Li Assignee: Neville Li Priority: Minor Fix For: 1.1.0 RDDs of each stage can be accessed from StageInfo#rddInfos. It'd be nice to show them in the UI. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2250) show stage RDDs in UI
[ https://issues.apache.org/jira/browse/SPARK-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2250. Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 1188 [https://github.com/apache/spark/pull/1188] show stage RDDs in UI - Key: SPARK-2250 URL: https://issues.apache.org/jira/browse/SPARK-2250 Project: Spark Issue Type: New Feature Components: Web UI Reporter: Neville Li Priority: Minor Fix For: 1.1.0 RDDs of each stage can be accessed from StageInfo#rddInfos. It'd be nice to show them in the UI. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2677) BasicBlockFetchIterator#next can be wait forever
Kousuke Saruta created SPARK-2677: - Summary: BasicBlockFetchIterator#next can be wait forever Key: SPARK-2677 URL: https://issues.apache.org/jira/browse/SPARK-2677 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Kousuke Saruta Priority: Critical In BasicBlockFetchIterator#next, it waits fetch result on result.take. {code} override def next(): (BlockId, Option[Iterator[Any]]) = { resultsGotten += 1 val startFetchWait = System.currentTimeMillis() val result = results.take() val stopFetchWait = System.currentTimeMillis() _fetchWaitTime += (stopFetchWait - startFetchWait) if (! result.failed) bytesInFlight -= result.size while (!fetchRequests.isEmpty (bytesInFlight == 0 || bytesInFlight + fetchRequests.front.size = maxBytesInFlight)) { sendRequest(fetchRequests.dequeue()) } (result.blockId, if (result.failed) None else Some(result.deserialize())) } {code} But, results is implemented as LinkedBlockingQueue so if remote executor hang up, fetching Executor waits forever. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2677) BasicBlockFetchIterator#next can wait forever
[ https://issues.apache.org/jira/browse/SPARK-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-2677: -- Summary: BasicBlockFetchIterator#next can wait forever (was: BasicBlockFetchIterator#next can be wait forever) BasicBlockFetchIterator#next can wait forever - Key: SPARK-2677 URL: https://issues.apache.org/jira/browse/SPARK-2677 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Kousuke Saruta Priority: Critical In BasicBlockFetchIterator#next, it waits fetch result on result.take. {code} override def next(): (BlockId, Option[Iterator[Any]]) = { resultsGotten += 1 val startFetchWait = System.currentTimeMillis() val result = results.take() val stopFetchWait = System.currentTimeMillis() _fetchWaitTime += (stopFetchWait - startFetchWait) if (! result.failed) bytesInFlight -= result.size while (!fetchRequests.isEmpty (bytesInFlight == 0 || bytesInFlight + fetchRequests.front.size = maxBytesInFlight)) { sendRequest(fetchRequests.dequeue()) } (result.blockId, if (result.failed) None else Some(result.deserialize())) } {code} But, results is implemented as LinkedBlockingQueue so if remote executor hang up, fetching Executor waits forever. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing
[ https://issues.apache.org/jira/browse/SPARK-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073784#comment-14073784 ] koert kuipers commented on SPARK-1855: -- i think this makes sense. we have iterative queries that should be very quick. in case of machine failure i am ok if query fails, we will simply repeat. so i do not care about checkpoint to disk in this situation. but i do care about checkpoint to memory to cut my dependencies, which means they get garbage collected and cached rdds get cleaned up. Provide memory-and-local-disk RDD checkpointing --- Key: SPARK-1855 URL: https://issues.apache.org/jira/browse/SPARK-1855 Project: Spark Issue Type: New Feature Components: MLlib, Spark Core Affects Versions: 1.0.0 Reporter: Xiangrui Meng Checkpointing is used to cut long lineage while maintaining fault tolerance. The current implementation is HDFS-based. Using the BlockRDD we can create in-memory-and-local-disk (with replication) checkpoints that are not as reliable as HDFS-based solution but faster. It can help applications that require many iterations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2678) `Spark-submit` overrides user application options
[ https://issues.apache.org/jira/browse/SPARK-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-2678: -- Priority: Major (was: Minor) `Spark-submit` overrides user application options - Key: SPARK-2678 URL: https://issues.apache.org/jira/browse/SPARK-2678 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 1.0.1, 1.0.2 Reporter: Cheng Lian Here is an example: {code} ./bin/spark-submit --class Foo some.jar --help {code} SInce {{--help}} appears behind the primary resource (i.e. {{some.jar}}), it should be recognized as a user application option. But it's actually overriden by {{spark-submit}} and will show {{spark-submit}} help message. When directly invoking {{spark-submit}}, the constraints here are: # Options before primary resource should be recognized as {{spark-submit}} options # Options after primary resource should be recognized as user application options The tricky part is how to handle scripts like {{spark-shell}} that delegate {{spark-submit}}. These scripts allow users specify both {{spark-submit}} options like {{--master}} and user defined application options together. For example, say we'd like to write a new script {{start-thriftserver.sh}} to start the Hive Thrift server, basically we may do this: {code} $SPARK_HOME/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 spark-internal $@ {code} Then user may call this script like: {code} ./sbin/start-thriftserver.sh --master spark://some-host:7077 --hiveconf key=value {code} Notice that all options are captured by {{$@}}. If we put it before {{spark-internal}}, they are all recognized as {{spark-submit}} options, thus {{--hiveconf}} won't be passed to {{HiveThriftServer2}}; if we put it after {{spark-internal}}, they *should* all be recognized as options of {{HiveThriftServer2}}, but because of this bug, {{--master}} is still recognized as {{spark-submit}} option and leads to the right behavior. Although currently all scripts using {{spark-submit}} work correctly, we still should fix this bug, because it causes option name collision between {{spark-submit}} and user application, and every time we add a new option to {{spark-submit}}, some existing user applications may break. However, solving this bug may cause some incompatible changes. The suggested solution here is using {{--}} as separator of {{spark-submit}} options and user application options. For the Hive Thrift server example above, user should call it in this way: {code} ./sbin/start-thriftserver.sh --master spark://some-host:7077 -- --hiveconf key=value {code} And {{SparkSubmitArguments}} should be responsible for splitting two sets of options and pass them correctly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2679) Ser/De for Double to enable calling Java API from python in MLlib
Doris Xin created SPARK-2679: Summary: Ser/De for Double to enable calling Java API from python in MLlib Key: SPARK-2679 URL: https://issues.apache.org/jira/browse/SPARK-2679 Project: Spark Issue Type: Sub-task Reporter: Doris Xin In order to enable Java/Scala APIs to be reused in the Python implementation of RandomRDD and Correlations, we need a set of ser/de for the type Double in _common.py and PythonMLLibAPI. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2679) Ser/De for Double to enable calling Java API from python in MLlib
[ https://issues.apache.org/jira/browse/SPARK-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073833#comment-14073833 ] Apache Spark commented on SPARK-2679: - User 'dorx' has created a pull request for this issue: https://github.com/apache/spark/pull/1581 Ser/De for Double to enable calling Java API from python in MLlib - Key: SPARK-2679 URL: https://issues.apache.org/jira/browse/SPARK-2679 Project: Spark Issue Type: Sub-task Components: MLlib Reporter: Doris Xin In order to enable Java/Scala APIs to be reused in the Python implementation of RandomRDD and Correlations, we need a set of ser/de for the type Double in _common.py and PythonMLLibAPI. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2298) Show stage attempt in UI
[ https://issues.apache.org/jira/browse/SPARK-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2298: --- Priority: Critical (was: Major) Show stage attempt in UI Key: SPARK-2298 URL: https://issues.apache.org/jira/browse/SPARK-2298 Project: Spark Issue Type: Improvement Components: Web UI Reporter: Reynold Xin Assignee: Masayoshi TSUZUKI Priority: Critical Attachments: Screen Shot 2014-06-25 at 4.54.46 PM.png We should add a column to the web ui to show stage attempt id. Then tasks should be grouped by (stageId, stageAttempt) tuple. When a stage is resubmitted (e.g. due to fetch failures), we should get a different entry in the web ui and tasks for the resubmission go there. See the attached screenshot for the confusing status quo. We currently show the same stage entry twice, and then tasks appear in both. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2515) Hypothesis testing
[ https://issues.apache.org/jira/browse/SPARK-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073879#comment-14073879 ] Doris Xin commented on SPARK-2515: -- Here's the proposed API for chi-squared tests (lives in org.apache.spark.mllib.stat.Statistics): {code} def chiSquare(X: RDD[Vector], method: String = “pearson”): ChiSquareTestResult def chiSquare(x: RDD[Double], y: RDD[Double], method: String = “pearson”): ChiSquareTestResult {code} where ChiSquareTestResult : TestResult looks like: {code} pValue: Double df: Array[Int] //normally a single but need to be more for anova statistic: Double ChiSquareSummary : Summary {code} So a couple points of discussion: 1. Of the many variants of the chi-squared test, what methods in addition to pearson do we want to support (hopefully based on popular demand)? http://en.wikipedia.org/wiki/Chi-squared_test 2. What special fields should ChiSquareSummary have? Hypothesis testing -- Key: SPARK-2515 URL: https://issues.apache.org/jira/browse/SPARK-2515 Project: Spark Issue Type: Sub-task Components: MLlib Reporter: Xiangrui Meng Assignee: Doris Xin Support common statistical tests in Spark MLlib. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2464) Twitter Receiver does not stop correctly when streamingContext.stop is called
[ https://issues.apache.org/jira/browse/SPARK-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-2464. -- Resolution: Fixed Twitter Receiver does not stop correctly when streamingContext.stop is called - Key: SPARK-2464 URL: https://issues.apache.org/jira/browse/SPARK-2464 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.0.0, 1.0.1 Reporter: Tathagata Das Assignee: Tathagata Das Priority: Critical -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2014) Make PySpark store RDDs in MEMORY_ONLY_SER with compression by default
[ https://issues.apache.org/jira/browse/SPARK-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2014. -- Resolution: Fixed Fix Version/s: 1.1.0 Make PySpark store RDDs in MEMORY_ONLY_SER with compression by default -- Key: SPARK-2014 URL: https://issues.apache.org/jira/browse/SPARK-2014 Project: Spark Issue Type: Improvement Components: PySpark Reporter: Matei Zaharia Assignee: Prashant Sharma Fix For: 1.1.0 Since the data is serialized on the Python side, there's not much point in keeping it as byte arrays in Java, or even in skipping compression. We should make cache() in PySpark use MEMORY_ONLY_SER and turn on spark.rdd.compress for it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1044) Default spark logs location in EC2 AMI leads to out-of-disk space pretty soon
[ https://issues.apache.org/jira/browse/SPARK-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073930#comment-14073930 ] Andrew Ash commented on SPARK-1044: --- Filling up the work dir could be alleviated by fixing https://issues.apache.org/jira/browse/SPARK-1860 so we could enable worker dir cleanup automatically. If we had automatic worker dir cleanup, would you still want to move the work directory to somewhere else? Default spark logs location in EC2 AMI leads to out-of-disk space pretty soon - Key: SPARK-1044 URL: https://issues.apache.org/jira/browse/SPARK-1044 Project: Spark Issue Type: Improvement Components: EC2 Reporter: Tathagata Das Priority: Minor The default log location is SPARK_HOME/work/ and this leads to disk space running out pretty quickly. The spark-ec2 scripts should configure the cluster to automatically set the logging directory to /mnt/spark-work/ or something like that on the mounted disks. The SPARK_HOME/work may also be symlinked to that directory to maintain the existing setup. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-786) Clean up old work directories in standalone worker
[ https://issues.apache.org/jira/browse/SPARK-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073932#comment-14073932 ] Andrew Ash commented on SPARK-786: -- Agreed. With SPARK-1860 we could re-enable that the features from that PR by default and be good here (it was disabled after it had negative effects with long-running transactions). I think this ticket can be closed as a dupe of that one Clean up old work directories in standalone worker -- Key: SPARK-786 URL: https://issues.apache.org/jira/browse/SPARK-786 Project: Spark Issue Type: New Feature Components: Deploy Affects Versions: 0.7.2 Reporter: Matei Zaharia We should add a setting to clean old work directories after X days. Otherwise, the directory gets filled forever with shuffle files and logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1044) Default spark logs location in EC2 AMI leads to out-of-disk space pretty soon
[ https://issues.apache.org/jira/browse/SPARK-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073938#comment-14073938 ] Allan Douglas R. de Oliveira commented on SPARK-1044: - I think it is still a good idea even if the automatic cleanup is implemented. One large job or many small jobs can fill many gigabytes before the cleanup can kick in. Default spark logs location in EC2 AMI leads to out-of-disk space pretty soon - Key: SPARK-1044 URL: https://issues.apache.org/jira/browse/SPARK-1044 Project: Spark Issue Type: Improvement Components: EC2 Reporter: Tathagata Das Priority: Minor The default log location is SPARK_HOME/work/ and this leads to disk space running out pretty quickly. The spark-ec2 scripts should configure the cluster to automatically set the logging directory to /mnt/spark-work/ or something like that on the mounted disks. The SPARK_HOME/work may also be symlinked to that directory to maintain the existing setup. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1030) unneeded file required when running pyspark program using yarn-client
[ https://issues.apache.org/jira/browse/SPARK-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-1030. --- Resolution: Fixed Fix Version/s: 1.0.0 Closing this now, since it was addressed as part of Spark 1.0's PySpark on YARN patches (including SPARK-1004). unneeded file required when running pyspark program using yarn-client - Key: SPARK-1030 URL: https://issues.apache.org/jira/browse/SPARK-1030 Project: Spark Issue Type: Bug Components: Deploy, PySpark, YARN Affects Versions: 0.8.1 Reporter: Diana Carroll Assignee: Josh Rosen Fix For: 1.0.0 I can successfully run a pyspark program using the yarn-client master using the following command: {code} SPARK_JAR=$SPARK_HOME/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop2.2.0.jar \ SPARK_YARN_APP_JAR=~/testdata.txt pyspark \ test1.py {code} However, the SPARK_YARN_APP_JAR doesn't make any sense; it's a Python program, and therefore there's no JAR. If I don't set the value, or if I set the value to a non-existent files, Spark gives me an error message. {code} py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : org.apache.spark.SparkException: env SPARK_YARN_APP_JAR is not set at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:46) {code} or {code} py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.io.FileNotFoundException: File file:dummy.txt does not exist at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520) {code} My program is very simple: {code} from pyspark import SparkContext def main(): sc = SparkContext(yarn-client, Simple App) logData = sc.textFile(hdfs://localhost/user/training/weblogs/2013-09-15.log) numjpgs = logData.filter(lambda s: '.jpg' in s).count() print Number of JPG requests: + str(numjpgs) {code} Although it reads the SPARK_YARN_APP_JAR file, it doesn't use the file at all; I can point it at anything, as long as it's a valid, accessible file, and it works the same. Although there's an obvious workaround for this bug, it's high priority from my perspective because I'm working on a course to teach people how to do this, and it's really hard to explain why this variable is needed! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2680) Lower spark.shuffle.memoryFraction to 0.2 by default
Matei Zaharia created SPARK-2680: Summary: Lower spark.shuffle.memoryFraction to 0.2 by default Key: SPARK-2680 URL: https://issues.apache.org/jira/browse/SPARK-2680 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Matei Zaharia Priority: Minor Fix For: 1.1.0 Seems like it's good to be more conservative on this, in particular to try to fit all the data in the young gen. People can always increase it for performance. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2529) Clean the closure in foreach and foreachPartition
[ https://issues.apache.org/jira/browse/SPARK-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-2529: - Target Version/s: 1.1.0, 1.0.3 (was: 1.1.0, 1.0.2) Clean the closure in foreach and foreachPartition - Key: SPARK-2529 URL: https://issues.apache.org/jira/browse/SPARK-2529 Project: Spark Issue Type: Bug Reporter: Reynold Xin Somehow we didn't clean the closure for foreach and foreachPartition. Should do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2531) Make BroadcastNestedLoopJoin take into account a BuildSide
[ https://issues.apache.org/jira/browse/SPARK-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-2531: - Target Version/s: 1.1.0, 1.0.3 (was: 1.1.0, 1.0.2) Make BroadcastNestedLoopJoin take into account a BuildSide -- Key: SPARK-2531 URL: https://issues.apache.org/jira/browse/SPARK-2531 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.0.1 Reporter: Zongheng Yang Assignee: Zongheng Yang Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2548) JavaRecoverableWordCount is missing
[ https://issues.apache.org/jira/browse/SPARK-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-2548: - Target Version/s: 1.1.0, 0.9.3, 1.0.3 (was: 1.1.0, 1.0.2, 0.9.3) JavaRecoverableWordCount is missing --- Key: SPARK-2548 URL: https://issues.apache.org/jira/browse/SPARK-2548 Project: Spark Issue Type: Bug Components: Documentation, Streaming Affects Versions: 0.9.2, 1.0.1 Reporter: Xiangrui Meng Priority: Minor JavaRecoverableWordCount was mentioned in the doc but not in the codebase. We need to rewrite the example because the code was lost during the migration from spark/spark-incubating to apache/spark. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2506) In yarn-cluster mode, ApplicationMaster does not clean up correctly at the end of the job if users call sc.stop manually
[ https://issues.apache.org/jira/browse/SPARK-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-2506: - Target Version/s: 1.0.3 (was: 1.0.2) In yarn-cluster mode, ApplicationMaster does not clean up correctly at the end of the job if users call sc.stop manually Key: SPARK-2506 URL: https://issues.apache.org/jira/browse/SPARK-2506 Project: Spark Issue Type: Bug Components: Block Manager, Spark Core, YARN Affects Versions: 1.0.1 Reporter: uncleGen Priority: Minor when i call sc.stop manually, some strange ERRORs will appear: 1. in driver log: INFO [Thread-116] YarnAllocationHandler: Completed container container_1400565786114_79510_01_41 (state: COMPLETE, exit status: 0) WARN [Thread-4] BlockManagerMaster: Error sending message to BlockManagerMaster in 3 attempts akka.pattern.AskTimeoutException: Recipient[Actor[akka://spark/user/BlockManagerMaster#1994513092]] had already been terminated. at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134) at org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:236) at org.apache.spark.storage.BlockManagerMaster.tell(BlockManagerMaster.scala:216) at org.apache.spark.storage.BlockManagerMaster.stop(BlockManagerMaster.scala:208) at org.apache.spark.SparkEnv.stop(SparkEnv.scala:86) at org.apache.spark.SparkContext.stop(SparkContext.scala:993) at TestWeibo$.main(TestWeibo.scala:46) at TestWeibo.main(TestWeibo.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$1.run(ApplicationMaster.scala:192) INFO [Thread-116] ApplicationMaster: Allocating 1 containers to make up for (potentially) lost containers INFO [Thread-116] YarnAllocationHandler: Will Allocate 1 executor containers, each with 9600 memory 2: in executor log: WARN [Connection manager future execution context-13] BlockManagerMaster: Error sending message to BlockManagerMaster in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:237) at org.apache.spark.storage.BlockManagerMaster.sendHeartBeat(BlockManagerMaster.scala:51) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$heartBeat(BlockManager.scala:113) at org.apache.spark.storage.BlockManager$$anonfun$initialize$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(BlockManager.scala:158) at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:790) at org.apache.spark.storage.BlockManager$$anonfun$initialize$1.apply$mcV$sp(BlockManager.scala:158) at akka.actor.Scheduler$$anon$9.run(Scheduler.scala:80) at akka.actor.LightArrayRevolverScheduler$$anon$3$$anon$2.run(Scheduler.scala:241) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) WARN [Connection manager future execution context-13] BlockManagerMaster: Error sending message to BlockManagerMaster in 2 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:237) at org.apache.spark.storage.BlockManagerMaster.sendHeartBeat(BlockManagerMaster.scala:51) at
[jira] [Updated] (SPARK-1667) Jobs never finish successfully once bucket file missing occurred
[ https://issues.apache.org/jira/browse/SPARK-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-1667: - Target Version/s: 1.1.0, 1.0.3 (was: 1.1.0, 1.0.2) Jobs never finish successfully once bucket file missing occurred Key: SPARK-1667 URL: https://issues.apache.org/jira/browse/SPARK-1667 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 1.0.0 Reporter: Kousuke Saruta If jobs execute shuffle, bucket files are created in a temporary directory (named like spark-local-*). When the bucket files are missing cased by disk failure or any reasons, jobs cannot execute shuffle which has same shuffle id for the bucket files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2558) Mention --queue argument in YARN documentation
[ https://issues.apache.org/jira/browse/SPARK-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-2558: - Target Version/s: 1.1.0, 1.0.3 (was: 1.1.0, 1.0.2) Mention --queue argument in YARN documentation --- Key: SPARK-2558 URL: https://issues.apache.org/jira/browse/SPARK-2558 Project: Spark Issue Type: Documentation Components: YARN Reporter: Matei Zaharia Priority: Trivial Labels: Starter The docs about it went away when we updated the page to spark-submit. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2576) slave node throws NoClassDefFoundError $line11.$read$ when executing a Spark QL query on HDFS CSV file
[ https://issues.apache.org/jira/browse/SPARK-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-2576: - Target Version/s: 1.1.0, 1.0.3 (was: 1.1.0, 1.0.2) slave node throws NoClassDefFoundError $line11.$read$ when executing a Spark QL query on HDFS CSV file -- Key: SPARK-2576 URL: https://issues.apache.org/jira/browse/SPARK-2576 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 1.0.1 Environment: One Mesos 0.19 master without zookeeper and 4 mesos slaves. JDK 1.7.51 and Scala 2.10.4 on all nodes. HDFS from CDH5.0.3 Spark version: I tried both with the pre-built CDH5 spark package available from http://spark.apache.org/downloads.html and by packaging spark with sbt 0.13.2, JDK 1.7.51 and scala 2.10.4 as explained here http://mesosphere.io/learn/run-spark-on-mesos/ All nodes are running Debian 3.2.51-1 x86_64 GNU/Linux and have Reporter: Svend Vanderveken Assignee: Yin Huai Priority: Blocker Execution of SQL query against HDFS systematically throws a class not found exception on slave nodes when executing . (this was originally reported on the user list: http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-1-spark-sql-error-java-lang-NoClassDefFoundError-Could-not-initialize-class-line11-read-tc10135.html) Sample code (ran from spark-shell): {code} val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.createSchemaRDD case class Car(timestamp: Long, objectid: String, isGreen: Boolean) // I get the same error when pointing to the folder hdfs://vm28:8020/test/cardata val data = sc.textFile(hdfs://vm28:8020/test/cardata/part-0) val cars = data.map(_.split(,)).map ( ar = Car(ar(0).toLong, ar(1), ar(2).toBoolean)) cars.registerAsTable(mcars) val allgreens = sqlContext.sql(SELECT objectid from mcars where isGreen = true) allgreens.collect.take(10).foreach(println) {code} Stack trace on the slave nodes: {code} I0716 13:01:16.215158 13631 exec.cpp:131] Version: 0.19.0 I0716 13:01:16.219285 13656 exec.cpp:205] Executor registered on slave 20140714-142853-485682442-5050-25487-2 14/07/16 13:01:16 INFO MesosExecutorBackend: Registered with Mesos as executor ID 20140714-142853-485682442-5050-25487-2 14/07/16 13:01:16 INFO SecurityManager: Changing view acls to: mesos,mnubohadoop 14/07/16 13:01:16 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mesos, mnubohadoop) 14/07/16 13:01:17 INFO Slf4jLogger: Slf4jLogger started 14/07/16 13:01:17 INFO Remoting: Starting remoting 14/07/16 13:01:17 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@vm23:38230] 14/07/16 13:01:17 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@vm23:38230] 14/07/16 13:01:17 INFO SparkEnv: Connecting to MapOutputTracker: akka.tcp://spark@vm28:41632/user/MapOutputTracker 14/07/16 13:01:17 INFO SparkEnv: Connecting to BlockManagerMaster: akka.tcp://spark@vm28:41632/user/BlockManagerMaster 14/07/16 13:01:17 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20140716130117-8ea0 14/07/16 13:01:17 INFO MemoryStore: MemoryStore started with capacity 294.9 MB. 14/07/16 13:01:17 INFO ConnectionManager: Bound socket to port 44501 with id = ConnectionManagerId(vm23-hulk-priv.mtl.mnubo.com,44501) 14/07/16 13:01:17 INFO BlockManagerMaster: Trying to register BlockManager 14/07/16 13:01:17 INFO BlockManagerMaster: Registered BlockManager 14/07/16 13:01:17 INFO HttpFileServer: HTTP File server directory is /tmp/spark-ccf6f36c-2541-4a25-8fe4-bb4ba00ee633 14/07/16 13:01:17 INFO HttpServer: Starting HTTP Server 14/07/16 13:01:18 INFO Executor: Using REPL class URI: http://vm28:33973 14/07/16 13:01:18 INFO Executor: Running task ID 2 14/07/16 13:01:18 INFO HttpBroadcast: Started reading broadcast variable 0 14/07/16 13:01:18 INFO MemoryStore: ensureFreeSpace(125590) called with curMem=0, maxMem=309225062 14/07/16 13:01:18 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 122.6 KB, free 294.8 MB) 14/07/16 13:01:18 INFO HttpBroadcast: Reading broadcast variable 0 took 0.294602722 s 14/07/16 13:01:19 INFO HadoopRDD: Input split: hdfs://vm28:8020/test/cardata/part-0:23960450+23960451 I0716 13:01:19.905113 13657 exec.cpp:378] Executor asked to shutdown 14/07/16 13:01:20 ERROR Executor: Exception in task ID 2 java.lang.NoClassDefFoundError: $line11/$read$ at $line12.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(console:19) at $line12.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(console:19) at
[jira] [Updated] (SPARK-2425) Standalone Master is too aggressive in removing Applications
[ https://issues.apache.org/jira/browse/SPARK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-2425: - Target Version/s: 1.0.3 (was: 1.0.2) Standalone Master is too aggressive in removing Applications Key: SPARK-2425 URL: https://issues.apache.org/jira/browse/SPARK-2425 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Mark Hamstra Assignee: Mark Hamstra When standalone Executors trying to run a particular Application fail a cummulative ApplicationState.MAX_NUM_RETRY times, Master will remove the Application. This will be true even if there actually are a number of Executors that are successfully running the Application. This makes long-running standalone-mode Applications in particular unnecessarily vulnerable to limited failures in the cluster -- e.g., a single bad node on which Executors repeatedly fail for any reason can prevent an Application from starting or can result in a running Application being removed even though it could continue to run successfully (just not making use of all potential Workers and Executors.) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2541) Standalone mode can't access secure HDFS anymore
[ https://issues.apache.org/jira/browse/SPARK-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-2541: - Target Version/s: 1.1.0, 1.0.3 (was: 1.1.0, 1.0.2) Standalone mode can't access secure HDFS anymore Key: SPARK-2541 URL: https://issues.apache.org/jira/browse/SPARK-2541 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 1.0.0, 1.0.1 Reporter: Thomas Graves In spark 0.9.x you could access secure HDFS from Standalone deploy, that doesn't work in 1.X anymore. It looks like the issues is in SparkHadoopUtil.runAsSparkUser. Previously it wouldn't do the doAs if the currentUser == user. Not sure how it affects when the daemons run as a super user but SPARK_USER is set to someone else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2529) Clean the closure in foreach and foreachPartition
[ https://issues.apache.org/jira/browse/SPARK-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073964#comment-14073964 ] Apache Spark commented on SPARK-2529: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/1583 Clean the closure in foreach and foreachPartition - Key: SPARK-2529 URL: https://issues.apache.org/jira/browse/SPARK-2529 Project: Spark Issue Type: Bug Reporter: Reynold Xin Somehow we didn't clean the closure for foreach and foreachPartition. Should do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2668) Support log4j log to yarn container log directory
[ https://issues.apache.org/jira/browse/SPARK-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Zhang updated SPARK-2668: -- Affects Version/s: 1.0.0 Support log4j log to yarn container log directory - Key: SPARK-2668 URL: https://issues.apache.org/jira/browse/SPARK-2668 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 1.0.0 Reporter: Peng Zhang Assign value of yarn container log directory to java opts spark.yarn.log.dir, So user defined log4j.properties can reference this value and write log to YARN container directory. Otherwise, user defined file append will log to CWD, and files will not be displayed on YARN UI,and either cannot be aggregated to HDFS log directory after job finished. User defined log4j.properties reference example: {code} log4j.appender.rolling_file.File = ${spark.yarn.log.dir}/spark.log {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2668) Add variable of yarn log directory for reference from the log4j configuration
[ https://issues.apache.org/jira/browse/SPARK-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Zhang updated SPARK-2668: -- Description: Assign value of yarn container log directory to java opts spark.yarn.log.dir, So user defined log4j.properties can reference this value and write log to YARN container's log directory. Otherwise, user defined file appender will only write to container's CWD, and log files in CWD will not be displayed on YARN UI,and either cannot be aggregated to HDFS log directory after job finished. User defined log4j.properties reference example: {code} log4j.appender.rolling_file.File = ${spark.yarn.log.dir}/spark.log {code} was: Assign value of yarn container log directory to java opts spark.yarn.log.dir, So user defined log4j.properties can reference this value and write log to YARN container directory. Otherwise, user defined file appender will only write to container's CWD, and log files in CWD will not be displayed on YARN UI,and either cannot be aggregated to HDFS log directory after job finished. User defined log4j.properties reference example: {code} log4j.appender.rolling_file.File = ${spark.yarn.log.dir}/spark.log {code} Add variable of yarn log directory for reference from the log4j configuration - Key: SPARK-2668 URL: https://issues.apache.org/jira/browse/SPARK-2668 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 1.0.0 Reporter: Peng Zhang Assign value of yarn container log directory to java opts spark.yarn.log.dir, So user defined log4j.properties can reference this value and write log to YARN container's log directory. Otherwise, user defined file appender will only write to container's CWD, and log files in CWD will not be displayed on YARN UI,and either cannot be aggregated to HDFS log directory after job finished. User defined log4j.properties reference example: {code} log4j.appender.rolling_file.File = ${spark.yarn.log.dir}/spark.log {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2681) With low probability, the Spark inexplicable hang
[ https://issues.apache.org/jira/browse/SPARK-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074031#comment-14074031 ] Patrick Wendell commented on SPARK-2681: Can you do a jstack of the executor when this happens? With low probability, the Spark inexplicable hang - Key: SPARK-2681 URL: https://issues.apache.org/jira/browse/SPARK-2681 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Guoqiang Li Priority: Blocker executor log : {noformat} 14/07/24 22:56:52 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 53628 14/07/24 22:56:52 INFO executor.Executor: Running task ID 53628 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_3 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_18 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_16 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_19 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_20 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_21 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_22 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_3 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_18 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_16 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_19 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_20 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_21 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_22 locally 14/07/24 22:56:52 INFO spark.MapOutputTrackerWorker: Updating epoch to 236 and clearing cache 14/07/24 22:56:52 INFO spark.CacheManager: Partition rdd_51_83 not found, computing it 14/07/24 22:56:52 INFO spark.MapOutputTrackerWorker: Don't have map outputs for shuffle 9, fetching them 14/07/24 22:56:52 INFO spark.MapOutputTrackerWorker: Doing the fetch; tracker actor = Actor[akka.tcp://spark@tuan202:49488/user/MapOutputTracker#-1031481395] 14/07/24 22:56:53 INFO spark.MapOutputTrackerWorker: Got the output locations 14/07/24 22:56:53 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/07/24 22:56:53 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1024 non-empty blocks out of 1024 blocks 14/07/24 22:56:53 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: Started 58 remote fetches in 8 ms 14/07/24 22:56:55 INFO storage.MemoryStore: ensureFreeSpace(28728) called with curMem=920109320, maxMem=4322230272 14/07/24 22:56:55 INFO storage.MemoryStore: Block rdd_51_83 stored as values to memory (estimated size 28.1 KB, free 3.2 GB) 14/07/24 22:56:55 INFO storage.BlockManagerMaster: Updated info of block rdd_51_83 14/07/24 22:56:55 INFO spark.CacheManager: Partition rdd_189_83 not found, computing it 14/07/24 22:56:55 INFO spark.MapOutputTrackerWorker: Don't have map outputs for shuffle 28, fetching them 14/07/24 22:56:55 INFO spark.MapOutputTrackerWorker: Doing the fetch; tracker actor = Actor[akka.tcp://spark@tuan202:49488/user/MapOutputTracker#-1031481395] 14/07/24 22:56:55 INFO spark.MapOutputTrackerWorker: Got the output locations 14/07/24 22:56:55 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/07/24 22:56:55 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1 non-empty blocks out of 1024 blocks 14/07/24 22:56:55 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: Started 1 remote fetches in 0 ms 14/07/24 22:56:55 INFO spark.CacheManager: Partition rdd_50_83 not found, computing it 14/07/24 22:56:55 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/07/24 22:56:55 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1024 non-empty blocks out of 1024 blocks 14/07/24 22:56:55 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: Started 58 remote fetches in 4 ms 14/07/24 22:57:09 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(tuan221,51153) 14/07/24 22:57:09 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(tuan221,51153) 14/07/24 22:57:09 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(tuan221,51153) 14/07/24 23:05:07 INFO network.ConnectionManager: Key not valid ?
[jira] [Commented] (SPARK-2681) Spark can hang when fetching shuffle blocks
[ https://issues.apache.org/jira/browse/SPARK-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074044#comment-14074044 ] Guoqiang Li commented on SPARK-2681: OK, but have some time. Spark can hang when fetching shuffle blocks --- Key: SPARK-2681 URL: https://issues.apache.org/jira/browse/SPARK-2681 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Guoqiang Li Priority: Blocker executor log : {noformat} 14/07/24 22:56:52 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 53628 14/07/24 22:56:52 INFO executor.Executor: Running task ID 53628 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_3 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_18 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_16 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_19 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_20 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_21 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_22 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_3 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_18 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_16 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_19 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_20 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_21 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_22 locally 14/07/24 22:56:52 INFO spark.MapOutputTrackerWorker: Updating epoch to 236 and clearing cache 14/07/24 22:56:52 INFO spark.CacheManager: Partition rdd_51_83 not found, computing it 14/07/24 22:56:52 INFO spark.MapOutputTrackerWorker: Don't have map outputs for shuffle 9, fetching them 14/07/24 22:56:52 INFO spark.MapOutputTrackerWorker: Doing the fetch; tracker actor = Actor[akka.tcp://spark@tuan202:49488/user/MapOutputTracker#-1031481395] 14/07/24 22:56:53 INFO spark.MapOutputTrackerWorker: Got the output locations 14/07/24 22:56:53 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/07/24 22:56:53 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1024 non-empty blocks out of 1024 blocks 14/07/24 22:56:53 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: Started 58 remote fetches in 8 ms 14/07/24 22:56:55 INFO storage.MemoryStore: ensureFreeSpace(28728) called with curMem=920109320, maxMem=4322230272 14/07/24 22:56:55 INFO storage.MemoryStore: Block rdd_51_83 stored as values to memory (estimated size 28.1 KB, free 3.2 GB) 14/07/24 22:56:55 INFO storage.BlockManagerMaster: Updated info of block rdd_51_83 14/07/24 22:56:55 INFO spark.CacheManager: Partition rdd_189_83 not found, computing it 14/07/24 22:56:55 INFO spark.MapOutputTrackerWorker: Don't have map outputs for shuffle 28, fetching them 14/07/24 22:56:55 INFO spark.MapOutputTrackerWorker: Doing the fetch; tracker actor = Actor[akka.tcp://spark@tuan202:49488/user/MapOutputTracker#-1031481395] 14/07/24 22:56:55 INFO spark.MapOutputTrackerWorker: Got the output locations 14/07/24 22:56:55 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/07/24 22:56:55 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1 non-empty blocks out of 1024 blocks 14/07/24 22:56:55 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: Started 1 remote fetches in 0 ms 14/07/24 22:56:55 INFO spark.CacheManager: Partition rdd_50_83 not found, computing it 14/07/24 22:56:55 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/07/24 22:56:55 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1024 non-empty blocks out of 1024 blocks 14/07/24 22:56:55 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: Started 58 remote fetches in 4 ms 14/07/24 22:57:09 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(tuan221,51153) 14/07/24 22:57:09 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(tuan221,51153) 14/07/24 22:57:09 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(tuan221,51153) 14/07/24 23:05:07 INFO network.ConnectionManager: Key not valid ? sun.nio.ch.SelectionKeyImpl@3dcc1da1 14/07/24 23:05:07 INFO
[jira] [Commented] (SPARK-2618) use config spark.scheduler.priority for specifying TaskSet's priority on DAGScheduler
[ https://issues.apache.org/jira/browse/SPARK-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074045#comment-14074045 ] Patrick Wendell commented on SPARK-2618: We shouldn't should expose these types of hooks into the scheduler internals. The TaskSet, for instance, is an implementation detail we don't want to be part of a public API and the priority is an internal concept. The public API of Spark for scheduling policies is the Fair Scheduler. Many different types of policies can be achieved within Fair Scheduling, including having a high priority pool to which tasks are submitted. use config spark.scheduler.priority for specifying TaskSet's priority on DAGScheduler - Key: SPARK-2618 URL: https://issues.apache.org/jira/browse/SPARK-2618 Project: Spark Issue Type: Improvement Reporter: Lianhui Wang we use shark server to do interative query. every sql run with a job. sometimes we want to immediately run a query that later be submitted to shark server. so we need to provide user to define a job's priority and ensure that high priority job can be firstly launched. i have created a pull request: https://github.com/apache/spark/pull/1528 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-2681) Spark can hang when fetching shuffle blocks
[ https://issues.apache.org/jira/browse/SPARK-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074044#comment-14074044 ] Guoqiang Li edited comment on SPARK-2681 at 7/25/14 4:39 AM: - OK, but may take some time. was (Author: gq): OK, but have some time. Spark can hang when fetching shuffle blocks --- Key: SPARK-2681 URL: https://issues.apache.org/jira/browse/SPARK-2681 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Guoqiang Li Priority: Blocker executor log : {noformat} 14/07/24 22:56:52 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 53628 14/07/24 22:56:52 INFO executor.Executor: Running task ID 53628 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_3 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_18 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_16 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_19 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_20 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_21 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_22 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_3 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_18 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_16 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_19 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_20 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_21 locally 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_22 locally 14/07/24 22:56:52 INFO spark.MapOutputTrackerWorker: Updating epoch to 236 and clearing cache 14/07/24 22:56:52 INFO spark.CacheManager: Partition rdd_51_83 not found, computing it 14/07/24 22:56:52 INFO spark.MapOutputTrackerWorker: Don't have map outputs for shuffle 9, fetching them 14/07/24 22:56:52 INFO spark.MapOutputTrackerWorker: Doing the fetch; tracker actor = Actor[akka.tcp://spark@tuan202:49488/user/MapOutputTracker#-1031481395] 14/07/24 22:56:53 INFO spark.MapOutputTrackerWorker: Got the output locations 14/07/24 22:56:53 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/07/24 22:56:53 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1024 non-empty blocks out of 1024 blocks 14/07/24 22:56:53 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: Started 58 remote fetches in 8 ms 14/07/24 22:56:55 INFO storage.MemoryStore: ensureFreeSpace(28728) called with curMem=920109320, maxMem=4322230272 14/07/24 22:56:55 INFO storage.MemoryStore: Block rdd_51_83 stored as values to memory (estimated size 28.1 KB, free 3.2 GB) 14/07/24 22:56:55 INFO storage.BlockManagerMaster: Updated info of block rdd_51_83 14/07/24 22:56:55 INFO spark.CacheManager: Partition rdd_189_83 not found, computing it 14/07/24 22:56:55 INFO spark.MapOutputTrackerWorker: Don't have map outputs for shuffle 28, fetching them 14/07/24 22:56:55 INFO spark.MapOutputTrackerWorker: Doing the fetch; tracker actor = Actor[akka.tcp://spark@tuan202:49488/user/MapOutputTracker#-1031481395] 14/07/24 22:56:55 INFO spark.MapOutputTrackerWorker: Got the output locations 14/07/24 22:56:55 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/07/24 22:56:55 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1 non-empty blocks out of 1024 blocks 14/07/24 22:56:55 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: Started 1 remote fetches in 0 ms 14/07/24 22:56:55 INFO spark.CacheManager: Partition rdd_50_83 not found, computing it 14/07/24 22:56:55 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/07/24 22:56:55 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1024 non-empty blocks out of 1024 blocks 14/07/24 22:56:55 INFO storage.BlockFetcherIterator$BasicBlockFetcherIterator: Started 58 remote fetches in 4 ms 14/07/24 22:57:09 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(tuan221,51153) 14/07/24 22:57:09 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(tuan221,51153) 14/07/24 22:57:09 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(tuan221,51153) 14/07/24 23:05:07 INFO
[jira] [Updated] (SPARK-2670) FetchFailedException should be thrown when local fetch has failed
[ https://issues.apache.org/jira/browse/SPARK-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2670: --- Priority: Critical (was: Major) FetchFailedException should be thrown when local fetch has failed - Key: SPARK-2670 URL: https://issues.apache.org/jira/browse/SPARK-2670 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Kousuke Saruta Priority: Critical In BasicBlockFetchIterator, when remote fetch has failed, then FetchResult which size is -1 is set to results. {code} case None = { logError(Could not get block(s) from + cmId) for ((blockId, size) - req.blocks) { results.put(new FetchResult(blockId, -1, null)) } {code} The size -1 means fetch fail and BlockStoreShuffleFetcher#unpackBlock throws FetchFailedException so that we can retry. But, when local fetch has failed, the failed FetchResult is not set. So, we cannot retry for the FetchResult. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2670) FetchFailedException should be thrown when local fetch has failed
[ https://issues.apache.org/jira/browse/SPARK-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2670: --- Component/s: Spark Core FetchFailedException should be thrown when local fetch has failed - Key: SPARK-2670 URL: https://issues.apache.org/jira/browse/SPARK-2670 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Kousuke Saruta Priority: Critical In BasicBlockFetchIterator, when remote fetch has failed, then FetchResult which size is -1 is set to results. {code} case None = { logError(Could not get block(s) from + cmId) for ((blockId, size) - req.blocks) { results.put(new FetchResult(blockId, -1, null)) } {code} The size -1 means fetch fail and BlockStoreShuffleFetcher#unpackBlock throws FetchFailedException so that we can retry. But, when local fetch has failed, the failed FetchResult is not set. So, we cannot retry for the FetchResult. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2670) FetchFailedException should be thrown when local fetch has failed
[ https://issues.apache.org/jira/browse/SPARK-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2670: --- Target Version/s: 1.1.0 FetchFailedException should be thrown when local fetch has failed - Key: SPARK-2670 URL: https://issues.apache.org/jira/browse/SPARK-2670 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Kousuke Saruta In BasicBlockFetchIterator, when remote fetch has failed, then FetchResult which size is -1 is set to results. {code} case None = { logError(Could not get block(s) from + cmId) for ((blockId, size) - req.blocks) { results.put(new FetchResult(blockId, -1, null)) } {code} The size -1 means fetch fail and BlockStoreShuffleFetcher#unpackBlock throws FetchFailedException so that we can retry. But, when local fetch has failed, the failed FetchResult is not set. So, we cannot retry for the FetchResult. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2682) Javadoc generated from Scala source code is not in javadoc's index
Yin Huai created SPARK-2682: --- Summary: Javadoc generated from Scala source code is not in javadoc's index Key: SPARK-2682 URL: https://issues.apache.org/jira/browse/SPARK-2682 Project: Spark Issue Type: Bug Reporter: Yin Huai Seems genjavadocSettings was deleted from SparkBuild. We need to add it back to let unidoc generate javadoc for our scala code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2682) Javadoc generated from Scala source code is not in javadoc's index
[ https://issues.apache.org/jira/browse/SPARK-2682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-2682: Component/s: Documentation Javadoc generated from Scala source code is not in javadoc's index -- Key: SPARK-2682 URL: https://issues.apache.org/jira/browse/SPARK-2682 Project: Spark Issue Type: Bug Components: Documentation Reporter: Yin Huai Assignee: Yin Huai Seems genjavadocSettings was deleted from SparkBuild. We need to add it back to let unidoc generate javadoc for our scala code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2682) Javadoc generated from Scala source code is not in javadoc's index
[ https://issues.apache.org/jira/browse/SPARK-2682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074109#comment-14074109 ] Apache Spark commented on SPARK-2682: - User 'yhuai' has created a pull request for this issue: https://github.com/apache/spark/pull/1584 Javadoc generated from Scala source code is not in javadoc's index -- Key: SPARK-2682 URL: https://issues.apache.org/jira/browse/SPARK-2682 Project: Spark Issue Type: Bug Components: Documentation Reporter: Yin Huai Assignee: Yin Huai Seems genjavadocSettings was deleted from SparkBuild. We need to add it back to let unidoc generate javadoc for our scala code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2664) Deal with `--conf` options in spark-submit that relate to flags
[ https://issues.apache.org/jira/browse/SPARK-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074108#comment-14074108 ] Patrick Wendell commented on SPARK-2664: Hey Sandy, The reason why we originally allowed spark-defaults.conf to support spark options that have corresponding flags (which is 1 here) is because there was no other way for users to set things like the master in a configuration file. This seems like something we need to support (at least, IIRC it was one reason some packagers wanted a configuration file in the first place). So my proposal here was to just treat --conf as the same way. I'd also be okay to just throw an exception if users try to set one of these as a --conf when it corresponds to a flag, but then it deviates a bit from the behavior of the config file which could be confusing. In terms of adding new configs for flags that don't currently have a corresponding conf. Personally, I'm also open to doing that. It would certainly simplify the fact that we have two different concepts at the moment for which there is not a 1:1 mapping. In my experience users have been most confused about the fact that those flags and spark conf properties are partially-but-not-completely overlapping. They haven't been confused about precedence order as much, since we state it clearly. In the shorter term, given that we can't revert the behavior of spark-defaults.conf, I'd prefer to either use that same behavior for flags (the original proposal) or to throw an exception if any of the reserved properties are set via the --conf flag instead of their dedicated flag (we can always make it accept more liberally later). Otherwise it's super confusing what happens if the user sets --conf spark.master and also --master. Deal with `--conf` options in spark-submit that relate to flags --- Key: SPARK-2664 URL: https://issues.apache.org/jira/browse/SPARK-2664 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Assignee: Sandy Ryza Priority: Blocker If someone sets a spark conf that relates to an existing flag `--master`, we should set it correctly like we do with the defaults file. Otherwise it can have confusing semantics. I noticed this after merging it, otherwise I would have mentioned it in the review. I think it's as simple as modifying loadDefaults to check the user-supplied options also. We might change it to loadUserProperties since it's no longer just the defaults file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2538) External aggregation in Python
[ https://issues.apache.org/jira/browse/SPARK-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2538. -- Resolution: Fixed Fix Version/s: (was: 1.0.1) (was: 1.0.0) 1.1.0 External aggregation in Python -- Key: SPARK-2538 URL: https://issues.apache.org/jira/browse/SPARK-2538 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 1.0.0, 1.0.1 Reporter: Davies Liu Assignee: Davies Liu Priority: Critical Labels: pyspark Fix For: 1.1.0 Original Estimate: 72h Remaining Estimate: 72h For huge reduce tasks, user will got out of memory exception when all the data can not fit in memory. It should put some of the data into disks and then merge them together, just like what we do in Scala. -- This message was sent by Atlassian JIRA (v6.2#6252)