[jira] [Commented] (SPARK-3324) YARN module has nonstandard structure which cause compile error In IntelliJ
[ https://issues.apache.org/jira/browse/SPARK-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116667#comment-14116667 ] Sean Owen commented on SPARK-3324: -- Let me try to sketch what's funky about the structure. We have yarn/alpha, yarn/common, yarn/stable. Understanding the purpose, I would expect each to be a module, and that each has a src/ directory, and that alpha and stable depend on common, and the Spark parent activates either yarn/alpha or yarn/stable depending on profiles. IntelliJ is fine with that. However what we have is that yarn/ is a module. But its source is in yarn/common. But it's a pom-only module. And yarn/alpha and yarn/stable list it as the parent and inherit all of their source directory info and dependencies from yarn/, which is not itself a module of code. So each compiles two source directories defined in different places. This plus profiles confused IntelliJ and required manual intervention. Maybe I overlook a reason this had to be done, but rejiggering this as three simple modules should work. Again I imagine the question is, is it worth it versus removing yarn/alpha at some point in the future? Because it's trivial to fix how IntelliJ reads the POMs once by hand in the IDE. > YARN module has nonstandard structure which cause compile error In IntelliJ > --- > > Key: SPARK-3324 > URL: https://issues.apache.org/jira/browse/SPARK-3324 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.1.0 > Environment: Mac OS: 10.9.4 > IntelliJ IDEA: 13.1.4 > Scala Plugins: 0.41.2 > Maven: 3.0.5 >Reporter: Yi Tian >Priority: Minor > Labels: intellij, maven, yarn > > The YARN module has nonstandard path structure like: > {code} > ${SPARK_HOME} > |--yarn > |--alpha (contains yarn api support for 0.23 and 2.0.x) > |--stable (contains yarn api support for 2.2 and later) > | |--pom.xml (spark-yarn) > |--common (Common codes not depending on specific version of Hadoop) > |--pom.xml (yarn-parent) > {code} > When we use maven to compile yarn module, maven will import 'alpha' or > 'stable' module according to profile setting. > And the submodule like 'stable' use the build propertie defined in > yarn/pom.xml to import common codes to sourcePath. > It will cause IntelliJ can't directly recognize sources in common directory > as sourcePath. > I thought we should change the yarn module to a unified maven jar project, > and add specify different version of yarn api via maven profile setting. > It will resolve the compile error in IntelliJ and make the yarn module more > simple and clear. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3328) --with-tachyon build is broken
Elijah Epifanov created SPARK-3328: -- Summary: --with-tachyon build is broken Key: SPARK-3328 URL: https://issues.apache.org/jira/browse/SPARK-3328 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.1.0 Reporter: Elijah Epifanov cp: tachyon-0.5.0/core/src/main/java/tachyon/web/resources: No such file or directory -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3328) --with-tachyon build is broken
[ https://issues.apache.org/jira/browse/SPARK-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elijah Epifanov updated SPARK-3328: --- Description: cp: tachyon-0.5.0/target/tachyon-0.5.0-jar-with-dependencies.jar: No such file or directory was: cp: tachyon-0.5.0/core/src/main/java/tachyon/web/resources: No such file or directory > --with-tachyon build is broken > -- > > Key: SPARK-3328 > URL: https://issues.apache.org/jira/browse/SPARK-3328 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.1.0 >Reporter: Elijah Epifanov > > cp: tachyon-0.5.0/target/tachyon-0.5.0-jar-with-dependencies.jar: No such > file or directory -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3324) YARN module has nonstandard structure which cause compile error In IntelliJ
[ https://issues.apache.org/jira/browse/SPARK-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116782#comment-14116782 ] Yi Tian commented on SPARK-3324: Thanks [~srowen] explain that. In my beginning idea, I'd like to remove pom.xml from yarn/stable and yarn/alpha, and change the packaging properties in yarn/pom.xml from pom to jar. When building this module, the pom.xml should dynamically add common/src and either yarn/alpha/src or yarn/stable/src to sourcePath. > YARN module has nonstandard structure which cause compile error In IntelliJ > --- > > Key: SPARK-3324 > URL: https://issues.apache.org/jira/browse/SPARK-3324 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.1.0 > Environment: Mac OS: 10.9.4 > IntelliJ IDEA: 13.1.4 > Scala Plugins: 0.41.2 > Maven: 3.0.5 >Reporter: Yi Tian >Priority: Minor > Labels: intellij, maven, yarn > > The YARN module has nonstandard path structure like: > {code} > ${SPARK_HOME} > |--yarn > |--alpha (contains yarn api support for 0.23 and 2.0.x) > |--stable (contains yarn api support for 2.2 and later) > | |--pom.xml (spark-yarn) > |--common (Common codes not depending on specific version of Hadoop) > |--pom.xml (yarn-parent) > {code} > When we use maven to compile yarn module, maven will import 'alpha' or > 'stable' module according to profile setting. > And the submodule like 'stable' use the build propertie defined in > yarn/pom.xml to import common codes to sourcePath. > It will cause IntelliJ can't directly recognize sources in common directory > as sourcePath. > I thought we should change the yarn module to a unified maven jar project, > and add specify different version of yarn api via maven profile setting. > It will resolve the compile error in IntelliJ and make the yarn module more > simple and clear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3324) YARN module has nonstandard structure which cause compile error In IntelliJ
[ https://issues.apache.org/jira/browse/SPARK-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116785#comment-14116785 ] Yi Tian commented on SPARK-3324: BTW, [~pwendell] there is another problem for IntelliJ. The spark-streaming-flume-sink module need avro plugin compiled some avro files, but the outputDirectory of generated scala source is under target path which cause IntelliJ can't recognize them and throw error during compiling the spark project. May I make a PR to fix these problem? > YARN module has nonstandard structure which cause compile error In IntelliJ > --- > > Key: SPARK-3324 > URL: https://issues.apache.org/jira/browse/SPARK-3324 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.1.0 > Environment: Mac OS: 10.9.4 > IntelliJ IDEA: 13.1.4 > Scala Plugins: 0.41.2 > Maven: 3.0.5 >Reporter: Yi Tian >Priority: Minor > Labels: intellij, maven, yarn > > The YARN module has nonstandard path structure like: > {code} > ${SPARK_HOME} > |--yarn > |--alpha (contains yarn api support for 0.23 and 2.0.x) > |--stable (contains yarn api support for 2.2 and later) > | |--pom.xml (spark-yarn) > |--common (Common codes not depending on specific version of Hadoop) > |--pom.xml (yarn-parent) > {code} > When we use maven to compile yarn module, maven will import 'alpha' or > 'stable' module according to profile setting. > And the submodule like 'stable' use the build propertie defined in > yarn/pom.xml to import common codes to sourcePath. > It will cause IntelliJ can't directly recognize sources in common directory > as sourcePath. > I thought we should change the yarn module to a unified maven jar project, > and add specify different version of yarn api via maven profile setting. > It will resolve the compile error in IntelliJ and make the yarn module more > simple and clear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3324) YARN module has nonstandard structure which cause compile error In IntelliJ
[ https://issues.apache.org/jira/browse/SPARK-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116801#comment-14116801 ] Sean Owen commented on SPARK-3324: -- [~tianyi] I seem to remember having a similar problem. I think that is straightforward to fix. It's a separate issue. But FWIW I would like to see that improved. > YARN module has nonstandard structure which cause compile error In IntelliJ > --- > > Key: SPARK-3324 > URL: https://issues.apache.org/jira/browse/SPARK-3324 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.1.0 > Environment: Mac OS: 10.9.4 > IntelliJ IDEA: 13.1.4 > Scala Plugins: 0.41.2 > Maven: 3.0.5 >Reporter: Yi Tian >Priority: Minor > Labels: intellij, maven, yarn > > The YARN module has nonstandard path structure like: > {code} > ${SPARK_HOME} > |--yarn > |--alpha (contains yarn api support for 0.23 and 2.0.x) > |--stable (contains yarn api support for 2.2 and later) > | |--pom.xml (spark-yarn) > |--common (Common codes not depending on specific version of Hadoop) > |--pom.xml (yarn-parent) > {code} > When we use maven to compile yarn module, maven will import 'alpha' or > 'stable' module according to profile setting. > And the submodule like 'stable' use the build propertie defined in > yarn/pom.xml to import common codes to sourcePath. > It will cause IntelliJ can't directly recognize sources in common directory > as sourcePath. > I thought we should change the yarn module to a unified maven jar project, > and add specify different version of yarn api via maven profile setting. > It will resolve the compile error in IntelliJ and make the yarn module more > simple and clear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3329) HiveQuerySuite SET tests depend on map orderings
William Benton created SPARK-3329: - Summary: HiveQuerySuite SET tests depend on map orderings Key: SPARK-3329 URL: https://issues.apache.org/jira/browse/SPARK-3329 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.2, 1.1.0 Reporter: William Benton Priority: Trivial The SET tests in HiveQuerySuite that return multiple values depend on the ordering in which map pairs are returned from Hive and can fail spuriously if this changes due to environment or library changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3329) HiveQuerySuite SET tests depend on map orderings
[ https://issues.apache.org/jira/browse/SPARK-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116821#comment-14116821 ] Apache Spark commented on SPARK-3329: - User 'willb' has created a pull request for this issue: https://github.com/apache/spark/pull/2220 > HiveQuerySuite SET tests depend on map orderings > > > Key: SPARK-3329 > URL: https://issues.apache.org/jira/browse/SPARK-3329 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.0.2, 1.1.0 >Reporter: William Benton >Priority: Trivial > > The SET tests in HiveQuerySuite that return multiple values depend on the > ordering in which map pairs are returned from Hive and can fail spuriously if > this changes due to environment or library changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3330) Successive test runs with different profiles fail SparkSubmitSuite
Sean Owen created SPARK-3330: Summary: Successive test runs with different profiles fail SparkSubmitSuite Key: SPARK-3330 URL: https://issues.apache.org/jira/browse/SPARK-3330 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.0.2 Reporter: Sean Owen Maven-based Jenkins builds have been failing for a while: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/480/HADOOP_PROFILE=hadoop-2.4,label=centos/console One common cause is that on the second and subsequent runs of "mvn clean test", at least two assembly JARs will exist in assembly/target. Because assembly is not a submodule of parent, "mvn clean" is not invoked for assembly. The presence of two assembly jars causes spark-submit to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3331) PEP8 tests fail in release because they check unzipped py4j code
Sean Owen created SPARK-3331: Summary: PEP8 tests fail in release because they check unzipped py4j code Key: SPARK-3331 URL: https://issues.apache.org/jira/browse/SPARK-3331 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.0.2 Reporter: Sean Owen Priority: Minor PEP8 tests run on files under "./python", but in the release packaging, py4j code is present in "./python/build/py4j". Py4J code fails style checks and thus release fails ./dev/run-tests now. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3331) PEP8 tests fail because they check unzipped py4j code
[ https://issues.apache.org/jira/browse/SPARK-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-3331: - Summary: PEP8 tests fail because they check unzipped py4j code (was: PEP8 tests fail in release because they check unzipped py4j code) > PEP8 tests fail because they check unzipped py4j code > - > > Key: SPARK-3331 > URL: https://issues.apache.org/jira/browse/SPARK-3331 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.0.2 >Reporter: Sean Owen >Priority: Minor > > PEP8 tests run on files under "./python", but in the release packaging, py4j > code is present in "./python/build/py4j". Py4J code fails style checks and > thus release fails ./dev/run-tests now. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3331) PEP8 tests fail because they check unzipped py4j code
[ https://issues.apache.org/jira/browse/SPARK-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-3331: - Description: PEP8 tests run on files under "./python", but unzipped py4j code is found at "./python/build/py4j". Py4J code fails style checks and can fail ./dev/run-tests if this code is present locally. (was: PEP8 tests run on files under "./python", but in the release packaging, py4j code is present in "./python/build/py4j". Py4J code fails style checks and thus release fails ./dev/run-tests now.) > PEP8 tests fail because they check unzipped py4j code > - > > Key: SPARK-3331 > URL: https://issues.apache.org/jira/browse/SPARK-3331 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.0.2 >Reporter: Sean Owen >Priority: Minor > > PEP8 tests run on files under "./python", but unzipped py4j code is found at > "./python/build/py4j". Py4J code fails style checks and can fail > ./dev/run-tests if this code is present locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3330) Successive test runs with different profiles fail SparkSubmitSuite
[ https://issues.apache.org/jira/browse/SPARK-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116829#comment-14116829 ] Apache Spark commented on SPARK-3330: - User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/2221 > Successive test runs with different profiles fail SparkSubmitSuite > -- > > Key: SPARK-3330 > URL: https://issues.apache.org/jira/browse/SPARK-3330 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.0.2 >Reporter: Sean Owen > > Maven-based Jenkins builds have been failing for a while: > https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/480/HADOOP_PROFILE=hadoop-2.4,label=centos/console > One common cause is that on the second and subsequent runs of "mvn clean > test", at least two assembly JARs will exist in assembly/target. Because > assembly is not a submodule of parent, "mvn clean" is not invoked for > assembly. The presence of two assembly jars causes spark-submit to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3331) PEP8 tests fail because they check unzipped py4j code
[ https://issues.apache.org/jira/browse/SPARK-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116830#comment-14116830 ] Apache Spark commented on SPARK-3331: - User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/ > PEP8 tests fail because they check unzipped py4j code > - > > Key: SPARK-3331 > URL: https://issues.apache.org/jira/browse/SPARK-3331 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.0.2 >Reporter: Sean Owen >Priority: Minor > > PEP8 tests run on files under "./python", but unzipped py4j code is found at > "./python/build/py4j". Py4J code fails style checks and can fail > ./dev/run-tests if this code is present locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2870) Thorough schema inference directly on RDDs of Python dictionaries
[ https://issues.apache.org/jira/browse/SPARK-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116868#comment-14116868 ] Nicholas Chammas commented on SPARK-2870: - [~marmbrus], [~davies], [~yhuai] - We discussed this feature on the user list some weeks back. Just pinging you here to make sure this feature request is on your radar. > Thorough schema inference directly on RDDs of Python dictionaries > - > > Key: SPARK-2870 > URL: https://issues.apache.org/jira/browse/SPARK-2870 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Reporter: Nicholas Chammas > > h4. Background > I love the {{SQLContext.jsonRDD()}} and {{SQLContext.jsonFile()}} methods. > They process JSON text directly and infer a schema that covers the entire > source data set. > This is very important with semi-structured data like JSON since individual > elements in the data set are free to have different structures. Matching > fields across elements may even have different value types. > For example: > {code} > {"a": 5} > {"a": "cow"} > {code} > To get a queryable schema that covers the whole data set, you need to infer a > schema by looking at the whole data set. The aforementioned > {{SQLContext.json...()}} methods do this very well. > h4. Feature Request > What we need is for {{SQlContext.inferSchema()}} to do this, too. > Alternatively, we need a new {{SQLContext}} method that works on RDDs of > Python dictionaries and does something functionally equivalent to this: > {code} > SQLContext.jsonRDD(RDD[dict].map(lambda x: json.dumps(x))) > {code} > As of 1.0.2, > [{{inferSchema()}}|http://spark.apache.org/docs/latest/api/python/pyspark.sql.SQLContext-class.html#inferSchema] > just looks at the first element in the data set. This won't help much when > the structure of the elements in the target RDD is variable. > h4. Example Use Case > * You have some JSON text data that you want to analyze using Spark SQL. > * You would use one of the {{SQLContext.json...()}} methods, but you need to > do some filtering on the data first to remove bad elements--basically, some > minimal schema validation. > * You deserialize the JSON objects to Python {{dict}} s and filter out the > bad ones. You now have an RDD of dictionaries. > * From this RDD, you want a SchemaRDD that captures the schema for the whole > data set. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3332) Tags shouldn't be the main strategy for machine membership on clusters
Allan Douglas R. de Oliveira created SPARK-3332: --- Summary: Tags shouldn't be the main strategy for machine membership on clusters Key: SPARK-3332 URL: https://issues.apache.org/jira/browse/SPARK-3332 Project: Spark Issue Type: Bug Components: EC2 Reporter: Allan Douglas R. de Oliveira The implementation for SPARK-2333 changed the machine membership mechanism from security groups to tags. This is a fundamentally flawed strategy as there aren't guarantees at all the machines will have a tag (even with a retry mechanism). For instance, if the script is killed after launching the instances but before setting the tags the machines will be "invisible" to a destroy command, leaving a unmanageable cluster behind. The initial proposal is to go back to the previous behavior for all cases but when the new flag (--security-group-prefix) is used. Also it's worthwhile to mention that SPARK-3180 introduced the --additional-security-group flag which is a reasonable solution to SPARK-2333 (but isn't a full replacement to all use cases of --security-group-prefix). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3332) Tags shouldn't be the main strategy for machine membership on clusters
[ https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116890#comment-14116890 ] Apache Spark commented on SPARK-3332: - User 'douglaz' has created a pull request for this issue: https://github.com/apache/spark/pull/2223 > Tags shouldn't be the main strategy for machine membership on clusters > -- > > Key: SPARK-3332 > URL: https://issues.apache.org/jira/browse/SPARK-3332 > Project: Spark > Issue Type: Bug > Components: EC2 >Reporter: Allan Douglas R. de Oliveira > > The implementation for SPARK-2333 changed the machine membership mechanism > from security groups to tags. > This is a fundamentally flawed strategy as there aren't guarantees at all the > machines will have a tag (even with a retry mechanism). > For instance, if the script is killed after launching the instances but > before setting the tags the machines will be "invisible" to a destroy > command, leaving a unmanageable cluster behind. > The initial proposal is to go back to the previous behavior for all cases but > when the new flag (--security-group-prefix) is used. > Also it's worthwhile to mention that SPARK-3180 introduced the > --additional-security-group flag which is a reasonable solution to SPARK-2333 > (but isn't a full replacement to all use cases of --security-group-prefix). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3010) fix redundant conditional
[ https://issues.apache.org/jira/browse/SPARK-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3010: - Assignee: wangfei > fix redundant conditional > - > > Key: SPARK-3010 > URL: https://issues.apache.org/jira/browse/SPARK-3010 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.0.2 >Reporter: wangfei >Assignee: wangfei > Fix For: 1.1.0 > > > there are some redundant conditional in spark, such as > 1. > private[spark] def codegenEnabled: Boolean = > if (getConf(CODEGEN_ENABLED, "false") == "true") true else false > 2. > x => if (x == 2) true else false > ... etc -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3010) fix redundant conditional
[ https://issues.apache.org/jira/browse/SPARK-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-3010. -- Resolution: Fixed Fix Version/s: (was: 1.1.0) 1.2.0 Target Version/s: 1.2.0 (was: 1.1.0) > fix redundant conditional > - > > Key: SPARK-3010 > URL: https://issues.apache.org/jira/browse/SPARK-3010 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.0.2 >Reporter: wangfei >Assignee: wangfei > Fix For: 1.2.0 > > > there are some redundant conditional in spark, such as > 1. > private[spark] def codegenEnabled: Boolean = > if (getConf(CODEGEN_ENABLED, "false") == "true") true else false > 2. > x => if (x == 2) true else false > ... etc -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3010) fix redundant conditional
[ https://issues.apache.org/jira/browse/SPARK-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3010: - Priority: Trivial (was: Major) > fix redundant conditional > - > > Key: SPARK-3010 > URL: https://issues.apache.org/jira/browse/SPARK-3010 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.0.2 >Reporter: wangfei >Assignee: wangfei >Priority: Trivial > Fix For: 1.2.0 > > > there are some redundant conditional in spark, such as > 1. > private[spark] def codegenEnabled: Boolean = > if (getConf(CODEGEN_ENABLED, "false") == "true") true else false > 2. > x => if (x == 2) true else false > ... etc -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3333) Large number of partitions causes OOM
Nicholas Chammas created SPARK-: --- Summary: Large number of partitions causes OOM Key: SPARK- URL: https://issues.apache.org/jira/browse/SPARK- Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances Reporter: Nicholas Chammas Here’s a repro for PySpark: {code} a = sc.parallelize(["Nick", "John", "Bob"]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) {code} This code runs fine on 1.0.2. It returns the following result in just over a minute: {code} [(4, 'NickJohn')] {code} However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it runs for a very, very long time and then fails with {{java.lang.OutOfMemoryError: Java heap space}}. Here is a stack trace taken from a run on 1.1.0-rc2: {code} >>> a = sc.parallelize(["Nick", "John", "Bob"]) >>> a = a.repartition(24000) >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart beats: 175143ms exceeds 45000ms 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent heart beats: 175359ms exceeds 45000ms 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent heart beats: 173061ms exceeds 45000ms 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart beats: 176816ms exceeds 45000ms 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent heart beats: 182241ms exceeds 45000ms 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart beats: 178406ms exceeds 45000ms 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver thread-3 java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR SendingConnection: Exception while reading SendingConnection to ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295) at org.apache.spark.network.SendingConnection.read(Connection.scala:390) at org.apache.spark.network.ConnectionManager$$anon$7.run(ConnectionManager.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Inp
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116895#comment-14116895 ] Nicholas Chammas commented on SPARK-: - Note: I have not yet confirmed that 1.1.0-rc3 yields the exact same stack trace as the one provided above (which is for 1.1.0-rc2), though I expect them to be the same. I _can_ confirm that it takes a very, very long time to run, as it is running right now on rc3 and has been for about 45 minutes. Since I have to be offline for a bit, I thought I'd report this issue ASAP with the rc2 stack trace and update it later with a stack trace from rc3. > Large number of partitions causes OOM > - > > Key: SPARK- > URL: https://issues.apache.org/jira/browse/SPARK- > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances >Reporter: Nicholas Chammas > > Here’s a repro for PySpark: > {code} > a = sc.parallelize(["Nick", "John", "Bob"]) > a = a.repartition(24000) > a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > {code} > This code runs fine on 1.0.2. It returns the following result in just over a > minute: > {code} > [(4, 'NickJohn')] > {code} > However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it > runs for a very, very long time (at least > 45 min) and then fails with > {{java.lang.OutOfMemoryError: Java heap space}}. > Here is a stack trace taken from a run on 1.1.0-rc2: > {code} > >>> a = sc.parallelize(["Nick", "John", "Bob"]) > >>> a = a.repartition(24000) > >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent > heart beats: 175143ms exceeds 45000ms > 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent > heart beats: 175359ms exceeds 45000ms > 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent > heart beats: 173061ms exceeds 45000ms > 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent > heart beats: 176816ms exceeds 45000ms > 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent > heart beats: 182241ms exceeds 45000ms > 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent > heart beats: 178406ms exceeds 45000ms > 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver > thread-3 > java.lang.OutOfMemoryError: Java heap space > at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) > at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) > at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) > at > org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) > at > org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Exception in thread "Result re
[jira] [Updated] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-: Description: Here’s a repro for PySpark: {code} a = sc.parallelize(["Nick", "John", "Bob"]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) {code} This code runs fine on 1.0.2. It returns the following result in just over a minute: {code} [(4, 'NickJohn')] {code} However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it runs for a very, very long time (at least > 45 min) and then fails with {{java.lang.OutOfMemoryError: Java heap space}}. Here is a stack trace taken from a run on 1.1.0-rc2: {code} >>> a = sc.parallelize(["Nick", "John", "Bob"]) >>> a = a.repartition(24000) >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart beats: 175143ms exceeds 45000ms 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent heart beats: 175359ms exceeds 45000ms 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent heart beats: 173061ms exceeds 45000ms 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart beats: 176816ms exceeds 45000ms 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent heart beats: 182241ms exceeds 45000ms 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart beats: 178406ms exceeds 45000ms 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver thread-3 java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR SendingConnection: Exception while reading SendingConnection to ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295) at org.apache.spark.network.SendingConnection.read(Connection.scala:390) at org.apache.spark.network.ConnectionManager$$anon$7.run(ConnectionManager.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) java.lang.OutOfMemoryError: Java heap space at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at com.esotericsoftware.kryo.serializers.DefaultArraySerializer
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116902#comment-14116902 ] Patrick Wendell commented on SPARK-: Hey [~nchammas] - I don't think anything relevant to this issue has changed between RC2 and RC3, so the RC2 trace is probably sufficient. > Large number of partitions causes OOM > - > > Key: SPARK- > URL: https://issues.apache.org/jira/browse/SPARK- > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances >Reporter: Nicholas Chammas > > Here’s a repro for PySpark: > {code} > a = sc.parallelize(["Nick", "John", "Bob"]) > a = a.repartition(24000) > a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > {code} > This code runs fine on 1.0.2. It returns the following result in just over a > minute: > {code} > [(4, 'NickJohn')] > {code} > However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it > runs for a very, very long time (at least > 45 min) and then fails with > {{java.lang.OutOfMemoryError: Java heap space}}. > Here is a stack trace taken from a run on 1.1.0-rc2: > {code} > >>> a = sc.parallelize(["Nick", "John", "Bob"]) > >>> a = a.repartition(24000) > >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent > heart beats: 175143ms exceeds 45000ms > 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent > heart beats: 175359ms exceeds 45000ms > 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent > heart beats: 173061ms exceeds 45000ms > 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent > heart beats: 176816ms exceeds 45000ms > 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent > heart beats: 182241ms exceeds 45000ms > 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent > heart beats: 178406ms exceeds 45000ms > 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver > thread-3 > java.lang.OutOfMemoryError: Java heap space > at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) > at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) > at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) > at > org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) > at > org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR > SendingConnection: Exception while reading SendingConnection to > ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) > java.nio.channels.ClosedChannelException > at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) > at sun.nio.ch.Sock
[jira] [Updated] (SPARK-3332) Tagging is not atomic with launching instances on EC2
[ https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3332: --- Summary: Tagging is not atomic with launching instances on EC2 (was: Tags shouldn't be the main strategy for machine membership on clusters) > Tagging is not atomic with launching instances on EC2 > - > > Key: SPARK-3332 > URL: https://issues.apache.org/jira/browse/SPARK-3332 > Project: Spark > Issue Type: Bug > Components: EC2 >Reporter: Allan Douglas R. de Oliveira > > The implementation for SPARK-2333 changed the machine membership mechanism > from security groups to tags. > This is a fundamentally flawed strategy as there aren't guarantees at all the > machines will have a tag (even with a retry mechanism). > For instance, if the script is killed after launching the instances but > before setting the tags the machines will be "invisible" to a destroy > command, leaving a unmanageable cluster behind. > The initial proposal is to go back to the previous behavior for all cases but > when the new flag (--security-group-prefix) is used. > Also it's worthwhile to mention that SPARK-3180 introduced the > --additional-security-group flag which is a reasonable solution to SPARK-2333 > (but isn't a full replacement to all use cases of --security-group-prefix). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3332) Tagging is not atomic with launching instances on EC2
[ https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116906#comment-14116906 ] Patrick Wendell commented on SPARK-3332: I changed the title slightly - I think the underlying problem here is that tagging is not atomic with launching instances. Removing the use of tagging entirely is one potential solution for this. Another is that we just print better errors if tagging does not succeed and explain there might be orphaned instances. The reason why SPARK-2333 was added is that the current approach can lead to too many security groups, so we can't just revert this without any cost. In the mean time I'd like to revert SPARK-2333 in branch-1.1 so we can defer the design decision to the 1.2 release timeframe. Thanks [~douglaz] for highlighting this issue. > Tagging is not atomic with launching instances on EC2 > - > > Key: SPARK-3332 > URL: https://issues.apache.org/jira/browse/SPARK-3332 > Project: Spark > Issue Type: Bug > Components: EC2 >Reporter: Allan Douglas R. de Oliveira > > The implementation for SPARK-2333 changed the machine membership mechanism > from security groups to tags. > This is a fundamentally flawed strategy as there aren't guarantees at all the > machines will have a tag (even with a retry mechanism). > For instance, if the script is killed after launching the instances but > before setting the tags the machines will be "invisible" to a destroy > command, leaving a unmanageable cluster behind. > The initial proposal is to go back to the previous behavior for all cases but > when the new flag (--security-group-prefix) is used. > Also it's worthwhile to mention that SPARK-3180 introduced the > --additional-security-group flag which is a reasonable solution to SPARK-2333 > (but isn't a full replacement to all use cases of --security-group-prefix). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3332) Tagging is not atomic with launching instances on EC2
[ https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3332: --- Target Version/s: 1.2.0 > Tagging is not atomic with launching instances on EC2 > - > > Key: SPARK-3332 > URL: https://issues.apache.org/jira/browse/SPARK-3332 > Project: Spark > Issue Type: Bug > Components: EC2 >Reporter: Allan Douglas R. de Oliveira > > The implementation for SPARK-2333 changed the machine membership mechanism > from security groups to tags. > This is a fundamentally flawed strategy as there aren't guarantees at all the > machines will have a tag (even with a retry mechanism). > For instance, if the script is killed after launching the instances but > before setting the tags the machines will be "invisible" to a destroy > command, leaving a unmanageable cluster behind. > The initial proposal is to go back to the previous behavior for all cases but > when the new flag (--security-group-prefix) is used. > Also it's worthwhile to mention that SPARK-3180 introduced the > --additional-security-group flag which is a reasonable solution to SPARK-2333 > (but isn't a full replacement to all use cases of --security-group-prefix). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3332) Tagging is not atomic with launching instances on EC2
[ https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116910#comment-14116910 ] Apache Spark commented on SPARK-3332: - User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/2225 > Tagging is not atomic with launching instances on EC2 > - > > Key: SPARK-3332 > URL: https://issues.apache.org/jira/browse/SPARK-3332 > Project: Spark > Issue Type: Bug > Components: EC2 >Reporter: Allan Douglas R. de Oliveira > > The implementation for SPARK-2333 changed the machine membership mechanism > from security groups to tags. > This is a fundamentally flawed strategy as there aren't guarantees at all the > machines will have a tag (even with a retry mechanism). > For instance, if the script is killed after launching the instances but > before setting the tags the machines will be "invisible" to a destroy > command, leaving a unmanageable cluster behind. > The initial proposal is to go back to the previous behavior for all cases but > when the new flag (--security-group-prefix) is used. > Also it's worthwhile to mention that SPARK-3180 introduced the > --additional-security-group flag which is a reasonable solution to SPARK-2333 > (but isn't a full replacement to all use cases of --security-group-prefix). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2870) Thorough schema inference directly on RDDs of Python dictionaries
[ https://issues.apache.org/jira/browse/SPARK-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2870: Target Version/s: 1.2.0 > Thorough schema inference directly on RDDs of Python dictionaries > - > > Key: SPARK-2870 > URL: https://issues.apache.org/jira/browse/SPARK-2870 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Reporter: Nicholas Chammas > > h4. Background > I love the {{SQLContext.jsonRDD()}} and {{SQLContext.jsonFile()}} methods. > They process JSON text directly and infer a schema that covers the entire > source data set. > This is very important with semi-structured data like JSON since individual > elements in the data set are free to have different structures. Matching > fields across elements may even have different value types. > For example: > {code} > {"a": 5} > {"a": "cow"} > {code} > To get a queryable schema that covers the whole data set, you need to infer a > schema by looking at the whole data set. The aforementioned > {{SQLContext.json...()}} methods do this very well. > h4. Feature Request > What we need is for {{SQlContext.inferSchema()}} to do this, too. > Alternatively, we need a new {{SQLContext}} method that works on RDDs of > Python dictionaries and does something functionally equivalent to this: > {code} > SQLContext.jsonRDD(RDD[dict].map(lambda x: json.dumps(x))) > {code} > As of 1.0.2, > [{{inferSchema()}}|http://spark.apache.org/docs/latest/api/python/pyspark.sql.SQLContext-class.html#inferSchema] > just looks at the first element in the data set. This won't help much when > the structure of the elements in the target RDD is variable. > h4. Example Use Case > * You have some JSON text data that you want to analyze using Spark SQL. > * You would use one of the {{SQLContext.json...()}} methods, but you need to > do some filtering on the data first to remove bad elements--basically, some > minimal schema validation. > * You deserialize the JSON objects to Python {{dict}} s and filter out the > bad ones. You now have an RDD of dictionaries. > * From this RDD, you want a SchemaRDD that captures the schema for the whole > data set. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2870) Thorough schema inference directly on RDDs of Python dictionaries
[ https://issues.apache.org/jira/browse/SPARK-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116915#comment-14116915 ] Michael Armbrust commented on SPARK-2870: - Yeah, thanks for pinging me. I've targeted this JIRA for 1.2. > Thorough schema inference directly on RDDs of Python dictionaries > - > > Key: SPARK-2870 > URL: https://issues.apache.org/jira/browse/SPARK-2870 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Reporter: Nicholas Chammas > > h4. Background > I love the {{SQLContext.jsonRDD()}} and {{SQLContext.jsonFile()}} methods. > They process JSON text directly and infer a schema that covers the entire > source data set. > This is very important with semi-structured data like JSON since individual > elements in the data set are free to have different structures. Matching > fields across elements may even have different value types. > For example: > {code} > {"a": 5} > {"a": "cow"} > {code} > To get a queryable schema that covers the whole data set, you need to infer a > schema by looking at the whole data set. The aforementioned > {{SQLContext.json...()}} methods do this very well. > h4. Feature Request > What we need is for {{SQlContext.inferSchema()}} to do this, too. > Alternatively, we need a new {{SQLContext}} method that works on RDDs of > Python dictionaries and does something functionally equivalent to this: > {code} > SQLContext.jsonRDD(RDD[dict].map(lambda x: json.dumps(x))) > {code} > As of 1.0.2, > [{{inferSchema()}}|http://spark.apache.org/docs/latest/api/python/pyspark.sql.SQLContext-class.html#inferSchema] > just looks at the first element in the data set. This won't help much when > the structure of the elements in the target RDD is variable. > h4. Example Use Case > * You have some JSON text data that you want to analyze using Spark SQL. > * You would use one of the {{SQLContext.json...()}} methods, but you need to > do some filtering on the data first to remove bad elements--basically, some > minimal schema validation. > * You deserialize the JSON objects to Python {{dict}} s and filter out the > bad ones. You now have an RDD of dictionaries. > * From this RDD, you want a SchemaRDD that captures the schema for the whole > data set. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-3317) The loss of regularization in Updater should use the oldWeights
[ https://issues.apache.org/jira/browse/SPARK-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai closed SPARK-3317. -- Resolution: Won't Fix > The loss of regularization in Updater should use the oldWeights > --- > > Key: SPARK-3317 > URL: https://issues.apache.org/jira/browse/SPARK-3317 > Project: Spark > Issue Type: Bug > Components: MLlib >Reporter: DB Tsai > > The current loss of the regularization is computed from the newWeights which > is not correct. The loss, R(w) = 1/2 ||w||^2 should be computed with the > oldWeights. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116918#comment-14116918 ] Josh Rosen commented on SPARK-: --- Even on my laptop, I notice a huge speed difference between 1.0.2 and 1.1.0 here. I haven't hit the OOM yet, though. > Large number of partitions causes OOM > - > > Key: SPARK- > URL: https://issues.apache.org/jira/browse/SPARK- > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances >Reporter: Nicholas Chammas > > Here’s a repro for PySpark: > {code} > a = sc.parallelize(["Nick", "John", "Bob"]) > a = a.repartition(24000) > a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > {code} > This code runs fine on 1.0.2. It returns the following result in just over a > minute: > {code} > [(4, 'NickJohn')] > {code} > However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it > runs for a very, very long time (at least > 45 min) and then fails with > {{java.lang.OutOfMemoryError: Java heap space}}. > Here is a stack trace taken from a run on 1.1.0-rc2: > {code} > >>> a = sc.parallelize(["Nick", "John", "Bob"]) > >>> a = a.repartition(24000) > >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent > heart beats: 175143ms exceeds 45000ms > 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent > heart beats: 175359ms exceeds 45000ms > 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent > heart beats: 173061ms exceeds 45000ms > 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent > heart beats: 176816ms exceeds 45000ms > 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent > heart beats: 182241ms exceeds 45000ms > 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent > heart beats: 178406ms exceeds 45000ms > 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver > thread-3 > java.lang.OutOfMemoryError: Java heap space > at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) > at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) > at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) > at > org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) > at > org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR > SendingConnection: Exception while reading SendingConnection to > ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) > java.nio.channels.ClosedChannelException > at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) > at sun.nio.ch.SocketChannelImpl.read(SocketChann
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116923#comment-14116923 ] Matei Zaharia commented on SPARK-: -- The slowdown might be partly due to adding external spilling in Python, but it's weird that this would crash the driver. > Large number of partitions causes OOM > - > > Key: SPARK- > URL: https://issues.apache.org/jira/browse/SPARK- > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances >Reporter: Nicholas Chammas > > Here’s a repro for PySpark: > {code} > a = sc.parallelize(["Nick", "John", "Bob"]) > a = a.repartition(24000) > a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > {code} > This code runs fine on 1.0.2. It returns the following result in just over a > minute: > {code} > [(4, 'NickJohn')] > {code} > However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it > runs for a very, very long time (at least > 45 min) and then fails with > {{java.lang.OutOfMemoryError: Java heap space}}. > Here is a stack trace taken from a run on 1.1.0-rc2: > {code} > >>> a = sc.parallelize(["Nick", "John", "Bob"]) > >>> a = a.repartition(24000) > >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent > heart beats: 175143ms exceeds 45000ms > 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent > heart beats: 175359ms exceeds 45000ms > 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent > heart beats: 173061ms exceeds 45000ms > 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent > heart beats: 176816ms exceeds 45000ms > 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent > heart beats: 182241ms exceeds 45000ms > 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent > heart beats: 178406ms exceeds 45000ms > 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver > thread-3 > java.lang.OutOfMemoryError: Java heap space > at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) > at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) > at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) > at > org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) > at > org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR > SendingConnection: Exception while reading SendingConnection to > ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) > java.nio.channels.ClosedChannelException > at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) > at sun.nio.ch.SocketChannelImpl.read(So
[jira] [Resolved] (SPARK-3320) Batched in-memory column buffer building doesn't work for SchemaRDDs with empty partitions
[ https://issues.apache.org/jira/browse/SPARK-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-3320. - Resolution: Fixed Fix Version/s: 1.1.0 Assignee: Cheng Lian > Batched in-memory column buffer building doesn't work for SchemaRDDs with > empty partitions > -- > > Key: SPARK-3320 > URL: https://issues.apache.org/jira/browse/SPARK-3320 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.0.2 >Reporter: Cheng Lian >Assignee: Cheng Lian >Priority: Blocker > Fix For: 1.1.0 > > > Empty partition iterator is not properly handled in > [#1880|https://github.com/apache/spark/pull/1880/files#diff-b47dac3d98014877d5879f5cf37ab0d1R115], > and throws exception when accessing empty partition of the target SchemaRDD -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116935#comment-14116935 ] Davies Liu commented on SPARK-: --- [~matei] I think this is not related to external spilling in Python, because the dataset is too small that it will not trigger spilling in Python. Also, this slowdown can be reproduced in Scala, such as: sc.parallelize(1 to 10) rdd.repartition(24000).keyBy(x=>x).reduceByKey(_+_).collect() The second stage ( 24000 tasks) will take 105 mins (not finished yet). > Large number of partitions causes OOM > - > > Key: SPARK- > URL: https://issues.apache.org/jira/browse/SPARK- > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances >Reporter: Nicholas Chammas > > Here’s a repro for PySpark: > {code} > a = sc.parallelize(["Nick", "John", "Bob"]) > a = a.repartition(24000) > a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > {code} > This code runs fine on 1.0.2. It returns the following result in just over a > minute: > {code} > [(4, 'NickJohn')] > {code} > However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it > runs for a very, very long time (at least > 45 min) and then fails with > {{java.lang.OutOfMemoryError: Java heap space}}. > Here is a stack trace taken from a run on 1.1.0-rc2: > {code} > >>> a = sc.parallelize(["Nick", "John", "Bob"]) > >>> a = a.repartition(24000) > >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent > heart beats: 175143ms exceeds 45000ms > 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent > heart beats: 175359ms exceeds 45000ms > 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent > heart beats: 173061ms exceeds 45000ms > 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent > heart beats: 176816ms exceeds 45000ms > 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent > heart beats: 182241ms exceeds 45000ms > 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent > heart beats: 178406ms exceeds 45000ms > 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver > thread-3 > java.lang.OutOfMemoryError: Java heap space > at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) > at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) > at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) > at > org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) > at > org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR > SendingConnection: Exception while reading SendingCon
[jira] [Comment Edited] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116935#comment-14116935 ] Davies Liu edited comment on SPARK- at 8/31/14 11:26 PM: - [~matei] I think this is not related to external spilling in Python, because the dataset is too small that it will not trigger spilling in Python. Also, this slowdown can be reproduced in Scala, such as: sc.parallelize(1 to 10) rdd.repartition(24000).keyBy(x=>x).reduceByKey(_+_).collect() The second stage ( 24000 tasks) will take 105 mins (not finished yet). PS: I am running on master. was (Author: davies): [~matei] I think this is not related to external spilling in Python, because the dataset is too small that it will not trigger spilling in Python. Also, this slowdown can be reproduced in Scala, such as: sc.parallelize(1 to 10) rdd.repartition(24000).keyBy(x=>x).reduceByKey(_+_).collect() The second stage ( 24000 tasks) will take 105 mins (not finished yet). > Large number of partitions causes OOM > - > > Key: SPARK- > URL: https://issues.apache.org/jira/browse/SPARK- > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances >Reporter: Nicholas Chammas > > Here’s a repro for PySpark: > {code} > a = sc.parallelize(["Nick", "John", "Bob"]) > a = a.repartition(24000) > a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > {code} > This code runs fine on 1.0.2. It returns the following result in just over a > minute: > {code} > [(4, 'NickJohn')] > {code} > However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it > runs for a very, very long time (at least > 45 min) and then fails with > {{java.lang.OutOfMemoryError: Java heap space}}. > Here is a stack trace taken from a run on 1.1.0-rc2: > {code} > >>> a = sc.parallelize(["Nick", "John", "Bob"]) > >>> a = a.repartition(24000) > >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent > heart beats: 175143ms exceeds 45000ms > 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent > heart beats: 175359ms exceeds 45000ms > 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent > heart beats: 173061ms exceeds 45000ms > 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent > heart beats: 176816ms exceeds 45000ms > 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent > heart beats: 182241ms exceeds 45000ms > 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent > heart beats: 178406ms exceeds 45000ms > 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver > thread-3 > java.lang.OutOfMemoryError: Java heap space > at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) > at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) > at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) > at > org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) > at > org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at org.apache.spark.util.Utils$.logUncaughtExceptions(U
[jira] [Comment Edited] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116935#comment-14116935 ] Davies Liu edited comment on SPARK- at 8/31/14 11:27 PM: - [~matei] I think this is not related to external spilling in Python, because the dataset is too small that it will not trigger spilling in Python. Also, this slowdown can be reproduced in Scala, such as: sc.parallelize(1 to 10) rdd.repartition(24000).keyBy(x=>x).reduceByKey(_+_).collect() The second stage ( 24000 tasks) will take 105 mins (not finished yet). The CPU usage of JVM is 250%, memory is about 655M, it may trigger OOM somewhere. PS: I am running on master. was (Author: davies): [~matei] I think this is not related to external spilling in Python, because the dataset is too small that it will not trigger spilling in Python. Also, this slowdown can be reproduced in Scala, such as: sc.parallelize(1 to 10) rdd.repartition(24000).keyBy(x=>x).reduceByKey(_+_).collect() The second stage ( 24000 tasks) will take 105 mins (not finished yet). PS: I am running on master. > Large number of partitions causes OOM > - > > Key: SPARK- > URL: https://issues.apache.org/jira/browse/SPARK- > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances >Reporter: Nicholas Chammas > > Here’s a repro for PySpark: > {code} > a = sc.parallelize(["Nick", "John", "Bob"]) > a = a.repartition(24000) > a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > {code} > This code runs fine on 1.0.2. It returns the following result in just over a > minute: > {code} > [(4, 'NickJohn')] > {code} > However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it > runs for a very, very long time (at least > 45 min) and then fails with > {{java.lang.OutOfMemoryError: Java heap space}}. > Here is a stack trace taken from a run on 1.1.0-rc2: > {code} > >>> a = sc.parallelize(["Nick", "John", "Bob"]) > >>> a = a.repartition(24000) > >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent > heart beats: 175143ms exceeds 45000ms > 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent > heart beats: 175359ms exceeds 45000ms > 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent > heart beats: 173061ms exceeds 45000ms > 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent > heart beats: 176816ms exceeds 45000ms > 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent > heart beats: 182241ms exceeds 45000ms > 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent > heart beats: 178406ms exceeds 45000ms > 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver > thread-3 > java.lang.OutOfMemoryError: Java heap space > at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) > at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) > at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) > at > org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) > at > org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at > org.apache.spark.scheduler.TaskResultGetter$$an
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116936#comment-14116936 ] Josh Rosen commented on SPARK-: --- I agree with Davies; I think this is a more general Spark issue, perhaps related to {{repartition()}}. I just tried testing this locally with commit eff9714e1c88e39e28317358ca9ec87677f121dc, which is the commit immediately prior to [14174abd421318e71c16edd24224fd5094bdfed4|https://github.com/apache/spark/commit/14174abd421318e71c16edd24224fd5094bdfed4], Davies' patch that adds hash-based disk spilling aggregation to PySpark, and I still saw the same slowdown there. > Large number of partitions causes OOM > - > > Key: SPARK- > URL: https://issues.apache.org/jira/browse/SPARK- > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances >Reporter: Nicholas Chammas > > Here’s a repro for PySpark: > {code} > a = sc.parallelize(["Nick", "John", "Bob"]) > a = a.repartition(24000) > a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > {code} > This code runs fine on 1.0.2. It returns the following result in just over a > minute: > {code} > [(4, 'NickJohn')] > {code} > However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it > runs for a very, very long time (at least > 45 min) and then fails with > {{java.lang.OutOfMemoryError: Java heap space}}. > Here is a stack trace taken from a run on 1.1.0-rc2: > {code} > >>> a = sc.parallelize(["Nick", "John", "Bob"]) > >>> a = a.repartition(24000) > >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent > heart beats: 175143ms exceeds 45000ms > 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent > heart beats: 175359ms exceeds 45000ms > 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent > heart beats: 173061ms exceeds 45000ms > 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent > heart beats: 176816ms exceeds 45000ms > 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent > heart beats: 182241ms exceeds 45000ms > 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent > heart beats: 178406ms exceeds 45000ms > 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver > thread-3 > java.lang.OutOfMemoryError: Java heap space > at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) > at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) > at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) > at > org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) > at > org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Exception in thr
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116942#comment-14116942 ] Matei Zaharia commented on SPARK-: -- I see, that makes sense. > Large number of partitions causes OOM > - > > Key: SPARK- > URL: https://issues.apache.org/jira/browse/SPARK- > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances >Reporter: Nicholas Chammas > > Here’s a repro for PySpark: > {code} > a = sc.parallelize(["Nick", "John", "Bob"]) > a = a.repartition(24000) > a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > {code} > This code runs fine on 1.0.2. It returns the following result in just over a > minute: > {code} > [(4, 'NickJohn')] > {code} > However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it > runs for a very, very long time (at least > 45 min) and then fails with > {{java.lang.OutOfMemoryError: Java heap space}}. > Here is a stack trace taken from a run on 1.1.0-rc2: > {code} > >>> a = sc.parallelize(["Nick", "John", "Bob"]) > >>> a = a.repartition(24000) > >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent > heart beats: 175143ms exceeds 45000ms > 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent > heart beats: 175359ms exceeds 45000ms > 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent > heart beats: 173061ms exceeds 45000ms > 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent > heart beats: 176816ms exceeds 45000ms > 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent > heart beats: 182241ms exceeds 45000ms > 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent > heart beats: 178406ms exceeds 45000ms > 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver > thread-3 > java.lang.OutOfMemoryError: Java heap space > at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) > at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) > at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) > at > org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) > at > org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR > SendingConnection: Exception while reading SendingConnection to > ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) > java.nio.channels.ClosedChannelException > at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295) > at org.apache.spark.network.SendingConnection.read(Connection.sca
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116948#comment-14116948 ] Josh Rosen commented on SPARK-: --- Working on doing some manual bisecting to find the patch that introduced the slowdown. It's still slow as early as 8d338f64c4eda45d22ae33f61ef7928011cc2846. > Large number of partitions causes OOM > - > > Key: SPARK- > URL: https://issues.apache.org/jira/browse/SPARK- > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances >Reporter: Nicholas Chammas > > Here’s a repro for PySpark: > {code} > a = sc.parallelize(["Nick", "John", "Bob"]) > a = a.repartition(24000) > a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > {code} > This code runs fine on 1.0.2. It returns the following result in just over a > minute: > {code} > [(4, 'NickJohn')] > {code} > However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it > runs for a very, very long time (at least > 45 min) and then fails with > {{java.lang.OutOfMemoryError: Java heap space}}. > Here is a stack trace taken from a run on 1.1.0-rc2: > {code} > >>> a = sc.parallelize(["Nick", "John", "Bob"]) > >>> a = a.repartition(24000) > >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent > heart beats: 175143ms exceeds 45000ms > 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent > heart beats: 175359ms exceeds 45000ms > 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent > heart beats: 173061ms exceeds 45000ms > 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent > heart beats: 176816ms exceeds 45000ms > 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent > heart beats: 182241ms exceeds 45000ms > 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent > heart beats: 178406ms exceeds 45000ms > 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver > thread-3 > java.lang.OutOfMemoryError: Java heap space > at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) > at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) > at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) > at > org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) > at > org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR > SendingConnection: Exception while reading SendingConnection to > ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) > java.nio.channels.ClosedChannelException > at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) > at sun.n
[jira] [Commented] (SPARK-3332) Tagging is not atomic with launching instances on EC2
[ https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116950#comment-14116950 ] Allan Douglas R. de Oliveira commented on SPARK-3332: - [~pwendell], yes this is a good reword and I agree to revert it for now. You mentioned the two potential solutions but I think a good compromise is the one implemented by the PR which keeps the flag allowing reuse of the same security group but also allowing to match the machines by the security group in the other cases. > Tagging is not atomic with launching instances on EC2 > - > > Key: SPARK-3332 > URL: https://issues.apache.org/jira/browse/SPARK-3332 > Project: Spark > Issue Type: Bug > Components: EC2 >Reporter: Allan Douglas R. de Oliveira > > The implementation for SPARK-2333 changed the machine membership mechanism > from security groups to tags. > This is a fundamentally flawed strategy as there aren't guarantees at all the > machines will have a tag (even with a retry mechanism). > For instance, if the script is killed after launching the instances but > before setting the tags the machines will be "invisible" to a destroy > command, leaving a unmanageable cluster behind. > The initial proposal is to go back to the previous behavior for all cases but > when the new flag (--security-group-prefix) is used. > Also it's worthwhile to mention that SPARK-3180 introduced the > --additional-security-group flag which is a reasonable solution to SPARK-2333 > (but isn't a full replacement to all use cases of --security-group-prefix). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3332) Tagging is not atomic with launching instances on EC2
[ https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116950#comment-14116950 ] Allan Douglas R. de Oliveira edited comment on SPARK-3332 at 9/1/14 12:49 AM: -- [~pwendell], yes this is a good reword and I agree to revert it for now. You mentioned the two potential solutions but I think a good compromise is the one implemented by the PR which keeps the flag allowing reuse of the same security group but also allowing to match the machines by the security group in the other cases. Perhaps more messages could be added when the flag has been used and the tagging failed. was (Author: douglaz): [~pwendell], yes this is a good reword and I agree to revert it for now. You mentioned the two potential solutions but I think a good compromise is the one implemented by the PR which keeps the flag allowing reuse of the same security group but also allowing to match the machines by the security group in the other cases. > Tagging is not atomic with launching instances on EC2 > - > > Key: SPARK-3332 > URL: https://issues.apache.org/jira/browse/SPARK-3332 > Project: Spark > Issue Type: Bug > Components: EC2 >Reporter: Allan Douglas R. de Oliveira > > The implementation for SPARK-2333 changed the machine membership mechanism > from security groups to tags. > This is a fundamentally flawed strategy as there aren't guarantees at all the > machines will have a tag (even with a retry mechanism). > For instance, if the script is killed after launching the instances but > before setting the tags the machines will be "invisible" to a destroy > command, leaving a unmanageable cluster behind. > The initial proposal is to go back to the previous behavior for all cases but > when the new flag (--security-group-prefix) is used. > Also it's worthwhile to mention that SPARK-3180 introduced the > --additional-security-group flag which is a reasonable solution to SPARK-2333 > (but isn't a full replacement to all use cases of --security-group-prefix). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116969#comment-14116969 ] Josh Rosen commented on SPARK-: --- Looks like the issue was introduced somewhere between 273afcb and 62d4a0f. > Large number of partitions causes OOM > - > > Key: SPARK- > URL: https://issues.apache.org/jira/browse/SPARK- > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances >Reporter: Nicholas Chammas > > Here’s a repro for PySpark: > {code} > a = sc.parallelize(["Nick", "John", "Bob"]) > a = a.repartition(24000) > a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > {code} > This code runs fine on 1.0.2. It returns the following result in just over a > minute: > {code} > [(4, 'NickJohn')] > {code} > However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it > runs for a very, very long time (at least > 45 min) and then fails with > {{java.lang.OutOfMemoryError: Java heap space}}. > Here is a stack trace taken from a run on 1.1.0-rc2: > {code} > >>> a = sc.parallelize(["Nick", "John", "Bob"]) > >>> a = a.repartition(24000) > >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent > heart beats: 175143ms exceeds 45000ms > 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent > heart beats: 175359ms exceeds 45000ms > 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent > heart beats: 173061ms exceeds 45000ms > 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent > heart beats: 176816ms exceeds 45000ms > 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent > heart beats: 182241ms exceeds 45000ms > 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent > heart beats: 178406ms exceeds 45000ms > 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver > thread-3 > java.lang.OutOfMemoryError: Java heap space > at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) > at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) > at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) > at > org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) > at > org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR > SendingConnection: Exception while reading SendingConnection to > ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) > java.nio.channels.ClosedChannelException > at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295) > at org.apache.spark.n
[jira] [Commented] (SPARK-3168) The ServletContextHandler of webui lacks a SessionManager
[ https://issues.apache.org/jira/browse/SPARK-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116974#comment-14116974 ] meiyoula commented on SPARK-3168: - To use CAS for single sign-on, i add some filters of webui in configuration.For History Server, I set the following configuration: export SPARK_HISTORY_OPTS=$SPARK_HISTORY_OPTS" -Dspark.ui.filters=org.jasig.cas.client.authentication.Saml11AuthenticationFilter,org.jasig.cas.client.validation.Saml11TicketValidationFilter,org.jasig.cas.client.util.HttpServletRequestWrapperFilter -Dspark.org.jasig.cas.client.authentication.Saml11AuthenticationFilter.params=casServerLoginUrl=https://9.91.11.120:8443/cas/login,serverName=http://9.91.11.171:18080 -Dspark.org.jasig.cas.client.validation.Saml11TicketValidationFilter.params=casServerUrlPrefix=https://9.91.11.120:8443/cas/,serverName=http://9.91.11.171:18080,hostnameVerifier=org.jasig.cas.client.ssl.AnyHostnameVerifier " > The ServletContextHandler of webui lacks a SessionManager > - > > Key: SPARK-3168 > URL: https://issues.apache.org/jira/browse/SPARK-3168 > Project: Spark > Issue Type: Bug > Components: Spark Core > Environment: CAS >Reporter: meiyoula > > When i use CAS to realize single sign of webui, it occurs a exception: > {code} > WARN [qtp1076146544-24] / > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:561) > java.lang.IllegalStateException: No SessionManager > at org.eclipse.jetty.server.Request.getSession(Request.java:1269) > at org.eclipse.jetty.server.Request.getSession(Request.java:1248) > at > org.jasig.cas.client.validation.AbstractTicketValidationFilter.doFilter(AbstractTicketValidationFilter.java:178) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467) > at > org.jasig.cas.client.authentication.AuthenticationFilter.doFilter(AuthenticationFilter.java:116) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467) > at > org.jasig.cas.client.session.SingleSignOutFilter.doFilter(SingleSignOutFilter.java:76) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) > at org.eclipse.jetty.server.Server.handle(Server.java:370) > at > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) > at > org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) > at > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) > at > org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) > at > org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) > at > org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) > at > org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) > at java.lang.Thread.run(Thread.java:744) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3293) yarn's web show "SUCCEEDED" when the driver throw a exception in yarn-client
[ https://issues.apache.org/jira/browse/SPARK-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116973#comment-14116973 ] Guoqiang Li commented on SPARK-3293: Here is a related PR: https://github.com/apache/spark/pull/1788/files > yarn's web show "SUCCEEDED" when the driver throw a exception in yarn-client > > > Key: SPARK-3293 > URL: https://issues.apache.org/jira/browse/SPARK-3293 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.0.2 >Reporter: wangfei > Fix For: 1.1.0 > > > If an exception occurs, the yarn'web->Applications->FinalStatus will also be > the "SUCCEEDED" without the expectation of "FAILED". > In the release of spark-1.0.2, only yarn-client mode will show this. > But recently the yarn-cluster mode will also be a problem. > To reply this: > just new a sparkContext and then throw an exception > then watch the yarn websit about applications -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116978#comment-14116978 ] Nicholas Chammas commented on SPARK-: - For the record, I got the OOM in a relatively short amount of time (less than 30 min) on a 1.1.0-rc2 EC2 cluster with 20 {{m1.xlarge}} slaves. Perhaps one of y'all can replicate the OOM with that kind of environment. > Large number of partitions causes OOM > - > > Key: SPARK- > URL: https://issues.apache.org/jira/browse/SPARK- > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances >Reporter: Nicholas Chammas > > Here’s a repro for PySpark: > {code} > a = sc.parallelize(["Nick", "John", "Bob"]) > a = a.repartition(24000) > a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > {code} > This code runs fine on 1.0.2. It returns the following result in just over a > minute: > {code} > [(4, 'NickJohn')] > {code} > However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it > runs for a very, very long time (at least > 45 min) and then fails with > {{java.lang.OutOfMemoryError: Java heap space}}. > Here is a stack trace taken from a run on 1.1.0-rc2: > {code} > >>> a = sc.parallelize(["Nick", "John", "Bob"]) > >>> a = a.repartition(24000) > >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent > heart beats: 175143ms exceeds 45000ms > 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent > heart beats: 175359ms exceeds 45000ms > 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent > heart beats: 173061ms exceeds 45000ms > 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent > heart beats: 176816ms exceeds 45000ms > 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent > heart beats: 182241ms exceeds 45000ms > 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent > heart beats: 178406ms exceeds 45000ms > 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver > thread-3 > java.lang.OutOfMemoryError: Java heap space > at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) > at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) > at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) > at > org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) > at > org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR > SendingConnection: Exception while reading SendingConnection to > ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) > java.nio.channels.ClosedChannelException > at sun.nio.ch.Soc
[jira] [Comment Edited] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116978#comment-14116978 ] Nicholas Chammas edited comment on SPARK- at 9/1/14 2:09 AM: - For the record, I got the OOM in [my original report|http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-large-of-partitions-causes-OOM-td13155.html] (which I duplicated here in this JIRA) in a relatively short amount of time (less than 30 min) on a 1.1.0-rc2 EC2 cluster with 20 {{m1.xlarge}} slaves. Perhaps one of y'all can replicate the OOM with that kind of environment. was (Author: nchammas): For the record, I got the OOM in a relatively short amount of time (less than 30 min) on a 1.1.0-rc2 EC2 cluster with 20 {{m1.xlarge}} slaves. Perhaps one of y'all can replicate the OOM with that kind of environment. > Large number of partitions causes OOM > - > > Key: SPARK- > URL: https://issues.apache.org/jira/browse/SPARK- > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances >Reporter: Nicholas Chammas > > Here’s a repro for PySpark: > {code} > a = sc.parallelize(["Nick", "John", "Bob"]) > a = a.repartition(24000) > a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > {code} > This code runs fine on 1.0.2. It returns the following result in just over a > minute: > {code} > [(4, 'NickJohn')] > {code} > However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it > runs for a very, very long time (at least > 45 min) and then fails with > {{java.lang.OutOfMemoryError: Java heap space}}. > Here is a stack trace taken from a run on 1.1.0-rc2: > {code} > >>> a = sc.parallelize(["Nick", "John", "Bob"]) > >>> a = a.repartition(24000) > >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent > heart beats: 175143ms exceeds 45000ms > 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent > heart beats: 175359ms exceeds 45000ms > 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent > heart beats: 173061ms exceeds 45000ms > 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent > heart beats: 176816ms exceeds 45000ms > 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent > heart beats: 182241ms exceeds 45000ms > 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent > heart beats: 178406ms exceeds 45000ms > 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver > thread-3 > java.lang.OutOfMemoryError: Java heap space > at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) > at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) > at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) > at > org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) > at > org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) > at > java.util.concurrent.Thr
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116985#comment-14116985 ] Josh Rosen commented on SPARK-: --- I'll resume work on this later tonight, but just wanted to note that things run fast as recently as commit 5ad5e34 and slow down as long ago as 6587ef7. > Large number of partitions causes OOM > - > > Key: SPARK- > URL: https://issues.apache.org/jira/browse/SPARK- > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances >Reporter: Nicholas Chammas > > Here’s a repro for PySpark: > {code} > a = sc.parallelize(["Nick", "John", "Bob"]) > a = a.repartition(24000) > a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > {code} > This code runs fine on 1.0.2. It returns the following result in just over a > minute: > {code} > [(4, 'NickJohn')] > {code} > However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it > runs for a very, very long time (at least > 45 min) and then fails with > {{java.lang.OutOfMemoryError: Java heap space}}. > Here is a stack trace taken from a run on 1.1.0-rc2: > {code} > >>> a = sc.parallelize(["Nick", "John", "Bob"]) > >>> a = a.repartition(24000) > >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent > heart beats: 175143ms exceeds 45000ms > 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent > heart beats: 175359ms exceeds 45000ms > 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent > heart beats: 173061ms exceeds 45000ms > 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent > heart beats: 176816ms exceeds 45000ms > 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent > heart beats: 182241ms exceeds 45000ms > 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent > heart beats: 178406ms exceeds 45000ms > 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver > thread-3 > java.lang.OutOfMemoryError: Java heap space > at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) > at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) > at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) > at > org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) > at > org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR > SendingConnection: Exception while reading SendingConnection to > ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) > java.nio.channels.ClosedChannelException > at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) > at sun.nio.ch
[jira] [Updated] (SPARK-3334) Spark causes mesos-master memory leak
[ https://issues.apache.org/jira/browse/SPARK-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Iven Hsu updated SPARK-3334: Description: The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround SPARK-1112, this causes all serialized task result is sent using Mesos TaskStatus. mesos-master stores TaskStatus in memory, and when running Spark, its memory grows very fast, and will be OOM killed. See MESOS-1746 for more. I've tryed to set {{akkaFrameSize}} to 0, mesos-master won't be killed, however, the driver will block after success unless I use {{sc.stop()}} to quit it manually. Not sure if it's related to SPARK-1112. was: The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround SPARK-1112, this causes all serialized task result is sent using Mesos TaskStatus. mesos-master stores TaskStatus in memory, and when running Spark, it's memory grows very fast, and will be OOM killed. See MESOS-1746 for more. I've tryed to set {{akkaFrameSize}} to 0, mesos-master won't be killed, however, the driver will block after success unless I use {{sc.stop()}} to quit it manually. Not sure if it's related to SPARK-1112. > Spark causes mesos-master memory leak > - > > Key: SPARK-3334 > URL: https://issues.apache.org/jira/browse/SPARK-3334 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.0.2 > Environment: Mesos 0.16.0/0.19.0 > CentOS 6.4 >Reporter: Iven Hsu > > The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to > workaround SPARK-1112, this causes all serialized task result is sent using > Mesos TaskStatus. > mesos-master stores TaskStatus in memory, and when running Spark, its memory > grows very fast, and will be OOM killed. > See MESOS-1746 for more. > I've tryed to set {{akkaFrameSize}} to 0, mesos-master won't be killed, > however, the driver will block after success unless I use {{sc.stop()}} to > quit it manually. Not sure if it's related to SPARK-1112. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3334) Spark causes mesos-master memory leak
Iven Hsu created SPARK-3334: --- Summary: Spark causes mesos-master memory leak Key: SPARK-3334 URL: https://issues.apache.org/jira/browse/SPARK-3334 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.0.2 Environment: Mesos 0.16.0/0.19.0 CentOS 6.4 Reporter: Iven Hsu The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround SPARK-1112, this causes all serialized task result is sent using Mesos TaskStatus. mesos-master stores TaskStatus in memory, and when running Spark, it's memory grows very fast, and will be OOM killed. See MESOS-1746 for more. I've tryed to set {{akkaFrameSize}} to 0, mesos-master won't be killed, however, the driver will block after success unless I use {{sc.stop()}} to quit it manually. Not sure if it's related to SPARK-1112. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3334) Spark causes mesos-master memory leak
[ https://issues.apache.org/jira/browse/SPARK-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Iven Hsu updated SPARK-3334: Description: The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround SPARK-1112, this causes all serialized task result is sent using Mesos TaskStatus. mesos-master stores TaskStatus in memory, and when running Spark, its memory grows very fast, and will be OOM killed. See MESOS-1746 for more. I've tried to set {{akkaFrameSize}} to 0, mesos-master won't be killed, however, the driver will block after success unless I use {{sc.stop()}} to quit it manually. Not sure if it's related to SPARK-1112. was: The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround SPARK-1112, this causes all serialized task result is sent using Mesos TaskStatus. mesos-master stores TaskStatus in memory, and when running Spark, its memory grows very fast, and will be OOM killed. See MESOS-1746 for more. I've tryed to set {{akkaFrameSize}} to 0, mesos-master won't be killed, however, the driver will block after success unless I use {{sc.stop()}} to quit it manually. Not sure if it's related to SPARK-1112. > Spark causes mesos-master memory leak > - > > Key: SPARK-3334 > URL: https://issues.apache.org/jira/browse/SPARK-3334 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.0.2 > Environment: Mesos 0.16.0/0.19.0 > CentOS 6.4 >Reporter: Iven Hsu > > The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to > workaround SPARK-1112, this causes all serialized task result is sent using > Mesos TaskStatus. > mesos-master stores TaskStatus in memory, and when running Spark, its memory > grows very fast, and will be OOM killed. > See MESOS-1746 for more. > I've tried to set {{akkaFrameSize}} to 0, mesos-master won't be killed, > however, the driver will block after success unless I use {{sc.stop()}} to > quit it manually. Not sure if it's related to SPARK-1112. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-3324) YARN module has nonstandard structure which cause compile error In IntelliJ
[ https://issues.apache.org/jira/browse/SPARK-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reassigned SPARK-3324: -- Assignee: Patrick Wendell > YARN module has nonstandard structure which cause compile error In IntelliJ > --- > > Key: SPARK-3324 > URL: https://issues.apache.org/jira/browse/SPARK-3324 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.1.0 > Environment: Mac OS: 10.9.4 > IntelliJ IDEA: 13.1.4 > Scala Plugins: 0.41.2 > Maven: 3.0.5 >Reporter: Yi Tian >Assignee: Patrick Wendell >Priority: Minor > Labels: intellij, maven, yarn > > The YARN module has nonstandard path structure like: > {code} > ${SPARK_HOME} > |--yarn > |--alpha (contains yarn api support for 0.23 and 2.0.x) > |--stable (contains yarn api support for 2.2 and later) > | |--pom.xml (spark-yarn) > |--common (Common codes not depending on specific version of Hadoop) > |--pom.xml (yarn-parent) > {code} > When we use maven to compile yarn module, maven will import 'alpha' or > 'stable' module according to profile setting. > And the submodule like 'stable' use the build propertie defined in > yarn/pom.xml to import common codes to sourcePath. > It will cause IntelliJ can't directly recognize sources in common directory > as sourcePath. > I thought we should change the yarn module to a unified maven jar project, > and add specify different version of yarn api via maven profile setting. > It will resolve the compile error in IntelliJ and make the yarn module more > simple and clear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117016#comment-14117016 ] Josh Rosen commented on SPARK-: --- It looks like https://github.com/apache/spark/pull/1138 may be the culprit, since this job runs quickly immediately prior to that commit. > Large number of partitions causes OOM > - > > Key: SPARK- > URL: https://issues.apache.org/jira/browse/SPARK- > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances >Reporter: Nicholas Chammas > > Here’s a repro for PySpark: > {code} > a = sc.parallelize(["Nick", "John", "Bob"]) > a = a.repartition(24000) > a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > {code} > This code runs fine on 1.0.2. It returns the following result in just over a > minute: > {code} > [(4, 'NickJohn')] > {code} > However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it > runs for a very, very long time (at least > 45 min) and then fails with > {{java.lang.OutOfMemoryError: Java heap space}}. > Here is a stack trace taken from a run on 1.1.0-rc2: > {code} > >>> a = sc.parallelize(["Nick", "John", "Bob"]) > >>> a = a.repartition(24000) > >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent > heart beats: 175143ms exceeds 45000ms > 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent > heart beats: 175359ms exceeds 45000ms > 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent > heart beats: 173061ms exceeds 45000ms > 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent > heart beats: 176816ms exceeds 45000ms > 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent > heart beats: 182241ms exceeds 45000ms > 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent > heart beats: 178406ms exceeds 45000ms > 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver > thread-3 > java.lang.OutOfMemoryError: Java heap space > at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) > at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) > at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) > at > org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) > at > org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR > SendingConnection: Exception while reading SendingConnection to > ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) > java.nio.channels.ClosedChannelException > at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) > at sun.nio.ch.SocketChannelI
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117019#comment-14117019 ] Davies Liu commented on SPARK-: --- @joserosen This should not be the culprit, it just show the bad things up in PySpark. Before it, the default partitions of reduceByKey() could be something much smaller, such as 4. The root cause should be inside Scala, you should use the Scala one to test it. > Large number of partitions causes OOM > - > > Key: SPARK- > URL: https://issues.apache.org/jira/browse/SPARK- > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances >Reporter: Nicholas Chammas > > Here’s a repro for PySpark: > {code} > a = sc.parallelize(["Nick", "John", "Bob"]) > a = a.repartition(24000) > a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > {code} > This code runs fine on 1.0.2. It returns the following result in just over a > minute: > {code} > [(4, 'NickJohn')] > {code} > However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it > runs for a very, very long time (at least > 45 min) and then fails with > {{java.lang.OutOfMemoryError: Java heap space}}. > Here is a stack trace taken from a run on 1.1.0-rc2: > {code} > >>> a = sc.parallelize(["Nick", "John", "Bob"]) > >>> a = a.repartition(24000) > >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent > heart beats: 175143ms exceeds 45000ms > 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent > heart beats: 175359ms exceeds 45000ms > 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent > heart beats: 173061ms exceeds 45000ms > 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent > heart beats: 176816ms exceeds 45000ms > 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent > heart beats: 182241ms exceeds 45000ms > 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent > heart beats: 178406ms exceeds 45000ms > 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver > thread-3 > java.lang.OutOfMemoryError: Java heap space > at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) > at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) > at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) > at > org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) > at > org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR > SendingConnection: Exception while reading SendingConnection to > ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) > java.nio.channels.ClosedChannel
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117021#comment-14117021 ] Josh Rosen commented on SPARK-: --- Good point. I guess "culprit" was the wrong word, but that commit helps to narrow down the problem. > Large number of partitions causes OOM > - > > Key: SPARK- > URL: https://issues.apache.org/jira/browse/SPARK- > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances >Reporter: Nicholas Chammas > > Here’s a repro for PySpark: > {code} > a = sc.parallelize(["Nick", "John", "Bob"]) > a = a.repartition(24000) > a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > {code} > This code runs fine on 1.0.2. It returns the following result in just over a > minute: > {code} > [(4, 'NickJohn')] > {code} > However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it > runs for a very, very long time (at least > 45 min) and then fails with > {{java.lang.OutOfMemoryError: Java heap space}}. > Here is a stack trace taken from a run on 1.1.0-rc2: > {code} > >>> a = sc.parallelize(["Nick", "John", "Bob"]) > >>> a = a.repartition(24000) > >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent > heart beats: 175143ms exceeds 45000ms > 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent > heart beats: 175359ms exceeds 45000ms > 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent > heart beats: 173061ms exceeds 45000ms > 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent > heart beats: 176816ms exceeds 45000ms > 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent > heart beats: 182241ms exceeds 45000ms > 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent > heart beats: 178406ms exceeds 45000ms > 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver > thread-3 > java.lang.OutOfMemoryError: Java heap space > at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) > at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) > at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) > at > org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514) > at > org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR > SendingConnection: Exception while reading SendingConnection to > ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014) > java.nio.channels.ClosedChannelException > at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
[jira] [Resolved] (SPARK-2536) Update the MLlib page of Spark website
[ https://issues.apache.org/jira/browse/SPARK-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-2536. -- Resolution: Done > Update the MLlib page of Spark website > -- > > Key: SPARK-2536 > URL: https://issues.apache.org/jira/browse/SPARK-2536 > Project: Spark > Issue Type: Sub-task > Components: Documentation, MLlib >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > > It still shows v0.9. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3205) input format for text records saved with in-record delimiter and newline characters escaped
[ https://issues.apache.org/jira/browse/SPARK-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-3205. -- Resolution: Later Moved the implementation to https://github.com/mengxr/redshift-input-format. If people feel this input format is common, we can move it to Spark Core later. > input format for text records saved with in-record delimiter and newline > characters escaped > --- > > Key: SPARK-3205 > URL: https://issues.apache.org/jira/browse/SPARK-3205 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > > Text records may contain in-record delimiter or newline characters. In such > cases, we can either encode them or escape them. The latter is simpler and > used by Redshift's UNLOAD with the ESCAPE option. The problem is that a > record will span multiple lines. We need an input format for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3090) Avoid not stopping SparkContext with YARN Client mode
[ https://issues.apache.org/jira/browse/SPARK-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117036#comment-14117036 ] Kousuke Saruta commented on SPARK-3090: --- It sound good idea that SparkContext register shutdown-hook itself. > Avoid not stopping SparkContext with YARN Client mode > -- > > Key: SPARK-3090 > URL: https://issues.apache.org/jira/browse/SPARK-3090 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN >Affects Versions: 1.1.0 >Reporter: Kousuke Saruta > > When we use YARN Cluster mode, ApplicationMaser register a shutdown hook, > stopping SparkContext. > Thanks to this, SparkContext can stop even if Application forgets to stop > SparkContext itself. > But, unfortunately, YARN Client mode doesn't have such mechanism. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117062#comment-14117062 ] Josh Rosen commented on SPARK-: --- [~shivaram] and I discussed this; we have a few ideas about what might be happening. I tried running {{sc.parallelize(1 to 10).repartition(24000).keyBy(x=>x).reduceByKey(_+_).collect()}} in 1.0.2 and observed similarly slow speed to what I saw in the current 1.1.0 release candidate. When I modified my job to use fewer reducers ({{reduceByKey(_+_, 4)}}, then the job completed quickly. You can see similar behavior in Python by explicitly specifying a smaller number of reducers. I think the issue here is that the overhead of sending and processing task completions is proportional to O(numReducers). Specifically, the uncompressed size of ShuffleMapTask results is roughly O(numReducers), and there's a O(numReducers) processing cost for task completions within DAGScheduler (since mapOutputLocations is O(numReducers)). This normally isn't a problem, but it can impact performance for jobs with large numbers of extremely small map tasks (like this job, where nearly all of the map tasks are effectively no-ops). For larger tasks, this cost should be masked by larger overheads (such as task processing time). I'm not sure where the OOM is coming from, but the slow performance that you're observing here is probably due to the new default number of reducers (https://github.com/apache/spark/pull/1138 exposed this in Python by changing it's defaults to match Scala Spark). As a result, I'm not sure that this is a regression from 1.0.2, since it behaves similarly for Scala jobs. I think we already do some compression of the task results and there are probably other improvements that we can make to lower these overheads, but I think we should postpone that to 1.2.0. > Large number of partitions causes OOM > - > > Key: SPARK- > URL: https://issues.apache.org/jira/browse/SPARK- > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances >Reporter: Nicholas Chammas > > Here’s a repro for PySpark: > {code} > a = sc.parallelize(["Nick", "John", "Bob"]) > a = a.repartition(24000) > a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > {code} > This code runs fine on 1.0.2. It returns the following result in just over a > minute: > {code} > [(4, 'NickJohn')] > {code} > However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it > runs for a very, very long time (at least > 45 min) and then fails with > {{java.lang.OutOfMemoryError: Java heap space}}. > Here is a stack trace taken from a run on 1.1.0-rc2: > {code} > >>> a = sc.parallelize(["Nick", "John", "Bob"]) > >>> a = a.repartition(24000) > >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1) > 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent > heart beats: 175143ms exceeds 45000ms > 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent > heart beats: 175359ms exceeds 45000ms > 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent > heart beats: 173061ms exceeds 45000ms > 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent > heart beats: 176816ms exceeds 45000ms > 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent > heart beats: 182241ms exceeds 45000ms > 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager > BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent > heart beats: 178406ms exceeds 45000ms > 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver > thread-3 > java.lang.OutOfMemoryError: Java heap space > at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) > at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) >
[jira] [Updated] (SPARK-3330) Successive test runs with different profiles fail SparkSubmitSuite
[ https://issues.apache.org/jira/browse/SPARK-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Sharma updated SPARK-3330: --- Assignee: Sean Owen > Successive test runs with different profiles fail SparkSubmitSuite > -- > > Key: SPARK-3330 > URL: https://issues.apache.org/jira/browse/SPARK-3330 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.0.2 >Reporter: Sean Owen >Assignee: Sean Owen > > Maven-based Jenkins builds have been failing for a while: > https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/480/HADOOP_PROFILE=hadoop-2.4,label=centos/console > One common cause is that on the second and subsequent runs of "mvn clean > test", at least two assembly JARs will exist in assembly/target. Because > assembly is not a submodule of parent, "mvn clean" is not invoked for > assembly. The presence of two assembly jars causes spark-submit to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3330) Successive test runs with different profiles fail SparkSubmitSuite
[ https://issues.apache.org/jira/browse/SPARK-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Sharma updated SPARK-3330: --- Assignee: (was: Sean Owen) > Successive test runs with different profiles fail SparkSubmitSuite > -- > > Key: SPARK-3330 > URL: https://issues.apache.org/jira/browse/SPARK-3330 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.0.2 >Reporter: Sean Owen > > Maven-based Jenkins builds have been failing for a while: > https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/480/HADOOP_PROFILE=hadoop-2.4,label=centos/console > One common cause is that on the second and subsequent runs of "mvn clean > test", at least two assembly JARs will exist in assembly/target. Because > assembly is not a submodule of parent, "mvn clean" is not invoked for > assembly. The presence of two assembly jars causes spark-submit to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org