[jira] [Commented] (SPARK-3324) YARN module has nonstandard structure which cause compile error In IntelliJ

2014-08-31 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116667#comment-14116667
 ] 

Sean Owen commented on SPARK-3324:
--

Let me try to sketch what's funky about the structure. We have yarn/alpha, 
yarn/common, yarn/stable. Understanding the purpose, I would expect each to be 
a module, and that each has a src/ directory, and that alpha and stable depend 
on common, and the Spark parent activates either yarn/alpha or yarn/stable 
depending on profiles. IntelliJ is fine with that.

However what we have is that yarn/ is a module. But its source is in 
yarn/common. But it's a pom-only module. And yarn/alpha and yarn/stable list it 
as the parent and inherit all of their source directory info and dependencies 
from yarn/, which is not itself a module of code. So each compiles two source 
directories defined in different places. This plus profiles confused IntelliJ 
and required manual intervention.

Maybe I overlook a reason this had to be done, but rejiggering this as three 
simple modules should work.
Again I imagine the question is, is it worth it versus removing yarn/alpha at 
some point in the future? Because it's trivial to fix how IntelliJ reads the 
POMs once by hand in the IDE.

> YARN module has nonstandard structure which cause compile error In IntelliJ
> ---
>
> Key: SPARK-3324
> URL: https://issues.apache.org/jira/browse/SPARK-3324
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.1.0
> Environment: Mac OS: 10.9.4
> IntelliJ IDEA: 13.1.4
> Scala Plugins: 0.41.2
> Maven: 3.0.5
>Reporter: Yi Tian
>Priority: Minor
>  Labels: intellij, maven, yarn
>
> The YARN module has nonstandard path structure like:
> {code}
> ${SPARK_HOME}
>   |--yarn
>  |--alpha (contains yarn api support for 0.23 and 2.0.x)
>  |--stable (contains yarn api support for 2.2 and later)
>  | |--pom.xml (spark-yarn)
>  |--common (Common codes not depending on specific version of Hadoop)
>  |--pom.xml (yarn-parent)
> {code}
> When we use maven to compile yarn module, maven will import 'alpha' or 
> 'stable' module according to profile setting.
> And the submodule like 'stable' use the build propertie defined in 
> yarn/pom.xml to import common codes to sourcePath.
> It will cause IntelliJ can't directly recognize sources in common directory 
> as sourcePath. 
> I thought we should change the yarn module to a unified maven jar project, 
> and add specify different version of yarn api via maven profile setting.
> It will resolve the compile error in IntelliJ and make the yarn module more 
> simple and clear.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3328) --with-tachyon build is broken

2014-08-31 Thread Elijah Epifanov (JIRA)
Elijah Epifanov created SPARK-3328:
--

 Summary: --with-tachyon build is broken
 Key: SPARK-3328
 URL: https://issues.apache.org/jira/browse/SPARK-3328
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.1.0
Reporter: Elijah Epifanov


cp: tachyon-0.5.0/core/src/main/java/tachyon/web/resources: No such file or 
directory




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3328) --with-tachyon build is broken

2014-08-31 Thread Elijah Epifanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elijah Epifanov updated SPARK-3328:
---

Description: 
cp: tachyon-0.5.0/target/tachyon-0.5.0-jar-with-dependencies.jar: No such file 
or directory


  was:
cp: tachyon-0.5.0/core/src/main/java/tachyon/web/resources: No such file or 
directory



> --with-tachyon build is broken
> --
>
> Key: SPARK-3328
> URL: https://issues.apache.org/jira/browse/SPARK-3328
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.1.0
>Reporter: Elijah Epifanov
>
> cp: tachyon-0.5.0/target/tachyon-0.5.0-jar-with-dependencies.jar: No such 
> file or directory



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3324) YARN module has nonstandard structure which cause compile error In IntelliJ

2014-08-31 Thread Yi Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116782#comment-14116782
 ] 

Yi Tian commented on SPARK-3324:


Thanks [~srowen] explain that. 
In my beginning idea, I'd like to remove pom.xml from yarn/stable and 
yarn/alpha, and change the packaging properties in yarn/pom.xml from pom to 
jar. 
When building this module, the pom.xml should dynamically add common/src and 
either yarn/alpha/src or yarn/stable/src to sourcePath. 

> YARN module has nonstandard structure which cause compile error In IntelliJ
> ---
>
> Key: SPARK-3324
> URL: https://issues.apache.org/jira/browse/SPARK-3324
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.1.0
> Environment: Mac OS: 10.9.4
> IntelliJ IDEA: 13.1.4
> Scala Plugins: 0.41.2
> Maven: 3.0.5
>Reporter: Yi Tian
>Priority: Minor
>  Labels: intellij, maven, yarn
>
> The YARN module has nonstandard path structure like:
> {code}
> ${SPARK_HOME}
>   |--yarn
>  |--alpha (contains yarn api support for 0.23 and 2.0.x)
>  |--stable (contains yarn api support for 2.2 and later)
>  | |--pom.xml (spark-yarn)
>  |--common (Common codes not depending on specific version of Hadoop)
>  |--pom.xml (yarn-parent)
> {code}
> When we use maven to compile yarn module, maven will import 'alpha' or 
> 'stable' module according to profile setting.
> And the submodule like 'stable' use the build propertie defined in 
> yarn/pom.xml to import common codes to sourcePath.
> It will cause IntelliJ can't directly recognize sources in common directory 
> as sourcePath. 
> I thought we should change the yarn module to a unified maven jar project, 
> and add specify different version of yarn api via maven profile setting.
> It will resolve the compile error in IntelliJ and make the yarn module more 
> simple and clear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3324) YARN module has nonstandard structure which cause compile error In IntelliJ

2014-08-31 Thread Yi Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116785#comment-14116785
 ] 

Yi Tian commented on SPARK-3324:


BTW, [~pwendell] there is another problem for IntelliJ. 
The spark-streaming-flume-sink module need avro plugin compiled some avro 
files, but the outputDirectory of generated scala source is under target path 
which cause IntelliJ can't recognize them and throw error during compiling the 
spark project.

May I make a PR to fix these problem?

> YARN module has nonstandard structure which cause compile error In IntelliJ
> ---
>
> Key: SPARK-3324
> URL: https://issues.apache.org/jira/browse/SPARK-3324
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.1.0
> Environment: Mac OS: 10.9.4
> IntelliJ IDEA: 13.1.4
> Scala Plugins: 0.41.2
> Maven: 3.0.5
>Reporter: Yi Tian
>Priority: Minor
>  Labels: intellij, maven, yarn
>
> The YARN module has nonstandard path structure like:
> {code}
> ${SPARK_HOME}
>   |--yarn
>  |--alpha (contains yarn api support for 0.23 and 2.0.x)
>  |--stable (contains yarn api support for 2.2 and later)
>  | |--pom.xml (spark-yarn)
>  |--common (Common codes not depending on specific version of Hadoop)
>  |--pom.xml (yarn-parent)
> {code}
> When we use maven to compile yarn module, maven will import 'alpha' or 
> 'stable' module according to profile setting.
> And the submodule like 'stable' use the build propertie defined in 
> yarn/pom.xml to import common codes to sourcePath.
> It will cause IntelliJ can't directly recognize sources in common directory 
> as sourcePath. 
> I thought we should change the yarn module to a unified maven jar project, 
> and add specify different version of yarn api via maven profile setting.
> It will resolve the compile error in IntelliJ and make the yarn module more 
> simple and clear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3324) YARN module has nonstandard structure which cause compile error In IntelliJ

2014-08-31 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116801#comment-14116801
 ] 

Sean Owen commented on SPARK-3324:
--

[~tianyi] I seem to remember having a similar problem. I think that is 
straightforward to fix. It's a separate issue. But FWIW I would like to see 
that improved.

> YARN module has nonstandard structure which cause compile error In IntelliJ
> ---
>
> Key: SPARK-3324
> URL: https://issues.apache.org/jira/browse/SPARK-3324
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.1.0
> Environment: Mac OS: 10.9.4
> IntelliJ IDEA: 13.1.4
> Scala Plugins: 0.41.2
> Maven: 3.0.5
>Reporter: Yi Tian
>Priority: Minor
>  Labels: intellij, maven, yarn
>
> The YARN module has nonstandard path structure like:
> {code}
> ${SPARK_HOME}
>   |--yarn
>  |--alpha (contains yarn api support for 0.23 and 2.0.x)
>  |--stable (contains yarn api support for 2.2 and later)
>  | |--pom.xml (spark-yarn)
>  |--common (Common codes not depending on specific version of Hadoop)
>  |--pom.xml (yarn-parent)
> {code}
> When we use maven to compile yarn module, maven will import 'alpha' or 
> 'stable' module according to profile setting.
> And the submodule like 'stable' use the build propertie defined in 
> yarn/pom.xml to import common codes to sourcePath.
> It will cause IntelliJ can't directly recognize sources in common directory 
> as sourcePath. 
> I thought we should change the yarn module to a unified maven jar project, 
> and add specify different version of yarn api via maven profile setting.
> It will resolve the compile error in IntelliJ and make the yarn module more 
> simple and clear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3329) HiveQuerySuite SET tests depend on map orderings

2014-08-31 Thread William Benton (JIRA)
William Benton created SPARK-3329:
-

 Summary: HiveQuerySuite SET tests depend on map orderings
 Key: SPARK-3329
 URL: https://issues.apache.org/jira/browse/SPARK-3329
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.2, 1.1.0
Reporter: William Benton
Priority: Trivial


The SET tests in HiveQuerySuite that return multiple values depend on the 
ordering in which map pairs are returned from Hive and can fail spuriously if 
this changes due to environment or library changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3329) HiveQuerySuite SET tests depend on map orderings

2014-08-31 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116821#comment-14116821
 ] 

Apache Spark commented on SPARK-3329:
-

User 'willb' has created a pull request for this issue:
https://github.com/apache/spark/pull/2220

> HiveQuerySuite SET tests depend on map orderings
> 
>
> Key: SPARK-3329
> URL: https://issues.apache.org/jira/browse/SPARK-3329
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.0.2, 1.1.0
>Reporter: William Benton
>Priority: Trivial
>
> The SET tests in HiveQuerySuite that return multiple values depend on the 
> ordering in which map pairs are returned from Hive and can fail spuriously if 
> this changes due to environment or library changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3330) Successive test runs with different profiles fail SparkSubmitSuite

2014-08-31 Thread Sean Owen (JIRA)
Sean Owen created SPARK-3330:


 Summary: Successive test runs with different profiles fail 
SparkSubmitSuite
 Key: SPARK-3330
 URL: https://issues.apache.org/jira/browse/SPARK-3330
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.0.2
Reporter: Sean Owen


Maven-based Jenkins builds have been failing for a while:
https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/480/HADOOP_PROFILE=hadoop-2.4,label=centos/console

One common cause is that on the second and subsequent runs of "mvn clean test", 
at least two assembly JARs will exist in assembly/target. Because assembly is 
not a submodule of parent, "mvn clean" is not invoked for assembly. The 
presence of two assembly jars causes spark-submit to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3331) PEP8 tests fail in release because they check unzipped py4j code

2014-08-31 Thread Sean Owen (JIRA)
Sean Owen created SPARK-3331:


 Summary: PEP8 tests fail in release because they check unzipped 
py4j code
 Key: SPARK-3331
 URL: https://issues.apache.org/jira/browse/SPARK-3331
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.0.2
Reporter: Sean Owen
Priority: Minor


PEP8 tests run on files under "./python", but in the release packaging, py4j 
code is present in "./python/build/py4j". Py4J code fails style checks and thus 
release fails ./dev/run-tests now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3331) PEP8 tests fail because they check unzipped py4j code

2014-08-31 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-3331:
-
Summary: PEP8 tests fail because they check unzipped py4j code  (was: PEP8 
tests fail in release because they check unzipped py4j code)

> PEP8 tests fail because they check unzipped py4j code
> -
>
> Key: SPARK-3331
> URL: https://issues.apache.org/jira/browse/SPARK-3331
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.0.2
>Reporter: Sean Owen
>Priority: Minor
>
> PEP8 tests run on files under "./python", but in the release packaging, py4j 
> code is present in "./python/build/py4j". Py4J code fails style checks and 
> thus release fails ./dev/run-tests now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3331) PEP8 tests fail because they check unzipped py4j code

2014-08-31 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-3331:
-
Description: PEP8 tests run on files under "./python", but unzipped py4j 
code is found at "./python/build/py4j". Py4J code fails style checks and can 
fail ./dev/run-tests if this code is present locally.  (was: PEP8 tests run on 
files under "./python", but in the release packaging, py4j code is present in 
"./python/build/py4j". Py4J code fails style checks and thus release fails 
./dev/run-tests now.)

> PEP8 tests fail because they check unzipped py4j code
> -
>
> Key: SPARK-3331
> URL: https://issues.apache.org/jira/browse/SPARK-3331
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.0.2
>Reporter: Sean Owen
>Priority: Minor
>
> PEP8 tests run on files under "./python", but unzipped py4j code is found at 
> "./python/build/py4j". Py4J code fails style checks and can fail 
> ./dev/run-tests if this code is present locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3330) Successive test runs with different profiles fail SparkSubmitSuite

2014-08-31 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116829#comment-14116829
 ] 

Apache Spark commented on SPARK-3330:
-

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/2221

> Successive test runs with different profiles fail SparkSubmitSuite
> --
>
> Key: SPARK-3330
> URL: https://issues.apache.org/jira/browse/SPARK-3330
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.0.2
>Reporter: Sean Owen
>
> Maven-based Jenkins builds have been failing for a while:
> https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/480/HADOOP_PROFILE=hadoop-2.4,label=centos/console
> One common cause is that on the second and subsequent runs of "mvn clean 
> test", at least two assembly JARs will exist in assembly/target. Because 
> assembly is not a submodule of parent, "mvn clean" is not invoked for 
> assembly. The presence of two assembly jars causes spark-submit to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3331) PEP8 tests fail because they check unzipped py4j code

2014-08-31 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116830#comment-14116830
 ] 

Apache Spark commented on SPARK-3331:
-

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/

> PEP8 tests fail because they check unzipped py4j code
> -
>
> Key: SPARK-3331
> URL: https://issues.apache.org/jira/browse/SPARK-3331
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.0.2
>Reporter: Sean Owen
>Priority: Minor
>
> PEP8 tests run on files under "./python", but unzipped py4j code is found at 
> "./python/build/py4j". Py4J code fails style checks and can fail 
> ./dev/run-tests if this code is present locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2870) Thorough schema inference directly on RDDs of Python dictionaries

2014-08-31 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116868#comment-14116868
 ] 

Nicholas Chammas commented on SPARK-2870:
-

[~marmbrus], [~davies], [~yhuai] - We discussed this feature on the user list 
some weeks back. Just pinging you here to make sure this feature request is on 
your radar.

> Thorough schema inference directly on RDDs of Python dictionaries
> -
>
> Key: SPARK-2870
> URL: https://issues.apache.org/jira/browse/SPARK-2870
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Reporter: Nicholas Chammas
>
> h4. Background
> I love the {{SQLContext.jsonRDD()}} and {{SQLContext.jsonFile()}} methods. 
> They process JSON text directly and infer a schema that covers the entire 
> source data set. 
> This is very important with semi-structured data like JSON since individual 
> elements in the data set are free to have different structures. Matching 
> fields across elements may even have different value types.
> For example:
> {code}
> {"a": 5}
> {"a": "cow"}
> {code}
> To get a queryable schema that covers the whole data set, you need to infer a 
> schema by looking at the whole data set. The aforementioned 
> {{SQLContext.json...()}} methods do this very well. 
> h4. Feature Request
> What we need is for {{SQlContext.inferSchema()}} to do this, too. 
> Alternatively, we need a new {{SQLContext}} method that works on RDDs of 
> Python dictionaries and does something functionally equivalent to this:
> {code}
> SQLContext.jsonRDD(RDD[dict].map(lambda x: json.dumps(x)))
> {code}
> As of 1.0.2, 
> [{{inferSchema()}}|http://spark.apache.org/docs/latest/api/python/pyspark.sql.SQLContext-class.html#inferSchema]
>  just looks at the first element in the data set. This won't help much when 
> the structure of the elements in the target RDD is variable.
> h4. Example Use Case
> * You have some JSON text data that you want to analyze using Spark SQL. 
> * You would use one of the {{SQLContext.json...()}} methods, but you need to 
> do some filtering on the data first to remove bad elements--basically, some 
> minimal schema validation.
> * You deserialize the JSON objects to Python {{dict}} s and filter out the 
> bad ones. You now have an RDD of dictionaries.
> * From this RDD, you want a SchemaRDD that captures the schema for the whole 
> data set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3332) Tags shouldn't be the main strategy for machine membership on clusters

2014-08-31 Thread Allan Douglas R. de Oliveira (JIRA)
Allan Douglas R. de Oliveira created SPARK-3332:
---

 Summary: Tags shouldn't be the main strategy for machine 
membership on clusters
 Key: SPARK-3332
 URL: https://issues.apache.org/jira/browse/SPARK-3332
 Project: Spark
  Issue Type: Bug
  Components: EC2
Reporter: Allan Douglas R. de Oliveira


The implementation for SPARK-2333 changed the machine membership mechanism from 
security groups to tags.

This is a fundamentally flawed strategy as there aren't guarantees at all the 
machines will have a tag (even with a retry mechanism).

For instance, if the script is killed after launching the instances but before 
setting the tags the machines will be "invisible" to a destroy command, leaving 
a unmanageable cluster behind.

The initial proposal is to go back to the previous behavior for all cases but 
when the new flag (--security-group-prefix) is used.

Also it's worthwhile to mention that SPARK-3180 introduced the 
--additional-security-group flag which is a reasonable solution to SPARK-2333 
(but isn't a full replacement to all use cases of --security-group-prefix).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3332) Tags shouldn't be the main strategy for machine membership on clusters

2014-08-31 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116890#comment-14116890
 ] 

Apache Spark commented on SPARK-3332:
-

User 'douglaz' has created a pull request for this issue:
https://github.com/apache/spark/pull/2223

> Tags shouldn't be the main strategy for machine membership on clusters
> --
>
> Key: SPARK-3332
> URL: https://issues.apache.org/jira/browse/SPARK-3332
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Reporter: Allan Douglas R. de Oliveira
>
> The implementation for SPARK-2333 changed the machine membership mechanism 
> from security groups to tags.
> This is a fundamentally flawed strategy as there aren't guarantees at all the 
> machines will have a tag (even with a retry mechanism).
> For instance, if the script is killed after launching the instances but 
> before setting the tags the machines will be "invisible" to a destroy 
> command, leaving a unmanageable cluster behind.
> The initial proposal is to go back to the previous behavior for all cases but 
> when the new flag (--security-group-prefix) is used.
> Also it's worthwhile to mention that SPARK-3180 introduced the 
> --additional-security-group flag which is a reasonable solution to SPARK-2333 
> (but isn't a full replacement to all use cases of --security-group-prefix).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3010) fix redundant conditional

2014-08-31 Thread Matei Zaharia (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated SPARK-3010:
-
Assignee: wangfei

> fix redundant conditional
> -
>
> Key: SPARK-3010
> URL: https://issues.apache.org/jira/browse/SPARK-3010
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.0.2
>Reporter: wangfei
>Assignee: wangfei
> Fix For: 1.1.0
>
>
> there are some redundant conditional in spark, such as 
> 1.
> private[spark] def codegenEnabled: Boolean =
>   if (getConf(CODEGEN_ENABLED, "false") == "true") true else false
> 2.
> x => if (x == 2) true else false
> ... etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3010) fix redundant conditional

2014-08-31 Thread Matei Zaharia (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia resolved SPARK-3010.
--
  Resolution: Fixed
   Fix Version/s: (was: 1.1.0)
  1.2.0
Target Version/s: 1.2.0  (was: 1.1.0)

> fix redundant conditional
> -
>
> Key: SPARK-3010
> URL: https://issues.apache.org/jira/browse/SPARK-3010
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.0.2
>Reporter: wangfei
>Assignee: wangfei
> Fix For: 1.2.0
>
>
> there are some redundant conditional in spark, such as 
> 1.
> private[spark] def codegenEnabled: Boolean =
>   if (getConf(CODEGEN_ENABLED, "false") == "true") true else false
> 2.
> x => if (x == 2) true else false
> ... etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3010) fix redundant conditional

2014-08-31 Thread Matei Zaharia (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated SPARK-3010:
-
Priority: Trivial  (was: Major)

> fix redundant conditional
> -
>
> Key: SPARK-3010
> URL: https://issues.apache.org/jira/browse/SPARK-3010
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.0.2
>Reporter: wangfei
>Assignee: wangfei
>Priority: Trivial
> Fix For: 1.2.0
>
>
> there are some redundant conditional in spark, such as 
> 1.
> private[spark] def codegenEnabled: Boolean =
>   if (getConf(CODEGEN_ENABLED, "false") == "true") true else false
> 2.
> x => if (x == 2) true else false
> ... etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-:
---

 Summary: Large number of partitions causes OOM
 Key: SPARK-
 URL: https://issues.apache.org/jira/browse/SPARK-
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.1.0
 Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
Reporter: Nicholas Chammas


Here’s a repro for PySpark:

{code}
a = sc.parallelize(["Nick", "John", "Bob"])
a = a.repartition(24000)
a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
{code}

This code runs fine on 1.0.2. It returns the following result in just over a 
minute:

{code}
[(4, 'NickJohn')]
{code}

However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
runs for a very, very long time and then fails with 
{{java.lang.OutOfMemoryError: Java heap space}}.

Here is a stack trace taken from a run on 1.1.0-rc2:

{code}
>>> a = sc.parallelize(["Nick", "John", "Bob"])
>>> a = a.repartition(24000)
>>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart 
beats: 175143ms exceeds 45000ms
14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
heart beats: 175359ms exceeds 45000ms
14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
heart beats: 173061ms exceeds 45000ms
14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart 
beats: 176816ms exceeds 45000ms
14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
heart beats: 182241ms exceeds 45000ms
14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart 
beats: 178406ms exceeds 45000ms
14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
thread-3
java.lang.OutOfMemoryError: Java heap space
at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at 
org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
at 
org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR 
SendingConnection: Exception while reading SendingConnection to 
ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
at org.apache.spark.network.SendingConnection.read(Connection.scala:390)
at 
org.apache.spark.network.ConnectionManager$$anon$7.run(ConnectionManager.scala:199)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
java.lang.OutOfMemoryError: Java heap space
at com.esotericsoftware.kryo.io.Inp

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116895#comment-14116895
 ] 

Nicholas Chammas commented on SPARK-:
-

Note: I have not yet confirmed that 1.1.0-rc3 yields the exact same stack trace 
as the one provided above (which is for 1.1.0-rc2), though I expect them to be 
the same. I _can_ confirm that it takes a very, very long time to run, as it is 
running right now on rc3 and has been for about 45 minutes. Since I have to be 
offline for a bit, I thought I'd report this issue ASAP with the rc2 stack 
trace and update it later with a stack trace from rc3.

> Large number of partitions causes OOM
> -
>
> Key: SPARK-
> URL: https://issues.apache.org/jira/browse/SPARK-
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
>Reporter: Nicholas Chammas
>
> Here’s a repro for PySpark:
> {code}
> a = sc.parallelize(["Nick", "John", "Bob"])
> a = a.repartition(24000)
> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> {code}
> This code runs fine on 1.0.2. It returns the following result in just over a 
> minute:
> {code}
> [(4, 'NickJohn')]
> {code}
> However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
> runs for a very, very long time (at least > 45 min) and then fails with 
> {{java.lang.OutOfMemoryError: Java heap space}}.
> Here is a stack trace taken from a run on 1.1.0-rc2:
> {code}
> >>> a = sc.parallelize(["Nick", "John", "Bob"])
> >>> a = a.repartition(24000)
> >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
> heart beats: 175143ms exceeds 45000ms
> 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
> heart beats: 175359ms exceeds 45000ms
> 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
> heart beats: 173061ms exceeds 45000ms
> 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
> heart beats: 176816ms exceeds 45000ms
> 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
> heart beats: 182241ms exceeds 45000ms
> 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
> heart beats: 178406ms exceeds 45000ms
> 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
> thread-3
> java.lang.OutOfMemoryError: Java heap space
> at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
> at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
> at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
> at 
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Exception in thread "Result re

[jira] [Updated] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-:

Description: 
Here’s a repro for PySpark:

{code}
a = sc.parallelize(["Nick", "John", "Bob"])
a = a.repartition(24000)
a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
{code}

This code runs fine on 1.0.2. It returns the following result in just over a 
minute:

{code}
[(4, 'NickJohn')]
{code}

However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
runs for a very, very long time (at least > 45 min) and then fails with 
{{java.lang.OutOfMemoryError: Java heap space}}.

Here is a stack trace taken from a run on 1.1.0-rc2:

{code}
>>> a = sc.parallelize(["Nick", "John", "Bob"])
>>> a = a.repartition(24000)
>>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent heart 
beats: 175143ms exceeds 45000ms
14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
heart beats: 175359ms exceeds 45000ms
14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
heart beats: 173061ms exceeds 45000ms
14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent heart 
beats: 176816ms exceeds 45000ms
14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
heart beats: 182241ms exceeds 45000ms
14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent heart 
beats: 178406ms exceeds 45000ms
14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
thread-3
java.lang.OutOfMemoryError: Java heap space
at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at 
org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
at 
org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR 
SendingConnection: Exception while reading SendingConnection to 
ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
at org.apache.spark.network.SendingConnection.read(Connection.scala:390)
at 
org.apache.spark.network.ConnectionManager$$anon$7.run(ConnectionManager.scala:199)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
java.lang.OutOfMemoryError: Java heap space
at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializer

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116902#comment-14116902
 ] 

Patrick Wendell commented on SPARK-:


Hey [~nchammas] - I don't think anything relevant to this issue has changed 
between RC2 and RC3, so the RC2 trace is probably sufficient.

> Large number of partitions causes OOM
> -
>
> Key: SPARK-
> URL: https://issues.apache.org/jira/browse/SPARK-
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
>Reporter: Nicholas Chammas
>
> Here’s a repro for PySpark:
> {code}
> a = sc.parallelize(["Nick", "John", "Bob"])
> a = a.repartition(24000)
> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> {code}
> This code runs fine on 1.0.2. It returns the following result in just over a 
> minute:
> {code}
> [(4, 'NickJohn')]
> {code}
> However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
> runs for a very, very long time (at least > 45 min) and then fails with 
> {{java.lang.OutOfMemoryError: Java heap space}}.
> Here is a stack trace taken from a run on 1.1.0-rc2:
> {code}
> >>> a = sc.parallelize(["Nick", "John", "Bob"])
> >>> a = a.repartition(24000)
> >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
> heart beats: 175143ms exceeds 45000ms
> 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
> heart beats: 175359ms exceeds 45000ms
> 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
> heart beats: 173061ms exceeds 45000ms
> 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
> heart beats: 176816ms exceeds 45000ms
> 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
> heart beats: 182241ms exceeds 45000ms
> 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
> heart beats: 178406ms exceeds 45000ms
> 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
> thread-3
> java.lang.OutOfMemoryError: Java heap space
> at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
> at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
> at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
> at 
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR 
> SendingConnection: Exception while reading SendingConnection to 
> ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
> java.nio.channels.ClosedChannelException
> at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
> at sun.nio.ch.Sock

[jira] [Updated] (SPARK-3332) Tagging is not atomic with launching instances on EC2

2014-08-31 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3332:
---
Summary: Tagging is not atomic with launching instances on EC2  (was: Tags 
shouldn't be the main strategy for machine membership on clusters)

> Tagging is not atomic with launching instances on EC2
> -
>
> Key: SPARK-3332
> URL: https://issues.apache.org/jira/browse/SPARK-3332
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Reporter: Allan Douglas R. de Oliveira
>
> The implementation for SPARK-2333 changed the machine membership mechanism 
> from security groups to tags.
> This is a fundamentally flawed strategy as there aren't guarantees at all the 
> machines will have a tag (even with a retry mechanism).
> For instance, if the script is killed after launching the instances but 
> before setting the tags the machines will be "invisible" to a destroy 
> command, leaving a unmanageable cluster behind.
> The initial proposal is to go back to the previous behavior for all cases but 
> when the new flag (--security-group-prefix) is used.
> Also it's worthwhile to mention that SPARK-3180 introduced the 
> --additional-security-group flag which is a reasonable solution to SPARK-2333 
> (but isn't a full replacement to all use cases of --security-group-prefix).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3332) Tagging is not atomic with launching instances on EC2

2014-08-31 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116906#comment-14116906
 ] 

Patrick Wendell commented on SPARK-3332:


I changed the title slightly - I think the underlying problem here is that 
tagging is not atomic with launching instances. Removing the use of tagging 
entirely is one potential solution for this. Another is that we just print 
better errors if tagging does not succeed and explain there might be orphaned 
instances.

The reason why SPARK-2333 was added is that the current approach can lead to 
too many security groups, so we can't just revert this without any cost.

In the mean time I'd like to revert SPARK-2333 in branch-1.1 so we can defer 
the design decision to the 1.2 release timeframe. Thanks [~douglaz] for 
highlighting this issue.

> Tagging is not atomic with launching instances on EC2
> -
>
> Key: SPARK-3332
> URL: https://issues.apache.org/jira/browse/SPARK-3332
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Reporter: Allan Douglas R. de Oliveira
>
> The implementation for SPARK-2333 changed the machine membership mechanism 
> from security groups to tags.
> This is a fundamentally flawed strategy as there aren't guarantees at all the 
> machines will have a tag (even with a retry mechanism).
> For instance, if the script is killed after launching the instances but 
> before setting the tags the machines will be "invisible" to a destroy 
> command, leaving a unmanageable cluster behind.
> The initial proposal is to go back to the previous behavior for all cases but 
> when the new flag (--security-group-prefix) is used.
> Also it's worthwhile to mention that SPARK-3180 introduced the 
> --additional-security-group flag which is a reasonable solution to SPARK-2333 
> (but isn't a full replacement to all use cases of --security-group-prefix).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3332) Tagging is not atomic with launching instances on EC2

2014-08-31 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3332:
---
Target Version/s: 1.2.0

> Tagging is not atomic with launching instances on EC2
> -
>
> Key: SPARK-3332
> URL: https://issues.apache.org/jira/browse/SPARK-3332
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Reporter: Allan Douglas R. de Oliveira
>
> The implementation for SPARK-2333 changed the machine membership mechanism 
> from security groups to tags.
> This is a fundamentally flawed strategy as there aren't guarantees at all the 
> machines will have a tag (even with a retry mechanism).
> For instance, if the script is killed after launching the instances but 
> before setting the tags the machines will be "invisible" to a destroy 
> command, leaving a unmanageable cluster behind.
> The initial proposal is to go back to the previous behavior for all cases but 
> when the new flag (--security-group-prefix) is used.
> Also it's worthwhile to mention that SPARK-3180 introduced the 
> --additional-security-group flag which is a reasonable solution to SPARK-2333 
> (but isn't a full replacement to all use cases of --security-group-prefix).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3332) Tagging is not atomic with launching instances on EC2

2014-08-31 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116910#comment-14116910
 ] 

Apache Spark commented on SPARK-3332:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/2225

> Tagging is not atomic with launching instances on EC2
> -
>
> Key: SPARK-3332
> URL: https://issues.apache.org/jira/browse/SPARK-3332
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Reporter: Allan Douglas R. de Oliveira
>
> The implementation for SPARK-2333 changed the machine membership mechanism 
> from security groups to tags.
> This is a fundamentally flawed strategy as there aren't guarantees at all the 
> machines will have a tag (even with a retry mechanism).
> For instance, if the script is killed after launching the instances but 
> before setting the tags the machines will be "invisible" to a destroy 
> command, leaving a unmanageable cluster behind.
> The initial proposal is to go back to the previous behavior for all cases but 
> when the new flag (--security-group-prefix) is used.
> Also it's worthwhile to mention that SPARK-3180 introduced the 
> --additional-security-group flag which is a reasonable solution to SPARK-2333 
> (but isn't a full replacement to all use cases of --security-group-prefix).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2870) Thorough schema inference directly on RDDs of Python dictionaries

2014-08-31 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-2870:

Target Version/s: 1.2.0

> Thorough schema inference directly on RDDs of Python dictionaries
> -
>
> Key: SPARK-2870
> URL: https://issues.apache.org/jira/browse/SPARK-2870
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Reporter: Nicholas Chammas
>
> h4. Background
> I love the {{SQLContext.jsonRDD()}} and {{SQLContext.jsonFile()}} methods. 
> They process JSON text directly and infer a schema that covers the entire 
> source data set. 
> This is very important with semi-structured data like JSON since individual 
> elements in the data set are free to have different structures. Matching 
> fields across elements may even have different value types.
> For example:
> {code}
> {"a": 5}
> {"a": "cow"}
> {code}
> To get a queryable schema that covers the whole data set, you need to infer a 
> schema by looking at the whole data set. The aforementioned 
> {{SQLContext.json...()}} methods do this very well. 
> h4. Feature Request
> What we need is for {{SQlContext.inferSchema()}} to do this, too. 
> Alternatively, we need a new {{SQLContext}} method that works on RDDs of 
> Python dictionaries and does something functionally equivalent to this:
> {code}
> SQLContext.jsonRDD(RDD[dict].map(lambda x: json.dumps(x)))
> {code}
> As of 1.0.2, 
> [{{inferSchema()}}|http://spark.apache.org/docs/latest/api/python/pyspark.sql.SQLContext-class.html#inferSchema]
>  just looks at the first element in the data set. This won't help much when 
> the structure of the elements in the target RDD is variable.
> h4. Example Use Case
> * You have some JSON text data that you want to analyze using Spark SQL. 
> * You would use one of the {{SQLContext.json...()}} methods, but you need to 
> do some filtering on the data first to remove bad elements--basically, some 
> minimal schema validation.
> * You deserialize the JSON objects to Python {{dict}} s and filter out the 
> bad ones. You now have an RDD of dictionaries.
> * From this RDD, you want a SchemaRDD that captures the schema for the whole 
> data set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2870) Thorough schema inference directly on RDDs of Python dictionaries

2014-08-31 Thread Michael Armbrust (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116915#comment-14116915
 ] 

Michael Armbrust commented on SPARK-2870:
-

Yeah, thanks for pinging me.  I've targeted this JIRA for 1.2.

> Thorough schema inference directly on RDDs of Python dictionaries
> -
>
> Key: SPARK-2870
> URL: https://issues.apache.org/jira/browse/SPARK-2870
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Reporter: Nicholas Chammas
>
> h4. Background
> I love the {{SQLContext.jsonRDD()}} and {{SQLContext.jsonFile()}} methods. 
> They process JSON text directly and infer a schema that covers the entire 
> source data set. 
> This is very important with semi-structured data like JSON since individual 
> elements in the data set are free to have different structures. Matching 
> fields across elements may even have different value types.
> For example:
> {code}
> {"a": 5}
> {"a": "cow"}
> {code}
> To get a queryable schema that covers the whole data set, you need to infer a 
> schema by looking at the whole data set. The aforementioned 
> {{SQLContext.json...()}} methods do this very well. 
> h4. Feature Request
> What we need is for {{SQlContext.inferSchema()}} to do this, too. 
> Alternatively, we need a new {{SQLContext}} method that works on RDDs of 
> Python dictionaries and does something functionally equivalent to this:
> {code}
> SQLContext.jsonRDD(RDD[dict].map(lambda x: json.dumps(x)))
> {code}
> As of 1.0.2, 
> [{{inferSchema()}}|http://spark.apache.org/docs/latest/api/python/pyspark.sql.SQLContext-class.html#inferSchema]
>  just looks at the first element in the data set. This won't help much when 
> the structure of the elements in the target RDD is variable.
> h4. Example Use Case
> * You have some JSON text data that you want to analyze using Spark SQL. 
> * You would use one of the {{SQLContext.json...()}} methods, but you need to 
> do some filtering on the data first to remove bad elements--basically, some 
> minimal schema validation.
> * You deserialize the JSON objects to Python {{dict}} s and filter out the 
> bad ones. You now have an RDD of dictionaries.
> * From this RDD, you want a SchemaRDD that captures the schema for the whole 
> data set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-3317) The loss of regularization in Updater should use the oldWeights

2014-08-31 Thread DB Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai closed SPARK-3317.
--
Resolution: Won't Fix

> The loss of regularization in Updater should use the oldWeights
> ---
>
> Key: SPARK-3317
> URL: https://issues.apache.org/jira/browse/SPARK-3317
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Reporter: DB Tsai
>
> The current loss of the regularization is computed from the newWeights which 
> is not correct.  The loss, R(w) = 1/2 ||w||^2 should be computed with the 
> oldWeights.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116918#comment-14116918
 ] 

Josh Rosen commented on SPARK-:
---

Even on my laptop, I notice a huge speed difference between 1.0.2 and 1.1.0 
here.  I haven't hit the OOM yet, though.

> Large number of partitions causes OOM
> -
>
> Key: SPARK-
> URL: https://issues.apache.org/jira/browse/SPARK-
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
>Reporter: Nicholas Chammas
>
> Here’s a repro for PySpark:
> {code}
> a = sc.parallelize(["Nick", "John", "Bob"])
> a = a.repartition(24000)
> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> {code}
> This code runs fine on 1.0.2. It returns the following result in just over a 
> minute:
> {code}
> [(4, 'NickJohn')]
> {code}
> However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
> runs for a very, very long time (at least > 45 min) and then fails with 
> {{java.lang.OutOfMemoryError: Java heap space}}.
> Here is a stack trace taken from a run on 1.1.0-rc2:
> {code}
> >>> a = sc.parallelize(["Nick", "John", "Bob"])
> >>> a = a.repartition(24000)
> >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
> heart beats: 175143ms exceeds 45000ms
> 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
> heart beats: 175359ms exceeds 45000ms
> 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
> heart beats: 173061ms exceeds 45000ms
> 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
> heart beats: 176816ms exceeds 45000ms
> 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
> heart beats: 182241ms exceeds 45000ms
> 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
> heart beats: 178406ms exceeds 45000ms
> 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
> thread-3
> java.lang.OutOfMemoryError: Java heap space
> at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
> at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
> at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
> at 
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR 
> SendingConnection: Exception while reading SendingConnection to 
> ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
> java.nio.channels.ClosedChannelException
> at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
> at sun.nio.ch.SocketChannelImpl.read(SocketChann

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Matei Zaharia (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116923#comment-14116923
 ] 

Matei Zaharia commented on SPARK-:
--

The slowdown might be partly due to adding external spilling in Python, but 
it's weird that this would crash the driver.

> Large number of partitions causes OOM
> -
>
> Key: SPARK-
> URL: https://issues.apache.org/jira/browse/SPARK-
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
>Reporter: Nicholas Chammas
>
> Here’s a repro for PySpark:
> {code}
> a = sc.parallelize(["Nick", "John", "Bob"])
> a = a.repartition(24000)
> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> {code}
> This code runs fine on 1.0.2. It returns the following result in just over a 
> minute:
> {code}
> [(4, 'NickJohn')]
> {code}
> However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
> runs for a very, very long time (at least > 45 min) and then fails with 
> {{java.lang.OutOfMemoryError: Java heap space}}.
> Here is a stack trace taken from a run on 1.1.0-rc2:
> {code}
> >>> a = sc.parallelize(["Nick", "John", "Bob"])
> >>> a = a.repartition(24000)
> >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
> heart beats: 175143ms exceeds 45000ms
> 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
> heart beats: 175359ms exceeds 45000ms
> 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
> heart beats: 173061ms exceeds 45000ms
> 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
> heart beats: 176816ms exceeds 45000ms
> 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
> heart beats: 182241ms exceeds 45000ms
> 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
> heart beats: 178406ms exceeds 45000ms
> 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
> thread-3
> java.lang.OutOfMemoryError: Java heap space
> at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
> at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
> at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
> at 
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR 
> SendingConnection: Exception while reading SendingConnection to 
> ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
> java.nio.channels.ClosedChannelException
> at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
> at sun.nio.ch.SocketChannelImpl.read(So

[jira] [Resolved] (SPARK-3320) Batched in-memory column buffer building doesn't work for SchemaRDDs with empty partitions

2014-08-31 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3320.
-
   Resolution: Fixed
Fix Version/s: 1.1.0
 Assignee: Cheng Lian

> Batched in-memory column buffer building doesn't work for SchemaRDDs with 
> empty partitions
> --
>
> Key: SPARK-3320
> URL: https://issues.apache.org/jira/browse/SPARK-3320
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.0.2
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>Priority: Blocker
> Fix For: 1.1.0
>
>
> Empty partition iterator is not properly handled in 
> [#1880|https://github.com/apache/spark/pull/1880/files#diff-b47dac3d98014877d5879f5cf37ab0d1R115],
>  and throws exception when accessing empty partition of the target SchemaRDD



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116935#comment-14116935
 ] 

Davies Liu commented on SPARK-:
---

[~matei] I think this is not related to external spilling in Python, because 
the dataset is too small that it will not trigger spilling in Python. 

Also, this slowdown can be reproduced in Scala, such as:

sc.parallelize(1 to 10)
rdd.repartition(24000).keyBy(x=>x).reduceByKey(_+_).collect()

The second stage ( 24000 tasks) will take 105 mins (not finished yet). 


> Large number of partitions causes OOM
> -
>
> Key: SPARK-
> URL: https://issues.apache.org/jira/browse/SPARK-
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
>Reporter: Nicholas Chammas
>
> Here’s a repro for PySpark:
> {code}
> a = sc.parallelize(["Nick", "John", "Bob"])
> a = a.repartition(24000)
> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> {code}
> This code runs fine on 1.0.2. It returns the following result in just over a 
> minute:
> {code}
> [(4, 'NickJohn')]
> {code}
> However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
> runs for a very, very long time (at least > 45 min) and then fails with 
> {{java.lang.OutOfMemoryError: Java heap space}}.
> Here is a stack trace taken from a run on 1.1.0-rc2:
> {code}
> >>> a = sc.parallelize(["Nick", "John", "Bob"])
> >>> a = a.repartition(24000)
> >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
> heart beats: 175143ms exceeds 45000ms
> 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
> heart beats: 175359ms exceeds 45000ms
> 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
> heart beats: 173061ms exceeds 45000ms
> 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
> heart beats: 176816ms exceeds 45000ms
> 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
> heart beats: 182241ms exceeds 45000ms
> 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
> heart beats: 178406ms exceeds 45000ms
> 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
> thread-3
> java.lang.OutOfMemoryError: Java heap space
> at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
> at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
> at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
> at 
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR 
> SendingConnection: Exception while reading SendingCon

[jira] [Comment Edited] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116935#comment-14116935
 ] 

Davies Liu edited comment on SPARK- at 8/31/14 11:26 PM:
-

[~matei] I think this is not related to external spilling in Python, because 
the dataset is too small that it will not trigger spilling in Python. 

Also, this slowdown can be reproduced in Scala, such as:

sc.parallelize(1 to 10)
rdd.repartition(24000).keyBy(x=>x).reduceByKey(_+_).collect()

The second stage ( 24000 tasks) will take 105 mins (not finished yet). 

PS: I am running on master.



was (Author: davies):
[~matei] I think this is not related to external spilling in Python, because 
the dataset is too small that it will not trigger spilling in Python. 

Also, this slowdown can be reproduced in Scala, such as:

sc.parallelize(1 to 10)
rdd.repartition(24000).keyBy(x=>x).reduceByKey(_+_).collect()

The second stage ( 24000 tasks) will take 105 mins (not finished yet). 


> Large number of partitions causes OOM
> -
>
> Key: SPARK-
> URL: https://issues.apache.org/jira/browse/SPARK-
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
>Reporter: Nicholas Chammas
>
> Here’s a repro for PySpark:
> {code}
> a = sc.parallelize(["Nick", "John", "Bob"])
> a = a.repartition(24000)
> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> {code}
> This code runs fine on 1.0.2. It returns the following result in just over a 
> minute:
> {code}
> [(4, 'NickJohn')]
> {code}
> However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
> runs for a very, very long time (at least > 45 min) and then fails with 
> {{java.lang.OutOfMemoryError: Java heap space}}.
> Here is a stack trace taken from a run on 1.1.0-rc2:
> {code}
> >>> a = sc.parallelize(["Nick", "John", "Bob"])
> >>> a = a.repartition(24000)
> >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
> heart beats: 175143ms exceeds 45000ms
> 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
> heart beats: 175359ms exceeds 45000ms
> 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
> heart beats: 173061ms exceeds 45000ms
> 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
> heart beats: 176816ms exceeds 45000ms
> 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
> heart beats: 182241ms exceeds 45000ms
> 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
> heart beats: 178406ms exceeds 45000ms
> 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
> thread-3
> java.lang.OutOfMemoryError: Java heap space
> at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
> at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
> at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
> at 
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(U

[jira] [Comment Edited] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116935#comment-14116935
 ] 

Davies Liu edited comment on SPARK- at 8/31/14 11:27 PM:
-

[~matei] I think this is not related to external spilling in Python, because 
the dataset is too small that it will not trigger spilling in Python. 

Also, this slowdown can be reproduced in Scala, such as:

sc.parallelize(1 to 10)
rdd.repartition(24000).keyBy(x=>x).reduceByKey(_+_).collect()

The second stage ( 24000 tasks) will take 105 mins (not finished yet). 

The CPU usage of JVM is 250%, memory is about 655M, it may trigger OOM 
somewhere. 

PS: I am running on master.



was (Author: davies):
[~matei] I think this is not related to external spilling in Python, because 
the dataset is too small that it will not trigger spilling in Python. 

Also, this slowdown can be reproduced in Scala, such as:

sc.parallelize(1 to 10)
rdd.repartition(24000).keyBy(x=>x).reduceByKey(_+_).collect()

The second stage ( 24000 tasks) will take 105 mins (not finished yet). 

PS: I am running on master.


> Large number of partitions causes OOM
> -
>
> Key: SPARK-
> URL: https://issues.apache.org/jira/browse/SPARK-
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
>Reporter: Nicholas Chammas
>
> Here’s a repro for PySpark:
> {code}
> a = sc.parallelize(["Nick", "John", "Bob"])
> a = a.repartition(24000)
> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> {code}
> This code runs fine on 1.0.2. It returns the following result in just over a 
> minute:
> {code}
> [(4, 'NickJohn')]
> {code}
> However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
> runs for a very, very long time (at least > 45 min) and then fails with 
> {{java.lang.OutOfMemoryError: Java heap space}}.
> Here is a stack trace taken from a run on 1.1.0-rc2:
> {code}
> >>> a = sc.parallelize(["Nick", "John", "Bob"])
> >>> a = a.repartition(24000)
> >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
> heart beats: 175143ms exceeds 45000ms
> 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
> heart beats: 175359ms exceeds 45000ms
> 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
> heart beats: 173061ms exceeds 45000ms
> 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
> heart beats: 176816ms exceeds 45000ms
> 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
> heart beats: 182241ms exceeds 45000ms
> 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
> heart beats: 178406ms exceeds 45000ms
> 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
> thread-3
> java.lang.OutOfMemoryError: Java heap space
> at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
> at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
> at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
> at 
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$an

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116936#comment-14116936
 ] 

Josh Rosen commented on SPARK-:
---

I agree with Davies; I think this is a more general Spark issue, perhaps 
related to {{repartition()}}.

I just tried testing this locally with commit 
eff9714e1c88e39e28317358ca9ec87677f121dc, which is the commit immediately prior 
to 
[14174abd421318e71c16edd24224fd5094bdfed4|https://github.com/apache/spark/commit/14174abd421318e71c16edd24224fd5094bdfed4],
 Davies' patch that adds hash-based disk spilling aggregation to PySpark, and I 
still saw the same slowdown there.



> Large number of partitions causes OOM
> -
>
> Key: SPARK-
> URL: https://issues.apache.org/jira/browse/SPARK-
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
>Reporter: Nicholas Chammas
>
> Here’s a repro for PySpark:
> {code}
> a = sc.parallelize(["Nick", "John", "Bob"])
> a = a.repartition(24000)
> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> {code}
> This code runs fine on 1.0.2. It returns the following result in just over a 
> minute:
> {code}
> [(4, 'NickJohn')]
> {code}
> However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
> runs for a very, very long time (at least > 45 min) and then fails with 
> {{java.lang.OutOfMemoryError: Java heap space}}.
> Here is a stack trace taken from a run on 1.1.0-rc2:
> {code}
> >>> a = sc.parallelize(["Nick", "John", "Bob"])
> >>> a = a.repartition(24000)
> >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
> heart beats: 175143ms exceeds 45000ms
> 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
> heart beats: 175359ms exceeds 45000ms
> 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
> heart beats: 173061ms exceeds 45000ms
> 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
> heart beats: 176816ms exceeds 45000ms
> 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
> heart beats: 182241ms exceeds 45000ms
> 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
> heart beats: 178406ms exceeds 45000ms
> 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
> thread-3
> java.lang.OutOfMemoryError: Java heap space
> at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
> at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
> at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
> at 
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Exception in thr

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Matei Zaharia (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116942#comment-14116942
 ] 

Matei Zaharia commented on SPARK-:
--

I see, that makes sense.

> Large number of partitions causes OOM
> -
>
> Key: SPARK-
> URL: https://issues.apache.org/jira/browse/SPARK-
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
>Reporter: Nicholas Chammas
>
> Here’s a repro for PySpark:
> {code}
> a = sc.parallelize(["Nick", "John", "Bob"])
> a = a.repartition(24000)
> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> {code}
> This code runs fine on 1.0.2. It returns the following result in just over a 
> minute:
> {code}
> [(4, 'NickJohn')]
> {code}
> However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
> runs for a very, very long time (at least > 45 min) and then fails with 
> {{java.lang.OutOfMemoryError: Java heap space}}.
> Here is a stack trace taken from a run on 1.1.0-rc2:
> {code}
> >>> a = sc.parallelize(["Nick", "John", "Bob"])
> >>> a = a.repartition(24000)
> >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
> heart beats: 175143ms exceeds 45000ms
> 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
> heart beats: 175359ms exceeds 45000ms
> 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
> heart beats: 173061ms exceeds 45000ms
> 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
> heart beats: 176816ms exceeds 45000ms
> 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
> heart beats: 182241ms exceeds 45000ms
> 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
> heart beats: 178406ms exceeds 45000ms
> 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
> thread-3
> java.lang.OutOfMemoryError: Java heap space
> at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
> at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
> at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
> at 
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR 
> SendingConnection: Exception while reading SendingConnection to 
> ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
> java.nio.channels.ClosedChannelException
> at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
> at org.apache.spark.network.SendingConnection.read(Connection.sca

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116948#comment-14116948
 ] 

Josh Rosen commented on SPARK-:
---

Working on doing some manual bisecting to find the patch that introduced the 
slowdown.  It's still slow as early as 8d338f64c4eda45d22ae33f61ef7928011cc2846.

> Large number of partitions causes OOM
> -
>
> Key: SPARK-
> URL: https://issues.apache.org/jira/browse/SPARK-
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
>Reporter: Nicholas Chammas
>
> Here’s a repro for PySpark:
> {code}
> a = sc.parallelize(["Nick", "John", "Bob"])
> a = a.repartition(24000)
> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> {code}
> This code runs fine on 1.0.2. It returns the following result in just over a 
> minute:
> {code}
> [(4, 'NickJohn')]
> {code}
> However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
> runs for a very, very long time (at least > 45 min) and then fails with 
> {{java.lang.OutOfMemoryError: Java heap space}}.
> Here is a stack trace taken from a run on 1.1.0-rc2:
> {code}
> >>> a = sc.parallelize(["Nick", "John", "Bob"])
> >>> a = a.repartition(24000)
> >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
> heart beats: 175143ms exceeds 45000ms
> 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
> heart beats: 175359ms exceeds 45000ms
> 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
> heart beats: 173061ms exceeds 45000ms
> 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
> heart beats: 176816ms exceeds 45000ms
> 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
> heart beats: 182241ms exceeds 45000ms
> 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
> heart beats: 178406ms exceeds 45000ms
> 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
> thread-3
> java.lang.OutOfMemoryError: Java heap space
> at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
> at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
> at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
> at 
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR 
> SendingConnection: Exception while reading SendingConnection to 
> ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
> java.nio.channels.ClosedChannelException
> at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
> at sun.n

[jira] [Commented] (SPARK-3332) Tagging is not atomic with launching instances on EC2

2014-08-31 Thread Allan Douglas R. de Oliveira (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116950#comment-14116950
 ] 

Allan Douglas R. de Oliveira commented on SPARK-3332:
-

[~pwendell], yes this is a good reword and I agree to revert it for now. You 
mentioned the two potential solutions but I think a good compromise is the one 
implemented by the PR which keeps the flag allowing reuse of the same security 
group but also allowing to match the machines by the security group in the 
other cases.

> Tagging is not atomic with launching instances on EC2
> -
>
> Key: SPARK-3332
> URL: https://issues.apache.org/jira/browse/SPARK-3332
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Reporter: Allan Douglas R. de Oliveira
>
> The implementation for SPARK-2333 changed the machine membership mechanism 
> from security groups to tags.
> This is a fundamentally flawed strategy as there aren't guarantees at all the 
> machines will have a tag (even with a retry mechanism).
> For instance, if the script is killed after launching the instances but 
> before setting the tags the machines will be "invisible" to a destroy 
> command, leaving a unmanageable cluster behind.
> The initial proposal is to go back to the previous behavior for all cases but 
> when the new flag (--security-group-prefix) is used.
> Also it's worthwhile to mention that SPARK-3180 introduced the 
> --additional-security-group flag which is a reasonable solution to SPARK-2333 
> (but isn't a full replacement to all use cases of --security-group-prefix).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3332) Tagging is not atomic with launching instances on EC2

2014-08-31 Thread Allan Douglas R. de Oliveira (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116950#comment-14116950
 ] 

Allan Douglas R. de Oliveira edited comment on SPARK-3332 at 9/1/14 12:49 AM:
--

[~pwendell], yes this is a good reword and I agree to revert it for now. You 
mentioned the two potential solutions but I think a good compromise is the one 
implemented by the PR which keeps the flag allowing reuse of the same security 
group but also allowing to match the machines by the security group in the 
other cases.

Perhaps more messages could be added when the flag has been used and the 
tagging failed.


was (Author: douglaz):
[~pwendell], yes this is a good reword and I agree to revert it for now. You 
mentioned the two potential solutions but I think a good compromise is the one 
implemented by the PR which keeps the flag allowing reuse of the same security 
group but also allowing to match the machines by the security group in the 
other cases.

> Tagging is not atomic with launching instances on EC2
> -
>
> Key: SPARK-3332
> URL: https://issues.apache.org/jira/browse/SPARK-3332
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Reporter: Allan Douglas R. de Oliveira
>
> The implementation for SPARK-2333 changed the machine membership mechanism 
> from security groups to tags.
> This is a fundamentally flawed strategy as there aren't guarantees at all the 
> machines will have a tag (even with a retry mechanism).
> For instance, if the script is killed after launching the instances but 
> before setting the tags the machines will be "invisible" to a destroy 
> command, leaving a unmanageable cluster behind.
> The initial proposal is to go back to the previous behavior for all cases but 
> when the new flag (--security-group-prefix) is used.
> Also it's worthwhile to mention that SPARK-3180 introduced the 
> --additional-security-group flag which is a reasonable solution to SPARK-2333 
> (but isn't a full replacement to all use cases of --security-group-prefix).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116969#comment-14116969
 ] 

Josh Rosen commented on SPARK-:
---

Looks like the issue was introduced somewhere between 273afcb and 62d4a0f.

> Large number of partitions causes OOM
> -
>
> Key: SPARK-
> URL: https://issues.apache.org/jira/browse/SPARK-
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
>Reporter: Nicholas Chammas
>
> Here’s a repro for PySpark:
> {code}
> a = sc.parallelize(["Nick", "John", "Bob"])
> a = a.repartition(24000)
> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> {code}
> This code runs fine on 1.0.2. It returns the following result in just over a 
> minute:
> {code}
> [(4, 'NickJohn')]
> {code}
> However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
> runs for a very, very long time (at least > 45 min) and then fails with 
> {{java.lang.OutOfMemoryError: Java heap space}}.
> Here is a stack trace taken from a run on 1.1.0-rc2:
> {code}
> >>> a = sc.parallelize(["Nick", "John", "Bob"])
> >>> a = a.repartition(24000)
> >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
> heart beats: 175143ms exceeds 45000ms
> 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
> heart beats: 175359ms exceeds 45000ms
> 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
> heart beats: 173061ms exceeds 45000ms
> 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
> heart beats: 176816ms exceeds 45000ms
> 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
> heart beats: 182241ms exceeds 45000ms
> 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
> heart beats: 178406ms exceeds 45000ms
> 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
> thread-3
> java.lang.OutOfMemoryError: Java heap space
> at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
> at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
> at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
> at 
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR 
> SendingConnection: Exception while reading SendingConnection to 
> ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
> java.nio.channels.ClosedChannelException
> at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
> at org.apache.spark.n

[jira] [Commented] (SPARK-3168) The ServletContextHandler of webui lacks a SessionManager

2014-08-31 Thread meiyoula (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116974#comment-14116974
 ] 

meiyoula commented on SPARK-3168:
-

To use CAS for single sign-on, i add some filters of webui in configuration.For 
History Server, I set the following configuration:
export SPARK_HISTORY_OPTS=$SPARK_HISTORY_OPTS" 
-Dspark.ui.filters=org.jasig.cas.client.authentication.Saml11AuthenticationFilter,org.jasig.cas.client.validation.Saml11TicketValidationFilter,org.jasig.cas.client.util.HttpServletRequestWrapperFilter
 
-Dspark.org.jasig.cas.client.authentication.Saml11AuthenticationFilter.params=casServerLoginUrl=https://9.91.11.120:8443/cas/login,serverName=http://9.91.11.171:18080
 
-Dspark.org.jasig.cas.client.validation.Saml11TicketValidationFilter.params=casServerUrlPrefix=https://9.91.11.120:8443/cas/,serverName=http://9.91.11.171:18080,hostnameVerifier=org.jasig.cas.client.ssl.AnyHostnameVerifier
 "

> The ServletContextHandler of webui lacks a SessionManager
> -
>
> Key: SPARK-3168
> URL: https://issues.apache.org/jira/browse/SPARK-3168
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
> Environment: CAS
>Reporter: meiyoula
>
> When i use CAS to realize single sign of webui, it occurs a exception:
> {code}
> WARN  [qtp1076146544-24] / 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:561)
> java.lang.IllegalStateException: No SessionManager
> at org.eclipse.jetty.server.Request.getSession(Request.java:1269)
> at org.eclipse.jetty.server.Request.getSession(Request.java:1248)
> at 
> org.jasig.cas.client.validation.AbstractTicketValidationFilter.doFilter(AbstractTicketValidationFilter.java:178)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467)
> at 
> org.jasig.cas.client.authentication.AuthenticationFilter.doFilter(AuthenticationFilter.java:116)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467)
> at 
> org.jasig.cas.client.session.SingleSignOutFilter.doFilter(SingleSignOutFilter.java:76)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:370)
> at 
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
> at 
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
> at 
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
> at 
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at 
> org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
> at 
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
> at 
> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3293) yarn's web show "SUCCEEDED" when the driver throw a exception in yarn-client

2014-08-31 Thread Guoqiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116973#comment-14116973
 ] 

Guoqiang Li commented on SPARK-3293:


Here is a related PR:
https://github.com/apache/spark/pull/1788/files

> yarn's web show "SUCCEEDED" when the driver throw a exception in yarn-client
> 
>
> Key: SPARK-3293
> URL: https://issues.apache.org/jira/browse/SPARK-3293
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.0.2
>Reporter: wangfei
> Fix For: 1.1.0
>
>
> If an exception occurs, the yarn'web->Applications->FinalStatus will also be 
> the "SUCCEEDED" without the expectation of "FAILED".
> In the release of spark-1.0.2, only yarn-client mode will show this.
> But recently the yarn-cluster mode will also be a problem.
> To reply this:
> just new a sparkContext and then throw an exception
> then watch the yarn websit about applications



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116978#comment-14116978
 ] 

Nicholas Chammas commented on SPARK-:
-

For the record, I got the OOM in a relatively short amount of time (less than 
30 min) on a 1.1.0-rc2 EC2 cluster with 20 {{m1.xlarge}} slaves. Perhaps one of 
y'all can replicate the OOM with that kind of environment.

> Large number of partitions causes OOM
> -
>
> Key: SPARK-
> URL: https://issues.apache.org/jira/browse/SPARK-
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
>Reporter: Nicholas Chammas
>
> Here’s a repro for PySpark:
> {code}
> a = sc.parallelize(["Nick", "John", "Bob"])
> a = a.repartition(24000)
> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> {code}
> This code runs fine on 1.0.2. It returns the following result in just over a 
> minute:
> {code}
> [(4, 'NickJohn')]
> {code}
> However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
> runs for a very, very long time (at least > 45 min) and then fails with 
> {{java.lang.OutOfMemoryError: Java heap space}}.
> Here is a stack trace taken from a run on 1.1.0-rc2:
> {code}
> >>> a = sc.parallelize(["Nick", "John", "Bob"])
> >>> a = a.repartition(24000)
> >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
> heart beats: 175143ms exceeds 45000ms
> 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
> heart beats: 175359ms exceeds 45000ms
> 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
> heart beats: 173061ms exceeds 45000ms
> 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
> heart beats: 176816ms exceeds 45000ms
> 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
> heart beats: 182241ms exceeds 45000ms
> 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
> heart beats: 178406ms exceeds 45000ms
> 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
> thread-3
> java.lang.OutOfMemoryError: Java heap space
> at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
> at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
> at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
> at 
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR 
> SendingConnection: Exception while reading SendingConnection to 
> ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
> java.nio.channels.ClosedChannelException
> at sun.nio.ch.Soc

[jira] [Comment Edited] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116978#comment-14116978
 ] 

Nicholas Chammas edited comment on SPARK- at 9/1/14 2:09 AM:
-

For the record, I got the OOM in [my original 
report|http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-large-of-partitions-causes-OOM-td13155.html]
 (which I duplicated here in this JIRA) in a relatively short amount of time 
(less than 30 min) on a 1.1.0-rc2 EC2 cluster with 20 {{m1.xlarge}} slaves. 
Perhaps one of y'all can replicate the OOM with that kind of environment.


was (Author: nchammas):
For the record, I got the OOM in a relatively short amount of time (less than 
30 min) on a 1.1.0-rc2 EC2 cluster with 20 {{m1.xlarge}} slaves. Perhaps one of 
y'all can replicate the OOM with that kind of environment.

> Large number of partitions causes OOM
> -
>
> Key: SPARK-
> URL: https://issues.apache.org/jira/browse/SPARK-
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
>Reporter: Nicholas Chammas
>
> Here’s a repro for PySpark:
> {code}
> a = sc.parallelize(["Nick", "John", "Bob"])
> a = a.repartition(24000)
> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> {code}
> This code runs fine on 1.0.2. It returns the following result in just over a 
> minute:
> {code}
> [(4, 'NickJohn')]
> {code}
> However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
> runs for a very, very long time (at least > 45 min) and then fails with 
> {{java.lang.OutOfMemoryError: Java heap space}}.
> Here is a stack trace taken from a run on 1.1.0-rc2:
> {code}
> >>> a = sc.parallelize(["Nick", "John", "Bob"])
> >>> a = a.repartition(24000)
> >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
> heart beats: 175143ms exceeds 45000ms
> 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
> heart beats: 175359ms exceeds 45000ms
> 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
> heart beats: 173061ms exceeds 45000ms
> 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
> heart beats: 176816ms exceeds 45000ms
> 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
> heart beats: 182241ms exceeds 45000ms
> 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
> heart beats: 178406ms exceeds 45000ms
> 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
> thread-3
> java.lang.OutOfMemoryError: Java heap space
> at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
> at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
> at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
> at 
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
> at 
> java.util.concurrent.Thr

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116985#comment-14116985
 ] 

Josh Rosen commented on SPARK-:
---

I'll resume work on this later tonight, but just wanted to note that things run 
fast as recently as commit 5ad5e34 and slow down as long ago as 6587ef7.

> Large number of partitions causes OOM
> -
>
> Key: SPARK-
> URL: https://issues.apache.org/jira/browse/SPARK-
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
>Reporter: Nicholas Chammas
>
> Here’s a repro for PySpark:
> {code}
> a = sc.parallelize(["Nick", "John", "Bob"])
> a = a.repartition(24000)
> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> {code}
> This code runs fine on 1.0.2. It returns the following result in just over a 
> minute:
> {code}
> [(4, 'NickJohn')]
> {code}
> However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
> runs for a very, very long time (at least > 45 min) and then fails with 
> {{java.lang.OutOfMemoryError: Java heap space}}.
> Here is a stack trace taken from a run on 1.1.0-rc2:
> {code}
> >>> a = sc.parallelize(["Nick", "John", "Bob"])
> >>> a = a.repartition(24000)
> >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
> heart beats: 175143ms exceeds 45000ms
> 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
> heart beats: 175359ms exceeds 45000ms
> 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
> heart beats: 173061ms exceeds 45000ms
> 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
> heart beats: 176816ms exceeds 45000ms
> 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
> heart beats: 182241ms exceeds 45000ms
> 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
> heart beats: 178406ms exceeds 45000ms
> 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
> thread-3
> java.lang.OutOfMemoryError: Java heap space
> at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
> at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
> at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
> at 
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR 
> SendingConnection: Exception while reading SendingConnection to 
> ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
> java.nio.channels.ClosedChannelException
> at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
> at sun.nio.ch

[jira] [Updated] (SPARK-3334) Spark causes mesos-master memory leak

2014-08-31 Thread Iven Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iven Hsu updated SPARK-3334:

Description: 
The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround 
SPARK-1112, this causes all serialized task result is sent using Mesos 
TaskStatus.

mesos-master stores TaskStatus in memory, and when running Spark, its memory 
grows very fast, and will be OOM killed.

See MESOS-1746 for more.

I've tryed to set {{akkaFrameSize}} to 0, mesos-master won't be killed, 
however, the driver will block after success unless I use {{sc.stop()}} to quit 
it manually. Not sure if it's related to SPARK-1112.

  was:
The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround 
SPARK-1112, this causes all serialized task result is sent using Mesos 
TaskStatus.

mesos-master stores TaskStatus in memory, and when running Spark, it's memory 
grows very fast, and will be OOM killed.

See MESOS-1746 for more.

I've tryed to set {{akkaFrameSize}} to 0, mesos-master won't be killed, 
however, the driver will block after success unless I use {{sc.stop()}} to quit 
it manually. Not sure if it's related to SPARK-1112.


> Spark causes mesos-master memory leak
> -
>
> Key: SPARK-3334
> URL: https://issues.apache.org/jira/browse/SPARK-3334
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.0.2
> Environment: Mesos 0.16.0/0.19.0
> CentOS 6.4
>Reporter: Iven Hsu
>
> The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to 
> workaround SPARK-1112, this causes all serialized task result is sent using 
> Mesos TaskStatus.
> mesos-master stores TaskStatus in memory, and when running Spark, its memory 
> grows very fast, and will be OOM killed.
> See MESOS-1746 for more.
> I've tryed to set {{akkaFrameSize}} to 0, mesos-master won't be killed, 
> however, the driver will block after success unless I use {{sc.stop()}} to 
> quit it manually. Not sure if it's related to SPARK-1112.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3334) Spark causes mesos-master memory leak

2014-08-31 Thread Iven Hsu (JIRA)
Iven Hsu created SPARK-3334:
---

 Summary: Spark causes mesos-master memory leak
 Key: SPARK-3334
 URL: https://issues.apache.org/jira/browse/SPARK-3334
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.0.2
 Environment: Mesos 0.16.0/0.19.0
CentOS 6.4
Reporter: Iven Hsu


The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround 
SPARK-1112, this causes all serialized task result is sent using Mesos 
TaskStatus.

mesos-master stores TaskStatus in memory, and when running Spark, it's memory 
grows very fast, and will be OOM killed.

See MESOS-1746 for more.

I've tryed to set {{akkaFrameSize}} to 0, mesos-master won't be killed, 
however, the driver will block after success unless I use {{sc.stop()}} to quit 
it manually. Not sure if it's related to SPARK-1112.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3334) Spark causes mesos-master memory leak

2014-08-31 Thread Iven Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iven Hsu updated SPARK-3334:

Description: 
The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround 
SPARK-1112, this causes all serialized task result is sent using Mesos 
TaskStatus.

mesos-master stores TaskStatus in memory, and when running Spark, its memory 
grows very fast, and will be OOM killed.

See MESOS-1746 for more.

I've tried to set {{akkaFrameSize}} to 0, mesos-master won't be killed, 
however, the driver will block after success unless I use {{sc.stop()}} to quit 
it manually. Not sure if it's related to SPARK-1112.

  was:
The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround 
SPARK-1112, this causes all serialized task result is sent using Mesos 
TaskStatus.

mesos-master stores TaskStatus in memory, and when running Spark, its memory 
grows very fast, and will be OOM killed.

See MESOS-1746 for more.

I've tryed to set {{akkaFrameSize}} to 0, mesos-master won't be killed, 
however, the driver will block after success unless I use {{sc.stop()}} to quit 
it manually. Not sure if it's related to SPARK-1112.


> Spark causes mesos-master memory leak
> -
>
> Key: SPARK-3334
> URL: https://issues.apache.org/jira/browse/SPARK-3334
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.0.2
> Environment: Mesos 0.16.0/0.19.0
> CentOS 6.4
>Reporter: Iven Hsu
>
> The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to 
> workaround SPARK-1112, this causes all serialized task result is sent using 
> Mesos TaskStatus.
> mesos-master stores TaskStatus in memory, and when running Spark, its memory 
> grows very fast, and will be OOM killed.
> See MESOS-1746 for more.
> I've tried to set {{akkaFrameSize}} to 0, mesos-master won't be killed, 
> however, the driver will block after success unless I use {{sc.stop()}} to 
> quit it manually. Not sure if it's related to SPARK-1112.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-3324) YARN module has nonstandard structure which cause compile error In IntelliJ

2014-08-31 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell reassigned SPARK-3324:
--

Assignee: Patrick Wendell

> YARN module has nonstandard structure which cause compile error In IntelliJ
> ---
>
> Key: SPARK-3324
> URL: https://issues.apache.org/jira/browse/SPARK-3324
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.1.0
> Environment: Mac OS: 10.9.4
> IntelliJ IDEA: 13.1.4
> Scala Plugins: 0.41.2
> Maven: 3.0.5
>Reporter: Yi Tian
>Assignee: Patrick Wendell
>Priority: Minor
>  Labels: intellij, maven, yarn
>
> The YARN module has nonstandard path structure like:
> {code}
> ${SPARK_HOME}
>   |--yarn
>  |--alpha (contains yarn api support for 0.23 and 2.0.x)
>  |--stable (contains yarn api support for 2.2 and later)
>  | |--pom.xml (spark-yarn)
>  |--common (Common codes not depending on specific version of Hadoop)
>  |--pom.xml (yarn-parent)
> {code}
> When we use maven to compile yarn module, maven will import 'alpha' or 
> 'stable' module according to profile setting.
> And the submodule like 'stable' use the build propertie defined in 
> yarn/pom.xml to import common codes to sourcePath.
> It will cause IntelliJ can't directly recognize sources in common directory 
> as sourcePath. 
> I thought we should change the yarn module to a unified maven jar project, 
> and add specify different version of yarn api via maven profile setting.
> It will resolve the compile error in IntelliJ and make the yarn module more 
> simple and clear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117016#comment-14117016
 ] 

Josh Rosen commented on SPARK-:
---

It looks like https://github.com/apache/spark/pull/1138 may be the culprit, 
since this job runs quickly immediately prior to that commit.

> Large number of partitions causes OOM
> -
>
> Key: SPARK-
> URL: https://issues.apache.org/jira/browse/SPARK-
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
>Reporter: Nicholas Chammas
>
> Here’s a repro for PySpark:
> {code}
> a = sc.parallelize(["Nick", "John", "Bob"])
> a = a.repartition(24000)
> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> {code}
> This code runs fine on 1.0.2. It returns the following result in just over a 
> minute:
> {code}
> [(4, 'NickJohn')]
> {code}
> However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
> runs for a very, very long time (at least > 45 min) and then fails with 
> {{java.lang.OutOfMemoryError: Java heap space}}.
> Here is a stack trace taken from a run on 1.1.0-rc2:
> {code}
> >>> a = sc.parallelize(["Nick", "John", "Bob"])
> >>> a = a.repartition(24000)
> >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
> heart beats: 175143ms exceeds 45000ms
> 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
> heart beats: 175359ms exceeds 45000ms
> 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
> heart beats: 173061ms exceeds 45000ms
> 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
> heart beats: 176816ms exceeds 45000ms
> 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
> heart beats: 182241ms exceeds 45000ms
> 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
> heart beats: 178406ms exceeds 45000ms
> 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
> thread-3
> java.lang.OutOfMemoryError: Java heap space
> at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
> at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
> at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
> at 
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR 
> SendingConnection: Exception while reading SendingConnection to 
> ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
> java.nio.channels.ClosedChannelException
> at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
> at sun.nio.ch.SocketChannelI

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117019#comment-14117019
 ] 

Davies Liu commented on SPARK-:
---

@joserosen This should not be the culprit, it just show the bad things up in 
PySpark. Before it, the default partitions of reduceByKey() could be something 
much smaller, such as 4.

The root cause should be inside Scala, you should use the Scala one to test it.

> Large number of partitions causes OOM
> -
>
> Key: SPARK-
> URL: https://issues.apache.org/jira/browse/SPARK-
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
>Reporter: Nicholas Chammas
>
> Here’s a repro for PySpark:
> {code}
> a = sc.parallelize(["Nick", "John", "Bob"])
> a = a.repartition(24000)
> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> {code}
> This code runs fine on 1.0.2. It returns the following result in just over a 
> minute:
> {code}
> [(4, 'NickJohn')]
> {code}
> However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
> runs for a very, very long time (at least > 45 min) and then fails with 
> {{java.lang.OutOfMemoryError: Java heap space}}.
> Here is a stack trace taken from a run on 1.1.0-rc2:
> {code}
> >>> a = sc.parallelize(["Nick", "John", "Bob"])
> >>> a = a.repartition(24000)
> >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
> heart beats: 175143ms exceeds 45000ms
> 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
> heart beats: 175359ms exceeds 45000ms
> 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
> heart beats: 173061ms exceeds 45000ms
> 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
> heart beats: 176816ms exceeds 45000ms
> 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
> heart beats: 182241ms exceeds 45000ms
> 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
> heart beats: 178406ms exceeds 45000ms
> 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
> thread-3
> java.lang.OutOfMemoryError: Java heap space
> at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
> at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
> at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
> at 
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR 
> SendingConnection: Exception while reading SendingConnection to 
> ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
> java.nio.channels.ClosedChannel

[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117021#comment-14117021
 ] 

Josh Rosen commented on SPARK-:
---

Good point.  I guess "culprit" was the wrong word, but that commit helps to 
narrow down the problem.

> Large number of partitions causes OOM
> -
>
> Key: SPARK-
> URL: https://issues.apache.org/jira/browse/SPARK-
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
>Reporter: Nicholas Chammas
>
> Here’s a repro for PySpark:
> {code}
> a = sc.parallelize(["Nick", "John", "Bob"])
> a = a.repartition(24000)
> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> {code}
> This code runs fine on 1.0.2. It returns the following result in just over a 
> minute:
> {code}
> [(4, 'NickJohn')]
> {code}
> However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
> runs for a very, very long time (at least > 45 min) and then fails with 
> {{java.lang.OutOfMemoryError: Java heap space}}.
> Here is a stack trace taken from a run on 1.1.0-rc2:
> {code}
> >>> a = sc.parallelize(["Nick", "John", "Bob"])
> >>> a = a.repartition(24000)
> >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
> heart beats: 175143ms exceeds 45000ms
> 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
> heart beats: 175359ms exceeds 45000ms
> 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
> heart beats: 173061ms exceeds 45000ms
> 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
> heart beats: 176816ms exceeds 45000ms
> 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
> heart beats: 182241ms exceeds 45000ms
> 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
> heart beats: 178406ms exceeds 45000ms
> 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
> thread-3
> java.lang.OutOfMemoryError: Java heap space
> at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
> at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
> at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
> at 
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Exception in thread "Result resolver thread-3" 14/08/29 21:56:26 ERROR 
> SendingConnection: Exception while reading SendingConnection to 
> ConnectionManagerId(ip-10-73-142-223.ec2.internal,54014)
> java.nio.channels.ClosedChannelException
> at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)

[jira] [Resolved] (SPARK-2536) Update the MLlib page of Spark website

2014-08-31 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-2536.
--
Resolution: Done

> Update the MLlib page of Spark website
> --
>
> Key: SPARK-2536
> URL: https://issues.apache.org/jira/browse/SPARK-2536
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, MLlib
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>
> It still shows v0.9.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3205) input format for text records saved with in-record delimiter and newline characters escaped

2014-08-31 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-3205.
--
Resolution: Later

Moved the implementation to https://github.com/mengxr/redshift-input-format. If 
people feel this input format is common, we can move it to Spark Core later.

> input format for text records saved with in-record delimiter and newline 
> characters escaped
> ---
>
> Key: SPARK-3205
> URL: https://issues.apache.org/jira/browse/SPARK-3205
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core, SQL
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>
> Text records may contain in-record delimiter or newline characters. In such 
> cases, we can either encode them or escape them. The latter is simpler and 
> used by Redshift's UNLOAD with the ESCAPE option. The problem is that a 
> record will span multiple lines. We need an input format for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3090) Avoid not stopping SparkContext with YARN Client mode

2014-08-31 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117036#comment-14117036
 ] 

Kousuke Saruta commented on SPARK-3090:
---

It sound good idea that SparkContext register shutdown-hook itself.

>  Avoid not stopping SparkContext with YARN Client mode
> --
>
> Key: SPARK-3090
> URL: https://issues.apache.org/jira/browse/SPARK-3090
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 1.1.0
>Reporter: Kousuke Saruta
>
> When we use YARN Cluster mode, ApplicationMaser register a shutdown hook, 
> stopping SparkContext.
> Thanks to this, SparkContext can stop even if Application forgets to stop 
> SparkContext itself.
> But, unfortunately, YARN Client mode doesn't have such mechanism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3333) Large number of partitions causes OOM

2014-08-31 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117062#comment-14117062
 ] 

Josh Rosen commented on SPARK-:
---

[~shivaram] and I discussed this; we have a few ideas about what might be 
happening.

I tried running {{sc.parallelize(1 to 
10).repartition(24000).keyBy(x=>x).reduceByKey(_+_).collect()}} in 1.0.2 and 
observed similarly slow speed to what I saw in the current 1.1.0 release 
candidate.  When I modified my job to use fewer reducers ({{reduceByKey(_+_, 
4)}}, then the job completed quickly.  You can see similar behavior in Python 
by explicitly specifying a smaller number of reducers.

I think the issue here is that the overhead of sending and processing task 
completions is proportional to O(numReducers).  Specifically, the uncompressed 
size of ShuffleMapTask results is roughly O(numReducers), and there's a 
O(numReducers) processing cost for task completions within DAGScheduler (since 
mapOutputLocations is O(numReducers)).

This normally isn't a problem, but it can impact performance for jobs with 
large numbers of extremely small map tasks (like this job, where nearly all of 
the map tasks are effectively no-ops).  For larger tasks, this cost should be 
masked by larger overheads (such as task processing time).

I'm not sure where the OOM is coming from, but the slow performance that you're 
observing here is probably due to the new default number of reducers 
(https://github.com/apache/spark/pull/1138 exposed this in Python by changing 
it's defaults to match Scala Spark).

As a result, I'm not sure that this is a regression from 1.0.2, since it 
behaves similarly for Scala jobs.  I think we already do some compression of 
the task results and there are probably other improvements that we can make to 
lower these overheads, but I think we should postpone that to 1.2.0.

> Large number of partitions causes OOM
> -
>
> Key: SPARK-
> URL: https://issues.apache.org/jira/browse/SPARK-
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: EC2; 1 master; 1 slave; {{m3.xlarge}} instances
>Reporter: Nicholas Chammas
>
> Here’s a repro for PySpark:
> {code}
> a = sc.parallelize(["Nick", "John", "Bob"])
> a = a.repartition(24000)
> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> {code}
> This code runs fine on 1.0.2. It returns the following result in just over a 
> minute:
> {code}
> [(4, 'NickJohn')]
> {code}
> However, when I try this with 1.1.0-rc3 on an identically-sized cluster, it 
> runs for a very, very long time (at least > 45 min) and then fails with 
> {{java.lang.OutOfMemoryError: Java heap space}}.
> Here is a stack trace taken from a run on 1.1.0-rc2:
> {code}
> >>> a = sc.parallelize(["Nick", "John", "Bob"])
> >>> a = a.repartition(24000)
> >>> a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
> 14/08/29 21:53:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(0, ip-10-138-29-167.ec2.internal, 46252, 0) with no recent 
> heart beats: 175143ms exceeds 45000ms
> 14/08/29 21:53:50 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(10, ip-10-138-18-106.ec2.internal, 33711, 0) with no recent 
> heart beats: 175359ms exceeds 45000ms
> 14/08/29 21:54:02 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(19, ip-10-139-36-207.ec2.internal, 52208, 0) with no recent 
> heart beats: 173061ms exceeds 45000ms
> 14/08/29 21:54:13 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(5, ip-10-73-142-70.ec2.internal, 56162, 0) with no recent 
> heart beats: 176816ms exceeds 45000ms
> 14/08/29 21:54:22 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(7, ip-10-236-145-200.ec2.internal, 40959, 0) with no recent 
> heart beats: 182241ms exceeds 45000ms
> 14/08/29 21:54:40 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(4, ip-10-139-1-195.ec2.internal, 49221, 0) with no recent 
> heart beats: 178406ms exceeds 45000ms
> 14/08/29 21:54:41 ERROR Utils: Uncaught exception in thread Result resolver 
> thread-3
> java.lang.OutOfMemoryError: Java heap space
> at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
> at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
>  

[jira] [Updated] (SPARK-3330) Successive test runs with different profiles fail SparkSubmitSuite

2014-08-31 Thread Prashant Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-3330:
---
Assignee: Sean Owen

> Successive test runs with different profiles fail SparkSubmitSuite
> --
>
> Key: SPARK-3330
> URL: https://issues.apache.org/jira/browse/SPARK-3330
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.0.2
>Reporter: Sean Owen
>Assignee: Sean Owen
>
> Maven-based Jenkins builds have been failing for a while:
> https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/480/HADOOP_PROFILE=hadoop-2.4,label=centos/console
> One common cause is that on the second and subsequent runs of "mvn clean 
> test", at least two assembly JARs will exist in assembly/target. Because 
> assembly is not a submodule of parent, "mvn clean" is not invoked for 
> assembly. The presence of two assembly jars causes spark-submit to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3330) Successive test runs with different profiles fail SparkSubmitSuite

2014-08-31 Thread Prashant Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-3330:
---
Assignee: (was: Sean Owen)

> Successive test runs with different profiles fail SparkSubmitSuite
> --
>
> Key: SPARK-3330
> URL: https://issues.apache.org/jira/browse/SPARK-3330
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.0.2
>Reporter: Sean Owen
>
> Maven-based Jenkins builds have been failing for a while:
> https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/480/HADOOP_PROFILE=hadoop-2.4,label=centos/console
> One common cause is that on the second and subsequent runs of "mvn clean 
> test", at least two assembly JARs will exist in assembly/target. Because 
> assembly is not a submodule of parent, "mvn clean" is not invoked for 
> assembly. The presence of two assembly jars causes spark-submit to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org