[jira] [Commented] (SPARK-1821) Document History Server

2014-05-15 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996755#comment-13996755
 ] 

Andrew Or commented on SPARK-1821:
--

Yes, it should be documented under "monitoring.html" in the latest branch

> Document History Server
> ---
>
> Key: SPARK-1821
> URL: https://issues.apache.org/jira/browse/SPARK-1821
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 1.0.0
>Reporter: Nan Zhu
>
> In 1.0, there is a new component, history server, which is not mentioned in 
> http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/
> I think we'd better add the missing document



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1770) repartition and coalesce(shuffle=true) put objects with the same key in the same bucket

2014-05-15 Thread Aaron Davidson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993908#comment-13993908
 ] 

Aaron Davidson commented on SPARK-1770:
---

Ah, that PR seems unrelated.

> repartition and coalesce(shuffle=true) put objects with the same key in the 
> same bucket
> ---
>
> Key: SPARK-1770
> URL: https://issues.apache.org/jira/browse/SPARK-1770
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 0.9.0, 1.0.0, 0.9.1
>Reporter: Matei Zaharia
>Priority: Blocker
>  Labels: Starter
> Fix For: 1.0.0
>
>
> This is bad when you have many identical objects. We should assign each one a 
> random key.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1664) spark-submit --name doesn't work in yarn-client mode

2014-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1664.


Resolution: Duplicate

> spark-submit --name doesn't work in yarn-client mode
> 
>
> Key: SPARK-1664
> URL: https://issues.apache.org/jira/browse/SPARK-1664
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 1.0.0
>Reporter: Thomas Graves
>Priority: Blocker
>
> When using spark-submit in yarn-client mode, the --name option doesn't 
> properly set the application name in either the ResourceManager UI.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1760) mvn -Dsuites=* test throw an ClassNotFoundException

2014-05-15 Thread Guoqiang Li (JIRA)
Guoqiang Li created SPARK-1760:
--

 Summary:  mvn  -Dsuites=*  test throw an ClassNotFoundException
 Key: SPARK-1760
 URL: https://issues.apache.org/jira/browse/SPARK-1760
 Project: Spark
  Issue Type: Bug
Reporter: Guoqiang Li


{{mvn -Dhadoop.version=0.23.9 -Phadoop-0.23 
-Dsuites=org.apache.spark.repl.ReplSuite test}} => 
{code}
*** RUN ABORTED ***
  java.lang.ClassNotFoundException: org.apache.spark.repl.ReplSuite
  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
  at org.scalatest.tools.Runner$$anonfun$21.apply(Runner.scala:1470)
  at org.scalatest.tools.Runner$$anonfun$21.apply(Runner.scala:1469)
  at 
scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264)
  at scala.collection.immutable.List.foreach(List.scala:318)
  ...
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1755) Spark-submit --name does not resolve to application name on YARN

2014-05-15 Thread Andrew Or (JIRA)
Andrew Or created SPARK-1755:


 Summary: Spark-submit --name does not resolve to application name 
on YARN
 Key: SPARK-1755
 URL: https://issues.apache.org/jira/browse/SPARK-1755
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Andrew Or
 Fix For: 1.0.1


In YARN client mode, --name is ignored because the deploy mode is client, and 
the name is for some reason a cluster config. (See 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L170)

In YARN cluster mode, --name is passed to the 
org.apache.spark.deploy.yarn.Client as a command line argument. The Client 
class, however, uses this name only as the app name for the RM, but not for 
Spark. In other words, when SparkConf attempts to load default configs, 
application name is not set.

In both cases, passing --name to SparkSubmit does not actually cause Spark to 
adopt it as its application name, despite what the usage promises.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (SPARK-1326) make-distribution.sh's Tachyon support relies on GNU sed

2014-05-15 Thread Sandeep Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992612#comment-13992612
 ] 

Sandeep Singh edited comment on SPARK-1326 at 5/8/14 9:16 AM:
--

fixed by PR: https://github.com/apache/spark/pull/264


was (Author: techaddict):
https://github.com/apache/spark/pull/264

> make-distribution.sh's Tachyon support relies on GNU sed
> 
>
> Key: SPARK-1326
> URL: https://issues.apache.org/jira/browse/SPARK-1326
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Reporter: Matei Zaharia
>Priority: Minor
> Fix For: 1.0.0
>
>
> It fails on Mac OS X, with {{sed: 1: "/Users/matei/ ...": invalid command 
> code m}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1500) add with-hive argument to make-distribution.sh

2014-05-15 Thread Guoqiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guoqiang Li resolved SPARK-1500.


Resolution: Fixed

>  add with-hive argument to make-distribution.sh
> ---
>
> Key: SPARK-1500
> URL: https://issues.apache.org/jira/browse/SPARK-1500
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.0.0
>Reporter: Guoqiang Li
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1827) LICENSE and NOTICE files need a refresh to contain transitive dependency info

2014-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1827.


Resolution: Fixed
  Assignee: Sean Owen

Fixed by:
https://github.com/apache/spark/pull/770

> LICENSE and NOTICE files need a refresh to contain transitive dependency info
> -
>
> Key: SPARK-1827
> URL: https://issues.apache.org/jira/browse/SPARK-1827
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 0.9.1
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Blocker
> Fix For: 1.0.0
>
>
> (Pardon marking it a blocker, but think it needs doing before 1.0 per chat 
> with [~pwendell])
> The LICENSE and NOTICE files need to cover all transitive dependencies, since 
> these are all distributed in the assembly jar. (c.f. 
> http://www.apache.org/dev/licensing-howto.html )
> I don't believe the current files cover everything. It's possible to 
> mostly-automatically generate these. I will generate this and propose a patch 
> to both today.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1818) Freshen Mesos docs

2014-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1818.


   Resolution: Fixed
Fix Version/s: 1.0.0

Issue resolved by pull request 756
[https://github.com/apache/spark/pull/756]

> Freshen Mesos docs
> --
>
> Key: SPARK-1818
> URL: https://issues.apache.org/jira/browse/SPARK-1818
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Mesos
>Affects Versions: 1.0.0
>Reporter: Andrew Ash
> Fix For: 1.0.0
>
>
> They haven't been updated since 0.6.0 and encourage compiling both Mesos and 
> Spark from scratch.  Include mention of the precompiled binary versions of 
> both projects available and otherwise generally freshen the documentation for 
> Mesos newcomers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1838) On a YARN cluster, Spark doesn't run on local mode

2014-05-15 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-1838:
-

Fix Version/s: 1.0.1

> On a YARN cluster, Spark doesn't run on local mode
> --
>
> Key: SPARK-1838
> URL: https://issues.apache.org/jira/browse/SPARK-1838
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Andrew Or
> Fix For: 1.0.1
>
>
> Right now we throw an exception if YARN_LOCAL_DIRS is not set. However, we 
> may want to just run Spark in local mode, which doesn't even use this 
> environment variable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1838) On a YARN cluster, Spark doesn't run on local mode

2014-05-15 Thread Andrew Or (JIRA)
Andrew Or created SPARK-1838:


 Summary: On a YARN cluster, Spark doesn't run on local mode
 Key: SPARK-1838
 URL: https://issues.apache.org/jira/browse/SPARK-1838
 Project: Spark
  Issue Type: Bug
Reporter: Andrew Or


Right now we throw an exception if YARN_LOCAL_DIRS is not set. However, we may 
want to just run Spark in local mode, which doesn't even use this environment 
variable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1840) SparkListenerBus prints out scary error message when terminating normally

2014-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1840:
---

Assignee: Tathagata Das

> SparkListenerBus prints out scary error message when terminating normally
> -
>
> Key: SPARK-1840
> URL: https://issues.apache.org/jira/browse/SPARK-1840
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Tathagata Das
> Fix For: 1.0.0
>
>
> This is because the Scala's NonLocalReturnControl (which extends 
> ControlThrowable) is being logged. However, this is expected when the 
> SparkContext terminates.
> (OP is TD)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1473) Feature selection for high dimensional datasets

2014-05-15 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998494#comment-13998494
 ] 

Sean Owen commented on SPARK-1473:
--

I believe these types of thing were more the goals of the MLI and MLbase 
projects rather than MLlib? I don't know the status of those. For what it's 
worth I think these are very useful things but in a separate 'layer' above 
something like MLlib.

> Feature selection for high dimensional datasets
> ---
>
> Key: SPARK-1473
> URL: https://issues.apache.org/jira/browse/SPARK-1473
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Ignacio Zendejas
>Priority: Minor
>  Labels: features
> Fix For: 1.1.0
>
>
> For classification tasks involving large feature spaces in the order of tens 
> of thousands or higher (e.g., text classification with n-grams, where n > 1), 
> it is often useful to rank and filter features that are irrelevant thereby 
> reducing the feature space by at least one or two orders of magnitude without 
> impacting performance on key evaluation metrics (accuracy/precision/recall).
> A feature evaluation interface which is flexible needs to be designed and at 
> least two methods should be implemented with Information Gain being a 
> priority as it has been shown to be amongst the most reliable.
> Special consideration should be taken in the design to account for wrapper 
> methods (see research papers below) which are more practical for lower 
> dimensional data.
> Relevant research:
> * Brown, G., Pocock, A., Zhao, M. J., & Luján, M. (2012). Conditional
> likelihood maximisation: a unifying framework for information theoretic
> feature selection.*The Journal of Machine Learning Research*, *13*, 27-66.
> * Forman, George. "An extensive empirical study of feature selection metrics 
> for text classification." The Journal of machine learning research 3 (2003): 
> 1289-1305.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1840) SparkListenerBus prints out scary error message when terminating normally

2014-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1840.


   Resolution: Fixed
Fix Version/s: 1.0.0

Issue resolved by pull request 783
[https://github.com/apache/spark/pull/783]

> SparkListenerBus prints out scary error message when terminating normally
> -
>
> Key: SPARK-1840
> URL: https://issues.apache.org/jira/browse/SPARK-1840
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Andrew Or
> Fix For: 1.0.0
>
>
> This is because the Scala's NonLocalReturnControl (which extends 
> ControlThrowable) is being logged. However, this is expected when the 
> SparkContext terminates.
> (OP is TD)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1646) ALS micro-optimisation

2014-05-15 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-1646.
--

   Resolution: Implemented
Fix Version/s: 1.0.0

PR: https://github.com/apache/spark/pull/568

> ALS micro-optimisation
> --
>
> Key: SPARK-1646
> URL: https://issues.apache.org/jira/browse/SPARK-1646
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Tor Myklebust
>Assignee: Tor Myklebust
>Priority: Trivial
> Fix For: 1.0.0
>
>
> Scala "for" loop bodies turn into methods and the loops themselves into 
> repeated invocations of the body method.  This may make Hotspot make poor 
> optimisation decisions.  (Xiangrui mentioned that there was a speed 
> improvement from doing similar transformations elsewhere.)
> The loops on i and p in the ALS training code are prime candidates for this 
> transformation, as is the "foreach" loop doing regularisation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1605) Improve mllib.linalg.Vector

2014-05-15 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998106#comment-13998106
 ] 

Xiangrui Meng commented on SPARK-1605:
--

`toBreeze` exposes a breeze type. We might want to mark it DeveloperApi and 
make it public, but I'm not sure whether we should do that in v1.0. Given a 
`mllib.linalg.Vector`, you can call `toArray` to get the values or operate 
directly on DenseVector/SparseVector.

> Improve mllib.linalg.Vector
> ---
>
> Key: SPARK-1605
> URL: https://issues.apache.org/jira/browse/SPARK-1605
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Sandeep Singh
>
> We can make current Vector a wrapper around Breeze.linalg.Vector ?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1835) sbt gen-idea includes both mesos and mesos with shaded-protobuf into dependencies

2014-05-15 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-1835:


 Summary: sbt gen-idea includes both mesos and mesos with 
shaded-protobuf into dependencies
 Key: SPARK-1835
 URL: https://issues.apache.org/jira/browse/SPARK-1835
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Xiangrui Meng
Priority: Minor


gen-idea includes both mesos-0.18.1 and mesos-0.18.1-shaded-protobuf into 
dependencies. This generates compile error because mesos-0.18.1 comes first and 
there is no protobuf jar in the dependencies.

A workaround is to delete mesos-0.18.1.jar manually from idea intellij. Another 
solution is to publish the shaded jar as a separate version instead of using 
classifier.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1833) Have an empty SparkContext constructor instead of relying on new SparkContext(new SparkConf())

2014-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1833.


   Resolution: Fixed
Fix Version/s: 1.0.0

Issue resolved by pull request 774
[https://github.com/apache/spark/pull/774]

> Have an empty SparkContext constructor instead of relying on new 
> SparkContext(new SparkConf())
> --
>
> Key: SPARK-1833
> URL: https://issues.apache.org/jira/browse/SPARK-1833
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Patrick Wendell
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1841) update scalatest to version 2.1.5

2014-05-15 Thread Guoqiang Li (JIRA)
Guoqiang Li created SPARK-1841:
--

 Summary: update scalatest to version 2.1.5
 Key: SPARK-1841
 URL: https://issues.apache.org/jira/browse/SPARK-1841
 Project: Spark
  Issue Type: Sub-task
Reporter: Guoqiang Li


scalatest 1.9.* not Scala 2.11



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1838) On a YARN cluster, Spark doesn't run on local mode

2014-05-15 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-1838:
-

Affects Version/s: 1.0.0

> On a YARN cluster, Spark doesn't run on local mode
> --
>
> Key: SPARK-1838
> URL: https://issues.apache.org/jira/browse/SPARK-1838
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Andrew Or
> Fix For: 1.0.1
>
>
> Right now we throw an exception if YARN_LOCAL_DIRS is not set. However, we 
> may want to just run Spark in local mode, which doesn't even use this 
> environment variable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1789) Multiple versions of Netty dependencies cause FlumeStreamSuite failure

2014-05-15 Thread William Benton (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997610#comment-13997610
 ] 

William Benton commented on SPARK-1789:
---

Yes, this is absolutely a post-1.0 thing.  I'm just saying that by updating the 
version of Akka to 2.3 we'd eliminate one of Spark's dependencies that can't 
work with Netty 4.  The issue of only transitively depending on at most one 
version of Netty 3 and at most one version of Netty 4 (and choosing ones that 
can work different coordinates) is orthogonal, but still an issue.

> Multiple versions of Netty dependencies cause FlumeStreamSuite failure
> --
>
> Key: SPARK-1789
> URL: https://issues.apache.org/jira/browse/SPARK-1789
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 0.9.1
>Reporter: Sean Owen
>Assignee: Sean Owen
>  Labels: flume, netty, test
> Fix For: 1.0.0
>
>
> TL;DR is there is a bit of JAR hell trouble with Netty, that can be mostly 
> resolved and will resolve a test failure.
> I hit the error described at 
> http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-startup-time-out-td1753.html
>  while running FlumeStreamingSuite, and have for a short while (is it just 
> me?)
> velvia notes:
> "I have found a workaround.  If you add akka 2.2.4 to your dependencies, then 
> everything works, probably because akka 2.2.4 brings in newer version of 
> Jetty." 
> There are at least 3 versions of Netty in play in the build:
> - the new Flume 1.4.0 dependency brings in io.netty:netty:3.4.0.Final, and 
> that is the immediate problem
> - the custom version of akka 2.2.3 depends on io.netty:netty:3.6.6.
> - but, Spark Core directly uses io.netty:netty-all:4.0.17.Final
> The POMs try to exclude other versions of netty, but are excluding 
> org.jboss.netty:netty, when in fact older versions of io.netty:netty (not 
> netty-all) are also an issue.
> The org.jboss.netty:netty excludes are largely unnecessary. I replaced many 
> of them with io.netty:netty exclusions until everything agreed on 
> io.netty:netty-all:4.0.17.Final.
> But this didn't work, since Akka 2.2.3 doesn't work with Netty 4.x. 
> Down-grading to 3.6.6.Final across the board made some Spark code not compile.
> If the build *keeps* io.netty:netty:3.6.6.Final as well, everything seems to 
> work. Part of the reason seems to be that Netty 3.x used the old 
> `org.jboss.netty` packages. This is less than ideal, but is no worse than the 
> current situation. 
> So this PR resolves the issue and improves the JAR hell, even if it leaves 
> the existing theoretical Netty 3-vs-4 conflict:
> - Remove org.jboss.netty excludes where possible, for clarity; they're not 
> needed except with Hadoop artifacts
> - Add io.netty:netty excludes where needed -- except, let akka keep its 
> io.netty:netty
> - Change a bit of test code that actually depended on Netty 3.x, to use 4.x 
> equivalent
> - Update SBT build accordingly
> A better change would be to update Akka far enough such that it agrees on 
> Netty 4.x, but I don't know if that's feasible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1644) The org.datanucleus:* should not be packaged into spark-assembly-*.jar

2014-05-15 Thread Guoqiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guoqiang Li updated SPARK-1644:
---

Fix Version/s: (was: 1.1.0)
   1.0.0

> The org.datanucleus:*  should not be packaged into spark-assembly-*.jar
> ---
>
> Key: SPARK-1644
> URL: https://issues.apache.org/jira/browse/SPARK-1644
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
> Fix For: 1.0.0
>
> Attachments: spark.log
>
>
> cat conf/hive-site.xml
> {code:xml}
> 
>   
> javax.jdo.option.ConnectionURL
> jdbc:postgresql://bj-java-hugedata1:7432/hive
>   
>   
> javax.jdo.option.ConnectionDriverName
> org.postgresql.Driver
>   
>   
> javax.jdo.option.ConnectionUserName
> hive
>   
>   
> javax.jdo.option.ConnectionPassword
> passwd
>   
>   
> hive.metastore.local
> false
>   
>   
> hive.metastore.warehouse.dir
> hdfs://host:8020/user/hive/warehouse
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1755) Spark-submit --name does not resolve to application name on YARN

2014-05-15 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-1755:
-

Description: 
In YARN client mode, --name is ignored because the deploy mode is client, and 
the name is for some reason a [cluster 
config|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L170)].

In YARN cluster mode, --name is passed to the 
org.apache.spark.deploy.yarn.Client as a command line argument. The Client 
class, however, uses this name only as the app name for the RM, but not for 
Spark. In other words, when SparkConf attempts to load default configs, 
application name is not set.

In both cases, passing --name to SparkSubmit does not actually cause Spark to 
adopt it as its application name, despite what the usage promises.

  was:
In YARN client mode, --name is ignored because the deploy mode is client, and 
the name is for some reason a cluster config. (See 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L170)

In YARN cluster mode, --name is passed to the 
org.apache.spark.deploy.yarn.Client as a command line argument. The Client 
class, however, uses this name only as the app name for the RM, but not for 
Spark. In other words, when SparkConf attempts to load default configs, 
application name is not set.

In both cases, passing --name to SparkSubmit does not actually cause Spark to 
adopt it as its application name, despite what the usage promises.


> Spark-submit --name does not resolve to application name on YARN
> 
>
> Key: SPARK-1755
> URL: https://issues.apache.org/jira/browse/SPARK-1755
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Andrew Or
> Fix For: 1.0.1
>
>
> In YARN client mode, --name is ignored because the deploy mode is client, and 
> the name is for some reason a [cluster 
> config|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L170)].
> In YARN cluster mode, --name is passed to the 
> org.apache.spark.deploy.yarn.Client as a command line argument. The Client 
> class, however, uses this name only as the app name for the RM, but not for 
> Spark. In other words, when SparkConf attempts to load default configs, 
> application name is not set.
> In both cases, passing --name to SparkSubmit does not actually cause Spark to 
> adopt it as its application name, despite what the usage promises.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1823) ExternalAppendOnlyMap can still OOM if one key is very large

2014-05-15 Thread Andrew Or (JIRA)
Andrew Or created SPARK-1823:


 Summary: ExternalAppendOnlyMap can still OOM if one key is very 
large
 Key: SPARK-1823
 URL: https://issues.apache.org/jira/browse/SPARK-1823
 Project: Spark
  Issue Type: Bug
Reporter: Andrew Or


If the values for one key do not collectively fit into memory, then the map 
will still OOM when you merge the spilled contents back in.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1764) EOF reached before Python server acknowledged

2014-05-15 Thread Bouke van der Bijl (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bouke van der Bijl updated SPARK-1764:
--

Description: 
I'm getting "EOF reached before Python server acknowledged" while using PySpark 
on Mesos. The error manifests itself in multiple ways. One is:

14/05/08 18:10:40 ERROR DAGSchedulerActorSupervisor: eventProcesserActor failed 
due to the error EOF reached before Python server acknowledged; shutting down 
SparkContext

And the other has a full stacktrace:

14/05/08 18:03:06 ERROR OneForOneStrategy: EOF reached before Python server 
acknowledged
org.apache.spark.SparkException: EOF reached before Python server acknowledged
at 
org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:416)
at 
org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:387)
at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:71)
at 
org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:279)
at 
org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:277)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at org.apache.spark.Accumulators$.add(Accumulators.scala:277)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:818)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1204)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

This error causes the SparkContext to shutdown. I have not been able to 
reliably reproduce this bug, it seems to happen randomly.

  was:
I'm getting "EOF reached before Python server acknowledged" while using PySpark 
on Mesos. The full error is:

14/05/08 18:10:40 ERROR DAGSchedulerActorSupervisor: eventProcesserActor failed 
due to the error EOF reached before Python server acknowledged; shutting down 
SparkContext

This error causes the SparkContext to shutdown. I have not been able to 
reliably reproduce this bug, it seems to happen randomly.


> EOF reached before Python server acknowledged
> -
>
> Key: SPARK-1764
> URL: https://issues.apache.org/jira/browse/SPARK-1764
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos, PySpark
>Affects Versions: 1.0.0
>Reporter: Bouke van der Bijl
>Priority: Critical
>  Labels: mesos, pyspark
>
> I'm getting "EOF reached before Python server acknowledged" while using 
> PySpark on Mesos. The error manifests itself in multiple ways. One is:
> 14/05/08 18:10:40 ERROR DAGSchedulerActorSupervisor: eventProcesserActor 
> failed due to the error EOF reached before Python server acknowledged; 
> shutting down SparkContext
> And the other has a full stacktrace:
> 14/05/08 18:03:06 ERROR OneForOneStrategy: EOF reached before Python server 
> acknowledged
> org.apache.spark.SparkException: EOF reached before Python server acknowledged
>   at 
> org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:416)
>   at 
> org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:387)
>   at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:71)
>   at 
> org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:279)
>   at 
> org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:277)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   a

[jira] [Updated] (SPARK-1825) Windows Spark fails to work with Linux YARN

2014-05-15 Thread Taeyun Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Taeyun Kim updated SPARK-1825:
--

Affects Version/s: 1.0.0

> Windows Spark fails to work with Linux YARN
> ---
>
> Key: SPARK-1825
> URL: https://issues.apache.org/jira/browse/SPARK-1825
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Taeyun Kim
>
> Windows Spark fails to work with Linux YARN.
> This is a cross-platform problem.
> On YARN side, Hadoop 2.4.0 resolved the issue as follows:
> https://issues.apache.org/jira/browse/YARN-1824
> But Spark YARN module does not incorporate the new YARN API yet, so problem 
> persists for Spark.
> First, the following source files should be changed:
> - /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala
> - 
> /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala
> Change is as follows:
> - Replace .$() to .$$()
> - Replace File.pathSeparator for Environment.CLASSPATH.name to 
> ApplicationConstants.CLASS_PATH_SEPARATOR (import 
> org.apache.hadoop.yarn.api.ApplicationConstants is required for this)
> Unless the above are applied, launch_container.sh will contain invalid shell 
> script statements(since they will contain Windows-specific separators), and 
> job will fail.
> Also, the following symptom should also be fixed (I could not find the 
> relevant source code):
> - SPARK_HOME environment variable is copied straight to launch_container.sh. 
> It should be changed to the path format for the server OS, or, the better, a 
> separate environment variable or a configuration variable should be created.
> - '%HADOOP_MAPRED_HOME%' string still exists in launch_container.sh, after 
> the above change is applied. maybe I missed a few lines.
> I'm not sure whether this is all, since I'm new to both Spark and YARN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1771) CoarseGrainedSchedulerBackend is not resilient to Akka restarts

2014-05-15 Thread Aaron Davidson (JIRA)
Aaron Davidson created SPARK-1771:
-

 Summary: CoarseGrainedSchedulerBackend is not resilient to Akka 
restarts
 Key: SPARK-1771
 URL: https://issues.apache.org/jira/browse/SPARK-1771
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Aaron Davidson


The exception reported in SPARK-1769 was propagated through the 
CoarseGrainedSchedulerBackend, and caused an Actor restart of the DriverActor. 
Unfortunately, this actor does not seem to have been written with Akka 
restartability in mind. For instance, the new DriverActor has lost all state 
about the prior Executors without cleanly disconnecting them. This means that 
the driver actually has executors attached to it, but doesn't think it does, 
which leads to mayhem of various sorts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1494) Hive Dependencies being checked by MIMA

2014-05-15 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-1494.
-

Resolution: Fixed

> Hive Dependencies being checked by MIMA
> ---
>
> Key: SPARK-1494
> URL: https://issues.apache.org/jira/browse/SPARK-1494
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra, SQL
>Affects Versions: 1.0.0
>Reporter: Ahir Reddy
>Assignee: Michael Armbrust
>Priority: Minor
> Fix For: 1.0.0
>
>
> It looks like code in companion objects is being invoked by the MIMA checker, 
> as it uses Scala reflection to check all of the interfaces. As a result it's 
> starting a Spark context and eventually out of memory errors. As a temporary 
> fix all classes that contain "hive" or "Hive" are excluded from the check.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1836) REPL $outer type mismatch causes lookup() and equals() problems

2014-05-15 Thread Michael Armbrust (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998312#comment-13998312
 ] 

Michael Armbrust commented on SPARK-1836:
-

This sounds like it could be related to [SPARK-1199]

> REPL $outer type mismatch causes lookup() and equals() problems
> ---
>
> Key: SPARK-1836
> URL: https://issues.apache.org/jira/browse/SPARK-1836
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Michael Malak
>
> Anand Avati partially traced the cause to REPL wrapping classes in $outer 
> classes. There are at least two major symptoms:
> 1. equals()
> =
> In REPL equals() (required in custom classes used as a key for groupByKey) 
> seems to have to be written using instanceOf[] instead of the canonical 
> match{}
> Spark Shell (equals uses match{}):
> {noformat}
> class C(val s:String) extends Serializable {
>   override def equals(o: Any) = o match {
> case that: C => that.s == s
> case _ => false
>   }
> }
> val x = new C("a")
> val bos = new java.io.ByteArrayOutputStream()
> val out = new java.io.ObjectOutputStream(bos)
> out.writeObject(x);
> val b = bos.toByteArray();
> out.close
> bos.close
> val y = new java.io.ObjectInputStream(new 
> ava.io.ByteArrayInputStream(b)).readObject().asInstanceOf[C]
> x.equals(y)
> res: Boolean = false
> {noformat}
> Spark Shell (equals uses isInstanceOf[]):
> {noformat}
> class C(val s:String) extends Serializable {
>   override def equals(o: Any) = if (o.isInstanceOf[C]) (o.asInstanceOf[C].s = 
> s) else false
> }
> val x = new C("a")
> val bos = new java.io.ByteArrayOutputStream()
> val out = new java.io.ObjectOutputStream(bos)
> out.writeObject(x);
> val b = bos.toByteArray();
> out.close
> bos.close
> val y = new java.io.ObjectInputStream(new 
> ava.io.ByteArrayInputStream(b)).readObject().asInstanceOf[C]
> x.equals(y)
> res: Boolean = true
> {noformat}
> Scala Shell (equals uses match{}):
> {noformat}
> class C(val s:String) extends Serializable {
>   override def equals(o: Any) = o match {
> case that: C => that.s == s
> case _ => false
>   }
> }
> val x = new C("a")
> val bos = new java.io.ByteArrayOutputStream()
> val out = new java.io.ObjectOutputStream(bos)
> out.writeObject(x);
> val b = bos.toByteArray();
> out.close
> bos.close
> val y = new java.io.ObjectInputStream(new 
> java.io.ByteArrayInputStream(b)).readObject().asInstanceOf[C]
> x.equals(y)
> res: Boolean = true
> {noformat}
> 2. lookup()
> =
> {noformat}
> class C(val s:String) extends Serializable {
>   override def equals(o: Any) = if (o.isInstanceOf[C]) o.asInstanceOf[C].s == 
> s else false
>   override def hashCode = s.hashCode
>   override def toString = s
> }
> val r = sc.parallelize(Array((new C("a"),11),(new C("a"),12)))
> r.lookup(new C("a"))
> :17: error: type mismatch;
>  found   : C
>  required: C
>   r.lookup(new C("a"))
>^
> {noformat}
> See
> http://mail-archives.apache.org/mod_mbox/spark-dev/201405.mbox/%3C1400019424.80629.YahooMailNeo%40web160801.mail.bf1.yahoo.com%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1696) RowMatrix.dspr is not using parameter alpha for DenseVector

2014-05-15 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-1696.
--

   Resolution: Fixed
Fix Version/s: 1.0.0

> RowMatrix.dspr is not using parameter alpha for DenseVector
> ---
>
> Key: SPARK-1696
> URL: https://issues.apache.org/jira/browse/SPARK-1696
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Reporter: Anish Patel
>Assignee: Xiangrui Meng
>Priority: Minor
> Fix For: 1.0.0
>
>
> In the master branch, method dspr of RowMatrix takes parameter alpha, but 
> does not use it when given a DenseVector.
> This probably slid by because when method computeGramianMatrix calls dspr, it 
> provides an alpha value of 1.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1843) Provide a simpler alternative to assemble-deps

2014-05-15 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-1843:
--

 Summary: Provide a simpler alternative to assemble-deps
 Key: SPARK-1843
 URL: https://issues.apache.org/jira/browse/SPARK-1843
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Patrick Wendell
Assignee: Prashant Sharma
 Fix For: 1.1.0


Right now we have the assemble-deps tool for speeding up local development.

I was thinking about a simpler solution to this problem where, instead of 
creating a fancy assembly jar, we just add an environment variable: 
USE_COMPILED_SPARK and, if that variable is present, we simply add the Spark 
classes on the classpath before the assembly jar. Since the compiled classes 
are on the classpath first, they will take precedence.

This would allow us to remove the entire assemble-deps build and associated 
logic in the bash scripts. We'd need to make sure it's propagated correctly 
during tests (like SPARK_TESTING) but other than that I think it should work.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1839) PySpark take() does not launch a Spark job when it has to

2014-05-15 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-1839:
-

 Summary: PySpark take() does not launch a Spark job when it has to
 Key: SPARK-1839
 URL: https://issues.apache.org/jira/browse/SPARK-1839
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.0.0
Reporter: Hossein Falaki


If you call take() or first() on a large FilteredRDD, the driver attempts to 
scan all partitions to find the first valid item. If the RDD is large this 
would fail or hang.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1791) SVM implementation does not use threshold parameter

2014-05-15 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-1791.
--

   Resolution: Fixed
Fix Version/s: 1.0.0

PR: https://github.com/apache/spark/pull/725

> SVM implementation does not use threshold parameter
> ---
>
> Key: SPARK-1791
> URL: https://issues.apache.org/jira/browse/SPARK-1791
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Andrew Tulloch
> Fix For: 1.0.0
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> The key error is in SVM.scala, in `predictPoint`
> ```
> threshold match {
>   case Some(t) => if (margin < 0.0) 0.0 else 1.0
>   case None => margin
> }
>  ```



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1754) Add missing arithmetic DSL operations.

2014-05-15 Thread Takuya Ueshin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992478#comment-13992478
 ] 

Takuya Ueshin commented on SPARK-1754:
--

Pull-requested: https://github.com/apache/spark/pull/689

> Add missing arithmetic DSL operations.
> --
>
> Key: SPARK-1754
> URL: https://issues.apache.org/jira/browse/SPARK-1754
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Takuya Ueshin
>
> Add missing arithmetic DSL operations: {{unary_-}}, {{%}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1787) Build failure on JDK8 :: SBT fails to load build configuration file

2014-05-15 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998492#comment-13998492
 ] 

Sean Owen commented on SPARK-1787:
--

Duplicate of https://issues.apache.org/jira/browse/SPARK-1444 it appears

> Build failure on JDK8 :: SBT fails to load build configuration file
> ---
>
> Key: SPARK-1787
> URL: https://issues.apache.org/jira/browse/SPARK-1787
> Project: Spark
>  Issue Type: New Feature
>  Components: Build
>Affects Versions: 0.9.0
> Environment: JDK8
> Scala 2.10.X
> SBT 0.12.X
>Reporter: Richard Gomes
>Priority: Minor
>
> SBT fails to build under JDK8.
> Please find steps to reproduce the error below:
> (j8s10)rgomes@terra:~/workspace/spark-0.9.1$ uname -a
> Linux terra 3.13-1-amd64 #1 SMP Debian 3.13.10-1 (2014-04-15) x86_64 GNU/Linux
> (j8s10)rgomes@terra:~/workspace/spark-0.9.1$ java -version
> java version "1.8.0_05"
> Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)
> (j8s10)rgomes@terra:~/workspace/spark-0.9.1$ scala -version
> Scala code runner version 2.10.3 -- Copyright 2002-2013, LAMP/EPFL
> (j8s10)rgomes@terra:~/workspace/spark-0.9.1$ sbt/sbt clean
> Launching sbt from sbt/sbt-launch-0.12.4.jar
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=350m; 
> support was removed in 8.0
> [info] Loading project definition from 
> /home/rgomes/workspace/spark-0.9.1/project/project
> [info] Compiling 1 Scala source to 
> /home/rgomes/workspace/spark-0.9.1/project/project/target/scala-2.9.2/sbt-0.12/classes...
> [error] error while loading CharSequence, class file 
> '/opt/developer/jdk1.8.0_05/jre/lib/rt.jar(java/lang/CharSequence.class)' is 
> broken
> [error] (bad constant pool tag 15 at byte 1501)
> [error] error while loading Comparator, class file 
> '/opt/developer/jdk1.8.0_05/jre/lib/rt.jar(java/util/Comparator.class)' is 
> broken
> [error] (bad constant pool tag 15 at byte 5003)
> [error] two errors found
> [error] (compile:compile) Compilation failed
> Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore? q



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (SPARK-1838) On a YARN cluster, Spark doesn't run on local mode

2014-05-15 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-1838.


Resolution: Not a Problem

Looks like I accidentally set SPARK_YARN_MODE to true manually, which directly 
conflicts with master being local mode. This isn't documented so users 
shouldn't be setting this variable anyway. Pas de problème.

> On a YARN cluster, Spark doesn't run on local mode
> --
>
> Key: SPARK-1838
> URL: https://issues.apache.org/jira/browse/SPARK-1838
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Andrew Or
> Fix For: 1.0.1
>
>
> Right now we throw an exception if YARN_LOCAL_DIRS is not set. However, we 
> may want to just run Spark in local mode, which doesn't even use this 
> environment variable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1770) repartition and coalesce(shuffle=true) put objects with the same key in the same bucket

2014-05-15 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-1770:


 Summary: repartition and coalesce(shuffle=true) put objects with 
the same key in the same bucket
 Key: SPARK-1770
 URL: https://issues.apache.org/jira/browse/SPARK-1770
 Project: Spark
  Issue Type: Bug
Affects Versions: 0.9.0, 1.0.0, 0.9.1
Reporter: Matei Zaharia
Priority: Blocker


This is bad when you have many identical objects. We should assign each one a 
random key.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1326) make-distribution.sh's Tachyon support relies on GNU sed

2014-05-15 Thread Sandeep Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992612#comment-13992612
 ] 

Sandeep Singh commented on SPARK-1326:
--

https://github.com/apache/spark/pull/264

> make-distribution.sh's Tachyon support relies on GNU sed
> 
>
> Key: SPARK-1326
> URL: https://issues.apache.org/jira/browse/SPARK-1326
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Reporter: Matei Zaharia
>Priority: Minor
> Fix For: 1.0.0
>
>
> It fails on Mac OS X, with {{sed: 1: "/Users/matei/ ...": invalid command 
> code m}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1778) Add 'limit' transformation to SchemaRDD.

2014-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1778:
---

Assignee: Takuya Ueshin

> Add 'limit' transformation to SchemaRDD.
> 
>
> Key: SPARK-1778
> URL: https://issues.apache.org/jira/browse/SPARK-1778
> Project: Spark
>  Issue Type: Improvement
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
> Fix For: 1.0.0
>
>
> Add {{limit}} transformation to {{SchemaRDD}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1841) update scalatest to version 2.1.5

2014-05-15 Thread Guoqiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guoqiang Li updated SPARK-1841:
---

Description: scalatest 1.9.* not support Scala 2.11  (was: scalatest 1.9.* 
not Scala 2.11)

> update scalatest to version 2.1.5
> -
>
> Key: SPARK-1841
> URL: https://issues.apache.org/jira/browse/SPARK-1841
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
>
> scalatest 1.9.* not support Scala 2.11



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1575) failing tests with master branch

2014-05-15 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996955#comment-13996955
 ] 

Sean Owen commented on SPARK-1575:
--

For what it's worth, I no longer see this failure I believe this has been 
resolved by other changes along the way.

> failing tests with master branch 
> -
>
> Key: SPARK-1575
> URL: https://issues.apache.org/jira/browse/SPARK-1575
> Project: Spark
>  Issue Type: Test
>Reporter: Nishkam Ravi
>Priority: Blocker
>
> Built the master branch against Hadoop version 2.3.0-cdh5.0.0 with 
> SPARK_YARN=true. sbt tests don't go through successfully (tried multiple 
> runs).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1767) Prefer HDFS-cached replicas when scheduling data-local tasks

2014-05-15 Thread Aaron Davidson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997761#comment-13997761
 ] 

Aaron Davidson commented on SPARK-1767:
---

One simple workaround to this is to just make sure that partitions that are in 
memory are ordered first in the list of partitions, as Spark will try to place 
executors based on the order in this list. This is, of course, not a complete 
solution, as we would not utilize the locality-wait logic within Spark and 
would immediately fallback to a non-cached node if the cached node was busy, 
rather than waiting for some period of time for the cached node to become 
available.

> Prefer HDFS-cached replicas when scheduling data-local tasks
> 
>
> Key: SPARK-1767
> URL: https://issues.apache.org/jira/browse/SPARK-1767
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Sandy Ryza
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1764) EOF reached before Python server acknowledged

2014-05-15 Thread Bouke van der Bijl (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bouke van der Bijl updated SPARK-1764:
--

Description: 
I'm getting "EOF reached before Python server acknowledged" while using PySpark 
on Mesos. The error manifests itself in multiple ways. One is:

14/05/08 18:10:40 ERROR DAGSchedulerActorSupervisor: eventProcesserActor failed 
due to the error EOF reached before Python server acknowledged; shutting down 
SparkContext

And the other has a full stacktrace:

14/05/08 18:03:06 ERROR OneForOneStrategy: EOF reached before Python server 
acknowledged
org.apache.spark.SparkException: EOF reached before Python server acknowledged
at 
org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:416)
at 
org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:387)
at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:71)
at 
org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:279)
at 
org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:277)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at org.apache.spark.Accumulators$.add(Accumulators.scala:277)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:818)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1204)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

This error causes the SparkContext to shutdown. I have not been able to 
reliably reproduce this bug, it seems to happen randomly, but if you run enough 
tasks on a SparkContext it'll hapen eventually

  was:
I'm getting "EOF reached before Python server acknowledged" while using PySpark 
on Mesos. The error manifests itself in multiple ways. One is:

14/05/08 18:10:40 ERROR DAGSchedulerActorSupervisor: eventProcesserActor failed 
due to the error EOF reached before Python server acknowledged; shutting down 
SparkContext

And the other has a full stacktrace:

14/05/08 18:03:06 ERROR OneForOneStrategy: EOF reached before Python server 
acknowledged
org.apache.spark.SparkException: EOF reached before Python server acknowledged
at 
org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:416)
at 
org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:387)
at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:71)
at 
org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:279)
at 
org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:277)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at org.apache.spark.Accumulators$.add(Accumulators.scala:277)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:818)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1204)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala

[jira] [Updated] (SPARK-1769) Executor loss can cause race condition in Pool

2014-05-15 Thread Aaron Davidson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Davidson updated SPARK-1769:
--

Description: 
Loss of executors (in this case due to OOMs) exposes a race condition in 
Pool.scala, evident from this stack trace:

{code}
14/05/08 22:41:48 ERROR OneForOneStrategy:
java.lang.NullPointerException
at 
org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87)
at 
org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.Pool.executorLost(Pool.scala:87)
at 
org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87)
at 
org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.Pool.executorLost(Pool.scala:87)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.removeExecutor(TaskSchedulerImpl.scala:412)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:385)
at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor.removeExecutor(CoarseGrainedSchedulerBackend.scala:160)
at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1$$anonfun$applyOrElse$5.apply(CoarseGrainedSchedulerBackend.scala:123)
at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1$$anonfun$applyOrElse$5.apply(CoarseGrainedSchedulerBackend.scala:123)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:123)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
{code}

Note that the line of code that throws this exception is here:
{code}
schedulableQueue.foreach(_.executorLost(executorId, host))
{code}

By the stack trace, it's not schedulableQueue that is null, but an element 
therein. As far as I could tell, we never add a null element to this queue. 
Rather, I could see that removeSchedulable() and executorLost() were called at 
about the same time (via log messages), and suspect that since this ArrayBuffer 
is in no way synchronized, that we iterate through the list while it's in an 
incomplete state.

  was:
Loss of executors (in this case due to OOMs) exposes a race condition in 
Pool.scala, evident from this stack trace:

{code}
14/05/08 22:41:48 ERROR OneForOneStrategy:
java.lang.NullPointerException
at 
org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87)
at 
org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.Pool.executorLost(Pool.scala:87)
at 
org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87)
at 
org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.Pool.executorLost(Pool.scala:87)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.removeExecutor(TaskSchedulerImpl.scala:412)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:385)
at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor.removeExecutor(CoarseGrainedSchedulerBackend.scala:160)
at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1$$anonfun$applyOrElse$5.apply(CoarseGrainedSchedulerBackend.scala:123)

[jira] [Resolved] (SPARK-1760) mvn -Dsuites=* test throw an ClassNotFoundException

2014-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1760.


   Resolution: Fixed
Fix Version/s: 1.0.0

Issue resolved by pull request 712
[https://github.com/apache/spark/pull/712]

>  mvn  -Dsuites=*  test throw an ClassNotFoundException
> --
>
> Key: SPARK-1760
> URL: https://issues.apache.org/jira/browse/SPARK-1760
> Project: Spark
>  Issue Type: Bug
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
> Fix For: 1.0.0
>
>
> {{mvn -Dhadoop.version=0.23.9 -Phadoop-0.23 
> -Dsuites=org.apache.spark.repl.ReplSuite test}} => 
> {code}
> *** RUN ABORTED ***
>   java.lang.ClassNotFoundException: org.apache.spark.repl.ReplSuite
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   at org.scalatest.tools.Runner$$anonfun$21.apply(Runner.scala:1470)
>   at org.scalatest.tools.Runner$$anonfun$21.apply(Runner.scala:1469)
>   at 
> scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1840) SparkListenerBus prints out scary error message when terminating normally

2014-05-15 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-1840:
-

Description: 
This is because the Scala's NonLocalReturnControl (which extends 
ControlThrowable) is being logged. However, this is expected when the 
SparkContext terminates.

(OP is TD)

  was:This is because the Scala's NonLocalReturnControl (which extends 
ControlThrowable) is being logged. However, this is expected when the 
SparkContext terminates.


> SparkListenerBus prints out scary error message when terminating normally
> -
>
> Key: SPARK-1840
> URL: https://issues.apache.org/jira/browse/SPARK-1840
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>
> This is because the Scala's NonLocalReturnControl (which extends 
> ControlThrowable) is being logged. However, this is expected when the 
> SparkContext terminates.
> (OP is TD)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1840) SparkListenerBus prints out scary error message when terminating normally

2014-05-15 Thread Andrew Or (JIRA)
Andrew Or created SPARK-1840:


 Summary: SparkListenerBus prints out scary error message when 
terminating normally
 Key: SPARK-1840
 URL: https://issues.apache.org/jira/browse/SPARK-1840
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Andrew Or


This is because the Scala's NonLocalReturnControl (which extends 
ControlThrowable) is being logged. However, this is expected when the 
SparkContext terminates.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1786) Kryo Serialization Error in GraphX

2014-05-15 Thread Joseph E. Gonzalez (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph E. Gonzalez updated SPARK-1786:
--

Description: 
The following code block will generate a serialization error when run in the 
spark-shell with Kryo enabled:

{code}
import org.apache.spark.storage._
import org.apache.spark.graphx._
import org.apache.spark.graphx.util._

val g = GraphGenerators.gridGraph(sc, 100, 100)
val e = g.edges
e.persist(StorageLevel.MEMORY_ONLY_SER)
e.collect().foreach(println(_)) // <- Runs successfully the first time.

// The following line will fail:
e.collect().foreach(println(_))
{code}

The following error is generated:

{code}
scala> e.collect().foreach(println(_))
14/05/09 18:31:13 INFO SparkContext: Starting job: collect at EdgeRDD.scala:59
14/05/09 18:31:13 INFO DAGScheduler: Got job 1 (collect at EdgeRDD.scala:59) 
with 8 output partitions (allowLocal=false)
14/05/09 18:31:13 INFO DAGScheduler: Final stage: Stage 1(collect at 
EdgeRDD.scala:59)
14/05/09 18:31:13 INFO DAGScheduler: Parents of final stage: List()
14/05/09 18:31:13 INFO DAGScheduler: Missing parents: List()
14/05/09 18:31:13 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[15] at map 
at EdgeRDD.scala:59), which has no missing parents
14/05/09 18:31:13 INFO DAGScheduler: Submitting 8 missing tasks from Stage 1 
(MappedRDD[15] at map at EdgeRDD.scala:59)
14/05/09 18:31:13 INFO TaskSchedulerImpl: Adding task set 1.0 with 8 tasks
14/05/09 18:31:13 INFO TaskSetManager: Starting task 1.0:0 as TID 8 on executor 
localhost: localhost (PROCESS_LOCAL)
14/05/09 18:31:13 INFO TaskSetManager: Serialized task 1.0:0 as 1779 bytes in 3 
ms
14/05/09 18:31:13 INFO TaskSetManager: Starting task 1.0:1 as TID 9 on executor 
localhost: localhost (PROCESS_LOCAL)
14/05/09 18:31:13 INFO TaskSetManager: Serialized task 1.0:1 as 1779 bytes in 4 
ms
14/05/09 18:31:13 INFO TaskSetManager: Starting task 1.0:2 as TID 10 on 
executor localhost: localhost (PROCESS_LOCAL)
14/05/09 18:31:13 INFO TaskSetManager: Serialized task 1.0:2 as 1779 bytes in 4 
ms
14/05/09 18:31:13 INFO TaskSetManager: Starting task 1.0:3 as TID 11 on 
executor localhost: localhost (PROCESS_LOCAL)
14/05/09 18:31:13 INFO TaskSetManager: Serialized task 1.0:3 as 1779 bytes in 4 
ms
14/05/09 18:31:13 INFO TaskSetManager: Starting task 1.0:4 as TID 12 on 
executor localhost: localhost (PROCESS_LOCAL)
14/05/09 18:31:13 INFO TaskSetManager: Serialized task 1.0:4 as 1779 bytes in 3 
ms
14/05/09 18:31:13 INFO TaskSetManager: Starting task 1.0:5 as TID 13 on 
executor localhost: localhost (PROCESS_LOCAL)
14/05/09 18:31:13 INFO TaskSetManager: Serialized task 1.0:5 as 1782 bytes in 4 
ms
14/05/09 18:31:13 INFO TaskSetManager: Starting task 1.0:6 as TID 14 on 
executor localhost: localhost (PROCESS_LOCAL)
14/05/09 18:31:13 INFO TaskSetManager: Serialized task 1.0:6 as 1783 bytes in 4 
ms
14/05/09 18:31:13 INFO TaskSetManager: Starting task 1.0:7 as TID 15 on 
executor localhost: localhost (PROCESS_LOCAL)
14/05/09 18:31:13 INFO TaskSetManager: Serialized task 1.0:7 as 1783 bytes in 4 
ms
14/05/09 18:31:13 INFO Executor: Running task ID 9
14/05/09 18:31:13 INFO Executor: Running task ID 8
14/05/09 18:31:13 INFO Executor: Running task ID 11
14/05/09 18:31:13 INFO Executor: Running task ID 14
14/05/09 18:31:13 INFO Executor: Running task ID 10
14/05/09 18:31:13 INFO Executor: Running task ID 13
14/05/09 18:31:13 INFO Executor: Running task ID 15
14/05/09 18:31:13 INFO Executor: Running task ID 12
14/05/09 18:31:13 INFO BlockManager: Found block rdd_12_6 locally
14/05/09 18:31:13 INFO BlockManager: Found block rdd_12_4 locally
14/05/09 18:31:13 INFO BlockManager: Found block rdd_12_2 locally
14/05/09 18:31:13 INFO BlockManager: Found block rdd_12_7 locally
14/05/09 18:31:13 INFO BlockManager: Found block rdd_12_1 locally
14/05/09 18:31:13 INFO BlockManager: Found block rdd_12_3 locally
14/05/09 18:31:13 INFO BlockManager: Found block rdd_12_0 locally
14/05/09 18:31:13 INFO BlockManager: Found block rdd_12_5 locally
14/05/09 18:31:13 ERROR Executor: Exception in task ID 13
java.lang.NullPointerException
at 
org.apache.spark.graphx.impl.EdgePartition$$anon$1.next(EdgePartition.scala:269)
at 
org.apache.spark.graphx.impl.EdgePartition$$anon$1.next(EdgePartition.scala:262)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at sca

[jira] [Created] (SPARK-1786) Kryo Serialization Error in GraphX

2014-05-15 Thread Joseph E. Gonzalez (JIRA)
Joseph E. Gonzalez created SPARK-1786:
-

 Summary: Kryo Serialization Error in GraphX
 Key: SPARK-1786
 URL: https://issues.apache.org/jira/browse/SPARK-1786
 Project: Spark
  Issue Type: Bug
  Components: GraphX
Affects Versions: 1.0.0
Reporter: Joseph E. Gonzalez


The following code block will generate a serialization error when run in the 
spark-shell with Kryo enabled:

import org.apache.spark.storage._
import org.apache.spark.graphx._
import org.apache.spark.graphx.util._

val g = GraphGenerators.gridGraph(sc, 100, 100)
val e = g.edges
e.persist(StorageLevel.MEMORY_ONLY_SER)
e.collect().foreach(println(_))
e.collect().foreach(println(_))




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1620) Uncaught exception from Akka scheduler

2014-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1620.


   Resolution: Fixed
Fix Version/s: 1.0.0

Issue resolved by pull request 622
[https://github.com/apache/spark/pull/622]

> Uncaught exception from Akka scheduler
> --
>
> Key: SPARK-1620
> URL: https://issues.apache.org/jira/browse/SPARK-1620
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 0.9.0, 1.0.0
>Reporter: Mark Hamstra
>Assignee: Mark Hamstra
>Priority: Blocker
> Fix For: 1.0.0
>
>
> I've been looking at this one in the context of a BlockManagerMaster that 
> OOMs and doesn't respond to heartBeat(), but I suspect that there may be 
> problems elsewhere where we use Akka's scheduler.
> The basic nature of the problem is that we are expecting exceptions thrown 
> from a scheduled function to be caught in the thread where 
> _ActorSystem_.scheduler.schedule() or scheduleOnce() has been called.  In 
> fact, the scheduled function runs on its own thread, so any exceptions that 
> it throws are not caught in the thread that called schedule() -- e.g., 
> unanswered BlockManager heartBeats (scheduled in BlockManager#initialize) 
> that end up throwing exceptions in BlockManagerMaster#askDriverWithReply do 
> not cause those exceptions to be handled by the Executor thread's 
> UncaughtExceptionHandler. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (SPARK-1776) Have Spark's SBT build read dependencies from Maven

2014-05-15 Thread Guoqiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993407#comment-13993407
 ] 

Guoqiang Li edited comment on SPARK-1776 at 5/9/14 5:39 AM:


Even so, there are many maintenance costs.We should not use two build tools at 
the same time, only use maven is better



was (Author: gq):
But I would think only use maven is better

> Have Spark's SBT build read dependencies from Maven
> ---
>
> Key: SPARK-1776
> URL: https://issues.apache.org/jira/browse/SPARK-1776
> Project: Spark
>  Issue Type: New Feature
>  Components: Build
>Reporter: Patrick Wendell
>Assignee: Prashant Sharma
> Fix For: 1.1.0
>
>
> We've wanted to consolidate Spark's build for a while see 
> [here|http://mail-archives.apache.org/mod_mbox/spark-dev/201307.mbox/%3c39343fa4-3cf4-4349-99e7-2b20e1aed...@gmail.com%3E]
>  and 
> [here|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html].
> I'd like to propose using the sbt-pom-reader plug-in to allow us to keep our 
> sbt build (for ease of development) while also holding onto our Maven build 
> which almost all downstream packagers use.
> I've prototyped this a bit locally and I think it's do-able, but will require 
> making some contributions to the sbt-pom-reader plugin. Josh Suereth who 
> maintains both sbt and the plug-in has agreed to help merge any patches we 
> need for this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1781) Generalized validity checking for configuration parameters

2014-05-15 Thread William Benton (JIRA)
William Benton created SPARK-1781:
-

 Summary: Generalized validity checking for configuration parameters
 Key: SPARK-1781
 URL: https://issues.apache.org/jira/browse/SPARK-1781
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: William Benton
Priority: Minor


Issues like SPARK-1779 could be handled easily by a general mechanism for 
specifying whether or not a configuration parameter value is valid or not (and 
then excepting or warning and switching to a default value if it is not).  I 
think it's possible to do this in a fairly lightweight fashion.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1775) Unneeded lock in ShuffleMapTask.deserializeInfo

2014-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1775.


   Resolution: Fixed
Fix Version/s: 1.0.0

Issue resolved by pull request 707
[https://github.com/apache/spark/pull/707]

> Unneeded lock in ShuffleMapTask.deserializeInfo
> ---
>
> Key: SPARK-1775
> URL: https://issues.apache.org/jira/browse/SPARK-1775
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 0.9.0, 1.0.0, 0.9.1
>Reporter: Matei Zaharia
>Assignee: Sandeep Singh
>  Labels: Starter
> Fix For: 1.0.0
>
>
> This was used in the past to have a cache of deserialized ShuffleMapTasks, 
> but that's been removed, so there's no need for a lock. It slows down Spark 
> when task descriptions are large, e.g. due to large lineage graphs or local 
> variables.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1755) Spark-submit --name does not resolve to application name on YARN

2014-05-15 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993108#comment-13993108
 ] 

Thomas Graves commented on SPARK-1755:
--

I believe this is a dup of SPARK-1664
spark-submit --name doesn't work in yarn-client mode

> Spark-submit --name does not resolve to application name on YARN
> 
>
> Key: SPARK-1755
> URL: https://issues.apache.org/jira/browse/SPARK-1755
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 1.0.0
>
>
> In YARN client mode, --name is ignored because the deploy mode is client, and 
> the name is for some reason a [cluster 
> config|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L170)].
> In YARN cluster mode, --name is passed to the 
> org.apache.spark.deploy.yarn.Client as a command line argument. The Client 
> class, however, uses this name only as the [app name for the 
> RM|https://github.com/apache/spark/blob/master/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L80],
>  but not for Spark. In other words, when SparkConf attempts to load default 
> configs, application name is not set.
> In both cases, passing --name to SparkSubmit does not actually cause Spark to 
> adopt it as its application name, despite what the usage promises.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1767) Prefer HDFS-cached replicas when scheduling data-local tasks

2014-05-15 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993909#comment-13993909
 ] 

Sandy Ryza commented on SPARK-1767:
---

Currently, RDDs only support a single level of location preference through 
RDD#preferredLocations(split), which returns a sequence of strings.  To prefer 
cached-replicas, this needs to be extended in some way.  We could deprecate 
preferredLocations and add a preferredLocations(split, storageType), where 
storageType is MEMORY, DISK, and eventually FLASH?  Maybe more hackily, we 
could give the location strings a prefix like "inmem:" that specifies the 
storage type.

> Prefer HDFS-cached replicas when scheduling data-local tasks
> 
>
> Key: SPARK-1767
> URL: https://issues.apache.org/jira/browse/SPARK-1767
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Sandy Ryza
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1631) App name set in SparkConf (not in JVM properties) not respected by Yarn backend

2014-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1631:
---

Priority: Blocker  (was: Major)

> App name set in SparkConf (not in JVM properties) not respected by Yarn 
> backend
> ---
>
> Key: SPARK-1631
> URL: https://issues.apache.org/jira/browse/SPARK-1631
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.0.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Blocker
> Fix For: 1.0.0
>
>
> When you submit an application that sets its name using a SparkContext 
> constructor or SparkConf.setAppName(), the Yarn app name is not set and the 
> app shows up as "Spark" in the RM UI.
> That's because YarnClientSchedulerBackend only looks at the system properties 
> to look for the app name, instead of looking at the app's config.
> e.g., app initializes like this:
> {code}
> val sc = new SparkContext(new SparkConf().setAppName("Blah"));
> {code}
> Start app like this:
> {noformat}
>   ./bin/spark-submit --master yarn --deploy-mode client blah blah blah
> {noformat}
> And app name in RM UI does not reflect the code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1433) Upgrade Mesos dependency to 0.17.0

2014-05-15 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993896#comment-13993896
 ] 

Timothy St. Clair commented on SPARK-1433:
--

Likely want to aim higher at this point, perhaps 0.18.1

> Upgrade Mesos dependency to 0.17.0
> --
>
> Key: SPARK-1433
> URL: https://issues.apache.org/jira/browse/SPARK-1433
> Project: Spark
>  Issue Type: Task
>Reporter: Sandeep Singh
>Assignee: Sandeep Singh
>Priority: Minor
> Fix For: 1.0.0
>
>
> Mesos 0.13.0 was released 6 months ago.
> Upgrade Mesos dependency to 0.17.0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1775) Unneeded lock in ShuffleMapTask.deserializeInfo

2014-05-15 Thread Matei Zaharia (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated SPARK-1775:
-

Fix Version/s: 0.9.2

> Unneeded lock in ShuffleMapTask.deserializeInfo
> ---
>
> Key: SPARK-1775
> URL: https://issues.apache.org/jira/browse/SPARK-1775
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 0.9.0, 1.0.0, 0.9.1
>Reporter: Matei Zaharia
>Assignee: Sandeep Singh
>Priority: Critical
>  Labels: Starter
> Fix For: 1.0.0, 0.9.2
>
>
> This was used in the past to have a cache of deserialized ShuffleMapTasks, 
> but that's been removed, so there's no need for a lock. It slows down Spark 
> when task descriptions are large, e.g. due to large lineage graphs or local 
> variables.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1780) Non-existent SPARK_DAEMON_OPTS is referred to in a few places

2014-05-15 Thread Andrew Or (JIRA)
Andrew Or created SPARK-1780:


 Summary: Non-existent SPARK_DAEMON_OPTS is referred to in a few 
places
 Key: SPARK-1780
 URL: https://issues.apache.org/jira/browse/SPARK-1780
 Project: Spark
  Issue Type: Bug
Affects Versions: 0.9.1
Reporter: Andrew Or
 Fix For: 1.0.0


SparkConf.scala and spark-env.sh refer to a non-existent SPARK_DAEMON_OPTS. 
What they really mean SPARK_DAEMON_JAVA_OPTS.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1767) Prefer HDFS-cached replicas when scheduling data-local tasks

2014-05-15 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-1767:
-

 Summary: Prefer HDFS-cached replicas when scheduling data-local 
tasks
 Key: SPARK-1767
 URL: https://issues.apache.org/jira/browse/SPARK-1767
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Sandy Ryza






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1757) Support saving null primitives with .saveAsParquetFile()

2014-05-15 Thread Andrew Ash (JIRA)
Andrew Ash created SPARK-1757:
-

 Summary: Support saving null primitives with .saveAsParquetFile()
 Key: SPARK-1757
 URL: https://issues.apache.org/jira/browse/SPARK-1757
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.0
Reporter: Andrew Ash


See stack trace below:

{noformat}
14/05/07 21:45:51 INFO analysis.Analyzer: Max iterations (2) reached for batch 
MultiInstanceRelations
14/05/07 21:45:51 INFO analysis.Analyzer: Max iterations (2) reached for batch 
CaseInsensitiveAttributeReferences
14/05/07 21:45:51 INFO optimizer.Optimizer$: Max iterations (2) reached for 
batch ConstantFolding
14/05/07 21:45:51 INFO optimizer.Optimizer$: Max iterations (2) reached for 
batch Filter Pushdown
java.lang.RuntimeException: Unsupported datatype StructType(List())
at scala.sys.package$.error(package.scala:27)
at 
org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetRelation.scala:201)
at 
org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$1.apply(ParquetRelation.scala:235)
at 
org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$1.apply(ParquetRelation.scala:235)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at 
org.apache.spark.sql.parquet.ParquetTypesConverter$.convertFromAttributes(ParquetRelation.scala:234)
at 
org.apache.spark.sql.parquet.ParquetTypesConverter$.writeMetaData(ParquetRelation.scala:267)
at 
org.apache.spark.sql.parquet.ParquetRelation$.createEmpty(ParquetRelation.scala:143)
at 
org.apache.spark.sql.parquet.ParquetRelation$.create(ParquetRelation.scala:122)
at 
org.apache.spark.sql.execution.SparkStrategies$ParquetOperations$.apply(SparkStrategies.scala:139)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:264)
at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:264)
at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:265)
at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:265)
at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:268)
at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:268)
at 
org.apache.spark.sql.SchemaRDDLike$class.saveAsParquetFile(SchemaRDDLike.scala:66)
at org.apache.spark.sql.SchemaRDD.saveAsParquetFile(SchemaRDD.scala:96)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1842) update scala-logging-slf4j to version 2.1.2

2014-05-15 Thread Guoqiang Li (JIRA)
Guoqiang Li created SPARK-1842:
--

 Summary: update scala-logging-slf4j to version 2.1.2
 Key: SPARK-1842
 URL: https://issues.apache.org/jira/browse/SPARK-1842
 Project: Spark
  Issue Type: Sub-task
Reporter: Guoqiang Li


 scala-logging-slf4j 1.0.1  not support Scala 2.11



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1433) Upgrade Mesos dependency to 0.17.0

2014-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1433.


Resolution: Duplicate

This is subsumed by SPARK-1806.

> Upgrade Mesos dependency to 0.17.0
> --
>
> Key: SPARK-1433
> URL: https://issues.apache.org/jira/browse/SPARK-1433
> Project: Spark
>  Issue Type: Task
>Reporter: Sandeep Singh
>Assignee: Sandeep Singh
>Priority: Minor
> Fix For: 1.0.0
>
>
> Mesos 0.13.0 was released 6 months ago.
> Upgrade Mesos dependency to 0.17.0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1436) Compression code broke in-memory store

2014-05-15 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-1436:
---

Description: 
Try run the following code:

{code}

package org.apache.spark.sql

import org.apache.spark.sql.test.TestSQLContext._
import org.apache.spark.sql.catalyst.util._

case class Data(a: Int, b: Long)

object AggregationBenchmark {
  def main(args: Array[String]): Unit = {
val rdd =
  sparkContext.parallelize(1 to 20).flatMap(_ => (1 to 50).map(i => 
Data(i % 100, i)))
rdd.registerAsTable("data")
cacheTable("data")

(1 to 10).foreach { i =>
  println(s"=== ITERATION $i ===")

  benchmark { println("SELECT COUNT() FROM data:" + sql("SELECT COUNT(*) 
FROM data").collect().head) }

  println("SELECT a, SUM(b) FROM data GROUP BY a")
  benchmark { sql("SELECT a, SUM(b) FROM data GROUP BY a").count() }

  println("SELECT SUM(b) FROM data")
  benchmark { sql("SELECT SUM(b) FROM data").count() }
}
  }
}
{code}

The following exception is thrown:
{code}
java.nio.BufferUnderflowException
at java.nio.Buffer.nextGetIndex(Buffer.java:498)
at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:355)
at 
org.apache.spark.sql.columnar.ColumnAccessor$.apply(ColumnAccessor.scala:103)
at 
org.apache.spark.sql.columnar.InMemoryColumnarTableScan$$anonfun$execute$1$$anon$1$$anonfun$3.apply(InMemoryColumnarTableScan.scala:61)
at 
org.apache.spark.sql.columnar.InMemoryColumnarTableScan$$anonfun$execute$1$$anon$1$$anonfun$3.apply(InMemoryColumnarTableScan.scala:61)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at 
org.apache.spark.sql.columnar.InMemoryColumnarTableScan$$anonfun$execute$1$$anon$1.(InMemoryColumnarTableScan.scala:61)
at 
org.apache.spark.sql.columnar.InMemoryColumnarTableScan$$anonfun$execute$1.apply(InMemoryColumnarTableScan.scala:60)
at 
org.apache.spark.sql.columnar.InMemoryColumnarTableScan$$anonfun$execute$1.apply(InMemoryColumnarTableScan.scala:56)
at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:504)
at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:504)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:229)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:220)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:229)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:220)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:229)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:220)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
at org.apache.spark.scheduler.Task.run(Task.scala:52)
at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)
at 
org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:46)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/04/07 12:07:38 WARN TaskSetManager: Lost TID 3 (task 4.0:0)
14/04/07 12:07:38 WARN TaskSetManager: Loss was due to 
java.nio.BufferUnderflowException
java.nio.BufferUnderflowException
at java.nio.Buffer.nextGetIndex(Buffer.java:498)
at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:355)
at 
org.apache.spark.sql.columnar.ColumnAccessor$.apply(ColumnAccessor.scala:103)
at 
org.apache.spark.sql.columnar.InMemoryColumnarTableScan$$anonfun$execute$1$$anon$1$$anonfun$3.apply(InMemoryColumnarTableScan.scala:61)
at 
org.apache.spark.sql.columnar.InMemoryColumnarTableScan$$anonfun$execute$1$$anon$1$$anonfun$3.apply(InMemoryColumnarTableScan.scala:61)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 

[jira] [Resolved] (SPARK-1778) Add 'limit' transformation to SchemaRDD.

2014-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1778.


   Resolution: Fixed
Fix Version/s: 1.0.0

Issue resolved by pull request 711
[https://github.com/apache/spark/pull/711]

> Add 'limit' transformation to SchemaRDD.
> 
>
> Key: SPARK-1778
> URL: https://issues.apache.org/jira/browse/SPARK-1778
> Project: Spark
>  Issue Type: Improvement
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
> Fix For: 1.0.0
>
>
> Add {{limit}} transformation to {{SchemaRDD}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1779) Warning when spark.storage.memoryFraction is not between 0 and 1

2014-05-15 Thread Erik Erlandson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993901#comment-13993901
 ] 

Erik Erlandson commented on SPARK-1779:
---

I'll volunteer to take this, can somebody assign it to me?

> Warning when spark.storage.memoryFraction is not between 0 and 1
> 
>
> Key: SPARK-1779
> URL: https://issues.apache.org/jira/browse/SPARK-1779
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 0.9.0, 1.0.0
>Reporter: wangfei
> Fix For: 1.1.0
>
>
> There should be a warning when memoryFraction is lower than 0 or greater than 
> 1



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1758) failing test org.apache.spark.JavaAPISuite.wholeTextFiles

2014-05-15 Thread Nishkam Ravi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishkam Ravi updated SPARK-1758:


Attachment: SPARK-1758.patch

> failing test org.apache.spark.JavaAPISuite.wholeTextFiles
> -
>
> Key: SPARK-1758
> URL: https://issues.apache.org/jira/browse/SPARK-1758
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 1.0.0
>Reporter: Nishkam Ravi
> Fix For: 1.0.0
>
> Attachments: SPARK-1758.patch
>
>
> Test org.apache.spark.JavaAPISuite.wholeTextFiles fails (during sbt/sbt test) 
> with the following error message:
> Test org.apache.spark.JavaAPISuite.wholeTextFiles failed: 
> java.lang.AssertionError: expected: but was:



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1696) RowMatrix.dspr is not using parameter alpha for DenseVector

2014-05-15 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998127#comment-13998127
 ] 

Xiangrui Meng commented on SPARK-1696:
--

Thanks! I sent a PR: https://github.com/apache/spark/pull/778

> RowMatrix.dspr is not using parameter alpha for DenseVector
> ---
>
> Key: SPARK-1696
> URL: https://issues.apache.org/jira/browse/SPARK-1696
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Reporter: Anish Patel
>Assignee: Xiangrui Meng
>Priority: Minor
>
> In the master branch, method dspr of RowMatrix takes parameter alpha, but 
> does not use it when given a DenseVector.
> This probably slid by because when method computeGramianMatrix calls dspr, it 
> provides an alpha value of 1.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1635) Java API docs do not show annotation.

2014-05-15 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-1635:
-

Priority: Minor  (was: Major)

> Java API docs do not show annotation.
> -
>
> Key: SPARK-1635
> URL: https://issues.apache.org/jira/browse/SPARK-1635
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.0.0
>Reporter: Xiangrui Meng
>Priority: Minor
>
> The generated Java API docs do not contain Developer/Experimental 
> annotations. The :: Developer/Experimental :: tag is in the generated doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1788) Upgrade Parquet to 1.4.3

2014-05-15 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-1788.
-

Resolution: Fixed

> Upgrade Parquet to 1.4.3
> 
>
> Key: SPARK-1788
> URL: https://issues.apache.org/jira/browse/SPARK-1788
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>
> https://github.com/apache/spark/pull/684



--
This message was sent by Atlassian JIRA
(v6.2#6252)