[jira] [Commented] (SPARK-1821) Document History Server
[ https://issues.apache.org/jira/browse/SPARK-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996755#comment-13996755 ] Andrew Or commented on SPARK-1821: -- Yes, it should be documented under "monitoring.html" in the latest branch > Document History Server > --- > > Key: SPARK-1821 > URL: https://issues.apache.org/jira/browse/SPARK-1821 > Project: Spark > Issue Type: Improvement > Components: Deploy >Affects Versions: 1.0.0 >Reporter: Nan Zhu > > In 1.0, there is a new component, history server, which is not mentioned in > http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/ > I think we'd better add the missing document -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1770) repartition and coalesce(shuffle=true) put objects with the same key in the same bucket
[ https://issues.apache.org/jira/browse/SPARK-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993908#comment-13993908 ] Aaron Davidson commented on SPARK-1770: --- Ah, that PR seems unrelated. > repartition and coalesce(shuffle=true) put objects with the same key in the > same bucket > --- > > Key: SPARK-1770 > URL: https://issues.apache.org/jira/browse/SPARK-1770 > Project: Spark > Issue Type: Bug >Affects Versions: 0.9.0, 1.0.0, 0.9.1 >Reporter: Matei Zaharia >Priority: Blocker > Labels: Starter > Fix For: 1.0.0 > > > This is bad when you have many identical objects. We should assign each one a > random key. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1664) spark-submit --name doesn't work in yarn-client mode
[ https://issues.apache.org/jira/browse/SPARK-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1664. Resolution: Duplicate > spark-submit --name doesn't work in yarn-client mode > > > Key: SPARK-1664 > URL: https://issues.apache.org/jira/browse/SPARK-1664 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 1.0.0 >Reporter: Thomas Graves >Priority: Blocker > > When using spark-submit in yarn-client mode, the --name option doesn't > properly set the application name in either the ResourceManager UI. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1760) mvn -Dsuites=* test throw an ClassNotFoundException
Guoqiang Li created SPARK-1760: -- Summary: mvn -Dsuites=* test throw an ClassNotFoundException Key: SPARK-1760 URL: https://issues.apache.org/jira/browse/SPARK-1760 Project: Spark Issue Type: Bug Reporter: Guoqiang Li {{mvn -Dhadoop.version=0.23.9 -Phadoop-0.23 -Dsuites=org.apache.spark.repl.ReplSuite test}} => {code} *** RUN ABORTED *** java.lang.ClassNotFoundException: org.apache.spark.repl.ReplSuite at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at org.scalatest.tools.Runner$$anonfun$21.apply(Runner.scala:1470) at org.scalatest.tools.Runner$$anonfun$21.apply(Runner.scala:1469) at scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264) at scala.collection.immutable.List.foreach(List.scala:318) ... {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1755) Spark-submit --name does not resolve to application name on YARN
Andrew Or created SPARK-1755: Summary: Spark-submit --name does not resolve to application name on YARN Key: SPARK-1755 URL: https://issues.apache.org/jira/browse/SPARK-1755 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Andrew Or Fix For: 1.0.1 In YARN client mode, --name is ignored because the deploy mode is client, and the name is for some reason a cluster config. (See https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L170) In YARN cluster mode, --name is passed to the org.apache.spark.deploy.yarn.Client as a command line argument. The Client class, however, uses this name only as the app name for the RM, but not for Spark. In other words, when SparkConf attempts to load default configs, application name is not set. In both cases, passing --name to SparkSubmit does not actually cause Spark to adopt it as its application name, despite what the usage promises. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-1326) make-distribution.sh's Tachyon support relies on GNU sed
[ https://issues.apache.org/jira/browse/SPARK-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992612#comment-13992612 ] Sandeep Singh edited comment on SPARK-1326 at 5/8/14 9:16 AM: -- fixed by PR: https://github.com/apache/spark/pull/264 was (Author: techaddict): https://github.com/apache/spark/pull/264 > make-distribution.sh's Tachyon support relies on GNU sed > > > Key: SPARK-1326 > URL: https://issues.apache.org/jira/browse/SPARK-1326 > Project: Spark > Issue Type: Bug > Components: Deploy >Reporter: Matei Zaharia >Priority: Minor > Fix For: 1.0.0 > > > It fails on Mac OS X, with {{sed: 1: "/Users/matei/ ...": invalid command > code m}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1500) add with-hive argument to make-distribution.sh
[ https://issues.apache.org/jira/browse/SPARK-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li resolved SPARK-1500. Resolution: Fixed > add with-hive argument to make-distribution.sh > --- > > Key: SPARK-1500 > URL: https://issues.apache.org/jira/browse/SPARK-1500 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 1.0.0 >Reporter: Guoqiang Li > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1827) LICENSE and NOTICE files need a refresh to contain transitive dependency info
[ https://issues.apache.org/jira/browse/SPARK-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1827. Resolution: Fixed Assignee: Sean Owen Fixed by: https://github.com/apache/spark/pull/770 > LICENSE and NOTICE files need a refresh to contain transitive dependency info > - > > Key: SPARK-1827 > URL: https://issues.apache.org/jira/browse/SPARK-1827 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 0.9.1 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Blocker > Fix For: 1.0.0 > > > (Pardon marking it a blocker, but think it needs doing before 1.0 per chat > with [~pwendell]) > The LICENSE and NOTICE files need to cover all transitive dependencies, since > these are all distributed in the assembly jar. (c.f. > http://www.apache.org/dev/licensing-howto.html ) > I don't believe the current files cover everything. It's possible to > mostly-automatically generate these. I will generate this and propose a patch > to both today. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1818) Freshen Mesos docs
[ https://issues.apache.org/jira/browse/SPARK-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1818. Resolution: Fixed Fix Version/s: 1.0.0 Issue resolved by pull request 756 [https://github.com/apache/spark/pull/756] > Freshen Mesos docs > -- > > Key: SPARK-1818 > URL: https://issues.apache.org/jira/browse/SPARK-1818 > Project: Spark > Issue Type: Documentation > Components: Documentation, Mesos >Affects Versions: 1.0.0 >Reporter: Andrew Ash > Fix For: 1.0.0 > > > They haven't been updated since 0.6.0 and encourage compiling both Mesos and > Spark from scratch. Include mention of the precompiled binary versions of > both projects available and otherwise generally freshen the documentation for > Mesos newcomers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1838) On a YARN cluster, Spark doesn't run on local mode
[ https://issues.apache.org/jira/browse/SPARK-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-1838: - Fix Version/s: 1.0.1 > On a YARN cluster, Spark doesn't run on local mode > -- > > Key: SPARK-1838 > URL: https://issues.apache.org/jira/browse/SPARK-1838 > Project: Spark > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Andrew Or > Fix For: 1.0.1 > > > Right now we throw an exception if YARN_LOCAL_DIRS is not set. However, we > may want to just run Spark in local mode, which doesn't even use this > environment variable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1838) On a YARN cluster, Spark doesn't run on local mode
Andrew Or created SPARK-1838: Summary: On a YARN cluster, Spark doesn't run on local mode Key: SPARK-1838 URL: https://issues.apache.org/jira/browse/SPARK-1838 Project: Spark Issue Type: Bug Reporter: Andrew Or Right now we throw an exception if YARN_LOCAL_DIRS is not set. However, we may want to just run Spark in local mode, which doesn't even use this environment variable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1840) SparkListenerBus prints out scary error message when terminating normally
[ https://issues.apache.org/jira/browse/SPARK-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1840: --- Assignee: Tathagata Das > SparkListenerBus prints out scary error message when terminating normally > - > > Key: SPARK-1840 > URL: https://issues.apache.org/jira/browse/SPARK-1840 > Project: Spark > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Tathagata Das > Fix For: 1.0.0 > > > This is because the Scala's NonLocalReturnControl (which extends > ControlThrowable) is being logged. However, this is expected when the > SparkContext terminates. > (OP is TD) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1473) Feature selection for high dimensional datasets
[ https://issues.apache.org/jira/browse/SPARK-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998494#comment-13998494 ] Sean Owen commented on SPARK-1473: -- I believe these types of thing were more the goals of the MLI and MLbase projects rather than MLlib? I don't know the status of those. For what it's worth I think these are very useful things but in a separate 'layer' above something like MLlib. > Feature selection for high dimensional datasets > --- > > Key: SPARK-1473 > URL: https://issues.apache.org/jira/browse/SPARK-1473 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Ignacio Zendejas >Priority: Minor > Labels: features > Fix For: 1.1.0 > > > For classification tasks involving large feature spaces in the order of tens > of thousands or higher (e.g., text classification with n-grams, where n > 1), > it is often useful to rank and filter features that are irrelevant thereby > reducing the feature space by at least one or two orders of magnitude without > impacting performance on key evaluation metrics (accuracy/precision/recall). > A feature evaluation interface which is flexible needs to be designed and at > least two methods should be implemented with Information Gain being a > priority as it has been shown to be amongst the most reliable. > Special consideration should be taken in the design to account for wrapper > methods (see research papers below) which are more practical for lower > dimensional data. > Relevant research: > * Brown, G., Pocock, A., Zhao, M. J., & Luján, M. (2012). Conditional > likelihood maximisation: a unifying framework for information theoretic > feature selection.*The Journal of Machine Learning Research*, *13*, 27-66. > * Forman, George. "An extensive empirical study of feature selection metrics > for text classification." The Journal of machine learning research 3 (2003): > 1289-1305. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1840) SparkListenerBus prints out scary error message when terminating normally
[ https://issues.apache.org/jira/browse/SPARK-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1840. Resolution: Fixed Fix Version/s: 1.0.0 Issue resolved by pull request 783 [https://github.com/apache/spark/pull/783] > SparkListenerBus prints out scary error message when terminating normally > - > > Key: SPARK-1840 > URL: https://issues.apache.org/jira/browse/SPARK-1840 > Project: Spark > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Andrew Or > Fix For: 1.0.0 > > > This is because the Scala's NonLocalReturnControl (which extends > ControlThrowable) is being logged. However, this is expected when the > SparkContext terminates. > (OP is TD) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1646) ALS micro-optimisation
[ https://issues.apache.org/jira/browse/SPARK-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-1646. -- Resolution: Implemented Fix Version/s: 1.0.0 PR: https://github.com/apache/spark/pull/568 > ALS micro-optimisation > -- > > Key: SPARK-1646 > URL: https://issues.apache.org/jira/browse/SPARK-1646 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Tor Myklebust >Assignee: Tor Myklebust >Priority: Trivial > Fix For: 1.0.0 > > > Scala "for" loop bodies turn into methods and the loops themselves into > repeated invocations of the body method. This may make Hotspot make poor > optimisation decisions. (Xiangrui mentioned that there was a speed > improvement from doing similar transformations elsewhere.) > The loops on i and p in the ALS training code are prime candidates for this > transformation, as is the "foreach" loop doing regularisation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1605) Improve mllib.linalg.Vector
[ https://issues.apache.org/jira/browse/SPARK-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998106#comment-13998106 ] Xiangrui Meng commented on SPARK-1605: -- `toBreeze` exposes a breeze type. We might want to mark it DeveloperApi and make it public, but I'm not sure whether we should do that in v1.0. Given a `mllib.linalg.Vector`, you can call `toArray` to get the values or operate directly on DenseVector/SparseVector. > Improve mllib.linalg.Vector > --- > > Key: SPARK-1605 > URL: https://issues.apache.org/jira/browse/SPARK-1605 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Sandeep Singh > > We can make current Vector a wrapper around Breeze.linalg.Vector ? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1835) sbt gen-idea includes both mesos and mesos with shaded-protobuf into dependencies
Xiangrui Meng created SPARK-1835: Summary: sbt gen-idea includes both mesos and mesos with shaded-protobuf into dependencies Key: SPARK-1835 URL: https://issues.apache.org/jira/browse/SPARK-1835 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Xiangrui Meng Priority: Minor gen-idea includes both mesos-0.18.1 and mesos-0.18.1-shaded-protobuf into dependencies. This generates compile error because mesos-0.18.1 comes first and there is no protobuf jar in the dependencies. A workaround is to delete mesos-0.18.1.jar manually from idea intellij. Another solution is to publish the shaded jar as a separate version instead of using classifier. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1833) Have an empty SparkContext constructor instead of relying on new SparkContext(new SparkConf())
[ https://issues.apache.org/jira/browse/SPARK-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1833. Resolution: Fixed Fix Version/s: 1.0.0 Issue resolved by pull request 774 [https://github.com/apache/spark/pull/774] > Have an empty SparkContext constructor instead of relying on new > SparkContext(new SparkConf()) > -- > > Key: SPARK-1833 > URL: https://issues.apache.org/jira/browse/SPARK-1833 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Patrick Wendell >Assignee: Patrick Wendell > Fix For: 1.0.0 > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1841) update scalatest to version 2.1.5
Guoqiang Li created SPARK-1841: -- Summary: update scalatest to version 2.1.5 Key: SPARK-1841 URL: https://issues.apache.org/jira/browse/SPARK-1841 Project: Spark Issue Type: Sub-task Reporter: Guoqiang Li scalatest 1.9.* not Scala 2.11 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1838) On a YARN cluster, Spark doesn't run on local mode
[ https://issues.apache.org/jira/browse/SPARK-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-1838: - Affects Version/s: 1.0.0 > On a YARN cluster, Spark doesn't run on local mode > -- > > Key: SPARK-1838 > URL: https://issues.apache.org/jira/browse/SPARK-1838 > Project: Spark > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Andrew Or > Fix For: 1.0.1 > > > Right now we throw an exception if YARN_LOCAL_DIRS is not set. However, we > may want to just run Spark in local mode, which doesn't even use this > environment variable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1789) Multiple versions of Netty dependencies cause FlumeStreamSuite failure
[ https://issues.apache.org/jira/browse/SPARK-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997610#comment-13997610 ] William Benton commented on SPARK-1789: --- Yes, this is absolutely a post-1.0 thing. I'm just saying that by updating the version of Akka to 2.3 we'd eliminate one of Spark's dependencies that can't work with Netty 4. The issue of only transitively depending on at most one version of Netty 3 and at most one version of Netty 4 (and choosing ones that can work different coordinates) is orthogonal, but still an issue. > Multiple versions of Netty dependencies cause FlumeStreamSuite failure > -- > > Key: SPARK-1789 > URL: https://issues.apache.org/jira/browse/SPARK-1789 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 0.9.1 >Reporter: Sean Owen >Assignee: Sean Owen > Labels: flume, netty, test > Fix For: 1.0.0 > > > TL;DR is there is a bit of JAR hell trouble with Netty, that can be mostly > resolved and will resolve a test failure. > I hit the error described at > http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-startup-time-out-td1753.html > while running FlumeStreamingSuite, and have for a short while (is it just > me?) > velvia notes: > "I have found a workaround. If you add akka 2.2.4 to your dependencies, then > everything works, probably because akka 2.2.4 brings in newer version of > Jetty." > There are at least 3 versions of Netty in play in the build: > - the new Flume 1.4.0 dependency brings in io.netty:netty:3.4.0.Final, and > that is the immediate problem > - the custom version of akka 2.2.3 depends on io.netty:netty:3.6.6. > - but, Spark Core directly uses io.netty:netty-all:4.0.17.Final > The POMs try to exclude other versions of netty, but are excluding > org.jboss.netty:netty, when in fact older versions of io.netty:netty (not > netty-all) are also an issue. > The org.jboss.netty:netty excludes are largely unnecessary. I replaced many > of them with io.netty:netty exclusions until everything agreed on > io.netty:netty-all:4.0.17.Final. > But this didn't work, since Akka 2.2.3 doesn't work with Netty 4.x. > Down-grading to 3.6.6.Final across the board made some Spark code not compile. > If the build *keeps* io.netty:netty:3.6.6.Final as well, everything seems to > work. Part of the reason seems to be that Netty 3.x used the old > `org.jboss.netty` packages. This is less than ideal, but is no worse than the > current situation. > So this PR resolves the issue and improves the JAR hell, even if it leaves > the existing theoretical Netty 3-vs-4 conflict: > - Remove org.jboss.netty excludes where possible, for clarity; they're not > needed except with Hadoop artifacts > - Add io.netty:netty excludes where needed -- except, let akka keep its > io.netty:netty > - Change a bit of test code that actually depended on Netty 3.x, to use 4.x > equivalent > - Update SBT build accordingly > A better change would be to update Akka far enough such that it agrees on > Netty 4.x, but I don't know if that's feasible. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1644) The org.datanucleus:* should not be packaged into spark-assembly-*.jar
[ https://issues.apache.org/jira/browse/SPARK-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-1644: --- Fix Version/s: (was: 1.1.0) 1.0.0 > The org.datanucleus:* should not be packaged into spark-assembly-*.jar > --- > > Key: SPARK-1644 > URL: https://issues.apache.org/jira/browse/SPARK-1644 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Guoqiang Li >Assignee: Guoqiang Li > Fix For: 1.0.0 > > Attachments: spark.log > > > cat conf/hive-site.xml > {code:xml} > > > javax.jdo.option.ConnectionURL > jdbc:postgresql://bj-java-hugedata1:7432/hive > > > javax.jdo.option.ConnectionDriverName > org.postgresql.Driver > > > javax.jdo.option.ConnectionUserName > hive > > > javax.jdo.option.ConnectionPassword > passwd > > > hive.metastore.local > false > > > hive.metastore.warehouse.dir > hdfs://host:8020/user/hive/warehouse > > > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1755) Spark-submit --name does not resolve to application name on YARN
[ https://issues.apache.org/jira/browse/SPARK-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-1755: - Description: In YARN client mode, --name is ignored because the deploy mode is client, and the name is for some reason a [cluster config|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L170)]. In YARN cluster mode, --name is passed to the org.apache.spark.deploy.yarn.Client as a command line argument. The Client class, however, uses this name only as the app name for the RM, but not for Spark. In other words, when SparkConf attempts to load default configs, application name is not set. In both cases, passing --name to SparkSubmit does not actually cause Spark to adopt it as its application name, despite what the usage promises. was: In YARN client mode, --name is ignored because the deploy mode is client, and the name is for some reason a cluster config. (See https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L170) In YARN cluster mode, --name is passed to the org.apache.spark.deploy.yarn.Client as a command line argument. The Client class, however, uses this name only as the app name for the RM, but not for Spark. In other words, when SparkConf attempts to load default configs, application name is not set. In both cases, passing --name to SparkSubmit does not actually cause Spark to adopt it as its application name, despite what the usage promises. > Spark-submit --name does not resolve to application name on YARN > > > Key: SPARK-1755 > URL: https://issues.apache.org/jira/browse/SPARK-1755 > Project: Spark > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Andrew Or > Fix For: 1.0.1 > > > In YARN client mode, --name is ignored because the deploy mode is client, and > the name is for some reason a [cluster > config|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L170)]. > In YARN cluster mode, --name is passed to the > org.apache.spark.deploy.yarn.Client as a command line argument. The Client > class, however, uses this name only as the app name for the RM, but not for > Spark. In other words, when SparkConf attempts to load default configs, > application name is not set. > In both cases, passing --name to SparkSubmit does not actually cause Spark to > adopt it as its application name, despite what the usage promises. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1823) ExternalAppendOnlyMap can still OOM if one key is very large
Andrew Or created SPARK-1823: Summary: ExternalAppendOnlyMap can still OOM if one key is very large Key: SPARK-1823 URL: https://issues.apache.org/jira/browse/SPARK-1823 Project: Spark Issue Type: Bug Reporter: Andrew Or If the values for one key do not collectively fit into memory, then the map will still OOM when you merge the spilled contents back in. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1764) EOF reached before Python server acknowledged
[ https://issues.apache.org/jira/browse/SPARK-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bouke van der Bijl updated SPARK-1764: -- Description: I'm getting "EOF reached before Python server acknowledged" while using PySpark on Mesos. The error manifests itself in multiple ways. One is: 14/05/08 18:10:40 ERROR DAGSchedulerActorSupervisor: eventProcesserActor failed due to the error EOF reached before Python server acknowledged; shutting down SparkContext And the other has a full stacktrace: 14/05/08 18:03:06 ERROR OneForOneStrategy: EOF reached before Python server acknowledged org.apache.spark.SparkException: EOF reached before Python server acknowledged at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:416) at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:387) at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:71) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:279) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:277) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.Accumulators$.add(Accumulators.scala:277) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:818) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1204) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) This error causes the SparkContext to shutdown. I have not been able to reliably reproduce this bug, it seems to happen randomly. was: I'm getting "EOF reached before Python server acknowledged" while using PySpark on Mesos. The full error is: 14/05/08 18:10:40 ERROR DAGSchedulerActorSupervisor: eventProcesserActor failed due to the error EOF reached before Python server acknowledged; shutting down SparkContext This error causes the SparkContext to shutdown. I have not been able to reliably reproduce this bug, it seems to happen randomly. > EOF reached before Python server acknowledged > - > > Key: SPARK-1764 > URL: https://issues.apache.org/jira/browse/SPARK-1764 > Project: Spark > Issue Type: Bug > Components: Mesos, PySpark >Affects Versions: 1.0.0 >Reporter: Bouke van der Bijl >Priority: Critical > Labels: mesos, pyspark > > I'm getting "EOF reached before Python server acknowledged" while using > PySpark on Mesos. The error manifests itself in multiple ways. One is: > 14/05/08 18:10:40 ERROR DAGSchedulerActorSupervisor: eventProcesserActor > failed due to the error EOF reached before Python server acknowledged; > shutting down SparkContext > And the other has a full stacktrace: > 14/05/08 18:03:06 ERROR OneForOneStrategy: EOF reached before Python server > acknowledged > org.apache.spark.SparkException: EOF reached before Python server acknowledged > at > org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:416) > at > org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:387) > at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:71) > at > org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:279) > at > org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:277) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > a
[jira] [Updated] (SPARK-1825) Windows Spark fails to work with Linux YARN
[ https://issues.apache.org/jira/browse/SPARK-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Taeyun Kim updated SPARK-1825: -- Affects Version/s: 1.0.0 > Windows Spark fails to work with Linux YARN > --- > > Key: SPARK-1825 > URL: https://issues.apache.org/jira/browse/SPARK-1825 > Project: Spark > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Taeyun Kim > > Windows Spark fails to work with Linux YARN. > This is a cross-platform problem. > On YARN side, Hadoop 2.4.0 resolved the issue as follows: > https://issues.apache.org/jira/browse/YARN-1824 > But Spark YARN module does not incorporate the new YARN API yet, so problem > persists for Spark. > First, the following source files should be changed: > - /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala > - > /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala > Change is as follows: > - Replace .$() to .$$() > - Replace File.pathSeparator for Environment.CLASSPATH.name to > ApplicationConstants.CLASS_PATH_SEPARATOR (import > org.apache.hadoop.yarn.api.ApplicationConstants is required for this) > Unless the above are applied, launch_container.sh will contain invalid shell > script statements(since they will contain Windows-specific separators), and > job will fail. > Also, the following symptom should also be fixed (I could not find the > relevant source code): > - SPARK_HOME environment variable is copied straight to launch_container.sh. > It should be changed to the path format for the server OS, or, the better, a > separate environment variable or a configuration variable should be created. > - '%HADOOP_MAPRED_HOME%' string still exists in launch_container.sh, after > the above change is applied. maybe I missed a few lines. > I'm not sure whether this is all, since I'm new to both Spark and YARN. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1771) CoarseGrainedSchedulerBackend is not resilient to Akka restarts
Aaron Davidson created SPARK-1771: - Summary: CoarseGrainedSchedulerBackend is not resilient to Akka restarts Key: SPARK-1771 URL: https://issues.apache.org/jira/browse/SPARK-1771 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Aaron Davidson The exception reported in SPARK-1769 was propagated through the CoarseGrainedSchedulerBackend, and caused an Actor restart of the DriverActor. Unfortunately, this actor does not seem to have been written with Akka restartability in mind. For instance, the new DriverActor has lost all state about the prior Executors without cleanly disconnecting them. This means that the driver actually has executors attached to it, but doesn't think it does, which leads to mayhem of various sorts. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1494) Hive Dependencies being checked by MIMA
[ https://issues.apache.org/jira/browse/SPARK-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-1494. - Resolution: Fixed > Hive Dependencies being checked by MIMA > --- > > Key: SPARK-1494 > URL: https://issues.apache.org/jira/browse/SPARK-1494 > Project: Spark > Issue Type: Bug > Components: Project Infra, SQL >Affects Versions: 1.0.0 >Reporter: Ahir Reddy >Assignee: Michael Armbrust >Priority: Minor > Fix For: 1.0.0 > > > It looks like code in companion objects is being invoked by the MIMA checker, > as it uses Scala reflection to check all of the interfaces. As a result it's > starting a Spark context and eventually out of memory errors. As a temporary > fix all classes that contain "hive" or "Hive" are excluded from the check. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1836) REPL $outer type mismatch causes lookup() and equals() problems
[ https://issues.apache.org/jira/browse/SPARK-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998312#comment-13998312 ] Michael Armbrust commented on SPARK-1836: - This sounds like it could be related to [SPARK-1199] > REPL $outer type mismatch causes lookup() and equals() problems > --- > > Key: SPARK-1836 > URL: https://issues.apache.org/jira/browse/SPARK-1836 > Project: Spark > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Michael Malak > > Anand Avati partially traced the cause to REPL wrapping classes in $outer > classes. There are at least two major symptoms: > 1. equals() > = > In REPL equals() (required in custom classes used as a key for groupByKey) > seems to have to be written using instanceOf[] instead of the canonical > match{} > Spark Shell (equals uses match{}): > {noformat} > class C(val s:String) extends Serializable { > override def equals(o: Any) = o match { > case that: C => that.s == s > case _ => false > } > } > val x = new C("a") > val bos = new java.io.ByteArrayOutputStream() > val out = new java.io.ObjectOutputStream(bos) > out.writeObject(x); > val b = bos.toByteArray(); > out.close > bos.close > val y = new java.io.ObjectInputStream(new > ava.io.ByteArrayInputStream(b)).readObject().asInstanceOf[C] > x.equals(y) > res: Boolean = false > {noformat} > Spark Shell (equals uses isInstanceOf[]): > {noformat} > class C(val s:String) extends Serializable { > override def equals(o: Any) = if (o.isInstanceOf[C]) (o.asInstanceOf[C].s = > s) else false > } > val x = new C("a") > val bos = new java.io.ByteArrayOutputStream() > val out = new java.io.ObjectOutputStream(bos) > out.writeObject(x); > val b = bos.toByteArray(); > out.close > bos.close > val y = new java.io.ObjectInputStream(new > ava.io.ByteArrayInputStream(b)).readObject().asInstanceOf[C] > x.equals(y) > res: Boolean = true > {noformat} > Scala Shell (equals uses match{}): > {noformat} > class C(val s:String) extends Serializable { > override def equals(o: Any) = o match { > case that: C => that.s == s > case _ => false > } > } > val x = new C("a") > val bos = new java.io.ByteArrayOutputStream() > val out = new java.io.ObjectOutputStream(bos) > out.writeObject(x); > val b = bos.toByteArray(); > out.close > bos.close > val y = new java.io.ObjectInputStream(new > java.io.ByteArrayInputStream(b)).readObject().asInstanceOf[C] > x.equals(y) > res: Boolean = true > {noformat} > 2. lookup() > = > {noformat} > class C(val s:String) extends Serializable { > override def equals(o: Any) = if (o.isInstanceOf[C]) o.asInstanceOf[C].s == > s else false > override def hashCode = s.hashCode > override def toString = s > } > val r = sc.parallelize(Array((new C("a"),11),(new C("a"),12))) > r.lookup(new C("a")) > :17: error: type mismatch; > found : C > required: C > r.lookup(new C("a")) >^ > {noformat} > See > http://mail-archives.apache.org/mod_mbox/spark-dev/201405.mbox/%3C1400019424.80629.YahooMailNeo%40web160801.mail.bf1.yahoo.com%3E -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1696) RowMatrix.dspr is not using parameter alpha for DenseVector
[ https://issues.apache.org/jira/browse/SPARK-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-1696. -- Resolution: Fixed Fix Version/s: 1.0.0 > RowMatrix.dspr is not using parameter alpha for DenseVector > --- > > Key: SPARK-1696 > URL: https://issues.apache.org/jira/browse/SPARK-1696 > Project: Spark > Issue Type: Bug > Components: MLlib >Reporter: Anish Patel >Assignee: Xiangrui Meng >Priority: Minor > Fix For: 1.0.0 > > > In the master branch, method dspr of RowMatrix takes parameter alpha, but > does not use it when given a DenseVector. > This probably slid by because when method computeGramianMatrix calls dspr, it > provides an alpha value of 1.0. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1843) Provide a simpler alternative to assemble-deps
Patrick Wendell created SPARK-1843: -- Summary: Provide a simpler alternative to assemble-deps Key: SPARK-1843 URL: https://issues.apache.org/jira/browse/SPARK-1843 Project: Spark Issue Type: Improvement Components: Build Reporter: Patrick Wendell Assignee: Prashant Sharma Fix For: 1.1.0 Right now we have the assemble-deps tool for speeding up local development. I was thinking about a simpler solution to this problem where, instead of creating a fancy assembly jar, we just add an environment variable: USE_COMPILED_SPARK and, if that variable is present, we simply add the Spark classes on the classpath before the assembly jar. Since the compiled classes are on the classpath first, they will take precedence. This would allow us to remove the entire assemble-deps build and associated logic in the bash scripts. We'd need to make sure it's propagated correctly during tests (like SPARK_TESTING) but other than that I think it should work. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1839) PySpark take() does not launch a Spark job when it has to
Hossein Falaki created SPARK-1839: - Summary: PySpark take() does not launch a Spark job when it has to Key: SPARK-1839 URL: https://issues.apache.org/jira/browse/SPARK-1839 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.0.0 Reporter: Hossein Falaki If you call take() or first() on a large FilteredRDD, the driver attempts to scan all partitions to find the first valid item. If the RDD is large this would fail or hang. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1791) SVM implementation does not use threshold parameter
[ https://issues.apache.org/jira/browse/SPARK-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-1791. -- Resolution: Fixed Fix Version/s: 1.0.0 PR: https://github.com/apache/spark/pull/725 > SVM implementation does not use threshold parameter > --- > > Key: SPARK-1791 > URL: https://issues.apache.org/jira/browse/SPARK-1791 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Andrew Tulloch > Fix For: 1.0.0 > > Original Estimate: 10m > Remaining Estimate: 10m > > The key error is in SVM.scala, in `predictPoint` > ``` > threshold match { > case Some(t) => if (margin < 0.0) 0.0 else 1.0 > case None => margin > } > ``` -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1754) Add missing arithmetic DSL operations.
[ https://issues.apache.org/jira/browse/SPARK-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992478#comment-13992478 ] Takuya Ueshin commented on SPARK-1754: -- Pull-requested: https://github.com/apache/spark/pull/689 > Add missing arithmetic DSL operations. > -- > > Key: SPARK-1754 > URL: https://issues.apache.org/jira/browse/SPARK-1754 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Takuya Ueshin > > Add missing arithmetic DSL operations: {{unary_-}}, {{%}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1787) Build failure on JDK8 :: SBT fails to load build configuration file
[ https://issues.apache.org/jira/browse/SPARK-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998492#comment-13998492 ] Sean Owen commented on SPARK-1787: -- Duplicate of https://issues.apache.org/jira/browse/SPARK-1444 it appears > Build failure on JDK8 :: SBT fails to load build configuration file > --- > > Key: SPARK-1787 > URL: https://issues.apache.org/jira/browse/SPARK-1787 > Project: Spark > Issue Type: New Feature > Components: Build >Affects Versions: 0.9.0 > Environment: JDK8 > Scala 2.10.X > SBT 0.12.X >Reporter: Richard Gomes >Priority: Minor > > SBT fails to build under JDK8. > Please find steps to reproduce the error below: > (j8s10)rgomes@terra:~/workspace/spark-0.9.1$ uname -a > Linux terra 3.13-1-amd64 #1 SMP Debian 3.13.10-1 (2014-04-15) x86_64 GNU/Linux > (j8s10)rgomes@terra:~/workspace/spark-0.9.1$ java -version > java version "1.8.0_05" > Java(TM) SE Runtime Environment (build 1.8.0_05-b13) > Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode) > (j8s10)rgomes@terra:~/workspace/spark-0.9.1$ scala -version > Scala code runner version 2.10.3 -- Copyright 2002-2013, LAMP/EPFL > (j8s10)rgomes@terra:~/workspace/spark-0.9.1$ sbt/sbt clean > Launching sbt from sbt/sbt-launch-0.12.4.jar > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=350m; > support was removed in 8.0 > [info] Loading project definition from > /home/rgomes/workspace/spark-0.9.1/project/project > [info] Compiling 1 Scala source to > /home/rgomes/workspace/spark-0.9.1/project/project/target/scala-2.9.2/sbt-0.12/classes... > [error] error while loading CharSequence, class file > '/opt/developer/jdk1.8.0_05/jre/lib/rt.jar(java/lang/CharSequence.class)' is > broken > [error] (bad constant pool tag 15 at byte 1501) > [error] error while loading Comparator, class file > '/opt/developer/jdk1.8.0_05/jre/lib/rt.jar(java/util/Comparator.class)' is > broken > [error] (bad constant pool tag 15 at byte 5003) > [error] two errors found > [error] (compile:compile) Compilation failed > Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore? q -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (SPARK-1838) On a YARN cluster, Spark doesn't run on local mode
[ https://issues.apache.org/jira/browse/SPARK-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-1838. Resolution: Not a Problem Looks like I accidentally set SPARK_YARN_MODE to true manually, which directly conflicts with master being local mode. This isn't documented so users shouldn't be setting this variable anyway. Pas de problème. > On a YARN cluster, Spark doesn't run on local mode > -- > > Key: SPARK-1838 > URL: https://issues.apache.org/jira/browse/SPARK-1838 > Project: Spark > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Andrew Or > Fix For: 1.0.1 > > > Right now we throw an exception if YARN_LOCAL_DIRS is not set. However, we > may want to just run Spark in local mode, which doesn't even use this > environment variable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1770) repartition and coalesce(shuffle=true) put objects with the same key in the same bucket
Matei Zaharia created SPARK-1770: Summary: repartition and coalesce(shuffle=true) put objects with the same key in the same bucket Key: SPARK-1770 URL: https://issues.apache.org/jira/browse/SPARK-1770 Project: Spark Issue Type: Bug Affects Versions: 0.9.0, 1.0.0, 0.9.1 Reporter: Matei Zaharia Priority: Blocker This is bad when you have many identical objects. We should assign each one a random key. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1326) make-distribution.sh's Tachyon support relies on GNU sed
[ https://issues.apache.org/jira/browse/SPARK-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992612#comment-13992612 ] Sandeep Singh commented on SPARK-1326: -- https://github.com/apache/spark/pull/264 > make-distribution.sh's Tachyon support relies on GNU sed > > > Key: SPARK-1326 > URL: https://issues.apache.org/jira/browse/SPARK-1326 > Project: Spark > Issue Type: Bug > Components: Deploy >Reporter: Matei Zaharia >Priority: Minor > Fix For: 1.0.0 > > > It fails on Mac OS X, with {{sed: 1: "/Users/matei/ ...": invalid command > code m}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1778) Add 'limit' transformation to SchemaRDD.
[ https://issues.apache.org/jira/browse/SPARK-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1778: --- Assignee: Takuya Ueshin > Add 'limit' transformation to SchemaRDD. > > > Key: SPARK-1778 > URL: https://issues.apache.org/jira/browse/SPARK-1778 > Project: Spark > Issue Type: Improvement >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin > Fix For: 1.0.0 > > > Add {{limit}} transformation to {{SchemaRDD}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1841) update scalatest to version 2.1.5
[ https://issues.apache.org/jira/browse/SPARK-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-1841: --- Description: scalatest 1.9.* not support Scala 2.11 (was: scalatest 1.9.* not Scala 2.11) > update scalatest to version 2.1.5 > - > > Key: SPARK-1841 > URL: https://issues.apache.org/jira/browse/SPARK-1841 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Guoqiang Li >Assignee: Guoqiang Li > > scalatest 1.9.* not support Scala 2.11 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1575) failing tests with master branch
[ https://issues.apache.org/jira/browse/SPARK-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996955#comment-13996955 ] Sean Owen commented on SPARK-1575: -- For what it's worth, I no longer see this failure I believe this has been resolved by other changes along the way. > failing tests with master branch > - > > Key: SPARK-1575 > URL: https://issues.apache.org/jira/browse/SPARK-1575 > Project: Spark > Issue Type: Test >Reporter: Nishkam Ravi >Priority: Blocker > > Built the master branch against Hadoop version 2.3.0-cdh5.0.0 with > SPARK_YARN=true. sbt tests don't go through successfully (tried multiple > runs). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1767) Prefer HDFS-cached replicas when scheduling data-local tasks
[ https://issues.apache.org/jira/browse/SPARK-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997761#comment-13997761 ] Aaron Davidson commented on SPARK-1767: --- One simple workaround to this is to just make sure that partitions that are in memory are ordered first in the list of partitions, as Spark will try to place executors based on the order in this list. This is, of course, not a complete solution, as we would not utilize the locality-wait logic within Spark and would immediately fallback to a non-cached node if the cached node was busy, rather than waiting for some period of time for the cached node to become available. > Prefer HDFS-cached replicas when scheduling data-local tasks > > > Key: SPARK-1767 > URL: https://issues.apache.org/jira/browse/SPARK-1767 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Sandy Ryza > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1764) EOF reached before Python server acknowledged
[ https://issues.apache.org/jira/browse/SPARK-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bouke van der Bijl updated SPARK-1764: -- Description: I'm getting "EOF reached before Python server acknowledged" while using PySpark on Mesos. The error manifests itself in multiple ways. One is: 14/05/08 18:10:40 ERROR DAGSchedulerActorSupervisor: eventProcesserActor failed due to the error EOF reached before Python server acknowledged; shutting down SparkContext And the other has a full stacktrace: 14/05/08 18:03:06 ERROR OneForOneStrategy: EOF reached before Python server acknowledged org.apache.spark.SparkException: EOF reached before Python server acknowledged at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:416) at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:387) at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:71) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:279) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:277) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.Accumulators$.add(Accumulators.scala:277) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:818) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1204) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) This error causes the SparkContext to shutdown. I have not been able to reliably reproduce this bug, it seems to happen randomly, but if you run enough tasks on a SparkContext it'll hapen eventually was: I'm getting "EOF reached before Python server acknowledged" while using PySpark on Mesos. The error manifests itself in multiple ways. One is: 14/05/08 18:10:40 ERROR DAGSchedulerActorSupervisor: eventProcesserActor failed due to the error EOF reached before Python server acknowledged; shutting down SparkContext And the other has a full stacktrace: 14/05/08 18:03:06 ERROR OneForOneStrategy: EOF reached before Python server acknowledged org.apache.spark.SparkException: EOF reached before Python server acknowledged at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:416) at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:387) at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:71) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:279) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:277) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.Accumulators$.add(Accumulators.scala:277) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:818) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1204) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala
[jira] [Updated] (SPARK-1769) Executor loss can cause race condition in Pool
[ https://issues.apache.org/jira/browse/SPARK-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Davidson updated SPARK-1769: -- Description: Loss of executors (in this case due to OOMs) exposes a race condition in Pool.scala, evident from this stack trace: {code} 14/05/08 22:41:48 ERROR OneForOneStrategy: java.lang.NullPointerException at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.Pool.executorLost(Pool.scala:87) at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.Pool.executorLost(Pool.scala:87) at org.apache.spark.scheduler.TaskSchedulerImpl.removeExecutor(TaskSchedulerImpl.scala:412) at org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:385) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor.removeExecutor(CoarseGrainedSchedulerBackend.scala:160) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1$$anonfun$applyOrElse$5.apply(CoarseGrainedSchedulerBackend.scala:123) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1$$anonfun$applyOrElse$5.apply(CoarseGrainedSchedulerBackend.scala:123) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:123) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) {code} Note that the line of code that throws this exception is here: {code} schedulableQueue.foreach(_.executorLost(executorId, host)) {code} By the stack trace, it's not schedulableQueue that is null, but an element therein. As far as I could tell, we never add a null element to this queue. Rather, I could see that removeSchedulable() and executorLost() were called at about the same time (via log messages), and suspect that since this ArrayBuffer is in no way synchronized, that we iterate through the list while it's in an incomplete state. was: Loss of executors (in this case due to OOMs) exposes a race condition in Pool.scala, evident from this stack trace: {code} 14/05/08 22:41:48 ERROR OneForOneStrategy: java.lang.NullPointerException at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.Pool.executorLost(Pool.scala:87) at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.Pool.executorLost(Pool.scala:87) at org.apache.spark.scheduler.TaskSchedulerImpl.removeExecutor(TaskSchedulerImpl.scala:412) at org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:385) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor.removeExecutor(CoarseGrainedSchedulerBackend.scala:160) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1$$anonfun$applyOrElse$5.apply(CoarseGrainedSchedulerBackend.scala:123)
[jira] [Resolved] (SPARK-1760) mvn -Dsuites=* test throw an ClassNotFoundException
[ https://issues.apache.org/jira/browse/SPARK-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1760. Resolution: Fixed Fix Version/s: 1.0.0 Issue resolved by pull request 712 [https://github.com/apache/spark/pull/712] > mvn -Dsuites=* test throw an ClassNotFoundException > -- > > Key: SPARK-1760 > URL: https://issues.apache.org/jira/browse/SPARK-1760 > Project: Spark > Issue Type: Bug >Reporter: Guoqiang Li >Assignee: Guoqiang Li > Fix For: 1.0.0 > > > {{mvn -Dhadoop.version=0.23.9 -Phadoop-0.23 > -Dsuites=org.apache.spark.repl.ReplSuite test}} => > {code} > *** RUN ABORTED *** > java.lang.ClassNotFoundException: org.apache.spark.repl.ReplSuite > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at org.scalatest.tools.Runner$$anonfun$21.apply(Runner.scala:1470) > at org.scalatest.tools.Runner$$anonfun$21.apply(Runner.scala:1469) > at > scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264) > at scala.collection.immutable.List.foreach(List.scala:318) > ... > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1840) SparkListenerBus prints out scary error message when terminating normally
[ https://issues.apache.org/jira/browse/SPARK-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-1840: - Description: This is because the Scala's NonLocalReturnControl (which extends ControlThrowable) is being logged. However, this is expected when the SparkContext terminates. (OP is TD) was:This is because the Scala's NonLocalReturnControl (which extends ControlThrowable) is being logged. However, this is expected when the SparkContext terminates. > SparkListenerBus prints out scary error message when terminating normally > - > > Key: SPARK-1840 > URL: https://issues.apache.org/jira/browse/SPARK-1840 > Project: Spark > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Andrew Or > > This is because the Scala's NonLocalReturnControl (which extends > ControlThrowable) is being logged. However, this is expected when the > SparkContext terminates. > (OP is TD) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1840) SparkListenerBus prints out scary error message when terminating normally
Andrew Or created SPARK-1840: Summary: SparkListenerBus prints out scary error message when terminating normally Key: SPARK-1840 URL: https://issues.apache.org/jira/browse/SPARK-1840 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Andrew Or This is because the Scala's NonLocalReturnControl (which extends ControlThrowable) is being logged. However, this is expected when the SparkContext terminates. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1786) Kryo Serialization Error in GraphX
[ https://issues.apache.org/jira/browse/SPARK-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph E. Gonzalez updated SPARK-1786: -- Description: The following code block will generate a serialization error when run in the spark-shell with Kryo enabled: {code} import org.apache.spark.storage._ import org.apache.spark.graphx._ import org.apache.spark.graphx.util._ val g = GraphGenerators.gridGraph(sc, 100, 100) val e = g.edges e.persist(StorageLevel.MEMORY_ONLY_SER) e.collect().foreach(println(_)) // <- Runs successfully the first time. // The following line will fail: e.collect().foreach(println(_)) {code} The following error is generated: {code} scala> e.collect().foreach(println(_)) 14/05/09 18:31:13 INFO SparkContext: Starting job: collect at EdgeRDD.scala:59 14/05/09 18:31:13 INFO DAGScheduler: Got job 1 (collect at EdgeRDD.scala:59) with 8 output partitions (allowLocal=false) 14/05/09 18:31:13 INFO DAGScheduler: Final stage: Stage 1(collect at EdgeRDD.scala:59) 14/05/09 18:31:13 INFO DAGScheduler: Parents of final stage: List() 14/05/09 18:31:13 INFO DAGScheduler: Missing parents: List() 14/05/09 18:31:13 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[15] at map at EdgeRDD.scala:59), which has no missing parents 14/05/09 18:31:13 INFO DAGScheduler: Submitting 8 missing tasks from Stage 1 (MappedRDD[15] at map at EdgeRDD.scala:59) 14/05/09 18:31:13 INFO TaskSchedulerImpl: Adding task set 1.0 with 8 tasks 14/05/09 18:31:13 INFO TaskSetManager: Starting task 1.0:0 as TID 8 on executor localhost: localhost (PROCESS_LOCAL) 14/05/09 18:31:13 INFO TaskSetManager: Serialized task 1.0:0 as 1779 bytes in 3 ms 14/05/09 18:31:13 INFO TaskSetManager: Starting task 1.0:1 as TID 9 on executor localhost: localhost (PROCESS_LOCAL) 14/05/09 18:31:13 INFO TaskSetManager: Serialized task 1.0:1 as 1779 bytes in 4 ms 14/05/09 18:31:13 INFO TaskSetManager: Starting task 1.0:2 as TID 10 on executor localhost: localhost (PROCESS_LOCAL) 14/05/09 18:31:13 INFO TaskSetManager: Serialized task 1.0:2 as 1779 bytes in 4 ms 14/05/09 18:31:13 INFO TaskSetManager: Starting task 1.0:3 as TID 11 on executor localhost: localhost (PROCESS_LOCAL) 14/05/09 18:31:13 INFO TaskSetManager: Serialized task 1.0:3 as 1779 bytes in 4 ms 14/05/09 18:31:13 INFO TaskSetManager: Starting task 1.0:4 as TID 12 on executor localhost: localhost (PROCESS_LOCAL) 14/05/09 18:31:13 INFO TaskSetManager: Serialized task 1.0:4 as 1779 bytes in 3 ms 14/05/09 18:31:13 INFO TaskSetManager: Starting task 1.0:5 as TID 13 on executor localhost: localhost (PROCESS_LOCAL) 14/05/09 18:31:13 INFO TaskSetManager: Serialized task 1.0:5 as 1782 bytes in 4 ms 14/05/09 18:31:13 INFO TaskSetManager: Starting task 1.0:6 as TID 14 on executor localhost: localhost (PROCESS_LOCAL) 14/05/09 18:31:13 INFO TaskSetManager: Serialized task 1.0:6 as 1783 bytes in 4 ms 14/05/09 18:31:13 INFO TaskSetManager: Starting task 1.0:7 as TID 15 on executor localhost: localhost (PROCESS_LOCAL) 14/05/09 18:31:13 INFO TaskSetManager: Serialized task 1.0:7 as 1783 bytes in 4 ms 14/05/09 18:31:13 INFO Executor: Running task ID 9 14/05/09 18:31:13 INFO Executor: Running task ID 8 14/05/09 18:31:13 INFO Executor: Running task ID 11 14/05/09 18:31:13 INFO Executor: Running task ID 14 14/05/09 18:31:13 INFO Executor: Running task ID 10 14/05/09 18:31:13 INFO Executor: Running task ID 13 14/05/09 18:31:13 INFO Executor: Running task ID 15 14/05/09 18:31:13 INFO Executor: Running task ID 12 14/05/09 18:31:13 INFO BlockManager: Found block rdd_12_6 locally 14/05/09 18:31:13 INFO BlockManager: Found block rdd_12_4 locally 14/05/09 18:31:13 INFO BlockManager: Found block rdd_12_2 locally 14/05/09 18:31:13 INFO BlockManager: Found block rdd_12_7 locally 14/05/09 18:31:13 INFO BlockManager: Found block rdd_12_1 locally 14/05/09 18:31:13 INFO BlockManager: Found block rdd_12_3 locally 14/05/09 18:31:13 INFO BlockManager: Found block rdd_12_0 locally 14/05/09 18:31:13 INFO BlockManager: Found block rdd_12_5 locally 14/05/09 18:31:13 ERROR Executor: Exception in task ID 13 java.lang.NullPointerException at org.apache.spark.graphx.impl.EdgePartition$$anon$1.next(EdgePartition.scala:269) at org.apache.spark.graphx.impl.EdgePartition$$anon$1.next(EdgePartition.scala:262) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at sca
[jira] [Created] (SPARK-1786) Kryo Serialization Error in GraphX
Joseph E. Gonzalez created SPARK-1786: - Summary: Kryo Serialization Error in GraphX Key: SPARK-1786 URL: https://issues.apache.org/jira/browse/SPARK-1786 Project: Spark Issue Type: Bug Components: GraphX Affects Versions: 1.0.0 Reporter: Joseph E. Gonzalez The following code block will generate a serialization error when run in the spark-shell with Kryo enabled: import org.apache.spark.storage._ import org.apache.spark.graphx._ import org.apache.spark.graphx.util._ val g = GraphGenerators.gridGraph(sc, 100, 100) val e = g.edges e.persist(StorageLevel.MEMORY_ONLY_SER) e.collect().foreach(println(_)) e.collect().foreach(println(_)) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1620) Uncaught exception from Akka scheduler
[ https://issues.apache.org/jira/browse/SPARK-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1620. Resolution: Fixed Fix Version/s: 1.0.0 Issue resolved by pull request 622 [https://github.com/apache/spark/pull/622] > Uncaught exception from Akka scheduler > -- > > Key: SPARK-1620 > URL: https://issues.apache.org/jira/browse/SPARK-1620 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 0.9.0, 1.0.0 >Reporter: Mark Hamstra >Assignee: Mark Hamstra >Priority: Blocker > Fix For: 1.0.0 > > > I've been looking at this one in the context of a BlockManagerMaster that > OOMs and doesn't respond to heartBeat(), but I suspect that there may be > problems elsewhere where we use Akka's scheduler. > The basic nature of the problem is that we are expecting exceptions thrown > from a scheduled function to be caught in the thread where > _ActorSystem_.scheduler.schedule() or scheduleOnce() has been called. In > fact, the scheduled function runs on its own thread, so any exceptions that > it throws are not caught in the thread that called schedule() -- e.g., > unanswered BlockManager heartBeats (scheduled in BlockManager#initialize) > that end up throwing exceptions in BlockManagerMaster#askDriverWithReply do > not cause those exceptions to be handled by the Executor thread's > UncaughtExceptionHandler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-1776) Have Spark's SBT build read dependencies from Maven
[ https://issues.apache.org/jira/browse/SPARK-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993407#comment-13993407 ] Guoqiang Li edited comment on SPARK-1776 at 5/9/14 5:39 AM: Even so, there are many maintenance costs.We should not use two build tools at the same time, only use maven is better was (Author: gq): But I would think only use maven is better > Have Spark's SBT build read dependencies from Maven > --- > > Key: SPARK-1776 > URL: https://issues.apache.org/jira/browse/SPARK-1776 > Project: Spark > Issue Type: New Feature > Components: Build >Reporter: Patrick Wendell >Assignee: Prashant Sharma > Fix For: 1.1.0 > > > We've wanted to consolidate Spark's build for a while see > [here|http://mail-archives.apache.org/mod_mbox/spark-dev/201307.mbox/%3c39343fa4-3cf4-4349-99e7-2b20e1aed...@gmail.com%3E] > and > [here|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-td2315.html]. > I'd like to propose using the sbt-pom-reader plug-in to allow us to keep our > sbt build (for ease of development) while also holding onto our Maven build > which almost all downstream packagers use. > I've prototyped this a bit locally and I think it's do-able, but will require > making some contributions to the sbt-pom-reader plugin. Josh Suereth who > maintains both sbt and the plug-in has agreed to help merge any patches we > need for this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1781) Generalized validity checking for configuration parameters
William Benton created SPARK-1781: - Summary: Generalized validity checking for configuration parameters Key: SPARK-1781 URL: https://issues.apache.org/jira/browse/SPARK-1781 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: William Benton Priority: Minor Issues like SPARK-1779 could be handled easily by a general mechanism for specifying whether or not a configuration parameter value is valid or not (and then excepting or warning and switching to a default value if it is not). I think it's possible to do this in a fairly lightweight fashion. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1775) Unneeded lock in ShuffleMapTask.deserializeInfo
[ https://issues.apache.org/jira/browse/SPARK-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1775. Resolution: Fixed Fix Version/s: 1.0.0 Issue resolved by pull request 707 [https://github.com/apache/spark/pull/707] > Unneeded lock in ShuffleMapTask.deserializeInfo > --- > > Key: SPARK-1775 > URL: https://issues.apache.org/jira/browse/SPARK-1775 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 0.9.0, 1.0.0, 0.9.1 >Reporter: Matei Zaharia >Assignee: Sandeep Singh > Labels: Starter > Fix For: 1.0.0 > > > This was used in the past to have a cache of deserialized ShuffleMapTasks, > but that's been removed, so there's no need for a lock. It slows down Spark > when task descriptions are large, e.g. due to large lineage graphs or local > variables. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1755) Spark-submit --name does not resolve to application name on YARN
[ https://issues.apache.org/jira/browse/SPARK-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993108#comment-13993108 ] Thomas Graves commented on SPARK-1755: -- I believe this is a dup of SPARK-1664 spark-submit --name doesn't work in yarn-client mode > Spark-submit --name does not resolve to application name on YARN > > > Key: SPARK-1755 > URL: https://issues.apache.org/jira/browse/SPARK-1755 > Project: Spark > Issue Type: Bug >Affects Versions: 0.9.1 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Blocker > Fix For: 1.0.0 > > > In YARN client mode, --name is ignored because the deploy mode is client, and > the name is for some reason a [cluster > config|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L170)]. > In YARN cluster mode, --name is passed to the > org.apache.spark.deploy.yarn.Client as a command line argument. The Client > class, however, uses this name only as the [app name for the > RM|https://github.com/apache/spark/blob/master/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L80], > but not for Spark. In other words, when SparkConf attempts to load default > configs, application name is not set. > In both cases, passing --name to SparkSubmit does not actually cause Spark to > adopt it as its application name, despite what the usage promises. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1767) Prefer HDFS-cached replicas when scheduling data-local tasks
[ https://issues.apache.org/jira/browse/SPARK-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993909#comment-13993909 ] Sandy Ryza commented on SPARK-1767: --- Currently, RDDs only support a single level of location preference through RDD#preferredLocations(split), which returns a sequence of strings. To prefer cached-replicas, this needs to be extended in some way. We could deprecate preferredLocations and add a preferredLocations(split, storageType), where storageType is MEMORY, DISK, and eventually FLASH? Maybe more hackily, we could give the location strings a prefix like "inmem:" that specifies the storage type. > Prefer HDFS-cached replicas when scheduling data-local tasks > > > Key: SPARK-1767 > URL: https://issues.apache.org/jira/browse/SPARK-1767 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Sandy Ryza > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1631) App name set in SparkConf (not in JVM properties) not respected by Yarn backend
[ https://issues.apache.org/jira/browse/SPARK-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1631: --- Priority: Blocker (was: Major) > App name set in SparkConf (not in JVM properties) not respected by Yarn > backend > --- > > Key: SPARK-1631 > URL: https://issues.apache.org/jira/browse/SPARK-1631 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.0.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin >Priority: Blocker > Fix For: 1.0.0 > > > When you submit an application that sets its name using a SparkContext > constructor or SparkConf.setAppName(), the Yarn app name is not set and the > app shows up as "Spark" in the RM UI. > That's because YarnClientSchedulerBackend only looks at the system properties > to look for the app name, instead of looking at the app's config. > e.g., app initializes like this: > {code} > val sc = new SparkContext(new SparkConf().setAppName("Blah")); > {code} > Start app like this: > {noformat} > ./bin/spark-submit --master yarn --deploy-mode client blah blah blah > {noformat} > And app name in RM UI does not reflect the code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1433) Upgrade Mesos dependency to 0.17.0
[ https://issues.apache.org/jira/browse/SPARK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993896#comment-13993896 ] Timothy St. Clair commented on SPARK-1433: -- Likely want to aim higher at this point, perhaps 0.18.1 > Upgrade Mesos dependency to 0.17.0 > -- > > Key: SPARK-1433 > URL: https://issues.apache.org/jira/browse/SPARK-1433 > Project: Spark > Issue Type: Task >Reporter: Sandeep Singh >Assignee: Sandeep Singh >Priority: Minor > Fix For: 1.0.0 > > > Mesos 0.13.0 was released 6 months ago. > Upgrade Mesos dependency to 0.17.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1775) Unneeded lock in ShuffleMapTask.deserializeInfo
[ https://issues.apache.org/jira/browse/SPARK-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1775: - Fix Version/s: 0.9.2 > Unneeded lock in ShuffleMapTask.deserializeInfo > --- > > Key: SPARK-1775 > URL: https://issues.apache.org/jira/browse/SPARK-1775 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 0.9.0, 1.0.0, 0.9.1 >Reporter: Matei Zaharia >Assignee: Sandeep Singh >Priority: Critical > Labels: Starter > Fix For: 1.0.0, 0.9.2 > > > This was used in the past to have a cache of deserialized ShuffleMapTasks, > but that's been removed, so there's no need for a lock. It slows down Spark > when task descriptions are large, e.g. due to large lineage graphs or local > variables. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1780) Non-existent SPARK_DAEMON_OPTS is referred to in a few places
Andrew Or created SPARK-1780: Summary: Non-existent SPARK_DAEMON_OPTS is referred to in a few places Key: SPARK-1780 URL: https://issues.apache.org/jira/browse/SPARK-1780 Project: Spark Issue Type: Bug Affects Versions: 0.9.1 Reporter: Andrew Or Fix For: 1.0.0 SparkConf.scala and spark-env.sh refer to a non-existent SPARK_DAEMON_OPTS. What they really mean SPARK_DAEMON_JAVA_OPTS. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1767) Prefer HDFS-cached replicas when scheduling data-local tasks
Sandy Ryza created SPARK-1767: - Summary: Prefer HDFS-cached replicas when scheduling data-local tasks Key: SPARK-1767 URL: https://issues.apache.org/jira/browse/SPARK-1767 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Sandy Ryza -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1757) Support saving null primitives with .saveAsParquetFile()
Andrew Ash created SPARK-1757: - Summary: Support saving null primitives with .saveAsParquetFile() Key: SPARK-1757 URL: https://issues.apache.org/jira/browse/SPARK-1757 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Reporter: Andrew Ash See stack trace below: {noformat} 14/05/07 21:45:51 INFO analysis.Analyzer: Max iterations (2) reached for batch MultiInstanceRelations 14/05/07 21:45:51 INFO analysis.Analyzer: Max iterations (2) reached for batch CaseInsensitiveAttributeReferences 14/05/07 21:45:51 INFO optimizer.Optimizer$: Max iterations (2) reached for batch ConstantFolding 14/05/07 21:45:51 INFO optimizer.Optimizer$: Max iterations (2) reached for batch Filter Pushdown java.lang.RuntimeException: Unsupported datatype StructType(List()) at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetRelation.scala:201) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$1.apply(ParquetRelation.scala:235) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$1.apply(ParquetRelation.scala:235) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.parquet.ParquetTypesConverter$.convertFromAttributes(ParquetRelation.scala:234) at org.apache.spark.sql.parquet.ParquetTypesConverter$.writeMetaData(ParquetRelation.scala:267) at org.apache.spark.sql.parquet.ParquetRelation$.createEmpty(ParquetRelation.scala:143) at org.apache.spark.sql.parquet.ParquetRelation$.create(ParquetRelation.scala:122) at org.apache.spark.sql.execution.SparkStrategies$ParquetOperations$.apply(SparkStrategies.scala:139) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:264) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:264) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:265) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:265) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:268) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:268) at org.apache.spark.sql.SchemaRDDLike$class.saveAsParquetFile(SchemaRDDLike.scala:66) at org.apache.spark.sql.SchemaRDD.saveAsParquetFile(SchemaRDD.scala:96) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1842) update scala-logging-slf4j to version 2.1.2
Guoqiang Li created SPARK-1842: -- Summary: update scala-logging-slf4j to version 2.1.2 Key: SPARK-1842 URL: https://issues.apache.org/jira/browse/SPARK-1842 Project: Spark Issue Type: Sub-task Reporter: Guoqiang Li scala-logging-slf4j 1.0.1 not support Scala 2.11 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1433) Upgrade Mesos dependency to 0.17.0
[ https://issues.apache.org/jira/browse/SPARK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1433. Resolution: Duplicate This is subsumed by SPARK-1806. > Upgrade Mesos dependency to 0.17.0 > -- > > Key: SPARK-1433 > URL: https://issues.apache.org/jira/browse/SPARK-1433 > Project: Spark > Issue Type: Task >Reporter: Sandeep Singh >Assignee: Sandeep Singh >Priority: Minor > Fix For: 1.0.0 > > > Mesos 0.13.0 was released 6 months ago. > Upgrade Mesos dependency to 0.17.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1436) Compression code broke in-memory store
[ https://issues.apache.org/jira/browse/SPARK-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-1436: --- Description: Try run the following code: {code} package org.apache.spark.sql import org.apache.spark.sql.test.TestSQLContext._ import org.apache.spark.sql.catalyst.util._ case class Data(a: Int, b: Long) object AggregationBenchmark { def main(args: Array[String]): Unit = { val rdd = sparkContext.parallelize(1 to 20).flatMap(_ => (1 to 50).map(i => Data(i % 100, i))) rdd.registerAsTable("data") cacheTable("data") (1 to 10).foreach { i => println(s"=== ITERATION $i ===") benchmark { println("SELECT COUNT() FROM data:" + sql("SELECT COUNT(*) FROM data").collect().head) } println("SELECT a, SUM(b) FROM data GROUP BY a") benchmark { sql("SELECT a, SUM(b) FROM data GROUP BY a").count() } println("SELECT SUM(b) FROM data") benchmark { sql("SELECT SUM(b) FROM data").count() } } } } {code} The following exception is thrown: {code} java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:498) at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:355) at org.apache.spark.sql.columnar.ColumnAccessor$.apply(ColumnAccessor.scala:103) at org.apache.spark.sql.columnar.InMemoryColumnarTableScan$$anonfun$execute$1$$anon$1$$anonfun$3.apply(InMemoryColumnarTableScan.scala:61) at org.apache.spark.sql.columnar.InMemoryColumnarTableScan$$anonfun$execute$1$$anon$1$$anonfun$3.apply(InMemoryColumnarTableScan.scala:61) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.sql.columnar.InMemoryColumnarTableScan$$anonfun$execute$1$$anon$1.(InMemoryColumnarTableScan.scala:61) at org.apache.spark.sql.columnar.InMemoryColumnarTableScan$$anonfun$execute$1.apply(InMemoryColumnarTableScan.scala:60) at org.apache.spark.sql.columnar.InMemoryColumnarTableScan$$anonfun$execute$1.apply(InMemoryColumnarTableScan.scala:56) at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:504) at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:504) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:229) at org.apache.spark.rdd.RDD.iterator(RDD.scala:220) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:229) at org.apache.spark.rdd.RDD.iterator(RDD.scala:220) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:229) at org.apache.spark.rdd.RDD.iterator(RDD.scala:220) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102) at org.apache.spark.scheduler.Task.run(Task.scala:52) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:46) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 14/04/07 12:07:38 WARN TaskSetManager: Lost TID 3 (task 4.0:0) 14/04/07 12:07:38 WARN TaskSetManager: Loss was due to java.nio.BufferUnderflowException java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:498) at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:355) at org.apache.spark.sql.columnar.ColumnAccessor$.apply(ColumnAccessor.scala:103) at org.apache.spark.sql.columnar.InMemoryColumnarTableScan$$anonfun$execute$1$$anon$1$$anonfun$3.apply(InMemoryColumnarTableScan.scala:61) at org.apache.spark.sql.columnar.InMemoryColumnarTableScan$$anonfun$execute$1$$anon$1$$anonfun$3.apply(InMemoryColumnarTableScan.scala:61) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at
[jira] [Resolved] (SPARK-1778) Add 'limit' transformation to SchemaRDD.
[ https://issues.apache.org/jira/browse/SPARK-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1778. Resolution: Fixed Fix Version/s: 1.0.0 Issue resolved by pull request 711 [https://github.com/apache/spark/pull/711] > Add 'limit' transformation to SchemaRDD. > > > Key: SPARK-1778 > URL: https://issues.apache.org/jira/browse/SPARK-1778 > Project: Spark > Issue Type: Improvement >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin > Fix For: 1.0.0 > > > Add {{limit}} transformation to {{SchemaRDD}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1779) Warning when spark.storage.memoryFraction is not between 0 and 1
[ https://issues.apache.org/jira/browse/SPARK-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993901#comment-13993901 ] Erik Erlandson commented on SPARK-1779: --- I'll volunteer to take this, can somebody assign it to me? > Warning when spark.storage.memoryFraction is not between 0 and 1 > > > Key: SPARK-1779 > URL: https://issues.apache.org/jira/browse/SPARK-1779 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 0.9.0, 1.0.0 >Reporter: wangfei > Fix For: 1.1.0 > > > There should be a warning when memoryFraction is lower than 0 or greater than > 1 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1758) failing test org.apache.spark.JavaAPISuite.wholeTextFiles
[ https://issues.apache.org/jira/browse/SPARK-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishkam Ravi updated SPARK-1758: Attachment: SPARK-1758.patch > failing test org.apache.spark.JavaAPISuite.wholeTextFiles > - > > Key: SPARK-1758 > URL: https://issues.apache.org/jira/browse/SPARK-1758 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 1.0.0 >Reporter: Nishkam Ravi > Fix For: 1.0.0 > > Attachments: SPARK-1758.patch > > > Test org.apache.spark.JavaAPISuite.wholeTextFiles fails (during sbt/sbt test) > with the following error message: > Test org.apache.spark.JavaAPISuite.wholeTextFiles failed: > java.lang.AssertionError: expected: but was: -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1696) RowMatrix.dspr is not using parameter alpha for DenseVector
[ https://issues.apache.org/jira/browse/SPARK-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998127#comment-13998127 ] Xiangrui Meng commented on SPARK-1696: -- Thanks! I sent a PR: https://github.com/apache/spark/pull/778 > RowMatrix.dspr is not using parameter alpha for DenseVector > --- > > Key: SPARK-1696 > URL: https://issues.apache.org/jira/browse/SPARK-1696 > Project: Spark > Issue Type: Bug > Components: MLlib >Reporter: Anish Patel >Assignee: Xiangrui Meng >Priority: Minor > > In the master branch, method dspr of RowMatrix takes parameter alpha, but > does not use it when given a DenseVector. > This probably slid by because when method computeGramianMatrix calls dspr, it > provides an alpha value of 1.0. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1635) Java API docs do not show annotation.
[ https://issues.apache.org/jira/browse/SPARK-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-1635: - Priority: Minor (was: Major) > Java API docs do not show annotation. > - > > Key: SPARK-1635 > URL: https://issues.apache.org/jira/browse/SPARK-1635 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.0.0 >Reporter: Xiangrui Meng >Priority: Minor > > The generated Java API docs do not contain Developer/Experimental > annotations. The :: Developer/Experimental :: tag is in the generated doc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1788) Upgrade Parquet to 1.4.3
[ https://issues.apache.org/jira/browse/SPARK-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-1788. - Resolution: Fixed > Upgrade Parquet to 1.4.3 > > > Key: SPARK-1788 > URL: https://issues.apache.org/jira/browse/SPARK-1788 > Project: Spark > Issue Type: Dependency upgrade > Components: SQL >Reporter: Michael Armbrust >Assignee: Michael Armbrust > > https://github.com/apache/spark/pull/684 -- This message was sent by Atlassian JIRA (v6.2#6252)