[jira] [Comment Edited] (SPARK-3838) Python code example for Word2Vec in user guide
[ https://issues.apache.org/jira/browse/SPARK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168989#comment-14168989 ] Anant Daksh Asthana edited comment on SPARK-3838 at 10/13/14 6:22 AM: -- Thanks [~mengxr] I will follow the instructions. I did also mention the coding guides are centered around Java/ Scala. was (Author: slcclimber): Thanks [~mengxr] I will follow the instructions. I did also mention the coding guides are centered around Java/ Scala. It would be nice to create one for Pyspark which colsely follows PEP-8. Python code example for Word2Vec in user guide -- Key: SPARK-3838 URL: https://issues.apache.org/jira/browse/SPARK-3838 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib Reporter: Xiangrui Meng Assignee: Anant Daksh Asthana Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3334) Spark causes mesos-master memory leak
[ https://issues.apache.org/jira/browse/SPARK-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168999#comment-14168999 ] Iven Hsu commented on SPARK-3334: - With spark 1.1.0, {{akkaFrameSize}} is same as other backends, reading from configuration. But the minimum value of it is 32000, and can't be set to 0, so it will still cause mesos-master to leak memory. Anyone look into this? Spark causes mesos-master memory leak - Key: SPARK-3334 URL: https://issues.apache.org/jira/browse/SPARK-3334 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.0.2 Environment: Mesos 0.16.0/0.19.0 CentOS 6.4 Reporter: Iven Hsu The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to workaround SPARK-1112, this causes all serialized task result is sent using Mesos TaskStatus. mesos-master stores TaskStatus in memory, and when running Spark, its memory grows very fast, and will be OOM killed. See MESOS-1746 for more. I've tried to set {{akkaFrameSize}} to 0, mesos-master won't be killed, however, the driver will block after success unless I use {{sc.stop()}} to quit it manually. Not sure if it's related to SPARK-1112. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3899) wrong links in streaming doc
[ https://issues.apache.org/jira/browse/SPARK-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-3899. --- Resolution: Fixed Fix Version/s: 1.1.1 1.2.0 Issue resolved by pull request 2749 [https://github.com/apache/spark/pull/2749] wrong links in streaming doc Key: SPARK-3899 URL: https://issues.apache.org/jira/browse/SPARK-3899 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0, 1.1.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3921) WorkerWatcher in Standalone mode fail to come up due to invalid workerUrl
[ https://issues.apache.org/jira/browse/SPARK-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Davidson updated SPARK-3921: -- Description: As of [this commit|https://github.com/apache/spark/commit/79e45c9323455a51f25ed9acd0edd8682b4bbb88#diff-79391110e9f26657e415aa169a004998R153], standalone mode appears to have lost its WorkerWatcher, because of the swapped workerUrl and appId parameters. We still put workerUrl before appId when we start standalone executors, and the Executor misinterprets the appId as the workerUrl and fails to create the WorkerWatcher. Note that this does not seem to crash the Standalone executor mode, despite the failing of the WorkerWatcher during its constructor. was:As of [this commit|https://github.com/apache/spark/commit/79e45c9323455a51f25ed9acd0edd8682b4bbb88#diff-79391110e9f26657e415aa169a004998R153], standalone mode appears to be broken, because of the swapped workerUrl and appId parameters. We still put workerUrl before appId when we start standalone executors, and the Executor misinterprets the appId as the workerUrl and fails to create the WorkerWatcher. Summary: WorkerWatcher in Standalone mode fail to come up due to invalid workerUrl (was: Executors in Standalone mode fail to come up due to invalid workerUrl) WorkerWatcher in Standalone mode fail to come up due to invalid workerUrl - Key: SPARK-3921 URL: https://issues.apache.org/jira/browse/SPARK-3921 Project: Spark Issue Type: Bug Reporter: Aaron Davidson Assignee: Aaron Davidson Priority: Critical As of [this commit|https://github.com/apache/spark/commit/79e45c9323455a51f25ed9acd0edd8682b4bbb88#diff-79391110e9f26657e415aa169a004998R153], standalone mode appears to have lost its WorkerWatcher, because of the swapped workerUrl and appId parameters. We still put workerUrl before appId when we start standalone executors, and the Executor misinterprets the appId as the workerUrl and fails to create the WorkerWatcher. Note that this does not seem to crash the Standalone executor mode, despite the failing of the WorkerWatcher during its constructor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3921) WorkerWatcher in Standalone mode fail to come up due to invalid workerUrl
[ https://issues.apache.org/jira/browse/SPARK-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-3921: - Target Version/s: 1.2.0 Affects Version/s: 1.2.0 WorkerWatcher in Standalone mode fail to come up due to invalid workerUrl - Key: SPARK-3921 URL: https://issues.apache.org/jira/browse/SPARK-3921 Project: Spark Issue Type: Bug Affects Versions: 1.2.0 Reporter: Aaron Davidson Assignee: Aaron Davidson Priority: Critical As of [this commit|https://github.com/apache/spark/commit/79e45c9323455a51f25ed9acd0edd8682b4bbb88#diff-79391110e9f26657e415aa169a004998R153], standalone mode appears to have lost its WorkerWatcher, because of the swapped workerUrl and appId parameters. We still put workerUrl before appId when we start standalone executors, and the Executor misinterprets the appId as the workerUrl and fails to create the WorkerWatcher. Note that this does not seem to crash the Standalone executor mode, despite the failing of the WorkerWatcher during its constructor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3905) The keys for sorting the columns of Executor page ,Stage page Storage page are incorrect
[ https://issues.apache.org/jira/browse/SPARK-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169043#comment-14169043 ] Apache Spark commented on SPARK-3905: - User 'witgo' has created a pull request for this issue: https://github.com/apache/spark/pull/2763 The keys for sorting the columns of Executor page ,Stage page Storage page are incorrect - Key: SPARK-3905 URL: https://issues.apache.org/jira/browse/SPARK-3905 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.0.2, 1.1.0, 1.2.0 Reporter: Guoqiang Li Assignee: Guoqiang Li Fix For: 1.1.1, 1.2.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3923) All Standalone Mode services time out with each other
[ https://issues.apache.org/jira/browse/SPARK-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169044#comment-14169044 ] Aaron Davidson commented on SPARK-3923: --- I did a little digging hoping to find some post about this, no particular luck. I did find [this post|https://groups.google.com/forum/#!topic/akka-user/X3xzpTCbEFs] which recommends using an interval time pause, which we are not doing. This doesn't seem to explain the services all timing out after the heartbeat interval time (which is currently 1000 seconds), but may be good to know in the future. All Standalone Mode services time out with each other - Key: SPARK-3923 URL: https://issues.apache.org/jira/browse/SPARK-3923 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 1.2.0 Reporter: Aaron Davidson Priority: Blocker I'm seeing an issue where it seems that components in Standalone Mode (Worker, Master, Driver, and Executor) all seem to time out with each other after around 1000 seconds. Here is an example log: {code} 14/10/13 06:43:55 INFO Master: Registering worker ip-10-0-147-189.us-west-2.compute.internal:38922 with 4 cores, 29.0 GB RAM 14/10/13 06:43:55 INFO Master: Registering worker ip-10-0-175-214.us-west-2.compute.internal:42918 with 4 cores, 59.0 GB RAM 14/10/13 06:43:56 INFO Master: Registering app Databricks Shell 14/10/13 06:43:56 INFO Master: Registered app Databricks Shell with ID app-20141013064356- ... precisely 1000 seconds later ... 14/10/13 07:00:35 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkwor...@ip-10-0-147-189.us-west-2.compute.internal:38922] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 14/10/13 07:00:35 INFO Master: akka.tcp://sparkwor...@ip-10-0-147-189.us-west-2.compute.internal:38922 got disassociated, removing it. 14/10/13 07:00:35 INFO LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.0.147.189%3A54956-1#1529980245] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 14/10/13 07:00:35 INFO Master: akka.tcp://sparkwor...@ip-10-0-175-214.us-west-2.compute.internal:42918 got disassociated, removing it. 14/10/13 07:00:35 INFO Master: Removing worker worker-20141013064354-ip-10-0-175-214.us-west-2.compute.internal-42918 on ip-10-0-175-214.us-west-2.compute.internal:42918 14/10/13 07:00:35 INFO Master: Telling app of lost executor: 1 14/10/13 07:00:35 INFO Master: akka.tcp://sparkwor...@ip-10-0-175-214.us-west-2.compute.internal:42918 got disassociated, removing it. 14/10/13 07:00:35 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkwor...@ip-10-0-175-214.us-west-2.compute.internal:42918] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 14/10/13 07:00:35 INFO LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.0.175.214%3A35958-2#314633324] was not delivered. [3] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 14/10/13 07:00:35 INFO LocalActorRef: Message [akka.remote.transport.AssociationHandle$Disassociated] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.0.175.214%3A35958-2#314633324] was not delivered. [4] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 14/10/13 07:00:36 INFO ProtocolStateActor: No response from remote. Handshake timed out or transport failure detector triggered. 14/10/13 07:00:36 INFO Master: akka.tcp://sparkdri...@ip-10-0-175-215.us-west-2.compute.internal:58259 got disassociated, removing it. 14/10/13 07:00:36 INFO LocalActorRef: Message [akka.remote.transport.AssociationHandle$InboundPayload] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.0.175.215%3A41987-3#1944377249] was not delivered. [5] dead letters encountered. This logging can
[jira] [Reopened] (SPARK-3598) cast to timestamp should be the same as hive
[ https://issues.apache.org/jira/browse/SPARK-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Wang reopened SPARK-3598: reopen to change assignee... cast to timestamp should be the same as hive Key: SPARK-3598 URL: https://issues.apache.org/jira/browse/SPARK-3598 Project: Spark Issue Type: Bug Components: SQL Reporter: Adrian Wang Fix For: 1.2.0 select cast(1000 as timestamp) from src limit 1; should return 1970-01-01 00:00:01 also, current implementation has bug when the time is before 1970-01-01 00:00:00 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3598) cast to timestamp should be the same as hive
[ https://issues.apache.org/jira/browse/SPARK-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Wang resolved SPARK-3598. Resolution: Fixed cast to timestamp should be the same as hive Key: SPARK-3598 URL: https://issues.apache.org/jira/browse/SPARK-3598 Project: Spark Issue Type: Bug Components: SQL Reporter: Adrian Wang Assignee: Adrian Wang Fix For: 1.2.0 select cast(1000 as timestamp) from src limit 1; should return 1970-01-01 00:00:01 also, current implementation has bug when the time is before 1970-01-01 00:00:00 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3222) cross join support in HiveQl
[ https://issues.apache.org/jira/browse/SPARK-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Wang resolved SPARK-3222. Resolution: Fixed cross join support in HiveQl Key: SPARK-3222 URL: https://issues.apache.org/jira/browse/SPARK-3222 Project: Spark Issue Type: New Feature Components: SQL Reporter: Adrian Wang Assignee: Adrian Wang Fix For: 1.1.0 Spark SQL hiveQl should support cross join. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-3222) cross join support in HiveQl
[ https://issues.apache.org/jira/browse/SPARK-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Wang reopened SPARK-3222: reopen to change assignee to myself cross join support in HiveQl Key: SPARK-3222 URL: https://issues.apache.org/jira/browse/SPARK-3222 Project: Spark Issue Type: New Feature Components: SQL Reporter: Adrian Wang Fix For: 1.1.0 Spark SQL hiveQl should support cross join. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3924) Upgrade to Akka version 2.3.6
Helena Edelson created SPARK-3924: - Summary: Upgrade to Akka version 2.3.6 Key: SPARK-3924 URL: https://issues.apache.org/jira/browse/SPARK-3924 Project: Spark Issue Type: Dependency upgrade Environment: deploy env Reporter: Helena Edelson I tried every sbt in the book but can't use the latest Akka version in my project with Spark. It would be great if I could. Also I can not use the latest Typesafe Config - 1.2.1, which would also be great. This is a big change. If I have time I can do a PR. [~helena_e] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-2593) Add ability to pass an existing Akka ActorSystem into Spark
[ https://issues.apache.org/jira/browse/SPARK-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169208#comment-14169208 ] Helena Edelson edited comment on SPARK-2593 at 10/13/14 11:55 AM: -- [~matei], [~pwendell] Yes I see the pain point here now. I just created a ticket to upgrade Akka and thus Typesafe Config versions because I am now locked into 2.2.3 and have binary incompatibility with using latest Akka 2.3.6 / config 1.2.1. Makes me very sad. I think I would throw in the towel on this one if you can make it completely separate so that a user with it's own AkkaSystem and Config versions are not affected? Tricky because when deploying, spark needs its version (provided?) and the user app needs the other. was (Author: helena_e): [~matei] [~pwendell] Yes I see the pain point here now. I just created a ticket to upgrade Akka and thus Typesafe Config versions because I am now locked into 2.2.3 and have binary incompatibility with using latest Akka 2.3.6 / config 1.2.1. Makes me very sad. I think I would throw in the towel on this one if you can make it completely separate so that a user with it's own AkkaSystem and Config versions are not affected? Tricky because when deploying, spark needs its version (provided?) and the user app needs the other. Add ability to pass an existing Akka ActorSystem into Spark --- Key: SPARK-2593 URL: https://issues.apache.org/jira/browse/SPARK-2593 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Helena Edelson As a developer I want to pass an existing ActorSystem into StreamingContext in load-time so that I do not have 2 actor systems running on a node in an Akka application. This would mean having spark's actor system on its own named-dispatchers as well as exposing the new private creation of its own actor system. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2593) Add ability to pass an existing Akka ActorSystem into Spark
[ https://issues.apache.org/jira/browse/SPARK-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169208#comment-14169208 ] Helena Edelson commented on SPARK-2593: --- [~matei] [~pwendell] Yes I see the pain point here now. I just created a ticket to upgrade Akka and thus Typesafe Config versions because I am now locked into 2.2.3 and have binary incompatibility with using latest Akka 2.3.6 / config 1.2.1. Makes me very sad. I think I would throw in the towel on this one if you can make it completely separate so that a user with it's own AkkaSystem and Config versions are not affected? Tricky because when deploying, spark needs its version (provided?) and the user app needs the other. Add ability to pass an existing Akka ActorSystem into Spark --- Key: SPARK-2593 URL: https://issues.apache.org/jira/browse/SPARK-2593 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Helena Edelson As a developer I want to pass an existing ActorSystem into StreamingContext in load-time so that I do not have 2 actor systems running on a node in an Akka application. This would mean having spark's actor system on its own named-dispatchers as well as exposing the new private creation of its own actor system. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1138) Spark 0.9.0 does not work with Hadoop / HDFS
[ https://issues.apache.org/jira/browse/SPARK-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169254#comment-14169254 ] Sunil Prabhakara commented on SPARK-1138: - I am using Cloudera Version 4.2.1, Spark 1.1.0 and Scala 2.10.4; Observing similar error ERROR Remoting: Remoting error: [Startup failed] [ akka.remote.RemoteTransportException: Startup failed at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129) at akka.remote.Remoting.start(Remoting.scala:194) at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184) at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:579) at akka.actor.ActorSystemImpl._start(ActorSystem.scala:577) at akka.actor.ActorSystemImpl.start(ActorSystem.scala:588) at akka.actor.ActorSystem$.apply(ActorSystem.scala:111) at akka.actor.ActorSystem$.apply(ActorSystem.scala:104) at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1446) ... along with Exception in thread main org.jboss.netty.channel.ChannelException: Failed to bind to: my-host-name/10.65.42.145:0 at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:391) at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:388) at scala.util.Success$$anonfun$map$1.apply(Try.scala:206) at scala.util.Try$.apply(Try.scala:161) at scala.util.Success.map(Try.scala:206) at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) ... For the second error I tried to update the /etc/hosts file with IP address of my host name and updated the spark-env.sh files with same IP address as suggested in other answer but still struck with the above issues. Spark 0.9.0 does not work with Hadoop / HDFS Key: SPARK-1138 URL: https://issues.apache.org/jira/browse/SPARK-1138 Project: Spark Issue Type: Bug Reporter: Sam Abeyratne UPDATE: This problem is certainly related to trying to use Spark 0.9.0 and the latest cloudera Hadoop / HDFS in the same jar. It seems no matter how I fiddle with the deps, the do not play nice together. I'm getting a java.util.concurrent.TimeoutException when trying to create a spark context with 0.9. I cannot, whatever I do, change the timeout. I've tried using System.setProperty, the SparkConf mechanism of creating a SparkContext and the -D flags when executing my jar. I seem to be able to run simple jobs from the spark-shell OK, but my more complicated jobs require external libraries so I need to build jars and execute them. Some code that causes this: println(Creating config) val conf = new SparkConf() .setMaster(clusterMaster) .setAppName(MyApp) .setSparkHome(sparkHome) .set(spark.akka.askTimeout, parsed.getOrElse(timeouts, 100)) .set(spark.akka.timeout, parsed.getOrElse(timeouts, 100)) println(Creating sc) implicit val sc = new SparkContext(conf) The output: Creating config Creating sc log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. [ERROR] [02/26/2014 11:05:25.491] [main] [Remoting] Remoting error: [Startup timed out] [ akka.remote.RemoteTransportException: Startup timed out at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129) at akka.remote.Remoting.start(Remoting.scala:191) at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184) at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:579) at akka.actor.ActorSystemImpl._start(ActorSystem.scala:577) at akka.actor.ActorSystemImpl.start(ActorSystem.scala:588) at akka.actor.ActorSystem$.apply(ActorSystem.scala:111) at akka.actor.ActorSystem$.apply(ActorSystem.scala:104) at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:96) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:126) at org.apache.spark.SparkContext.init(SparkContext.scala:139) at
[jira] [Comment Edited] (SPARK-1138) Spark 0.9.0 does not work with Hadoop / HDFS
[ https://issues.apache.org/jira/browse/SPARK-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169254#comment-14169254 ] Sunil Prabhakara edited comment on SPARK-1138 at 10/13/14 1:01 PM: --- I am using Cloudera Version 4.2.1, Spark 1.1.0 and Scala 2.10.4; Observing similar error ERROR Remoting: Remoting error: [Startup failed] [ akka.remote.RemoteTransportException: Startup failed at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129) at akka.remote.Remoting.start(Remoting.scala:194) at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184) at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:579) at akka.actor.ActorSystemImpl._start(ActorSystem.scala:577) at akka.actor.ActorSystemImpl.start(ActorSystem.scala:588) at akka.actor.ActorSystem$.apply(ActorSystem.scala:111) at akka.actor.ActorSystem$.apply(ActorSystem.scala:104) at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1446) ... along with Exception in thread main org.jboss.netty.channel.ChannelException: Failed to bind to: my-host-name/10.65.42.145:0 at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:391) at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:388) at scala.util.Success$$anonfun$map$1.apply(Try.scala:206) at scala.util.Try$.apply(Try.scala:161) at scala.util.Success.map(Try.scala:206) at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) ... For the second error I tried to update the /etc/hosts file with IP address of my host name and updated the spark-env.sh files with same IP address as suggested in other answer but still struck with the above issues. I tried adding Netty 3.6.6 to the dependency but still didn't get resolved. was (Author: sunil.prabhak...@gmail.com): I am using Cloudera Version 4.2.1, Spark 1.1.0 and Scala 2.10.4; Observing similar error ERROR Remoting: Remoting error: [Startup failed] [ akka.remote.RemoteTransportException: Startup failed at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129) at akka.remote.Remoting.start(Remoting.scala:194) at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184) at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:579) at akka.actor.ActorSystemImpl._start(ActorSystem.scala:577) at akka.actor.ActorSystemImpl.start(ActorSystem.scala:588) at akka.actor.ActorSystem$.apply(ActorSystem.scala:111) at akka.actor.ActorSystem$.apply(ActorSystem.scala:104) at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1446) ... along with Exception in thread main org.jboss.netty.channel.ChannelException: Failed to bind to: my-host-name/10.65.42.145:0 at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:391) at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:388) at scala.util.Success$$anonfun$map$1.apply(Try.scala:206) at scala.util.Try$.apply(Try.scala:161) at scala.util.Success.map(Try.scala:206) at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) ... For the second error I tried to update the /etc/hosts file with IP address of my host name and updated the spark-env.sh files with same IP address as suggested in other answer but still struck with the above issues. Spark 0.9.0 does not work with Hadoop / HDFS Key: SPARK-1138 URL: https://issues.apache.org/jira/browse/SPARK-1138
[jira] [Commented] (SPARK-3586) Support nested directories in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169276#comment-14169276 ] Apache Spark commented on SPARK-3586: - User 'wangxiaojing' has created a pull request for this issue: https://github.com/apache/spark/pull/2765 Support nested directories in Spark Streaming - Key: SPARK-3586 URL: https://issues.apache.org/jira/browse/SPARK-3586 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.1.0 Reporter: wangxj Priority: Minor Labels: patch Fix For: 1.1.0 For text files, the method streamingContext.textFileStream(dataDirectory). Spark Streaming will monitor the directory dataDirectory and process any files created in that directory.but files written in nested directories not supported eg streamingContext.textFileStream(/test). Look at the direction contents: /test/file1 /test/file2 /test/dr/file1 In this mothod the textFileStream can only read file: /test/file1 /test/file2 /test/dr/ but the file: /test/dr/file1 is not. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2863) Emulate Hive type coercion in native reimplementations of Hive functions
[ https://issues.apache.org/jira/browse/SPARK-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169306#comment-14169306 ] Apache Spark commented on SPARK-2863: - User 'willb' has created a pull request for this issue: https://github.com/apache/spark/pull/2768 Emulate Hive type coercion in native reimplementations of Hive functions Key: SPARK-2863 URL: https://issues.apache.org/jira/browse/SPARK-2863 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Reporter: William Benton Assignee: William Benton Native reimplementations of Hive functions no longer have the same type-coercion behavior as they would if executed via Hive. As [Michael Armbrust points out|https://github.com/apache/spark/pull/1750#discussion_r15790970], queries like {{SELECT SQRT(2) FROM src LIMIT 1}} succeed in Hive but fail if {{SQRT}} is implemented natively. Spark SQL should have Hive-compatible type coercions for arguments to natively-implemented functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3925) Considering the ordering of qualifiers when comparison
Liang-Chi Hsieh created SPARK-3925: -- Summary: Considering the ordering of qualifiers when comparison Key: SPARK-3925 URL: https://issues.apache.org/jira/browse/SPARK-3925 Project: Spark Issue Type: Bug Reporter: Liang-Chi Hsieh The qualifiers orderings should be considered during the comparison between old qualifiers and new qualifiers when calling 'withQualifiers'. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3925) Considering the ordering of qualifiers during comparison
[ https://issues.apache.org/jira/browse/SPARK-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-3925: --- Summary: Considering the ordering of qualifiers during comparison (was: Considering the ordering of qualifiers when comparison) Considering the ordering of qualifiers during comparison Key: SPARK-3925 URL: https://issues.apache.org/jira/browse/SPARK-3925 Project: Spark Issue Type: Bug Reporter: Liang-Chi Hsieh The qualifiers orderings should be considered during the comparison between old qualifiers and new qualifiers when calling 'withQualifiers'. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3925) Considering the ordering of qualifiers when comparison
[ https://issues.apache.org/jira/browse/SPARK-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169311#comment-14169311 ] Apache Spark commented on SPARK-3925: - User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/2783 Considering the ordering of qualifiers when comparison -- Key: SPARK-3925 URL: https://issues.apache.org/jira/browse/SPARK-3925 Project: Spark Issue Type: Bug Reporter: Liang-Chi Hsieh The qualifiers orderings should be considered during the comparison between old qualifiers and new qualifiers when calling 'withQualifiers'. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3925) Do not consider the ordering of qualifiers during comparison
[ https://issues.apache.org/jira/browse/SPARK-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-3925: --- Description: The qualifiers orderings should not be considered during the comparison between old qualifiers and new qualifiers when calling 'withQualifiers'. was: The qualifiers orderings should be considered during the comparison between old qualifiers and new qualifiers when calling 'withQualifiers'. Do not consider the ordering of qualifiers during comparison Key: SPARK-3925 URL: https://issues.apache.org/jira/browse/SPARK-3925 Project: Spark Issue Type: Bug Reporter: Liang-Chi Hsieh The qualifiers orderings should not be considered during the comparison between old qualifiers and new qualifiers when calling 'withQualifiers'. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3925) Do not considering the ordering of qualifiers during comparison
[ https://issues.apache.org/jira/browse/SPARK-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-3925: --- Summary: Do not considering the ordering of qualifiers during comparison (was: Considering the ordering of qualifiers during comparison) Do not considering the ordering of qualifiers during comparison --- Key: SPARK-3925 URL: https://issues.apache.org/jira/browse/SPARK-3925 Project: Spark Issue Type: Bug Reporter: Liang-Chi Hsieh The qualifiers orderings should be considered during the comparison between old qualifiers and new qualifiers when calling 'withQualifiers'. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3925) Do not consider the ordering of qualifiers during comparison
[ https://issues.apache.org/jira/browse/SPARK-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-3925: --- Summary: Do not consider the ordering of qualifiers during comparison (was: Do not considering the ordering of qualifiers during comparison) Do not consider the ordering of qualifiers during comparison Key: SPARK-3925 URL: https://issues.apache.org/jira/browse/SPARK-3925 Project: Spark Issue Type: Bug Reporter: Liang-Chi Hsieh The qualifiers orderings should be considered during the comparison between old qualifiers and new qualifiers when calling 'withQualifiers'. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3869) ./bin/spark-class miss Java version with _JAVA_OPTIONS set
[ https://issues.apache.org/jira/browse/SPARK-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169329#comment-14169329 ] cocoatomo commented on SPARK-3869: -- Hi [~pwendell], thank you for informing me. Is it OK to use the abbreviated last name (e.g. Barack O.) ? ./bin/spark-class miss Java version with _JAVA_OPTIONS set -- Key: SPARK-3869 URL: https://issues.apache.org/jira/browse/SPARK-3869 Project: Spark Issue Type: Bug Components: Spark Shell Affects Versions: 1.2.0 Environment: Mac OS X 10.9.5, Python 2.6.8, Java 1.8.0_20 Reporter: cocoatomo When _JAVA_OPTIONS environment variable is set, a command java -version outputs a message like Picked up _JAVA_OPTIONS: -Dfile.encoding=UTF-8. ./bin/spark-class knows java version from the first line of java -version output, so it mistakes java version with _JAVA_OPTIONS set. commit: a85f24accd3266e0f97ee04d03c22b593d99c062 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-922) Update Spark AMI to Python 2.7
[ https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169331#comment-14169331 ] Nicholas Chammas commented on SPARK-922: [~joshrosen] - Do you mean [this script|https://github.com/mesos/spark-ec2/blob/v4/create_image.sh]? I doesn't seem to have anything related to Python 2.7. Anyway, what I meant was if you were open to holding off on updating the Spark AMIs until we had also figured out how to automate that process per [SPARK-3821]. I should have something for that as soon as this week or next. Update Spark AMI to Python 2.7 -- Key: SPARK-922 URL: https://issues.apache.org/jira/browse/SPARK-922 Project: Spark Issue Type: Task Components: EC2, PySpark Affects Versions: 0.9.0, 0.9.1, 1.0.0, 1.1.0 Reporter: Josh Rosen Many Python libraries only support Python 2.7+, so we should make Python 2.7 the default Python on the Spark AMIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-922) Update Spark AMI to Python 2.7
[ https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169331#comment-14169331 ] Nicholas Chammas edited comment on SPARK-922 at 10/13/14 2:19 PM: -- [~joshrosen] - Do you mean [this script|https://github.com/mesos/spark-ec2/blob/v4/create_image.sh]? It doesn't seem to have anything related to Python 2.7. Anyway, what I meant was if you were open to holding off on updating the Spark AMIs until we had also figured out how to automate that process per [SPARK-3821]. I should have something for that as soon as this week or next. was (Author: nchammas): [~joshrosen] - Do you mean [this script|https://github.com/mesos/spark-ec2/blob/v4/create_image.sh]? I doesn't seem to have anything related to Python 2.7. Anyway, what I meant was if you were open to holding off on updating the Spark AMIs until we had also figured out how to automate that process per [SPARK-3821]. I should have something for that as soon as this week or next. Update Spark AMI to Python 2.7 -- Key: SPARK-922 URL: https://issues.apache.org/jira/browse/SPARK-922 Project: Spark Issue Type: Task Components: EC2, PySpark Affects Versions: 0.9.0, 0.9.1, 1.0.0, 1.1.0 Reporter: Josh Rosen Many Python libraries only support Python 2.7+, so we should make Python 2.7 the default Python on the Spark AMIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3926) result of JavaRDD collectAsMap() is not serializable
Antoine Amend created SPARK-3926: Summary: result of JavaRDD collectAsMap() is not serializable Key: SPARK-3926 URL: https://issues.apache.org/jira/browse/SPARK-3926 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 1.1.0 Environment: CentOS / Spark 1.1 / Hadoop Hortonworks 2.4.0.2.1.2.0-402 Reporter: Antoine Amend Using the Java API, I want to collect the result of a RDDString, String as a HashMap using collectAsMap function: MapString, String map = myJavaRDD.collectAsMap(); This works fine, but when passing this map to another function, such as... myOtherJavaRDD.mapToPair(new CustomFunction(map)) ...this leads to the following error: Exception in thread main org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) at org.apache.spark.SparkContext.clean(SparkContext.scala:1242) at org.apache.spark.rdd.RDD.map(RDD.scala:270) at org.apache.spark.api.java.JavaRDDLike$class.mapToPair(JavaRDDLike.scala:99) at org.apache.spark.api.java.JavaPairRDD.mapToPair(JavaPairRDD.scala:44) ../.. MY CLASS ../.. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.io.NotSerializableException: scala.collection.convert.Wrappers$MapWrapper at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73) at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164) This seems to be due to WrapAsJava.scala being non serializable ../.. implicit def mapAsJavaMap[A, B](m: Map[A, B]): ju.Map[A, B] = m match { //case JConcurrentMapWrapper(wrapped) = wrapped case JMapWrapper(wrapped) = wrapped.asInstanceOf[ju.Map[A, B]] case _ = new MapWrapper(m) } ../.. The workaround is to manually wrapper this map into another one (serialized) MapString, String map = myJavaRDD.collectAsMap(); MapString, String tmp = new HashMapString, String(map); myOtherJavaRDD.mapToPair(new CustomFunction(tmp)) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-922) Update Spark AMI to Python 2.7
[ https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169380#comment-14169380 ] Josh Rosen commented on SPARK-922: -- [~nchammas] - I don't think that there's an urgent rush to update the AMIs before the next round of releases, so I'm fine with waiting to incorporate this into SPARK-3821. Update Spark AMI to Python 2.7 -- Key: SPARK-922 URL: https://issues.apache.org/jira/browse/SPARK-922 Project: Spark Issue Type: Task Components: EC2, PySpark Affects Versions: 0.9.0, 0.9.1, 1.0.0, 1.1.0 Reporter: Josh Rosen Many Python libraries only support Python 2.7+, so we should make Python 2.7 the default Python on the Spark AMIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3926) result of JavaRDD collectAsMap() is not serializable
[ https://issues.apache.org/jira/browse/SPARK-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169383#comment-14169383 ] Sean Owen commented on SPARK-3926: -- Yeah, seems fine to just let {{MapWrapper}} implement {{Serializable}}, because standard Java {{Map}} implementations are as well. It's backwards-compatible so seems like an easy PR to submit if you like. result of JavaRDD collectAsMap() is not serializable Key: SPARK-3926 URL: https://issues.apache.org/jira/browse/SPARK-3926 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 1.1.0 Environment: CentOS / Spark 1.1 / Hadoop Hortonworks 2.4.0.2.1.2.0-402 Reporter: Antoine Amend Using the Java API, I want to collect the result of a RDDString, String as a HashMap using collectAsMap function: MapString, String map = myJavaRDD.collectAsMap(); This works fine, but when passing this map to another function, such as... myOtherJavaRDD.mapToPair(new CustomFunction(map)) ...this leads to the following error: Exception in thread main org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) at org.apache.spark.SparkContext.clean(SparkContext.scala:1242) at org.apache.spark.rdd.RDD.map(RDD.scala:270) at org.apache.spark.api.java.JavaRDDLike$class.mapToPair(JavaRDDLike.scala:99) at org.apache.spark.api.java.JavaPairRDD.mapToPair(JavaPairRDD.scala:44) ../.. MY CLASS ../.. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.io.NotSerializableException: scala.collection.convert.Wrappers$MapWrapper at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73) at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164) This seems to be due to WrapAsJava.scala being non serializable ../.. implicit def mapAsJavaMap[A, B](m: Map[A, B]): ju.Map[A, B] = m match { //case JConcurrentMapWrapper(wrapped) = wrapped case JMapWrapper(wrapped) = wrapped.asInstanceOf[ju.Map[A, B]] case _ = new MapWrapper(m) } ../.. The workaround is to manually wrapper this map into another one (serialized) MapString, String map = myJavaRDD.collectAsMap(); MapString, String tmp = new HashMapString, String(map); myOtherJavaRDD.mapToPair(new CustomFunction(tmp)) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3897) Scala style: format example code
[ https://issues.apache.org/jira/browse/SPARK-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-3897. -- Resolution: Won't Fix Given recent discussion, and consensus to not make sweeping style changes, I think this is WontFix. Scala style: format example code Key: SPARK-3897 URL: https://issues.apache.org/jira/browse/SPARK-3897 Project: Spark Issue Type: Sub-task Components: Project Infra Reporter: sjk https://github.com/apache/spark/pull/2754 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3895) Scala style: Indentation of method
[ https://issues.apache.org/jira/browse/SPARK-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-3895. -- Resolution: Won't Fix Given recent discussion, and consensus to not make sweeping style changes, I think this is WontFix. Scala style: Indentation of method -- Key: SPARK-3895 URL: https://issues.apache.org/jira/browse/SPARK-3895 Project: Spark Issue Type: Sub-task Components: Project Infra Reporter: sjk such as https://github.com/apache/spark/pull/2734 {code:title=core/src/main/scala/org/apache/spark/Aggregator.scala|borderStyle=solid} // for example def combineCombinersByKey(iter: Iterator[_ : Product2[K, C]], context: TaskContext) : Iterator[(K, C)] = { ... def combineValuesByKey(iter: Iterator[_ : Product2[K, V]], context: TaskContext): Iterator[(K, C)] = { {code} there are not conform to the rule.https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide there are so much code like this -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3781) code style format
[ https://issues.apache.org/jira/browse/SPARK-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-3781. -- Resolution: Won't Fix Given recent discussion, and consensus to not make sweeping style changes, I think this is WontFix. code style format - Key: SPARK-3781 URL: https://issues.apache.org/jira/browse/SPARK-3781 Project: Spark Issue Type: Improvement Reporter: sjk -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3896) checkSpeculatableTasks fask quit loop, invoking checkSpeculatableTasks is expensive
[ https://issues.apache.org/jira/browse/SPARK-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169406#comment-14169406 ] Josh Rosen commented on SPARK-3896: --- [~srowen] There actually IS a PR; it looks like the automatic PR linking script was broken for a couple of days, which is why it wasn't automatically linked here. However, I'm still confused even after looking at the PR (see my comments over there): https://github.com/apache/spark/pull/2751 checkSpeculatableTasks fask quit loop, invoking checkSpeculatableTasks is expensive --- Key: SPARK-3896 URL: https://issues.apache.org/jira/browse/SPARK-3896 Project: Spark Issue Type: Improvement Reporter: sjk -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3890) remove redundant spark.executor.memory in doc
[ https://issues.apache.org/jira/browse/SPARK-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169405#comment-14169405 ] Sean Owen commented on SPARK-3890: -- For some reason the PR was not linked: https://github.com/apache/spark/pull/2745 remove redundant spark.executor.memory in doc - Key: SPARK-3890 URL: https://issues.apache.org/jira/browse/SPARK-3890 Project: Spark Issue Type: Improvement Components: Documentation Reporter: WangTaoTheTonic Priority: Minor Seems like there is a redundant spark.executor.memory config item in docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3896) checkSpeculatableTasks fask quit loop, invoking checkSpeculatableTasks is expensive
[ https://issues.apache.org/jira/browse/SPARK-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169409#comment-14169409 ] Sean Owen commented on SPARK-3896: -- Oops, my bad. I just realized that some PRs didn't link after looking at other recent JIRAs. checkSpeculatableTasks fask quit loop, invoking checkSpeculatableTasks is expensive --- Key: SPARK-3896 URL: https://issues.apache.org/jira/browse/SPARK-3896 Project: Spark Issue Type: Improvement Reporter: sjk -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3896) checkSpeculatableTasks fask quit loop, invoking checkSpeculatableTasks is expensive
[ https://issues.apache.org/jira/browse/SPARK-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169412#comment-14169412 ] Josh Rosen commented on SPARK-3896: --- I moved the automatic linking code from Jenkins to my PR review board platform, so hopefully it should be more reliable now: https://github.com/databricks/spark-pr-dashboard/commit/9b1487cce315fe991d7081a1bae5fc1103f020a5 checkSpeculatableTasks fask quit loop, invoking checkSpeculatableTasks is expensive --- Key: SPARK-3896 URL: https://issues.apache.org/jira/browse/SPARK-3896 Project: Spark Issue Type: Improvement Reporter: sjk -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3662) Importing pandas breaks included pi.py example
[ https://issues.apache.org/jira/browse/SPARK-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169424#comment-14169424 ] Sean Owen commented on SPARK-3662: -- [~esamanas] Do you have a suggested change here, beyond just disambiguating imports in your example? Or a different example that doesn't involve import collision? It sounds like the modified example is then misunderstood to refer to a pandas random class, not the Python one, and that is simply a matter of namespace collision, and why pandas is dragged in. This example seems to fall down before it demonstrates anything else. Importing pandas breaks included pi.py example -- Key: SPARK-3662 URL: https://issues.apache.org/jira/browse/SPARK-3662 Project: Spark Issue Type: Bug Components: PySpark, YARN Affects Versions: 1.1.0 Environment: Xubuntu 14.04. Yarn cluster running on Ubuntu 12.04. Reporter: Evan Samanas If I add import pandas at the top of the included pi.py example and submit using spark-submit --master yarn-client, I get this stack trace: {code} Traceback (most recent call last): File /home/evan/pub_src/spark-1.1.0/examples/src/main/python/pi.py, line 39, in module count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add) File /home/evan/pub_src/spark/python/pyspark/rdd.py, line 759, in reduce vals = self.mapPartitions(func).collect() File /home/evan/pub_src/spark/python/pyspark/rdd.py, line 723, in collect bytesInJava = self._jrdd.collect().iterator() File /home/evan/pub_src/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, line 538, in __call__ File /home/evan/pub_src/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 300, in get_return_value py4j.protocol.Py4JJavaError14/09/23 15:51:58 INFO TaskSetManager: Lost task 2.3 in stage 0.0 (TID 10) on executor SERVERNAMEREMOVED: org.apache.spark.api.python.PythonException (Traceback (most recent call last): File /yarn/nm/usercache/evan/filecache/173/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.0.jar/pyspark/worker.py, line 75, in main command = pickleSer._read_with_length(infile) File /yarn/nm/usercache/evan/filecache/173/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.0.jar/pyspark/serializers.py, line 150, in _read_with_length return self.loads(obj) ImportError: No module named algos {code} The example works fine if I move the statement from random import random from the top and into the function (def f(_)) defined in the example. Near as I can tell, random is getting confused with a function of the same name within pandas.algos. Submitting the same script using --master local works, but gives a distressing amount of random characters to stdout or stderr and messes up my terminal: {code} ... @J@J@J@J@J@J@J@J@J@J@J@J@J@JJ@J@J@J@J @J!@J@J#@J$@J%@J@J'@J(@J)@J*@J+@J,@J-@J.@J/@J0@J1@J2@J3@J4@J5@J6@J7@J8@J9@J:@J;@J@J=@J@J?@J@@JA@JB@JC@JD@JE@JF@JG@JH@JI@JJ@JK@JL@JM@JN@JO@JP@JQ@JR@JS@JT@JU@JV@JW@JX@JY@JZ@J[@J\@J]@J^@J_@J`@Ja@Jb@Jc@Jd@Je@Jf@Jg@Jh@Ji@Jj@Jk@Jl@Jm@Jn@Jo@Jp@Jq@Jr@Js@Jt@Ju@Jv@Jw@Jx@Jy@Jz@J{@J|@J}@J~@J@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@JJJ�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@JAJAJAJAJAJAJAJAAJ AJ AJ AJ AJAJAJAJAJAJAJAJAJAJAJAJAJAJJAJAJAJAJ AJ!AJAJ#AJ$AJ%AJAJ'AJ(AJ)AJ*AJ+AJ,AJ-AJ.AJ/AJ0AJ1AJ2AJ3AJ4AJ5AJ6AJ7AJ8AJ9AJ:AJ;AJAJ=AJAJ?AJ@AJAAJBAJCAJDAJEAJFAJGAJHAJIAJJAJKAJLAJMAJNAJOAJPAJQAJRAJSAJTAJUAJVAJWAJXAJYAJZAJ[AJ\AJ]AJ^AJ_AJ`AJaAJbAJcAJdAJeAJfAJgAJhAJiAJjAJkAJlAJmAJnAJoAJpAJqAJrAJsAJtAJuAJvAJwAJxAJyAJzAJ{AJ|AJ}AJ~AJAJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJJJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�A14/09/23 15:42:09 INFO SparkContext: Job finished: reduce at /home/evan/pub_src/spark-1.1.0/examples/src/main/python/pi_sframe.py:38, took 11.276879779 s J�AJ�AJ�AJ�AJ�AJ�AJ�A�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJBJBJBJBJBJBJBJBBJ BJ BJ BJ BJBJBJBJBJBJBJBJBJBJBJBJBJBJJBJBJBJBJ BJ!BJBJ#BJ$BJ%BJBJ'BJ(BJ)BJ*BJ+BJ,BJ-BJ.BJ/BJ0BJ1BJ2BJ3BJ4BJ5BJ6BJ7BJ8BJ9BJ:BJ;BJBJ=BJBJ?BJ@Be. �]qJ#1a. �]qJX4a. �]qJX4a. �]qJ#1a. �]qJX4a. �]qJX4a. �]qJ#1a. �]qJX4a. �]qJX4a. �]qJa. Pi is roughly 3.146136 {code} No idea if that's related, but thought I'd include it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail:
[jira] [Resolved] (SPARK-3506) 1.1.0-SNAPSHOT in docs for 1.1.0 under docs/latest
[ https://issues.apache.org/jira/browse/SPARK-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-3506. -- Resolution: Fixed Fix Version/s: 1.1.1 Looks like the site has been updated, and I see no SNAPSHOT on the page. 1.1.0-SNAPSHOT in docs for 1.1.0 under docs/latest -- Key: SPARK-3506 URL: https://issues.apache.org/jira/browse/SPARK-3506 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 1.1.0 Reporter: Jacek Laskowski Assignee: Patrick Wendell Priority: Trivial Fix For: 1.1.1 In https://spark.apache.org/docs/latest/ there are references to 1.1.0-SNAPSHOT: * This documentation is for Spark version 1.1.0-SNAPSHOT. * For the Scala API, Spark 1.1.0-SNAPSHOT uses Scala 2.10. It should be version 1.1.0 since that's the latest released version and the header tells so, too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3927) Extends SPARK-2577 to fix secondary resources
Ian O Connell created SPARK-3927: Summary: Extends SPARK-2577 to fix secondary resources Key: SPARK-3927 URL: https://issues.apache.org/jira/browse/SPARK-3927 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.0 Reporter: Ian O Connell SPARK-2577 was a partial fix, handling the case of the assembly + app jar. The additional resources however would run into the same issue. I have the super simple PR ready. Though should this code be moved inside the addResource method instead to address it more globally? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3251) Clarify learning interfaces
[ https://issues.apache.org/jira/browse/SPARK-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169449#comment-14169449 ] Sean Owen commented on SPARK-3251: -- Is this a subset of / duplicate of SPARK-3702 now, given the discussion? Clarify learning interfaces Key: SPARK-3251 URL: https://issues.apache.org/jira/browse/SPARK-3251 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.1.0, 1.1.1 Reporter: Christoph Sawade *Make threshold mandatory* Currently, the output of predict for an example is either the score or the class. This side-effect is caused by clearThreshold. To clarify that behaviour three different types of predict (predictScore, predictClass, predictProbabilty) were introduced; the threshold is not longer optional. *Clarify classification interfaces* Currently, some functionality is spreaded over multiple models. In order to clarify the structure and simplify the implementation of more complex models (like multinomial logistic regression), two new classes are introduced: - BinaryClassificationModel: for all models that derives a binary classification from a single weight vector. Comprises the tresholding functionality to derive a prediction from a score. It basically captures SVMModel and LogisticRegressionModel. - ProbabilitistClassificaitonModel: This trait defines the interface for models that return a calibrated confidence score (aka probability). *Misc* - some renaming - add test for probabilistic output -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3897) Scala style: format example code
[ https://issues.apache.org/jira/browse/SPARK-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169451#comment-14169451 ] Apache Spark commented on SPARK-3897: - User 'shijinkui' has created a pull request for this issue: https://github.com/apache/spark/pull/2754 Scala style: format example code Key: SPARK-3897 URL: https://issues.apache.org/jira/browse/SPARK-3897 Project: Spark Issue Type: Sub-task Components: Project Infra Reporter: sjk https://github.com/apache/spark/pull/2754 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3883) Provide SSL support for Akka and HttpServer based connections
[ https://issues.apache.org/jira/browse/SPARK-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169458#comment-14169458 ] Apache Spark commented on SPARK-3883: - User 'jacek-lewandowski' has created a pull request for this issue: https://github.com/apache/spark/pull/2739 Provide SSL support for Akka and HttpServer based connections - Key: SPARK-3883 URL: https://issues.apache.org/jira/browse/SPARK-3883 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Jacek Lewandowski Spark uses at least 4 logical communication channels: 1. Control messages - Akka based 2. JARs and other files - Jetty based (HttpServer) 3. Computation results - Java NIO based 4. Web UI - Jetty based The aim of this feature is to enable SSL for (1) and (2). Why: Spark configuration is sent through (1). Spark configuration may contain sensitive information like credentials for accessing external data sources or streams. Application JAR files (2) may include the application logic and therefore they may include information about the structure of the external data sources, and credentials as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3890) remove redundant spark.executor.memory in doc
[ https://issues.apache.org/jira/browse/SPARK-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169466#comment-14169466 ] Apache Spark commented on SPARK-3890: - User 'WangTaoTheTonic' has created a pull request for this issue: https://github.com/apache/spark/pull/2745 remove redundant spark.executor.memory in doc - Key: SPARK-3890 URL: https://issues.apache.org/jira/browse/SPARK-3890 Project: Spark Issue Type: Improvement Components: Documentation Reporter: WangTaoTheTonic Priority: Minor Seems like there is a redundant spark.executor.memory config item in docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-3921) WorkerWatcher in Standalone mode fail to come up due to invalid workerUrl
[ https://issues.apache.org/jira/browse/SPARK-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-3921: - Comment: was deleted (was: https://github.com/apache/spark/pull/2779) WorkerWatcher in Standalone mode fail to come up due to invalid workerUrl - Key: SPARK-3921 URL: https://issues.apache.org/jira/browse/SPARK-3921 Project: Spark Issue Type: Bug Affects Versions: 1.2.0 Reporter: Aaron Davidson Assignee: Aaron Davidson Priority: Critical As of [this commit|https://github.com/apache/spark/commit/79e45c9323455a51f25ed9acd0edd8682b4bbb88#diff-79391110e9f26657e415aa169a004998R153], standalone mode appears to have lost its WorkerWatcher, because of the swapped workerUrl and appId parameters. We still put workerUrl before appId when we start standalone executors, and the Executor misinterprets the appId as the workerUrl and fails to create the WorkerWatcher. Note that this does not seem to crash the Standalone executor mode, despite the failing of the WorkerWatcher during its constructor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3480) Throws out Not a valid command 'yarn-alpha/scalastyle' in dev/scalastyle for sbt build tool during 'Running Scala style checks'
[ https://issues.apache.org/jira/browse/SPARK-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169472#comment-14169472 ] Sean Owen commented on SPARK-3480: -- Given the discussion I suggest this is CannotReproduce? Throws out Not a valid command 'yarn-alpha/scalastyle' in dev/scalastyle for sbt build tool during 'Running Scala style checks' --- Key: SPARK-3480 URL: https://issues.apache.org/jira/browse/SPARK-3480 Project: Spark Issue Type: Bug Components: Build Reporter: Yi Zhou Priority: Minor Symptom: Run ./dev/run-tests and dump outputs as following: SBT_MAVEN_PROFILES_ARGS=-Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl [Warn] Java 8 tests will not run because JDK version is 1.8. = Running Apache RAT checks = RAT checks passed. = Running Scala style checks = Scalastyle checks failed at following occurrences: [error] Expected ID character [error] Not a valid command: yarn-alpha [error] Expected project ID [error] Expected configuration [error] Expected ':' (if selecting a configuration) [error] Expected key [error] Not a valid key: yarn-alpha [error] yarn-alpha/scalastyle [error] ^ Possible Cause: I checked the dev/scalastyle, found that there are 2 parameters 'yarn-alpha/scalastyle' and 'yarn/scalastyle' separately,like echo -e q\n | sbt/sbt -Pyarn -Phadoop-0.23 -Dhadoop.version=0.23.9 yarn-alpha/scalastyle \ scalastyle.txt echo -e q\n | sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 yarn/scalastyle \ scalastyle.txt From above error message, sbt seems to complain them due to '/' separator. So it can be run through after I manually modified original ones to 'yarn-alpha:scalastyle' and 'yarn:scalastyle'.. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3257) Enable :cp to add JARs in spark-shell (Scala 2.11)
[ https://issues.apache.org/jira/browse/SPARK-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169479#comment-14169479 ] Heather Miller commented on SPARK-3257: --- FYI to Typesafers, I'm about to PR this to scala/scala (sometime today) Enable :cp to add JARs in spark-shell (Scala 2.11) -- Key: SPARK-3257 URL: https://issues.apache.org/jira/browse/SPARK-3257 Project: Spark Issue Type: New Feature Components: Spark Shell Reporter: Matei Zaharia Assignee: Heather Miller -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2633) enhance spark listener API to gather more spark job information
[ https://issues.apache.org/jira/browse/SPARK-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169499#comment-14169499 ] Josh Rosen commented on SPARK-2633: --- I've opened a pull request to add a stable pull-based progress / status API to Spark and would love to receive your feedback: https://github.com/apache/spark/pull/2696 enhance spark listener API to gather more spark job information --- Key: SPARK-2633 URL: https://issues.apache.org/jira/browse/SPARK-2633 Project: Spark Issue Type: New Feature Components: Java API Reporter: Chengxiang Li Priority: Critical Labels: hive Attachments: Spark listener enhancement for Hive on Spark job monitor and statistic.docx Based on Hive on Spark job status monitoring and statistic collection requirement, try to enhance spark listener API to gather more spark job information. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3902) Stabilize AsyncRDDActions and expose its methods in Java API
[ https://issues.apache.org/jira/browse/SPARK-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169505#comment-14169505 ] Apache Spark commented on SPARK-3902: - User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/2760 Stabilize AsyncRDDActions and expose its methods in Java API Key: SPARK-3902 URL: https://issues.apache.org/jira/browse/SPARK-3902 Project: Spark Issue Type: New Feature Components: Java API, Spark Core Reporter: Josh Rosen Assignee: Josh Rosen The AsyncRDDActions methods are currently the easiest way to determine Spark jobs' ids for use in progress-monitoring code (see SPARK-2636). AsyncRDDActions is currently marked as {{@Experimental}}; for 1.2, I think that we should stabilize this API and expose it in Java, too. One concern is whether there's a better async API design that we should prefer over this one as our stable API; I had some ideas for a more general API in SPARK-3626 (discussed in much greater detail on GitHub: https://github.com/apache/spark/pull/2482) but decided against the more general API due to its confusing cancellation semantics. Given this, I'd be comfortable stabilizing our current API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3590) Expose async APIs in the Java API
[ https://issues.apache.org/jira/browse/SPARK-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169507#comment-14169507 ] Josh Rosen commented on SPARK-3590: --- I've opened a pull request to add these Java APIs: https://github.com/apache/spark/pull/2760 Expose async APIs in the Java API - Key: SPARK-3590 URL: https://issues.apache.org/jira/browse/SPARK-3590 Project: Spark Issue Type: New Feature Components: Java API Reporter: Marcelo Vanzin Currently, a single async method is exposed through the Java API (JavaRDDLike::foreachAsync). That method returns a Scala future (FutureAction). We should bring the Java API up to sync with the Scala async APIs, and also expose Java-friendly types (e.g. a proper java.util.concurrent.Future). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3924) Upgrade to Akka version 2.3.6
[ https://issues.apache.org/jira/browse/SPARK-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169504#comment-14169504 ] Sean Owen commented on SPARK-3924: -- I think this is a duplicate of SPARK-2707 and SPARK-2805. Upgrade to Akka version 2.3.6 - Key: SPARK-3924 URL: https://issues.apache.org/jira/browse/SPARK-3924 Project: Spark Issue Type: Dependency upgrade Environment: deploy env Reporter: Helena Edelson I tried every sbt in the book but can't use the latest Akka version in my project with Spark. It would be great if I could. Also I can not use the latest Typesafe Config - 1.2.1, which would also be great. See https://issues.apache.org/jira/browse/SPARK-2593 This is a big change. If I have time I can do a PR. [~helena_e] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2707) Upgrade to Akka 2.3
[ https://issues.apache.org/jira/browse/SPARK-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169509#comment-14169509 ] Sean Owen commented on SPARK-2707: -- Can this be considered a duplicate of SPARK-2805, since that's where I see recent action? Upgrade to Akka 2.3 --- Key: SPARK-2707 URL: https://issues.apache.org/jira/browse/SPARK-2707 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 1.0.0 Reporter: Yardena Upgrade Akka from 2.2 to 2.3. We want to be able to use new Akka and Spray features directly in the same project. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3590) Expose async APIs in the Java API
[ https://issues.apache.org/jira/browse/SPARK-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169506#comment-14169506 ] Apache Spark commented on SPARK-3590: - User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/2760 Expose async APIs in the Java API - Key: SPARK-3590 URL: https://issues.apache.org/jira/browse/SPARK-3590 Project: Spark Issue Type: New Feature Components: Java API Reporter: Marcelo Vanzin Currently, a single async method is exposed through the Java API (JavaRDDLike::foreachAsync). That method returns a Scala future (FutureAction). We should bring the Java API up to sync with the Scala async APIs, and also expose Java-friendly types (e.g. a proper java.util.concurrent.Future). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1834) NoSuchMethodError when invoking JavaPairRDD.reduce() in Java
[ https://issues.apache.org/jira/browse/SPARK-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1834. -- Resolution: Duplicate On another look, I'm almost sure this is the same issue as in SPARK-3266, which [~joshrosen] has been looking at. NoSuchMethodError when invoking JavaPairRDD.reduce() in Java Key: SPARK-1834 URL: https://issues.apache.org/jira/browse/SPARK-1834 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.1 Environment: Redhat Linux, Java 7, Hadoop 2.2, Scala 2.10.4 Reporter: John Snodgrass I get a java.lang.NoSuchMethod error when invoking JavaPairRDD.reduce(). Here is the partial stack trace: Exception in thread main java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:39) at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) Caused by: java.lang.NoSuchMethodError: org.apache.spark.api.java.JavaPairRDD.reduce(Lorg/apache/spark/api/java/function/Function2;)Lscala/Tuple2; at JavaPairRDDReduceTest.main(JavaPairRDDReduceTest.java:49)... I'm using Spark 0.9.1. I checked to ensure that I'm compiling with the same version of Spark as I am running on the cluster. The reduce() method works fine with JavaRDD, just not with JavaPairRDD. Here is a code snippet that exhibits the problem: ArrayListInteger array = new ArrayList(); for (int i = 0; i 10; ++i) { array.add(i); } JavaRDDInteger rdd = javaSparkContext.parallelize(array); JavaPairRDDString, Integer testRDD = rdd.map(new PairFunctionInteger, String, Integer() { @Override public Tuple2String, Integer call(Integer t) throws Exception { return new Tuple2( + t, t); } }).cache(); testRDD.reduce(new Function2Tuple2String, Integer, Tuple2String, Integer, Tuple2String, Integer() { @Override public Tuple2String, Integer call(Tuple2String, Integer arg0, Tuple2String, Integer arg1) throws Exception { return new Tuple2(arg0._1 + arg1._1, arg0._2 * 10 + arg0._2); } }); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2493) SBT gen-idea doesn't generate correct Intellij project
[ https://issues.apache.org/jira/browse/SPARK-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169527#comment-14169527 ] Sean Owen commented on SPARK-2493: -- Is this still an issue [~dbtsai] ? For IntelliJ, I find it much easier to point directly at the Maven build, and that's more the primary build system now anyway. SBT gen-idea doesn't generate correct Intellij project -- Key: SPARK-2493 URL: https://issues.apache.org/jira/browse/SPARK-2493 Project: Spark Issue Type: Sub-task Components: Build Reporter: DB Tsai I've a clean clone of spark master repository, and I generated the intellij project file by sbt gen-idea as usual. There are two issues we have after merging SPARK-1776 (read dependencies from Maven). 1) After SPARK-1776, sbt gen-idea will download the dependencies from internet even those jars are in local cache. Before merging, the second time we run gen-idea will not download anything but use the jars in cache. 2) The tests with spark local context can not be run in the intellij. It will show the following exception. The current workaround we've are checking out any snapshot before merging to gen-idea, and then switch back to current master. But this will not work when the master deviate too much from the latest working snapshot. [ERROR] [07/14/2014 16:27:49.967] [ScalaTest-run] [Remoting] Remoting error: [Startup timed out] [ akka.remote.RemoteTransportException: Startup timed out at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129) at akka.remote.Remoting.start(Remoting.scala:191) at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184) at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:579) at akka.actor.ActorSystemImpl._start(ActorSystem.scala:577) at akka.actor.ActorSystemImpl.start(ActorSystem.scala:588) at akka.actor.ActorSystem$.apply(ActorSystem.scala:111) at akka.actor.ActorSystem$.apply(ActorSystem.scala:104) at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:104) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:153) at org.apache.spark.SparkContext.init(SparkContext.scala:202) at org.apache.spark.SparkContext.init(SparkContext.scala:117) at org.apache.spark.SparkContext.init(SparkContext.scala:132) at org.apache.spark.mllib.util.LocalSparkContext$class.beforeAll(LocalSparkContext.scala:29) at org.apache.spark.mllib.optimization.LBFGSSuite.beforeAll(LBFGSSuite.scala:27) at org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187) at org.apache.spark.mllib.optimization.LBFGSSuite.beforeAll(LBFGSSuite.scala:27) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253) at org.apache.spark.mllib.optimization.LBFGSSuite.run(LBFGSSuite.scala:27) at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55) at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563) at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557) at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044) at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043) at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722) at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043) at org.scalatest.tools.Runner$.run(Runner.scala:883) at org.scalatest.tools.Runner.run(Runner.scala) at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:141) at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:32) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [1 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at akka.remote.Remoting.start(Remoting.scala:173) ... 35 more ] An exception or error caused a
[jira] [Resolved] (SPARK-2198) Partition the scala build file so that it is easier to maintain
[ https://issues.apache.org/jira/browse/SPARK-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-2198. -- Resolution: Won't Fix Sounds like a WontFix Partition the scala build file so that it is easier to maintain --- Key: SPARK-2198 URL: https://issues.apache.org/jira/browse/SPARK-2198 Project: Spark Issue Type: Task Components: Build Reporter: Helena Edelson Priority: Minor Original Estimate: 3h Remaining Estimate: 3h Partition to standard Dependencies, Version, Settings, Publish.scala. keeping the SparkBuild clean to describe the modules and their deps so that changes in versions, for example, need only be made in Version.scala, settings changes such as in scalac in Settings.scala, etc. I'd be happy to do this ([~helena_e]) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2593) Add ability to pass an existing Akka ActorSystem into Spark
[ https://issues.apache.org/jira/browse/SPARK-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169545#comment-14169545 ] Evan Chan commented on SPARK-2593: -- Hmmm :( I believe Spark already uses a shaded version of Akka with a different namespace. Unfortunately it still creates some dependency conflicts down the chain, but I don't remember the details. On Mon, Oct 13, 2014 at 4:58 AM, Helena Edelson (JIRA) j...@apache.org -- The fruit of silence is prayer; the fruit of prayer is faith; the fruit of faith is love; the fruit of love is service; the fruit of service is peace. -- Mother Teresa Add ability to pass an existing Akka ActorSystem into Spark --- Key: SPARK-2593 URL: https://issues.apache.org/jira/browse/SPARK-2593 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Helena Edelson As a developer I want to pass an existing ActorSystem into StreamingContext in load-time so that I do not have 2 actor systems running on a node in an Akka application. This would mean having spark's actor system on its own named-dispatchers as well as exposing the new private creation of its own actor system. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1849) Broken UTF-8 encoded data gets character replacements and thus can't be fixed
[ https://issues.apache.org/jira/browse/SPARK-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169549#comment-14169549 ] Sean Owen commented on SPARK-1849: -- Yes, I think there isn't a 'fix' here short of a quite different implementation. Hadoop's text support pretty deeply assumes UTF-8 (partly for speed) and the Spark implementation is just Hadoop's. This would have to justify rewriting all that. I think you have to treat this as binary data for now. Broken UTF-8 encoded data gets character replacements and thus can't be fixed --- Key: SPARK-1849 URL: https://issues.apache.org/jira/browse/SPARK-1849 Project: Spark Issue Type: Bug Reporter: Harry Brundage Attachments: encoding_test I'm trying to process a file which isn't valid UTF-8 data inside hadoop using Spark via {{sc.textFile()}}. Is this possible, and if not, is this a bug that we should fix? It looks like {{HadoopRDD}} uses {{org.apache.hadoop.io.Text.toString}} on all the data it ever reads, which I believe replaces invalid UTF-8 byte sequences with the UTF-8 replacement character, \uFFFD. Some example code mimicking what {{sc.textFile}} does underneath: {code} scala sc.textFile(path).collect()(0) res8: String = ?pple scala sc.hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], classOf[Text]).map(pair = pair._2.toString).collect()(0).getBytes() res9: Array[Byte] = Array(-17, -65, -67, 112, 112, 108, 101) scala sc.hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], classOf[Text]).map(pair = pair._2.getBytes).collect()(0) res10: Array[Byte] = Array(-60, 112, 112, 108, 101) {code} In the above example, the first two snippets show the string representation and byte representation of the example line of text. The string shows a question mark for the replacement character and the bytes reveal the replacement character has been swapped in by {{Text.toString}}. The third snippet shows what happens if you call {{getBytes}} on the {{Text}} object which comes back from hadoop land: we get the real bytes in the file out. Now, I think this is a bug, though you may disagree. The text inside my file is perfectly valid iso-8859-1 encoded bytes, which I would like to be able to rescue and re-encode into UTF-8, because I want my application to be smart like that. I think Spark should give me the raw broken string so I can re-encode, but I can't get at the original bytes in order to guess at what the source encoding might be, as they have already been replaced. I'm dealing with data from some CDN access logs which are to put it nicely diversely encoded, but I think a use case Spark should fully support. So, my suggested fix, which I'd like some guidance, is to change {{textFile}} to spit out broken strings by not using {{Text}}'s UTF-8 encoding. Further compounding this issue is that my application is actually in PySpark, but we can talk about how bytes fly through to Scala land after this if we agree that this is an issue at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1787) Build failure on JDK8 :: SBT fails to load build configuration file
[ https://issues.apache.org/jira/browse/SPARK-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1787. -- Resolution: Duplicate FWIW SBT + Java 8 has worked fine for me on master for a long while, so assume this does not affect 1.1 or perhaps 1.0. Build failure on JDK8 :: SBT fails to load build configuration file --- Key: SPARK-1787 URL: https://issues.apache.org/jira/browse/SPARK-1787 Project: Spark Issue Type: New Feature Components: Build Affects Versions: 0.9.0 Environment: JDK8 Scala 2.10.X SBT 0.12.X Reporter: Richard Gomes Priority: Minor SBT fails to build under JDK8. Please find steps to reproduce the error below: (j8s10)rgomes@terra:~/workspace/spark-0.9.1$ uname -a Linux terra 3.13-1-amd64 #1 SMP Debian 3.13.10-1 (2014-04-15) x86_64 GNU/Linux (j8s10)rgomes@terra:~/workspace/spark-0.9.1$ java -version java version 1.8.0_05 Java(TM) SE Runtime Environment (build 1.8.0_05-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode) (j8s10)rgomes@terra:~/workspace/spark-0.9.1$ scala -version Scala code runner version 2.10.3 -- Copyright 2002-2013, LAMP/EPFL (j8s10)rgomes@terra:~/workspace/spark-0.9.1$ sbt/sbt clean Launching sbt from sbt/sbt-launch-0.12.4.jar Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=350m; support was removed in 8.0 [info] Loading project definition from /home/rgomes/workspace/spark-0.9.1/project/project [info] Compiling 1 Scala source to /home/rgomes/workspace/spark-0.9.1/project/project/target/scala-2.9.2/sbt-0.12/classes... [error] error while loading CharSequence, class file '/opt/developer/jdk1.8.0_05/jre/lib/rt.jar(java/lang/CharSequence.class)' is broken [error] (bad constant pool tag 15 at byte 1501) [error] error while loading Comparator, class file '/opt/developer/jdk1.8.0_05/jre/lib/rt.jar(java/util/Comparator.class)' is broken [error] (bad constant pool tag 15 at byte 5003) [error] two errors found [error] (compile:compile) Compilation failed Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore? q -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1738) Is spark-debugger still available?
[ https://issues.apache.org/jira/browse/SPARK-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1738. -- Resolution: Fixed That document was since deleted at some point anyway, and I assume the answer is that it does not exit. Is spark-debugger still available? -- Key: SPARK-1738 URL: https://issues.apache.org/jira/browse/SPARK-1738 Project: Spark Issue Type: Question Components: Documentation Reporter: WangTaoTheTonic Priority: Minor I see the arthur branch(https://github.com/apache/spark/tree/arthur) described in docs/spark-debugger.md does not exist. So the spark-debugger is still available? If not, should the document be deleted? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1605) Improve mllib.linalg.Vector
[ https://issues.apache.org/jira/browse/SPARK-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1605. -- Resolution: Won't Fix Another WontFix then? Improve mllib.linalg.Vector --- Key: SPARK-1605 URL: https://issues.apache.org/jira/browse/SPARK-1605 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Sandeep Singh We can make current Vector a wrapper around Breeze.linalg.Vector ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1573) slight modification with regards to sbt/sbt test
[ https://issues.apache.org/jira/browse/SPARK-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1573. -- Resolution: Won't Fix This has been resolved insofar as the main README.md no longer has this text. slight modification with regards to sbt/sbt test Key: SPARK-1573 URL: https://issues.apache.org/jira/browse/SPARK-1573 Project: Spark Issue Type: Documentation Components: Documentation Reporter: Nishkam Ravi When the sources are built against a certain Hadoop version with SPARK_YARN=true, the same settings seem necessary when running sbt/sbt test. For example: SPARK_HADOOP_VERSION=2.3.0-cdh5.0.0 SPARK_YARN=true sbt/sbt assembly SPARK_HADOOP_VERSION=2.3.0-cdh5.0.0 SPARK_YARN=true sbt/sbt test Otherwise build errors and failing tests are seen. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3873) Scala style: check import ordering
[ https://issues.apache.org/jira/browse/SPARK-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169572#comment-14169572 ] Apache Spark commented on SPARK-3873: - User 'vanzin' has created a pull request for this issue: https://github.com/apache/spark/pull/2757 Scala style: check import ordering -- Key: SPARK-3873 URL: https://issues.apache.org/jira/browse/SPARK-3873 Project: Spark Issue Type: Sub-task Components: Project Infra Reporter: Reynold Xin Assignee: Marcelo Vanzin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3928) Support wildcard matches on Parquet files
Nicholas Chammas created SPARK-3928: --- Summary: Support wildcard matches on Parquet files Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Priority: Minor {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1479) building spark on 2.0.0-cdh4.4.0 failed
[ https://issues.apache.org/jira/browse/SPARK-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1479. -- Resolution: Won't Fix Given discussion in SPARK-3445, I doubt anything more will be done for YARN alpha support, as it's on its way out. building spark on 2.0.0-cdh4.4.0 failed --- Key: SPARK-1479 URL: https://issues.apache.org/jira/browse/SPARK-1479 Project: Spark Issue Type: Question Environment: 2.0.0-cdh4.4.0 Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL spark 0.9.1 java version 1.6.0_32 Reporter: jackielihf Attachments: mvn.log [INFO] [ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.1.5:compile (scala-compile-first) on project spark-yarn-alpha_2.10: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.1.5:compile failed. CompileFailed - [Help 1] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.1.5:compile (scala-compile-first) on project spark-yarn-alpha_2.10: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.1.5:compile failed. at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:225) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59) at org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156) at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537) at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196) at org.apache.maven.cli.MavenCli.main(MavenCli.java:141) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352) Caused by: org.apache.maven.plugin.PluginExecutionException: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.1.5:compile failed. at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:110) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209) ... 19 more Caused by: Compilation failed at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:76) at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:35) at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:29) at sbt.compiler.AggressiveCompile$$anonfun$4$$anonfun$compileScala$1$1.apply$mcV$sp(AggressiveCompile.scala:71) at sbt.compiler.AggressiveCompile$$anonfun$4$$anonfun$compileScala$1$1.apply(AggressiveCompile.scala:71) at sbt.compiler.AggressiveCompile$$anonfun$4$$anonfun$compileScala$1$1.apply(AggressiveCompile.scala:71) at sbt.compiler.AggressiveCompile.sbt$compiler$AggressiveCompile$$timed(AggressiveCompile.scala:101) at sbt.compiler.AggressiveCompile$$anonfun$4.compileScala$1(AggressiveCompile.scala:70) at sbt.compiler.AggressiveCompile$$anonfun$4.apply(AggressiveCompile.scala:88) at sbt.compiler.AggressiveCompile$$anonfun$4.apply(AggressiveCompile.scala:60) at sbt.inc.IncrementalCompile$$anonfun$doCompile$1.apply(Compile.scala:24) at sbt.inc.IncrementalCompile$$anonfun$doCompile$1.apply(Compile.scala:22) at sbt.inc.Incremental$.cycle(Incremental.scala:40) at sbt.inc.Incremental$.compile(Incremental.scala:25) at sbt.inc.IncrementalCompile$.apply(Compile.scala:20) at sbt.compiler.AggressiveCompile.compile2(AggressiveCompile.scala:96) at
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169580#comment-14169580 ] Nicholas Chammas commented on SPARK-3928: - cc [~marmbrus] Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Priority: Minor {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1409) Flaky Test: actor input stream test in org.apache.spark.streaming.InputStreamsSuite
[ https://issues.apache.org/jira/browse/SPARK-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169589#comment-14169589 ] Sean Owen commented on SPARK-1409: -- Since this test was removed with SPARK-2805, safe to call this closed? Flaky Test: actor input stream test in org.apache.spark.streaming.InputStreamsSuite - Key: SPARK-1409 URL: https://issues.apache.org/jira/browse/SPARK-1409 Project: Spark Issue Type: Bug Components: Streaming Reporter: Michael Armbrust Assignee: Tathagata Das Here are just a few cases: https://travis-ci.org/apache/spark/jobs/22151827 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13709/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1398) Remove FindBugs jsr305 dependency
[ https://issues.apache.org/jira/browse/SPARK-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1398. -- Resolution: Won't Fix From the PR discussion, this had to be reverted because of some build problems, so I assume removing this .jar is a WontFix Remove FindBugs jsr305 dependency - Key: SPARK-1398 URL: https://issues.apache.org/jira/browse/SPARK-1398 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Mark Hamstra Assignee: Mark Hamstra Priority: Minor We're not making much use of FindBugs at this point, but findbugs-2.0.x is a drop-in replacement for 1.3.9 and does offer significant improvements (http://findbugs.sourceforge.net/findbugs2.html), so it's probably where we want to be for Spark 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1339) Build error: org.eclipse.paho:mqtt-client
[ https://issues.apache.org/jira/browse/SPARK-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1339. -- Resolution: Not a Problem Build error: org.eclipse.paho:mqtt-client - Key: SPARK-1339 URL: https://issues.apache.org/jira/browse/SPARK-1339 Project: Spark Issue Type: Bug Components: Build Affects Versions: 0.9.0 Reporter: Ken Williams Using Maven, I'm unable to build the 0.9.0 distribution I just downloaded. The Maven error is: {code} [ERROR] Failed to execute goal on project spark-examples_2.10: Could not resolve dependencies for project org.apache.spark:spark-examples_2.10:jar:0.9.0-incubating: Could not find artifact org.eclipse.paho:mqtt-client:jar:0.4.0 in nexus {code} My Maven version is 3.2.1, running on Java 1.7.0, using Scala 2.10.4. Is there an additional Maven repository I should add or something? If I go into the {{pom.xml}} and comment out the {{external/mqtt}} and {{examples}} modules, the build succeeds. I'm fine without the MQTT stuff, but I would really like to get the examples working because I haven't played with Spark before. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1317) sbt doesn't work for building Spark programs
[ https://issues.apache.org/jira/browse/SPARK-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169629#comment-14169629 ] Sean Owen commented on SPARK-1317: -- PS if you're still interested in this, I am pretty sure #1 is the correct answer. I would use my own sbt (or really, the SBT support in my IDE perhaps, or Maven) to build my own app. sbt doesn't work for building Spark programs Key: SPARK-1317 URL: https://issues.apache.org/jira/browse/SPARK-1317 Project: Spark Issue Type: Bug Components: Build, Documentation Affects Versions: 0.9.0 Reporter: Diana Carroll I don't know if this is a doc bug or a product bug, because I don't know how it is supposed to work. The Spark quick start guide page has a section that walks you through creating a standalone Spark app in Scala. I think the instructions worked in 0.8.1 but I can't get them to work in 0.9.0. The instructions have you create a directory structure in the canonical sbt format, but do not tell you where to locate this directory. However, after setting up the structure, the tutorial then instructs you to use the command {code}sbt/sbt package{code} which implies that the working directory must be SPARK_HOME. I tried it both ways: creating a mysparkapp directory right in SPARK_HOME and creating it in my home directory. Neither worked, with different results: - if I create a mysparkapp directory as instructed in SPARK_HOME, cd to SPARK_HOME and run the command sbt/sbt package as specified, it packages ALL of Spark...but does not build my own app. - if I create a mysparkapp directory elsewhere, cd to that directory, and run the command there, I get an error: {code} $SPARK_HOME/sbt/sbt package awk: cmd. line:1: fatal: cannot open file `./project/build.properties' for reading (No such file or directory) Attempting to fetch sbt /usr/lib/spark/sbt/sbt: line 33: sbt/sbt-launch-.jar: No such file or directory /usr/lib/spark/sbt/sbt: line 33: sbt/sbt-launch-.jar: No such file or directory Our attempt to download sbt locally to sbt/sbt-launch-.jar failed. Please install sbt manually from http://www.scala-sbt.org/ {code} So, either: 1: the Spark distribution of sbt can only be used to build Spark itself, not you own code...in which case the quick start guide is wrong, and should instead say that users should install sbt separately OR 2: the Spark distribution of sbt CAN be used, with property configuration, in which case that configuration should be documented (I wasn't able to figure it out, but I didn't try that hard either) OR 3: the Spark distribution of sbt is *supposed* to be able to build Spark apps, but is configured incorrectly in the product, in which case there's a product bug rather than a doc bug Although this is not a show-stopper, because the obvious workaround is to simply install sbt separately, I think at least updating the docs is pretty high priority, because most people learning Spark start with that Quick Start page, which doesn't work. (If it's doc issue #1, let me know, and I'll fix the docs myself. :-) ) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3929) Support for fixed-precision decimal
Matei Zaharia created SPARK-3929: Summary: Support for fixed-precision decimal Key: SPARK-3929 URL: https://issues.apache.org/jira/browse/SPARK-3929 Project: Spark Issue Type: New Feature Components: SQL Reporter: Matei Zaharia Assignee: Matei Zaharia -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3930) Add precision and scale to Spark SQL's Decimal type
Matei Zaharia created SPARK-3930: Summary: Add precision and scale to Spark SQL's Decimal type Key: SPARK-3930 URL: https://issues.apache.org/jira/browse/SPARK-3930 Project: Spark Issue Type: Sub-task Reporter: Matei Zaharia Assignee: Matei Zaharia -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3929) Support for fixed-precision decimal
[ https://issues.apache.org/jira/browse/SPARK-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3929: - Description: Spark SQL should support fixed-precision decimals, which are available in Hive 0.13 (see https://cwiki.apache.org/confluence/download/attachments/27362075/Hive_Decimal_Precision_Scale_Support.pdf) as well as in new versions of Parquet. This involves adding precision to the decimal type and implementing various rules for math on it (see above). Support for fixed-precision decimal --- Key: SPARK-3929 URL: https://issues.apache.org/jira/browse/SPARK-3929 Project: Spark Issue Type: New Feature Components: SQL Reporter: Matei Zaharia Assignee: Matei Zaharia Spark SQL should support fixed-precision decimals, which are available in Hive 0.13 (see https://cwiki.apache.org/confluence/download/attachments/27362075/Hive_Decimal_Precision_Scale_Support.pdf) as well as in new versions of Parquet. This involves adding precision to the decimal type and implementing various rules for math on it (see above). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3931) Support reading fixed-precision decimals from Parquet
Matei Zaharia created SPARK-3931: Summary: Support reading fixed-precision decimals from Parquet Key: SPARK-3931 URL: https://issues.apache.org/jira/browse/SPARK-3931 Project: Spark Issue Type: Sub-task Reporter: Matei Zaharia Assignee: Matei Zaharia -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3932) Support reading fixed-precision decimals from Hive 0.13
Matei Zaharia created SPARK-3932: Summary: Support reading fixed-precision decimals from Hive 0.13 Key: SPARK-3932 URL: https://issues.apache.org/jira/browse/SPARK-3932 Project: Spark Issue Type: Sub-task Reporter: Matei Zaharia Assignee: Matei Zaharia -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1243) spark compilation error
[ https://issues.apache.org/jira/browse/SPARK-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1243. -- Resolution: Fixed This appears to be long since resolved by something else, perhaps a subsequent change to Jetty deps. I have never seen this personally, and Jenkins builds are fine. spark compilation error --- Key: SPARK-1243 URL: https://issues.apache.org/jira/browse/SPARK-1243 Project: Spark Issue Type: Bug Components: Build Reporter: Qiuzhuang Lian After issuing git pull from git master, spark could not compile any longer Here is the error message, it seems that it is related to jetty upgrade.@rxin compile [info] Compiling 301 Scala sources and 19 Java sources to E:\projects\amplab\spark\core\target\scala-2.10\classes... [warn] Class java.nio.channels.ReadPendingException not found - continuing with a stub. [error] [error] while compiling: E:\projects\amplab\spark\core\src\main\scala\org\apache\spark\HttpServer.scala [error] during phase: erasure [error] library version: version 2.10.3 [error] compiler version: version 2.10.3 [error] reconstructed args: -Xmax-classfile-name 120 -deprecation -bootclasspath C:\Java\jdk1.6.0_27\jre\lib\resources.jar;C:\Java\jdk1.6.0_27\jre\lib\rt.jar;C:\Java\jdk1.6.0_27\jre\lib\sunrsasign.jar;C:\Java\jdk1.6.0_27\jre\lib\jsse.jar;C:\Java\jdk1.6.0_27\jre\lib\jce.jar;C:\Java\jdk1.6.0_27\jre\lib\charsets.jar;C:\Java\jdk1.6.0_27\jre\lib\modules\jdk.boot.jar;C:\Java\jdk1.6.0_27\jre\classes;C:\Users\Kand\.sbt\boot\scala-2.10.3\lib\scala-library.jar -unchecked -classpath
[jira] [Updated] (SPARK-3266) JavaDoubleRDD doesn't contain max()
[ https://issues.apache.org/jira/browse/SPARK-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-3266: -- Affects Version/s: 1.2.0 JavaDoubleRDD doesn't contain max() --- Key: SPARK-3266 URL: https://issues.apache.org/jira/browse/SPARK-3266 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 1.0.1, 1.0.2, 1.1.0, 1.2.0 Reporter: Amey Chaugule Assignee: Josh Rosen Attachments: spark-repro-3266.tar.gz While I can compile my code, I see: Caused by: java.lang.NoSuchMethodError: org.apache.spark.api.java.JavaDoubleRDD.max(Ljava/util/Comparator;)Ljava/lang/Double; When I try to execute my Spark code. Stepping into the JavaDoubleRDD class, I don't notice max() although it is clearly listed in the documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3933) Optimize decimal type in Spark SQL for those with small precision
Matei Zaharia created SPARK-3933: Summary: Optimize decimal type in Spark SQL for those with small precision Key: SPARK-3933 URL: https://issues.apache.org/jira/browse/SPARK-3933 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Matei Zaharia Assignee: Matei Zaharia With fixed-precision decimals, many decimal values will fit in a Long, so we can use a Decimal class with a mutable Long field to represent the unscaled value, rather than allocating a BigDecimal. We can then do some operations directly on these Long fields. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3266) JavaDoubleRDD doesn't contain max()
[ https://issues.apache.org/jira/browse/SPARK-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-3266: -- Target Version/s: 1.1.1, 1.2.0 JavaDoubleRDD doesn't contain max() --- Key: SPARK-3266 URL: https://issues.apache.org/jira/browse/SPARK-3266 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 1.0.1, 1.0.2, 1.1.0, 1.2.0 Reporter: Amey Chaugule Assignee: Josh Rosen Attachments: spark-repro-3266.tar.gz While I can compile my code, I see: Caused by: java.lang.NoSuchMethodError: org.apache.spark.api.java.JavaDoubleRDD.max(Ljava/util/Comparator;)Ljava/lang/Double; When I try to execute my Spark code. Stepping into the JavaDoubleRDD class, I don't notice max() although it is clearly listed in the documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1306) no instructions provided for sbt assembly with Hadoop 2.2
[ https://issues.apache.org/jira/browse/SPARK-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1306. -- Resolution: Fixed I think this was obviated by subsequent changes to this documentation. SBT is no longer the focus, but, building-spark.md now has more comprehensive documentation on building with YARN, including these recent versions. no instructions provided for sbt assembly with Hadoop 2.2 - Key: SPARK-1306 URL: https://issues.apache.org/jira/browse/SPARK-1306 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 0.9.0 Reporter: Diana Carroll on the running-on-yarn.html page, in the section Building a YARN-Enabled Assembly JAR, only the instructions for building for old Hadoop (2.0.5) are provided. There's a comment that The build process now also supports new YARN versions (2.2.x). See below. However, the only mention below is a single sentence which says See Building Spark with Maven for instructions on how to build Spark using the Maven process. There are no instructions for building with sbt. This is different than in prior versions of the docs, in which a whole paragraph was provided. I'd like to see the command line to build for Hadoop 2.2 included right at the top of the page. Also remove the bit about how it is now supported. Hadoop 2.2 is now the norm, no longer an exception, as I see it. Unfortunately I'm not sure exactly what the command should be. I tried this, but got errors: SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1234) clean up typos and grammar issues in Spark on YARN page
[ https://issues.apache.org/jira/browse/SPARK-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1234. -- Resolution: Won't Fix Given the discussion in https://github.com/apache/spark/pull/130 , this was abandoned, but I also don't see the bad text on that page anymore anyhow. It probably got improved in another subsequent update. clean up typos and grammar issues in Spark on YARN page --- Key: SPARK-1234 URL: https://issues.apache.org/jira/browse/SPARK-1234 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 0.9.0 Reporter: Diana Carroll Priority: Minor The Launch spark application with yarn-client mode section of this of this page has several incomplete sentences, typos, etc.etc. http://spark.incubator.apache.org/docs/latest/running-on-yarn.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1192) Around 30 parameters in Spark are used but undocumented and some are having confusing name
[ https://issues.apache.org/jira/browse/SPARK-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169649#comment-14169649 ] Sean Owen commented on SPARK-1192: -- PR is actually at https://github.com/apache/spark/pull/2312 and is misnamed. Is this still live though? Around 30 parameters in Spark are used but undocumented and some are having confusing name -- Key: SPARK-1192 URL: https://issues.apache.org/jira/browse/SPARK-1192 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 1.0.0 Reporter: Nan Zhu Assignee: Nan Zhu I grep the code in core component, I found that around 30 parameters in the implementation is actually used but undocumented. By reading the source code, I found that some of them are actually very useful for the user. I suggest to make a complete document on the parameters. Also some parameters are having confusing names spark.shuffle.copier.threads - this parameters is to control how many threads you will use when you start a Netty-based shuffle servicebut from the name, we cannot get this information spark.shuffle.sender.port - the similar problem with the above one, when you use Netty-based shuffle receiver, you will have to setup a Netty-based sender...this parameter is to setup the port used by the Netty sender, but the name cannot convey this information -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1192) Around 30 parameters in Spark are used but undocumented and some are having confusing name
[ https://issues.apache.org/jira/browse/SPARK-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169658#comment-14169658 ] Apache Spark commented on SPARK-1192: - User 'CodingCat' has created a pull request for this issue: https://github.com/apache/spark/pull/2312 Around 30 parameters in Spark are used but undocumented and some are having confusing name -- Key: SPARK-1192 URL: https://issues.apache.org/jira/browse/SPARK-1192 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 1.0.0 Reporter: Nan Zhu Assignee: Nan Zhu I grep the code in core component, I found that around 30 parameters in the implementation is actually used but undocumented. By reading the source code, I found that some of them are actually very useful for the user. I suggest to make a complete document on the parameters. Also some parameters are having confusing names spark.shuffle.copier.threads - this parameters is to control how many threads you will use when you start a Netty-based shuffle servicebut from the name, we cannot get this information spark.shuffle.sender.port - the similar problem with the above one, when you use Netty-based shuffle receiver, you will have to setup a Netty-based sender...this parameter is to setup the port used by the Netty sender, but the name cannot convey this information -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3251) Clarify learning interfaces
[ https://issues.apache.org/jira/browse/SPARK-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169659#comment-14169659 ] Joseph K. Bradley commented on SPARK-3251: -- I agree it's hard to say. Based on the description, I'd say it is a subset pertaining to classification models. Perhaps it should be renamed as such? Clarify learning interfaces Key: SPARK-3251 URL: https://issues.apache.org/jira/browse/SPARK-3251 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.1.0, 1.1.1 Reporter: Christoph Sawade *Make threshold mandatory* Currently, the output of predict for an example is either the score or the class. This side-effect is caused by clearThreshold. To clarify that behaviour three different types of predict (predictScore, predictClass, predictProbabilty) were introduced; the threshold is not longer optional. *Clarify classification interfaces* Currently, some functionality is spreaded over multiple models. In order to clarify the structure and simplify the implementation of more complex models (like multinomial logistic regression), two new classes are introduced: - BinaryClassificationModel: for all models that derives a binary classification from a single weight vector. Comprises the tresholding functionality to derive a prediction from a score. It basically captures SVMModel and LogisticRegressionModel. - ProbabilitistClassificaitonModel: This trait defines the interface for models that return a calibrated confidence score (aka probability). *Misc* - some renaming - add test for probabilistic output -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1149) Bad partitioners can cause Spark to hang
[ https://issues.apache.org/jira/browse/SPARK-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1149. -- Resolution: Fixed Looks like Patrick merged this into master in March. It might have been fixed for ... 1.0? Bad partitioners can cause Spark to hang Key: SPARK-1149 URL: https://issues.apache.org/jira/browse/SPARK-1149 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Bryn Keller Priority: Minor While implementing a unit test for lookup, I accidentally created a situation where a partitioner returned a partition number that was outside its range. It should have returned 0 or 1, but in the last case, it returned a -1. Rather than reporting the problem via an exception, Spark simply hangs during the unit test run. We should catch this bad behavior by partitioners and throw an exception. test(lookup with bad partitioner) { val pairs = sc.parallelize(Array((1,2), (3,4), (5,6), (5,7))) val p = new Partitioner { def numPartitions: Int = 2 def getPartition(key: Any): Int = key.hashCode() % 2 } val shuffled = pairs.partitionBy(p) assert(shuffled.partitioner === Some(p)) assert(shuffled.lookup(1) === Seq(2)) assert(shuffled.lookup(5) === Seq(6,7)) assert(shuffled.lookup(-1) === Seq()) } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1192) Around 30 parameters in Spark are used but undocumented and some are having confusing name
[ https://issues.apache.org/jira/browse/SPARK-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169663#comment-14169663 ] Nan Zhu commented on SPARK-1192: yes, I resubmitted https://github.com/apache/spark/pull/2312 for Matei's request (removed some, add some) it's still valid Around 30 parameters in Spark are used but undocumented and some are having confusing name -- Key: SPARK-1192 URL: https://issues.apache.org/jira/browse/SPARK-1192 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 1.0.0 Reporter: Nan Zhu Assignee: Nan Zhu I grep the code in core component, I found that around 30 parameters in the implementation is actually used but undocumented. By reading the source code, I found that some of them are actually very useful for the user. I suggest to make a complete document on the parameters. Also some parameters are having confusing names spark.shuffle.copier.threads - this parameters is to control how many threads you will use when you start a Netty-based shuffle servicebut from the name, we cannot get this information spark.shuffle.sender.port - the similar problem with the above one, when you use Netty-based shuffle receiver, you will have to setup a Netty-based sender...this parameter is to setup the port used by the Netty sender, but the name cannot convey this information -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1083) Build fail
[ https://issues.apache.org/jira/browse/SPARK-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1083. -- Resolution: Cannot Reproduce This looks like a git error, and is ancient at this point. I presume that since we have evidence that Windows builds subsequently worked, this was either a local problem or fixed by something else. Build fail -- Key: SPARK-1083 URL: https://issues.apache.org/jira/browse/SPARK-1083 Project: Spark Issue Type: Bug Components: Build, Windows Affects Versions: 0.7.3 Reporter: Jan Paw Problem with building the latest version from github. {code:none}[info] Loading project definition from C:\Users\Jan\Documents\GitHub\incubator-s park\project\project [debug] [debug] Initial source changes: [debug] removed:Set() [debug] added: Set() [debug] modified: Set() [debug] Removed products: Set() [debug] Modified external sources: Set() [debug] Modified binary dependencies: Set() [debug] Initial directly invalidated sources: Set() [debug] [debug] Sources indirectly invalidated by: [debug] product: Set() [debug] binary dep: Set() [debug] external source: Set() [debug] All initially invalidated sources: Set() [debug] Copy resource mappings: [debug] java.lang.RuntimeException: Nonzero exit code (128): git clone https://github.co m/chenkelmann/junit_xml_listener.git C:\Users\Jan\.sbt\0.13\staging\5f76b43a3aca 87b5c013\junit_xml_listener at scala.sys.package$.error(package.scala:27) at sbt.Resolvers$.run(Resolvers.scala:134) at sbt.Resolvers$.run(Resolvers.scala:123) at sbt.Resolvers$$anon$2.clone(Resolvers.scala:78) at sbt.Resolvers$DistributedVCS$$anonfun$toResolver$1$$anonfun$apply$11$ $anonfun$apply$5.apply$mcV$sp(Resolvers.scala:104) at sbt.Resolvers$.creates(Resolvers.scala:141) at sbt.Resolvers$DistributedVCS$$anonfun$toResolver$1$$anonfun$apply$11. apply(Resolvers.scala:103) at sbt.Resolvers$DistributedVCS$$anonfun$toResolver$1$$anonfun$apply$11. apply(Resolvers.scala:103) at sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$3.apply(Bui ldLoader.scala:90) at sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$3.apply(Bui ldLoader.scala:89) at scala.Option.map(Option.scala:145) at sbt.BuildLoader$$anonfun$componentLoader$1.apply(BuildLoader.scala:89 ) at sbt.BuildLoader$$anonfun$componentLoader$1.apply(BuildLoader.scala:85 ) at sbt.MultiHandler.apply(BuildLoader.scala:16) at sbt.BuildLoader.apply(BuildLoader.scala:142) at sbt.Load$.loadAll(Load.scala:314) at sbt.Load$.loadURI(Load.scala:266) at sbt.Load$.load(Load.scala:262) at sbt.Load$.load(Load.scala:253) at sbt.Load$.apply(Load.scala:137) at sbt.Load$.buildPluginDefinition(Load.scala:597) at sbt.Load$.buildPlugins(Load.scala:563) at sbt.Load$.plugins(Load.scala:551) at sbt.Load$.loadUnit(Load.scala:412) at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:258) at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:258) at sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$4$$anonfun$ apply$5$$anonfun$apply$6.apply(BuildLoader.scala:93) at sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$4$$anonfun$ apply$5$$anonfun$apply$6.apply(BuildLoader.scala:92) at sbt.BuildLoader.apply(BuildLoader.scala:143) at sbt.Load$.loadAll(Load.scala:314) at sbt.Load$.loadURI(Load.scala:266) at sbt.Load$.load(Load.scala:262) at sbt.Load$.load(Load.scala:253) at sbt.Load$.apply(Load.scala:137) at sbt.Load$.defaultLoad(Load.scala:40) at sbt.BuiltinCommands$.doLoadProject(Main.scala:451) at sbt.BuiltinCommands$$anonfun$loadProjectImpl$2.apply(Main.scala:445) at sbt.BuiltinCommands$$anonfun$loadProjectImpl$2.apply(Main.scala:445) at sbt.Command$$anonfun$applyEffect$1$$anonfun$apply$2.apply(Command.sca la:60) at sbt.Command$$anonfun$applyEffect$1$$anonfun$apply$2.apply(Command.sca la:60) at sbt.Command$$anonfun$applyEffect$2$$anonfun$apply$3.apply(Command.sca la:62) at sbt.Command$$anonfun$applyEffect$2$$anonfun$apply$3.apply(Command.sca la:62) at sbt.Command$.process(Command.scala:95) at sbt.MainLoop$$anonfun$1$$anonfun$apply$1.apply(MainLoop.scala:100) at sbt.MainLoop$$anonfun$1$$anonfun$apply$1.apply(MainLoop.scala:100) at sbt.State$$anon$1.process(State.scala:179) at sbt.MainLoop$$anonfun$1.apply(MainLoop.scala:100) at sbt.MainLoop$$anonfun$1.apply(MainLoop.scala:100) at
[jira] [Resolved] (SPARK-1017) Set the permgen even if we are calling the users sbt (via SBT_OPTS)
[ https://issues.apache.org/jira/browse/SPARK-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1017. -- Resolution: Won't Fix As I understand, only {{sbt/sbt}} is supported for building Spark with SBT, rather than a local {{sbt}}. Maven is the primary build, and it sets {{MaxPermSize}} and {{PermGen}} for scalac and scalatest. I think this is obsolete and/or already covered then? Set the permgen even if we are calling the users sbt (via SBT_OPTS) --- Key: SPARK-1017 URL: https://issues.apache.org/jira/browse/SPARK-1017 Project: Spark Issue Type: Improvement Reporter: Patrick Cogan Assignee: Patrick Cogan Now we will call the users sbt installation if they have one. But users might run into the permgen issues... so we should force the permgen unless the user explicitly overrides it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3923) All Standalone Mode services time out with each other
[ https://issues.apache.org/jira/browse/SPARK-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169692#comment-14169692 ] Apache Spark commented on SPARK-3923: - User 'aarondav' has created a pull request for this issue: https://github.com/apache/spark/pull/2784 All Standalone Mode services time out with each other - Key: SPARK-3923 URL: https://issues.apache.org/jira/browse/SPARK-3923 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 1.2.0 Reporter: Aaron Davidson Priority: Blocker I'm seeing an issue where it seems that components in Standalone Mode (Worker, Master, Driver, and Executor) all seem to time out with each other after around 1000 seconds. Here is an example log: {code} 14/10/13 06:43:55 INFO Master: Registering worker ip-10-0-147-189.us-west-2.compute.internal:38922 with 4 cores, 29.0 GB RAM 14/10/13 06:43:55 INFO Master: Registering worker ip-10-0-175-214.us-west-2.compute.internal:42918 with 4 cores, 59.0 GB RAM 14/10/13 06:43:56 INFO Master: Registering app Databricks Shell 14/10/13 06:43:56 INFO Master: Registered app Databricks Shell with ID app-20141013064356- ... precisely 1000 seconds later ... 14/10/13 07:00:35 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkwor...@ip-10-0-147-189.us-west-2.compute.internal:38922] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 14/10/13 07:00:35 INFO Master: akka.tcp://sparkwor...@ip-10-0-147-189.us-west-2.compute.internal:38922 got disassociated, removing it. 14/10/13 07:00:35 INFO LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.0.147.189%3A54956-1#1529980245] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 14/10/13 07:00:35 INFO Master: akka.tcp://sparkwor...@ip-10-0-175-214.us-west-2.compute.internal:42918 got disassociated, removing it. 14/10/13 07:00:35 INFO Master: Removing worker worker-20141013064354-ip-10-0-175-214.us-west-2.compute.internal-42918 on ip-10-0-175-214.us-west-2.compute.internal:42918 14/10/13 07:00:35 INFO Master: Telling app of lost executor: 1 14/10/13 07:00:35 INFO Master: akka.tcp://sparkwor...@ip-10-0-175-214.us-west-2.compute.internal:42918 got disassociated, removing it. 14/10/13 07:00:35 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkwor...@ip-10-0-175-214.us-west-2.compute.internal:42918] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 14/10/13 07:00:35 INFO LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.0.175.214%3A35958-2#314633324] was not delivered. [3] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 14/10/13 07:00:35 INFO LocalActorRef: Message [akka.remote.transport.AssociationHandle$Disassociated] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.0.175.214%3A35958-2#314633324] was not delivered. [4] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 14/10/13 07:00:36 INFO ProtocolStateActor: No response from remote. Handshake timed out or transport failure detector triggered. 14/10/13 07:00:36 INFO Master: akka.tcp://sparkdri...@ip-10-0-175-215.us-west-2.compute.internal:58259 got disassociated, removing it. 14/10/13 07:00:36 INFO LocalActorRef: Message [akka.remote.transport.AssociationHandle$InboundPayload] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.0.175.215%3A41987-3#1944377249] was not delivered. [5] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 14/10/13 07:00:36 INFO Master: Removing app app-20141013064356- 14/10/13 07:00:36 WARN ReliableDeliverySupervisor: Association with remote system
[jira] [Resolved] (SPARK-1409) Flaky Test: actor input stream test in org.apache.spark.streaming.InputStreamsSuite
[ https://issues.apache.org/jira/browse/SPARK-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-1409. - Resolution: Won't Fix Flaky Test: actor input stream test in org.apache.spark.streaming.InputStreamsSuite - Key: SPARK-1409 URL: https://issues.apache.org/jira/browse/SPARK-1409 Project: Spark Issue Type: Bug Components: Streaming Reporter: Michael Armbrust Assignee: Tathagata Das Here are just a few cases: https://travis-ci.org/apache/spark/jobs/22151827 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13709/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1463) cleanup unnecessary dependency jars in the spark assembly jars
[ https://issues.apache.org/jira/browse/SPARK-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169709#comment-14169709 ] Sean Owen commented on SPARK-1463: -- FWIW I do not see these packages in the final assembly JAR anymore. This may be obsolete? cleanup unnecessary dependency jars in the spark assembly jars -- Key: SPARK-1463 URL: https://issues.apache.org/jira/browse/SPARK-1463 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 0.9.0 Reporter: Jenny MA Priority: Minor Labels: easyfix Fix For: 1.0.0 there are couple GPL/LGPL based dependencies which are included in the final assembly jar, which are not used by spark runtime. identified the following libraries. we can provide a fix in assembly/pom.xml. excludecom.google.code.findbugs:*/exclude excludeorg.acplt:oncrpc:*/exclude excludeglassfish:*/exclude excludecom.cenqua.clover:clover:*/exclude excludeorg.glassfish:*/exclude excludeorg.glassfish.grizzly:*/exclude excludeorg.glassfish.gmbal:*/exclude excludeorg.glassfish.external:*/exclude -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1010) Update all unit tests to use SparkConf instead of system properties
[ https://issues.apache.org/jira/browse/SPARK-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169724#comment-14169724 ] Sean Owen commented on SPARK-1010: -- Yes, lots of usage in tests still. A lot looks intentional. {code} find . -name *Suite.scala -type f -exec grep -E System\.[gs]etProperty {} \; ... .format(System.getProperty(user.name, unknown), .format(System.getProperty(user.name, unknown)).stripMargin System.setProperty(spark.testing, true) System.setProperty(spark.reducer.maxMbInFlight, 1) System.setProperty(spark.storage.memoryFraction, 0.0001) System.setProperty(spark.storage.memoryFraction, 0.01) System.setProperty(spark.authenticate, false) System.setProperty(spark.authenticate, false) System.setProperty(spark.shuffle.manager, hash) System.setProperty(spark.scheduler.mode, FIFO) System.setProperty(spark.scheduler.mode, FAIR) ... {code} Update all unit tests to use SparkConf instead of system properties --- Key: SPARK-1010 URL: https://issues.apache.org/jira/browse/SPARK-1010 Project: Spark Issue Type: New Feature Affects Versions: 0.9.0 Reporter: Patrick Cogan Assignee: Nirmal Priority: Minor Labels: starter -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3934) RandomForest bug in sanity check in DTStatsAggregator
Joseph K. Bradley created SPARK-3934: Summary: RandomForest bug in sanity check in DTStatsAggregator Key: SPARK-3934 URL: https://issues.apache.org/jira/browse/SPARK-3934 Project: Spark Issue Type: Bug Components: MLlib Reporter: Joseph K. Bradley When run with a mix of unordered categorical and continuous features, on multiclass classification, RandomForest fails. The bug is in the sanity checks in getFeatureOffset and getLeftRightFeatureOffsets, which use the wrong indices for checking whether features are unordered. Proposal: Remove the sanity checks since they are not really needed, and since they would require DTStatsAggregator to keep track of an extra set of indices (for the feature subset). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3934) RandomForest bug in sanity check in DTStatsAggregator
[ https://issues.apache.org/jira/browse/SPARK-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169752#comment-14169752 ] Apache Spark commented on SPARK-3934: - User 'jkbradley' has created a pull request for this issue: https://github.com/apache/spark/pull/2785 RandomForest bug in sanity check in DTStatsAggregator - Key: SPARK-3934 URL: https://issues.apache.org/jira/browse/SPARK-3934 Project: Spark Issue Type: Bug Components: MLlib Reporter: Joseph K. Bradley Assignee: Joseph K. Bradley When run with a mix of unordered categorical and continuous features, on multiclass classification, RandomForest fails. The bug is in the sanity checks in getFeatureOffset and getLeftRightFeatureOffsets, which use the wrong indices for checking whether features are unordered. Proposal: Remove the sanity checks since they are not really needed, and since they would require DTStatsAggregator to keep track of an extra set of indices (for the feature subset). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3918) Forget Unpersist in RandomForest.scala(train Method)
[ https://issues.apache.org/jira/browse/SPARK-3918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169753#comment-14169753 ] Apache Spark commented on SPARK-3918: - User 'jkbradley' has created a pull request for this issue: https://github.com/apache/spark/pull/2785 Forget Unpersist in RandomForest.scala(train Method) Key: SPARK-3918 URL: https://issues.apache.org/jira/browse/SPARK-3918 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.2.0 Environment: All Reporter: junlong Assignee: Joseph K. Bradley Labels: decisiontree, train, unpersist Fix For: 1.1.0 Original Estimate: 10m Remaining Estimate: 10m In version 1.1.0 DecisionTree.scala, train Method, treeInput has been persisted in Memory, but without unpersist. It caused heavy DISK usage. In github version(1.2.0 maybe), RandomForest.scala, train Method, baggedInput has been persisted but without unpersisted too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3913) Spark Yarn Client API change to expose Yarn Resource Capacity, Yarn Application Listener and killApplication() API
[ https://issues.apache.org/jira/browse/SPARK-3913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169760#comment-14169760 ] Apache Spark commented on SPARK-3913: - User 'chesterxgchen' has created a pull request for this issue: https://github.com/apache/spark/pull/2786 Spark Yarn Client API change to expose Yarn Resource Capacity, Yarn Application Listener and killApplication() API -- Key: SPARK-3913 URL: https://issues.apache.org/jira/browse/SPARK-3913 Project: Spark Issue Type: Improvement Components: YARN Reporter: Chester When working with Spark with Yarn deployment mode, we have two issues: 1) We don't know how much yarn max capacity ( memory and cores) before we specify the number of executor and memories for spark drivers and executors. We we set a big number, the job can potentially exceeds the limit and got killed. It would be better we let the application know that the yarn resource capacity a head of time and the spark config can adjusted dynamically. 2) Once job started, we would like to have some feedbacks from yarn application. Currently, the spark client basically block the call and returns when the job is finished or failed or killed. If the job runs for few hours, we have no idea how far it has gone, the progress and resource usage, tracking URL etc. 3) Once the job is started, you basically can't stop it. The Yarn Client API stop doesn't to work in most cases from our experience. But Yarn API does work is killApplication(appId). So we need to expose this killApplication() API to Spark Yarn Client as well. I will create one Pull Request and try to address these problems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3654) Implement all extended HiveQL statements/commands with a separate parser combinator
[ https://issues.apache.org/jira/browse/SPARK-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3654: Assignee: Ravindra Pesala Implement all extended HiveQL statements/commands with a separate parser combinator --- Key: SPARK-3654 URL: https://issues.apache.org/jira/browse/SPARK-3654 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: Cheng Lian Assignee: Ravindra Pesala Fix For: 1.2.0 Statements and commands like {{SET}}, {{CACHE TABLE}} and {{ADD JAR}} etc. are currently parsed in a quite hacky way, like this: {code} if (sql.trim.toLowerCase.startsWith(cache table)) { sql.trim.toLowerCase.startsWith(cache table) match { ... } } {code} It would be much better to add an extra parser combinator that parses these syntax extensions first, and then fallback to the normal Hive parser. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3813) Support case when conditional functions in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3813: Assignee: Ravindra Pesala Support case when conditional functions in Spark SQL -- Key: SPARK-3813 URL: https://issues.apache.org/jira/browse/SPARK-3813 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: Ravindra Pesala(Old.Don't assign to it) Assignee: Ravindra Pesala Fix For: 1.2.0 The SQL queries which has following conditional functions are not supported in Spark SQL. {code} CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END {code} The same functions can work in Spark HiveQL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2594) Add CACHE TABLE name AS SELECT ...
[ https://issues.apache.org/jira/browse/SPARK-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2594: Assignee: Ravindra Pesala Add CACHE TABLE name AS SELECT ... Key: SPARK-2594 URL: https://issues.apache.org/jira/browse/SPARK-2594 Project: Spark Issue Type: New Feature Components: SQL Reporter: Michael Armbrust Assignee: Ravindra Pesala Priority: Critical Fix For: 1.2.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3371) Spark SQL: Renaming a function expression with group by gives error
[ https://issues.apache.org/jira/browse/SPARK-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3371: Assignee: Ravindra Pesala Spark SQL: Renaming a function expression with group by gives error --- Key: SPARK-3371 URL: https://issues.apache.org/jira/browse/SPARK-3371 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: Pei-Lun Lee Assignee: Ravindra Pesala Fix For: 1.2.0 {code} val sqlContext = new org.apache.spark.sql.SQLContext(sc) val rdd = sc.parallelize(List({foo:bar})) sqlContext.jsonRDD(rdd).registerAsTable(t1) sqlContext.registerFunction(len, (s: String) = s.length) sqlContext.sql(select len(foo) as a, count(1) from t1 group by len(foo)).collect() {code} running above code in spark-shell gives the following error {noformat} 14/09/03 17:20:13 ERROR Executor: Exception in task 2.0 in stage 3.0 (TID 214) org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: foo#0 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:43) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:42) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4$$anonfun$apply$2.apply(TreeNode.scala:201) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:199) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) {noformat} remove as a in the query causes no error -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org