Unsubscribe
unsubscribe
Re: Shark Vs Spark SQL
As of the spark summit 2014 they mentioned that there will be no active development on shark. Thanks, Shrikar On Wed, Jul 2, 2014 at 3:53 PM, Subacini B subac...@gmail.com wrote: Hi, http://mail-archives.apache.org/mod_mbox/spark-user/201403.mbox/%3cb75376b8-7a57-4161-b604-f919886cf...@gmail.com%3E This talks about Shark backend will be replaced with Spark SQL engine in future. Does that mean Spark will continue to support Shark + Spark SQL for long term? OR After some period, Shark will be decommissioned ?? Thanks Subacini
Re: How do you run your spark app?
Hi Shivani, I use sbt assembly to create a fat jar . https://github.com/sbt/sbt-assembly Example of the sbt file is below. import AssemblyKeys._ // put this at the top of the file assemblySettings mainClass in assembly := Some(FifaSparkStreaming) name := FifaSparkStreaming version := 1.0 scalaVersion := 2.10.4 libraryDependencies ++= Seq(org.apache.spark %% spark-core % 1.0.0 % provided, org.apache.spark %% spark-streaming % 1.0.0 % provided, (org.apache.spark %% spark-streaming-twitter % 1.0.0).exclude(org.eclipse.jetty.orbit,javax.transaction) .exclude(org.eclipse.jetty.orbit,javax.servlet) .exclude(org.eclipse.jetty.orbit,javax.mail.glassfish) .exclude(org.eclipse.jetty.orbit,javax.activation) .exclude(com.esotericsoftware.minlog, minlog), (net.debasishg % redisclient_2.10 % 2.12).exclude(com.typesafe.akka,akka-actor_2.10)) mergeStrategy in assembly = (mergeStrategy in assembly) { (old) = { case PathList(javax, servlet, xs @ _*) = MergeStrategy.first case PathList(org, apache, xs @ _*) = MergeStrategy.first case PathList(org, apache, xs @ _*) = MergeStrategy.first case application.conf = MergeStrategy.concat case unwanted.txt = MergeStrategy.discard case x = old(x) } } resolvers += Akka Repository at http://repo.akka.io/releases/; And I run as mentioned below. LOCALLY : 1) sbt 'run AP1z4IYraYm5fqWhITWArY53x Cyyz3Zr67tVK46G8dus5tSbc83KQOdtMDgYoQ5WLQwH0mTWzB6 115254720-OfJ4yFsUU6C6vBkEOMDlBlkIgslPleFjPwNcxHjN Qd76y2izncM7fGGYqU1VXYTxg1eseNuzcdZKm2QJyK8d1 fifa fifa2014' If you want to submit on the cluster CLUSTER: 2) spark-submit --class FifaSparkStreaming --master spark://server-8-144:7077 --driver-memory 2048 --deploy-mode cluster FifaSparkStreaming-assembly-1.0.jar AP1z4IYraYm5fqWhITWArY53x Cyyz3Zr67tVK46G8dus5tSbc83KQOdtMDgYoQ5WLQwH0mTWzB6 115254720-OfJ4yFsUU6C6vBkEOMDlBlkIgslPleFjPwNcxHjN Qd76y2izncM7fGGYqU1VXYTxg1eseNuzcdZKm2QJyK8d1 fifa fifa2014 Hope this helps. Thanks, Shrikar On Fri, Jun 20, 2014 at 9:16 AM, Shivani Rao raoshiv...@gmail.com wrote: Hello Michael, I have a quick question for you. Can you clarify the statement build fat JAR's and build dist-style TAR.GZ packages with launch scripts, JAR's and everything needed to run a Job. Can you give an example. I am using sbt assembly as well to create a fat jar, and supplying the spark and hadoop locations in the class path. Inside the main() function where spark context is created, I use SparkContext.jarOfClass(this).toList add the fat jar to my spark context. However, I seem to be running into issues with this approach. I was wondering if you had any inputs Michael. Thanks, Shivani On Thu, Jun 19, 2014 at 10:57 PM, Sonal Goyal sonalgoy...@gmail.com wrote: We use maven for building our code and then invoke spark-submit through the exec plugin, passing in our parameters. Works well for us. Best Regards, Sonal Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Fri, Jun 20, 2014 at 3:26 AM, Michael Cutler mich...@tumra.com wrote: P.S. Last but not least we use sbt-assembly to build fat JAR's and build dist-style TAR.GZ packages with launch scripts, JAR's and everything needed to run a Job. These are automatically built from source by our Jenkins and stored in HDFS. Our Chronos/Marathon jobs fetch the latest release TAR.GZ direct from HDFS, unpack it and launch the appropriate script. Makes for a much cleaner development / testing / deployment to package everything required in one go instead of relying on cluster specific classpath additions or any add-jars functionality. On 19 June 2014 22:53, Michael Cutler mich...@tumra.com wrote: When you start seriously using Spark in production there are basically two things everyone eventually needs: 1. Scheduled Jobs - recurring hourly/daily/weekly jobs. 2. Always-On Jobs - that require monitoring, restarting etc. There are lots of ways to implement these requirements, everything from crontab through to workflow managers like Oozie. We opted for the following stack: - Apache Mesos http://mesosphere.io/ (mesosphere.io distribution) - Marathon https://github.com/mesosphere/marathon - init/control system for starting, stopping, and maintaining always-on applications. - Chronos http://airbnb.github.io/chronos/ - general-purpose scheduler for Mesos, supports job dependency graphs. - ** Spark Job Server https://github.com/ooyala/spark-jobserver - primarily for it's ability to reuse shared contexts with multiple jobs The majority of our jobs are periodic (batch) jobs run through spark-sumit, and we have several always-on Spark Streaming jobs (also run through spark-submit). We always use client mode with spark-submit because the Mesos cluster has direct connectivity to the Spark cluster and
Possible approaches for adding extra metadata (Spark Streaming)?
Hi All, I was curious to know which of the two approach is better for doing analytics using spark streaming. Lets say we want to add some metadata to the stream which is being processed like sentiment, tags etc and then perform some analytics using these added metadata. 1) Is it ok to make a http call and add some extra information to the stream being processed in the updateByKeyAndWindow operations. 2) Add these sentiment/tags before and then stream through DStreams. Thanks, Shrikar
Possible approaches for adding extra metadata (Spark Streaming)
Hi All, I was curious to know which of the two approach is better for doing analytics using spark streaming. Lets say we want to add some metadata to the stream which is being processed like sentiment, tags etc and then perform some analytics using these added metadata. 1) Is it ok to make a http call and add some extra information to the stream being processed in the updateByKeyAndWindow operations. 2) Add these sentiment/tags before and then stream through DStreams. Thanks, Shrikar
SaveAsTextfile per day instead of window?
Hi All, Is there a way to store the streamed data as textfiles per day instead of per window? Thanks, Shrikar
Spark Streaming union expected behaviour?
Hi All, I was writing a simple Streaming job to get more understanding about Spark streaming. I am not understanding why the union behaviour in this particular case *WORKS:* val lines = ssc.socketTextStream(localhost, , StorageLevel.MEMORY_AND_DISK_SER) val words = lines..flatMap(_.split( )) val wordCounts = words.map(x = (x, 1)).reduceByKey(_ + _) wordCounts.print() wordCounts.saveAsTextFiles(all) This works as expected as well as the streams are stored as files *DOESN'T WORK* val lines = ssc.socketTextStream(localhost, , StorageLevel.MEMORY_AND_DISK_SER) val lines1 = ssc.socketTextStream(localhost, 1, StorageLevel.MEMORY_AND_DISK_SER) * val words = lines.union(lines1).flatMap(_.split( ))* val wordCounts = words.map(x = (x, 1)).reduceByKey(_ + _) wordCounts.print() wordCounts.saveAsTextFiles(all) In the above case neither the messages are printed nor the files are saved. Am I doing something wrong here? Thanks, Shrikar
Re: Unable to run a Standalone job([NOT FOUND ] org.eclipse.jetty.orbit#javax.mail.glassfish;1.4.1.v201005082020)
Hi Prabeesh/ Sean, I tried both the steps you guys mentioned looks like its not able to resolve it. [warn] [NOT FOUND ] org.eclipse.jetty.orbit#javax.transaction;1.1.1.v201105210645!javax.transaction.orbit (131ms) [warn] public: tried [warn] http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.transaction/1.1.1.v201105210645/javax.transaction-1.1.1.v201105210645.orbit [warn] [NOT FOUND ] org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016!javax.servlet.orbit (225ms) [warn] public: tried [warn] http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.orbit [warn] [NOT FOUND ] org.eclipse.jetty.orbit#javax.mail.glassfish;1.4.1.v201005082020!javax.mail.glassfish.orbit (214ms) [warn] public: tried [warn] http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.mail.glassfish/1.4.1.v201005082020/javax.mail.glassfish-1.4.1.v201005082020.orbit [warn] [NOT FOUND ] org.eclipse.jetty.orbit#javax.activation;1.1.0.v201105071233!javax.activation.orbit (112ms) [warn] public: tried Thanks, Shrikar On Thu, Jun 5, 2014 at 1:27 AM, prabeesh k prabsma...@gmail.com wrote: try sbt clean command before build the app. or delete .ivy2 ans .sbt folders(not a good methode). Then try to rebuild the project. On Thu, Jun 5, 2014 at 11:45 AM, Sean Owen so...@cloudera.com wrote: I think this is SPARK-1949 again: https://github.com/apache/spark/pull/906 I think this change fixed this issue for a few people using the SBT build, worth committing? On Thu, Jun 5, 2014 at 6:40 AM, Shrikar archak shrika...@gmail.com wrote: Hi All, Now that the Spark Version 1.0.0 is release there should not be any problem with the local jars. Shrikars-MacBook-Pro:SimpleJob shrikar$ cat simple.sbt name := Simple Project version := 1.0 scalaVersion := 2.10.4 libraryDependencies ++= Seq(org.apache.spark %% spark-core % 1.0.0, org.apache.spark %% spark-streaming % 1.0.0) resolvers += Akka Repository at http://repo.akka.io/releases/; I am still having this issue [error] (run-main) java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse at org.apache.spark.HttpServer.start(HttpServer.scala:54) at org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156) at org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127) at org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31) at org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48) at org.apache.spark.broadcast.BroadcastManager.init(BroadcastManager.scala:35) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218) at org.apache.spark.SparkContext.init(SparkContext.scala:202) Any help would be greatly appreciated. Thanks, Shrikar On Fri, May 23, 2014 at 3:58 PM, Shrikar archak shrika...@gmail.com wrote: Still the same error no change Thanks, Shrikar On Fri, May 23, 2014 at 2:38 PM, Jacek Laskowski ja...@japila.pl wrote: Hi Shrikar, How did you build Spark 1.0.0-SNAPSHOT on your machine? My understanding is that `sbt publishLocal` is not enough and you really need `sbt assembly` instead. Give it a try and report back. As to your build.sbt, upgrade Scala to 2.10.4 and org.apache.spark %% spark-streaming % 1.0.0-SNAPSHOT only that will pull down spark-core as a transitive dep. The resolver for Akka Repository is not needed. Your build.sbt should really look as follows: name := Simple Project version := 1.0 scalaVersion := 2.10.4 libraryDependencies += org.apache.spark %% spark-streaming % 1.0.0-SNAPSHOT Jacek On Thu, May 22, 2014 at 11:27 PM, Shrikar archak shrika...@gmail.com wrote: Hi All, I am trying to run the network count example as a seperate standalone job and running into some issues. Environment: 1) Mac Mavericks 2) Latest spark repo from Github. I have a structure like this Shrikars-MacBook-Pro:SimpleJob shrikar$ find . . ./simple.sbt ./src ./src/main ./src/main/scala ./src/main/scala/NetworkWordCount.scala ./src/main/scala/SimpleApp.scala.bk simple.sbt name := Simple Project version := 1.0 scalaVersion := 2.10.3 libraryDependencies ++= Seq(org.apache.spark %% spark-core % 1.0.0-SNAPSHOT, org.apache.spark %% spark-streaming % 1.0.0-SNAPSHOT) resolvers += Akka Repository at http://repo.akka.io/releases/; I am able to run the SimpleApp which is mentioned in the doc but when I try to run the NetworkWordCount app I get error like this am I missing something? [info] Running com.shrikar.sparkapps.NetworkWordCount 14/05/22 14:26:47 INFO spark.SecurityManager
Re: Unable to run a Standalone job
Hi All, Now that the Spark Version 1.0.0 is release there should not be any problem with the local jars. Shrikars-MacBook-Pro:SimpleJob shrikar$ cat simple.sbt name := Simple Project version := 1.0 scalaVersion := 2.10.4 libraryDependencies ++= Seq(org.apache.spark %% spark-core % 1.0.0, org.apache.spark %% spark-streaming % 1.0.0) resolvers += Akka Repository at http://repo.akka.io/releases/; I am still having this issue [error] (run-main) java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse at org.apache.spark.HttpServer.start(HttpServer.scala:54) at org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156) at org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127) at org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31) at org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48) at org.apache.spark.broadcast.BroadcastManager.init(BroadcastManager.scala:35) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218) at org.apache.spark.SparkContext.init(SparkContext.scala:202) Any help would be greatly appreciated. Thanks, Shrikar On Fri, May 23, 2014 at 3:58 PM, Shrikar archak shrika...@gmail.com wrote: Still the same error no change Thanks, Shrikar On Fri, May 23, 2014 at 2:38 PM, Jacek Laskowski ja...@japila.pl wrote: Hi Shrikar, How did you build Spark 1.0.0-SNAPSHOT on your machine? My understanding is that `sbt publishLocal` is not enough and you really need `sbt assembly` instead. Give it a try and report back. As to your build.sbt, upgrade Scala to 2.10.4 and org.apache.spark %% spark-streaming % 1.0.0-SNAPSHOT only that will pull down spark-core as a transitive dep. The resolver for Akka Repository is not needed. Your build.sbt should really look as follows: name := Simple Project version := 1.0 scalaVersion := 2.10.4 libraryDependencies += org.apache.spark %% spark-streaming % 1.0.0-SNAPSHOT Jacek On Thu, May 22, 2014 at 11:27 PM, Shrikar archak shrika...@gmail.com wrote: Hi All, I am trying to run the network count example as a seperate standalone job and running into some issues. Environment: 1) Mac Mavericks 2) Latest spark repo from Github. I have a structure like this Shrikars-MacBook-Pro:SimpleJob shrikar$ find . . ./simple.sbt ./src ./src/main ./src/main/scala ./src/main/scala/NetworkWordCount.scala ./src/main/scala/SimpleApp.scala.bk simple.sbt name := Simple Project version := 1.0 scalaVersion := 2.10.3 libraryDependencies ++= Seq(org.apache.spark %% spark-core % 1.0.0-SNAPSHOT, org.apache.spark %% spark-streaming % 1.0.0-SNAPSHOT) resolvers += Akka Repository at http://repo.akka.io/releases/; I am able to run the SimpleApp which is mentioned in the doc but when I try to run the NetworkWordCount app I get error like this am I missing something? [info] Running com.shrikar.sparkapps.NetworkWordCount 14/05/22 14:26:47 INFO spark.SecurityManager: Changing view acls to: shrikar 14/05/22 14:26:47 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(shrikar) 14/05/22 14:26:48 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/05/22 14:26:48 INFO Remoting: Starting remoting 14/05/22 14:26:48 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@192.168.10.88:49963] 14/05/22 14:26:48 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@192.168.10.88:49963] 14/05/22 14:26:48 INFO spark.SparkEnv: Registering MapOutputTracker 14/05/22 14:26:48 INFO spark.SparkEnv: Registering BlockManagerMaster 14/05/22 14:26:48 INFO storage.DiskBlockManager: Created local directory at /var/folders/r2/mbj08pb55n5d_9p8588xk5b0gn/T/spark-local-20140522142648-0a14 14/05/22 14:26:48 INFO storage.MemoryStore: MemoryStore started with capacity 911.6 MB. 14/05/22 14:26:48 INFO network.ConnectionManager: Bound socket to port 49964 with id = ConnectionManagerId(192.168.10.88,49964) 14/05/22 14:26:48 INFO storage.BlockManagerMaster: Trying to register BlockManager 14/05/22 14:26:48 INFO storage.BlockManagerInfo: Registering block manager 192.168.10.88:49964 with 911.6 MB RAM 14/05/22 14:26:48 INFO storage.BlockManagerMaster: Registered BlockManager 14/05/22 14:26:48 INFO spark.HttpServer: Starting HTTP Server [error] (run-main) java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse at org.apache.spark.HttpServer.start(HttpServer.scala:54) at org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156) at org.apache.spark.broadcast.HttpBroadcast$.initialize
Re: Unable to run a Standalone job
Still the same error no change Thanks, Shrikar On Fri, May 23, 2014 at 2:38 PM, Jacek Laskowski ja...@japila.pl wrote: Hi Shrikar, How did you build Spark 1.0.0-SNAPSHOT on your machine? My understanding is that `sbt publishLocal` is not enough and you really need `sbt assembly` instead. Give it a try and report back. As to your build.sbt, upgrade Scala to 2.10.4 and org.apache.spark %% spark-streaming % 1.0.0-SNAPSHOT only that will pull down spark-core as a transitive dep. The resolver for Akka Repository is not needed. Your build.sbt should really look as follows: name := Simple Project version := 1.0 scalaVersion := 2.10.4 libraryDependencies += org.apache.spark %% spark-streaming % 1.0.0-SNAPSHOT Jacek On Thu, May 22, 2014 at 11:27 PM, Shrikar archak shrika...@gmail.com wrote: Hi All, I am trying to run the network count example as a seperate standalone job and running into some issues. Environment: 1) Mac Mavericks 2) Latest spark repo from Github. I have a structure like this Shrikars-MacBook-Pro:SimpleJob shrikar$ find . . ./simple.sbt ./src ./src/main ./src/main/scala ./src/main/scala/NetworkWordCount.scala ./src/main/scala/SimpleApp.scala.bk simple.sbt name := Simple Project version := 1.0 scalaVersion := 2.10.3 libraryDependencies ++= Seq(org.apache.spark %% spark-core % 1.0.0-SNAPSHOT, org.apache.spark %% spark-streaming % 1.0.0-SNAPSHOT) resolvers += Akka Repository at http://repo.akka.io/releases/; I am able to run the SimpleApp which is mentioned in the doc but when I try to run the NetworkWordCount app I get error like this am I missing something? [info] Running com.shrikar.sparkapps.NetworkWordCount 14/05/22 14:26:47 INFO spark.SecurityManager: Changing view acls to: shrikar 14/05/22 14:26:47 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(shrikar) 14/05/22 14:26:48 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/05/22 14:26:48 INFO Remoting: Starting remoting 14/05/22 14:26:48 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@192.168.10.88:49963] 14/05/22 14:26:48 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@192.168.10.88:49963] 14/05/22 14:26:48 INFO spark.SparkEnv: Registering MapOutputTracker 14/05/22 14:26:48 INFO spark.SparkEnv: Registering BlockManagerMaster 14/05/22 14:26:48 INFO storage.DiskBlockManager: Created local directory at /var/folders/r2/mbj08pb55n5d_9p8588xk5b0gn/T/spark-local-20140522142648-0a14 14/05/22 14:26:48 INFO storage.MemoryStore: MemoryStore started with capacity 911.6 MB. 14/05/22 14:26:48 INFO network.ConnectionManager: Bound socket to port 49964 with id = ConnectionManagerId(192.168.10.88,49964) 14/05/22 14:26:48 INFO storage.BlockManagerMaster: Trying to register BlockManager 14/05/22 14:26:48 INFO storage.BlockManagerInfo: Registering block manager 192.168.10.88:49964 with 911.6 MB RAM 14/05/22 14:26:48 INFO storage.BlockManagerMaster: Registered BlockManager 14/05/22 14:26:48 INFO spark.HttpServer: Starting HTTP Server [error] (run-main) java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse at org.apache.spark.HttpServer.start(HttpServer.scala:54) at org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156) at org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127) at org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31) at org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48) at org.apache.spark.broadcast.BroadcastManager.init(BroadcastManager.scala:35) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218) at org.apache.spark.SparkContext.init(SparkContext.scala:202) at org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:549) at org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:561) at org.apache.spark.streaming.StreamingContext.init(StreamingContext.scala:91) at com.shrikar.sparkapps.NetworkWordCount$.main(NetworkWordCount.scala:39) at com.shrikar.sparkapps.NetworkWordCount.main(NetworkWordCount.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) Thanks, Shrikar -- Jacek Laskowski | http://blog.japila.pl Never discourage anyone who continually makes progress, no matter how slow. Plato
Unable to run a Standalone job
Hi All, I am trying to run the network count example as a seperate standalone job and running into some issues. Environment: 1) Mac Mavericks 2) Latest spark repo from Github. I have a structure like this Shrikars-MacBook-Pro:SimpleJob shrikar$ find . . ./simple.sbt ./src ./src/main ./src/main/scala ./src/main/scala/NetworkWordCount.scala ./src/main/scala/SimpleApp.scala.bk simple.sbt name := Simple Project version := 1.0 scalaVersion := 2.10.3 libraryDependencies ++= Seq(org.apache.spark %% spark-core % 1.0.0-SNAPSHOT, org.apache.spark %% spark-streaming % 1.0.0-SNAPSHOT) resolvers += Akka Repository at http://repo.akka.io/releases/; I am able to run the SimpleApp which is mentioned in the doc but when I try to run the NetworkWordCount app I get error like this am I missing something? [info] Running com.shrikar.sparkapps.NetworkWordCount 14/05/22 14:26:47 INFO spark.SecurityManager: Changing view acls to: shrikar 14/05/22 14:26:47 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(shrikar) 14/05/22 14:26:48 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/05/22 14:26:48 INFO Remoting: Starting remoting 14/05/22 14:26:48 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@192.168.10.88:49963] 14/05/22 14:26:48 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@192.168.10.88:49963] 14/05/22 14:26:48 INFO spark.SparkEnv: Registering MapOutputTracker 14/05/22 14:26:48 INFO spark.SparkEnv: Registering BlockManagerMaster 14/05/22 14:26:48 INFO storage.DiskBlockManager: Created local directory at /var/folders/r2/mbj08pb55n5d_9p8588xk5b0gn/T/spark-local-20140522142648-0a14 14/05/22 14:26:48 INFO storage.MemoryStore: MemoryStore started with capacity 911.6 MB. 14/05/22 14:26:48 INFO network.ConnectionManager: Bound socket to port 49964 with id = ConnectionManagerId(192.168.10.88,49964) 14/05/22 14:26:48 INFO storage.BlockManagerMaster: Trying to register BlockManager 14/05/22 14:26:48 INFO storage.BlockManagerInfo: Registering block manager 192.168.10.88:49964 with 911.6 MB RAM *14/05/22 14:26:48 INFO storage.BlockManagerMaster: Registered BlockManager* *14/05/22 14:26:48 INFO spark.HttpServer: Starting HTTP Server* *[error] (run-main) java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse* *java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse* * at org.apache.spark.HttpServer.start(HttpServer.scala:54)* * at org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156)* * at org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127)* at org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31) at org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48) at org.apache.spark.broadcast.BroadcastManager.init(BroadcastManager.scala:35) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218) at org.apache.spark.SparkContext.init(SparkContext.scala:202) at org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:549) at org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:561) at org.apache.spark.streaming.StreamingContext.init(StreamingContext.scala:91) at com.shrikar.sparkapps.NetworkWordCount$.main(NetworkWordCount.scala:39) at com.shrikar.sparkapps.NetworkWordCount.main(NetworkWordCount.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) Thanks, Shrikar
Re: Unable to run a Standalone job
I am running as sbt run. I am running it locally . Thanks, Shrikar On Thu, May 22, 2014 at 3:53 PM, Tathagata Das tathagata.das1...@gmail.comwrote: How are you launching the application? sbt run ? spark-submit? local mode or Spark standalone cluster? Are you packaging all your code into a jar? Looks to me that you seem to have spark classes in your execution environment but missing some of Spark's dependencies. TD On Thu, May 22, 2014 at 2:27 PM, Shrikar archak shrika...@gmail.com wrote: Hi All, I am trying to run the network count example as a seperate standalone job and running into some issues. Environment: 1) Mac Mavericks 2) Latest spark repo from Github. I have a structure like this Shrikars-MacBook-Pro:SimpleJob shrikar$ find . . ./simple.sbt ./src ./src/main ./src/main/scala ./src/main/scala/NetworkWordCount.scala ./src/main/scala/SimpleApp.scala.bk simple.sbt name := Simple Project version := 1.0 scalaVersion := 2.10.3 libraryDependencies ++= Seq(org.apache.spark %% spark-core % 1.0.0-SNAPSHOT, org.apache.spark %% spark-streaming % 1.0.0-SNAPSHOT) resolvers += Akka Repository at http://repo.akka.io/releases/; I am able to run the SimpleApp which is mentioned in the doc but when I try to run the NetworkWordCount app I get error like this am I missing something? [info] Running com.shrikar.sparkapps.NetworkWordCount 14/05/22 14:26:47 INFO spark.SecurityManager: Changing view acls to: shrikar 14/05/22 14:26:47 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(shrikar) 14/05/22 14:26:48 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/05/22 14:26:48 INFO Remoting: Starting remoting 14/05/22 14:26:48 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@192.168.10.88:49963] 14/05/22 14:26:48 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@192.168.10.88:49963] 14/05/22 14:26:48 INFO spark.SparkEnv: Registering MapOutputTracker 14/05/22 14:26:48 INFO spark.SparkEnv: Registering BlockManagerMaster 14/05/22 14:26:48 INFO storage.DiskBlockManager: Created local directory at /var/folders/r2/mbj08pb55n5d_9p8588xk5b0gn/T/spark-local-20140522142648-0a14 14/05/22 14:26:48 INFO storage.MemoryStore: MemoryStore started with capacity 911.6 MB. 14/05/22 14:26:48 INFO network.ConnectionManager: Bound socket to port 49964 with id = ConnectionManagerId(192.168.10.88,49964) 14/05/22 14:26:48 INFO storage.BlockManagerMaster: Trying to register BlockManager 14/05/22 14:26:48 INFO storage.BlockManagerInfo: Registering block manager 192.168.10.88:49964 with 911.6 MB RAM 14/05/22 14:26:48 INFO storage.BlockManagerMaster: Registered BlockManager 14/05/22 14:26:48 INFO spark.HttpServer: Starting HTTP Server [error] (run-main) java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse at org.apache.spark.HttpServer.start(HttpServer.scala:54) at org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156) at org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127) at org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31) at org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48) at org.apache.spark.broadcast.BroadcastManager.init(BroadcastManager.scala:35) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218) at org.apache.spark.SparkContext.init(SparkContext.scala:202) at org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:549) at org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:561) at org.apache.spark.streaming.StreamingContext.init(StreamingContext.scala:91) at com.shrikar.sparkapps.NetworkWordCount$.main(NetworkWordCount.scala:39) at com.shrikar.sparkapps.NetworkWordCount.main(NetworkWordCount.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) Thanks, Shrikar
Re: Unable to run a Standalone job
Yes I did a sbt publish-local. Ok I will try with Spark 0.9.1. Thanks, Shrikar On Thu, May 22, 2014 at 8:53 PM, Tathagata Das tathagata.das1...@gmail.comwrote: How are you getting Spark with 1.0.0-SNAPSHOT through maven? Did you publish Spark locally which allowed you to use it as a dependency? This is a weird indeed. SBT should take care of all the dependencies of spark. In any case, you can try the last released Spark 0.9.1 and see if the problem persists. On Thu, May 22, 2014 at 3:59 PM, Shrikar archak shrika...@gmail.comwrote: I am running as sbt run. I am running it locally . Thanks, Shrikar On Thu, May 22, 2014 at 3:53 PM, Tathagata Das tathagata.das1...@gmail.com wrote: How are you launching the application? sbt run ? spark-submit? local mode or Spark standalone cluster? Are you packaging all your code into a jar? Looks to me that you seem to have spark classes in your execution environment but missing some of Spark's dependencies. TD On Thu, May 22, 2014 at 2:27 PM, Shrikar archak shrika...@gmail.com wrote: Hi All, I am trying to run the network count example as a seperate standalone job and running into some issues. Environment: 1) Mac Mavericks 2) Latest spark repo from Github. I have a structure like this Shrikars-MacBook-Pro:SimpleJob shrikar$ find . . ./simple.sbt ./src ./src/main ./src/main/scala ./src/main/scala/NetworkWordCount.scala ./src/main/scala/SimpleApp.scala.bk simple.sbt name := Simple Project version := 1.0 scalaVersion := 2.10.3 libraryDependencies ++= Seq(org.apache.spark %% spark-core % 1.0.0-SNAPSHOT, org.apache.spark %% spark-streaming % 1.0.0-SNAPSHOT) resolvers += Akka Repository at http://repo.akka.io/releases/; I am able to run the SimpleApp which is mentioned in the doc but when I try to run the NetworkWordCount app I get error like this am I missing something? [info] Running com.shrikar.sparkapps.NetworkWordCount 14/05/22 14:26:47 INFO spark.SecurityManager: Changing view acls to: shrikar 14/05/22 14:26:47 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(shrikar) 14/05/22 14:26:48 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/05/22 14:26:48 INFO Remoting: Starting remoting 14/05/22 14:26:48 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@192.168.10.88:49963] 14/05/22 14:26:48 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@192.168.10.88:49963] 14/05/22 14:26:48 INFO spark.SparkEnv: Registering MapOutputTracker 14/05/22 14:26:48 INFO spark.SparkEnv: Registering BlockManagerMaster 14/05/22 14:26:48 INFO storage.DiskBlockManager: Created local directory at /var/folders/r2/mbj08pb55n5d_9p8588xk5b0gn/T/spark-local-20140522142648-0a14 14/05/22 14:26:48 INFO storage.MemoryStore: MemoryStore started with capacity 911.6 MB. 14/05/22 14:26:48 INFO network.ConnectionManager: Bound socket to port 49964 with id = ConnectionManagerId(192.168.10.88,49964) 14/05/22 14:26:48 INFO storage.BlockManagerMaster: Trying to register BlockManager 14/05/22 14:26:48 INFO storage.BlockManagerInfo: Registering block manager 192.168.10.88:49964 with 911.6 MB RAM 14/05/22 14:26:48 INFO storage.BlockManagerMaster: Registered BlockManager 14/05/22 14:26:48 INFO spark.HttpServer: Starting HTTP Server [error] (run-main) java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse at org.apache.spark.HttpServer.start(HttpServer.scala:54) at org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156) at org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127) at org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31) at org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48) at org.apache.spark.broadcast.BroadcastManager.init(BroadcastManager.scala:35) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218) at org.apache.spark.SparkContext.init(SparkContext.scala:202) at org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:549) at org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:561) at org.apache.spark.streaming.StreamingContext.init(StreamingContext.scala:91) at com.shrikar.sparkapps.NetworkWordCount$.main(NetworkWordCount.scala:39) at com.shrikar.sparkapps.NetworkWordCount.main(NetworkWordCount.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
Re: Unable to run a Standalone job
Hi, I tried clearing maven and ivy cache and I am a bit confused at this point in time. 1) Running the example from the spark directory and running using bin/run-example. It works fine as well as it prints the word counts. 2) Trying to run the same code as a seperate job. *) Using the latest 1.0.0-SNAPSHOT it doesn't work and throws exception. *) Using 0.9.1 doesn't throws any exception but doesn't print any word counts. Thanks, Shrikar On Thu, May 22, 2014 at 9:19 PM, Soumya Simanta soumya.sima...@gmail.comwrote: Try cleaning your maven (.m2) and ivy cache. On May 23, 2014, at 12:03 AM, Shrikar archak shrika...@gmail.com wrote: Yes I did a sbt publish-local. Ok I will try with Spark 0.9.1. Thanks, Shrikar On Thu, May 22, 2014 at 8:53 PM, Tathagata Das tathagata.das1...@gmail.com wrote: How are you getting Spark with 1.0.0-SNAPSHOT through maven? Did you publish Spark locally which allowed you to use it as a dependency? This is a weird indeed. SBT should take care of all the dependencies of spark. In any case, you can try the last released Spark 0.9.1 and see if the problem persists. On Thu, May 22, 2014 at 3:59 PM, Shrikar archak shrika...@gmail.comwrote: I am running as sbt run. I am running it locally . Thanks, Shrikar On Thu, May 22, 2014 at 3:53 PM, Tathagata Das tathagata.das1...@gmail.com wrote: How are you launching the application? sbt run ? spark-submit? local mode or Spark standalone cluster? Are you packaging all your code into a jar? Looks to me that you seem to have spark classes in your execution environment but missing some of Spark's dependencies. TD On Thu, May 22, 2014 at 2:27 PM, Shrikar archak shrika...@gmail.com wrote: Hi All, I am trying to run the network count example as a seperate standalone job and running into some issues. Environment: 1) Mac Mavericks 2) Latest spark repo from Github. I have a structure like this Shrikars-MacBook-Pro:SimpleJob shrikar$ find . . ./simple.sbt ./src ./src/main ./src/main/scala ./src/main/scala/NetworkWordCount.scala ./src/main/scala/SimpleApp.scala.bk simple.sbt name := Simple Project version := 1.0 scalaVersion := 2.10.3 libraryDependencies ++= Seq(org.apache.spark %% spark-core % 1.0.0-SNAPSHOT, org.apache.spark %% spark-streaming % 1.0.0-SNAPSHOT) resolvers += Akka Repository at http://repo.akka.io/releases/; I am able to run the SimpleApp which is mentioned in the doc but when I try to run the NetworkWordCount app I get error like this am I missing something? [info] Running com.shrikar.sparkapps.NetworkWordCount 14/05/22 14:26:47 INFO spark.SecurityManager: Changing view acls to: shrikar 14/05/22 14:26:47 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(shrikar) 14/05/22 14:26:48 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/05/22 14:26:48 INFO Remoting: Starting remoting 14/05/22 14:26:48 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@192.168.10.88:49963] 14/05/22 14:26:48 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@192.168.10.88:49963] 14/05/22 14:26:48 INFO spark.SparkEnv: Registering MapOutputTracker 14/05/22 14:26:48 INFO spark.SparkEnv: Registering BlockManagerMaster 14/05/22 14:26:48 INFO storage.DiskBlockManager: Created local directory at /var/folders/r2/mbj08pb55n5d_9p8588xk5b0gn/T/spark-local-20140522142648-0a14 14/05/22 14:26:48 INFO storage.MemoryStore: MemoryStore started with capacity 911.6 MB. 14/05/22 14:26:48 INFO network.ConnectionManager: Bound socket to port 49964 with id = ConnectionManagerId(192.168.10.88,49964) 14/05/22 14:26:48 INFO storage.BlockManagerMaster: Trying to register BlockManager 14/05/22 14:26:48 INFO storage.BlockManagerInfo: Registering block manager 192.168.10.88:49964 with 911.6 MB RAM 14/05/22 14:26:48 INFO storage.BlockManagerMaster: Registered BlockManager 14/05/22 14:26:48 INFO spark.HttpServer: Starting HTTP Server [error] (run-main) java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse at org.apache.spark.HttpServer.start(HttpServer.scala:54) at org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156) at org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127) at org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31) at org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48) at org.apache.spark.broadcast.BroadcastManager.init(BroadcastManager.scala:35) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218) at org.apache.spark.SparkContext.init(SparkContext.scala:202