Unsubscribe

2022-08-10 Thread Shrikar archak
unsubscribe


Re: Shark Vs Spark SQL

2014-07-02 Thread Shrikar archak
As of the spark summit 2014 they mentioned that there will be no active
development on shark.

Thanks,
Shrikar


On Wed, Jul 2, 2014 at 3:53 PM, Subacini B subac...@gmail.com wrote:

 Hi,


 http://mail-archives.apache.org/mod_mbox/spark-user/201403.mbox/%3cb75376b8-7a57-4161-b604-f919886cf...@gmail.com%3E

 This talks about  Shark backend will be replaced with Spark SQL engine in
 future.
 Does that mean Spark will continue to support Shark + Spark SQL for long
 term? OR
 After some period, Shark will be decommissioned ??

 Thanks
 Subacini



Re: How do you run your spark app?

2014-06-20 Thread Shrikar archak
Hi Shivani,

I use sbt assembly to create a fat jar .
https://github.com/sbt/sbt-assembly

Example of the sbt file is below.

import AssemblyKeys._ // put this at the top of the file

assemblySettings

mainClass in assembly := Some(FifaSparkStreaming)

name := FifaSparkStreaming

version := 1.0

scalaVersion := 2.10.4

libraryDependencies ++= Seq(org.apache.spark %% spark-core % 1.0.0 %
provided,
org.apache.spark %% spark-streaming %
1.0.0 % provided,
(org.apache.spark %%
spark-streaming-twitter %
1.0.0).exclude(org.eclipse.jetty.orbit,javax.transaction)

   .exclude(org.eclipse.jetty.orbit,javax.servlet)

   .exclude(org.eclipse.jetty.orbit,javax.mail.glassfish)

   .exclude(org.eclipse.jetty.orbit,javax.activation)

   .exclude(com.esotericsoftware.minlog, minlog),
(net.debasishg % redisclient_2.10 %
2.12).exclude(com.typesafe.akka,akka-actor_2.10))

mergeStrategy in assembly = (mergeStrategy in assembly) { (old) =
  {
case PathList(javax, servlet, xs @ _*) = MergeStrategy.first
case PathList(org, apache, xs @ _*) = MergeStrategy.first
case PathList(org, apache, xs @ _*) = MergeStrategy.first
case application.conf = MergeStrategy.concat
case unwanted.txt = MergeStrategy.discard
case x = old(x)
  }
}


resolvers += Akka Repository at http://repo.akka.io/releases/;


And I run as mentioned below.

LOCALLY :
1)  sbt 'run AP1z4IYraYm5fqWhITWArY53x
Cyyz3Zr67tVK46G8dus5tSbc83KQOdtMDgYoQ5WLQwH0mTWzB6
115254720-OfJ4yFsUU6C6vBkEOMDlBlkIgslPleFjPwNcxHjN
Qd76y2izncM7fGGYqU1VXYTxg1eseNuzcdZKm2QJyK8d1 fifa fifa2014'

If you want to submit on the cluster

CLUSTER:
2) spark-submit --class FifaSparkStreaming --master
spark://server-8-144:7077 --driver-memory 2048 --deploy-mode cluster
FifaSparkStreaming-assembly-1.0.jar AP1z4IYraYm5fqWhITWArY53x
Cyyz3Zr67tVK46G8dus5tSbc83KQOdtMDgYoQ5WLQwH0mTWzB6
115254720-OfJ4yFsUU6C6vBkEOMDlBlkIgslPleFjPwNcxHjN
Qd76y2izncM7fGGYqU1VXYTxg1eseNuzcdZKm2QJyK8d1 fifa fifa2014


Hope this helps.

Thanks,
Shrikar


On Fri, Jun 20, 2014 at 9:16 AM, Shivani Rao raoshiv...@gmail.com wrote:

 Hello Michael,

 I have a quick question for you. Can you clarify the statement  build
 fat JAR's and build dist-style TAR.GZ packages with launch scripts, JAR's
 and everything needed to run a Job.  Can you give an example.

 I am using sbt assembly as well to create a fat jar, and supplying the
 spark and hadoop locations in the class path. Inside the main() function
 where spark context is created, I use SparkContext.jarOfClass(this).toList
 add the fat jar to my spark context. However, I seem to be running into
 issues with this approach. I was wondering if you had any inputs Michael.

 Thanks,
 Shivani


 On Thu, Jun 19, 2014 at 10:57 PM, Sonal Goyal sonalgoy...@gmail.com
 wrote:

 We use maven for building our code and then invoke spark-submit through
 the exec plugin, passing in our parameters. Works well for us.

 Best Regards,
 Sonal
 Nube Technologies http://www.nubetech.co

 http://in.linkedin.com/in/sonalgoyal




 On Fri, Jun 20, 2014 at 3:26 AM, Michael Cutler mich...@tumra.com
 wrote:

 P.S. Last but not least we use sbt-assembly to build fat JAR's and build
 dist-style TAR.GZ packages with launch scripts, JAR's and everything needed
 to run a Job.  These are automatically built from source by our Jenkins and
 stored in HDFS.  Our Chronos/Marathon jobs fetch the latest release TAR.GZ
 direct from HDFS, unpack it and launch the appropriate script.

 Makes for a much cleaner development / testing / deployment to package
 everything required in one go instead of relying on cluster specific
 classpath additions or any add-jars functionality.


 On 19 June 2014 22:53, Michael Cutler mich...@tumra.com wrote:

 When you start seriously using Spark in production there are basically
 two things everyone eventually needs:

1. Scheduled Jobs - recurring hourly/daily/weekly jobs.
2. Always-On Jobs - that require monitoring, restarting etc.

 There are lots of ways to implement these requirements, everything from
 crontab through to workflow managers like Oozie.

 We opted for the following stack:

- Apache Mesos http://mesosphere.io/ (mesosphere.io distribution)


- Marathon https://github.com/mesosphere/marathon - init/control
system for starting, stopping, and maintaining always-on applications.


- Chronos http://airbnb.github.io/chronos/ - general-purpose
scheduler for Mesos, supports job dependency graphs.


- ** Spark Job Server https://github.com/ooyala/spark-jobserver -
primarily for it's ability to reuse shared contexts with multiple jobs

 The majority of our jobs are periodic (batch) jobs run through
 spark-sumit, and we have several always-on Spark Streaming jobs (also run
 through spark-submit).

 We always use client mode with spark-submit because the Mesos cluster
 has direct connectivity to the Spark cluster and 

Possible approaches for adding extra metadata (Spark Streaming)?

2014-06-20 Thread Shrikar archak
Hi All,

I was curious to know which of the two approach is better for doing
analytics using spark streaming. Lets say we want to add some metadata to
the stream which is being processed like sentiment, tags etc and then
perform some analytics using these added metadata.

1)  Is it ok to make a http call and add some extra information to the
stream being processed in the updateByKeyAndWindow operations.

2) Add these sentiment/tags before and then stream through DStreams.

Thanks,
Shrikar


Possible approaches for adding extra metadata (Spark Streaming)

2014-06-19 Thread Shrikar archak
Hi All,

I was curious to know which of the two approach is better for doing
analytics using spark streaming. Lets say we want to add some metadata to
the stream which is being processed like sentiment, tags etc and then
perform some analytics using these added metadata.

1)  Is it ok to make a http call and add some extra information to the
stream being processed in the updateByKeyAndWindow operations.

2) Add these sentiment/tags before and then stream through DStreams.

Thanks,
Shrikar


SaveAsTextfile per day instead of window?

2014-06-09 Thread Shrikar archak
Hi All,

Is there a way to store the streamed data as textfiles per day instead of
per window?

Thanks,
Shrikar


Spark Streaming union expected behaviour?

2014-06-08 Thread Shrikar archak
Hi All,

I was writing a simple Streaming job to get more understanding about Spark
streaming.
I am not understanding why the union behaviour in this particular case

*WORKS:*
val lines = ssc.socketTextStream(localhost, ,
StorageLevel.MEMORY_AND_DISK_SER)
val words = lines..flatMap(_.split( ))
val wordCounts = words.map(x = (x, 1)).reduceByKey(_ + _)
wordCounts.print()
wordCounts.saveAsTextFiles(all)

This works as expected as well as the streams are stored as files


*DOESN'T WORK*
val lines = ssc.socketTextStream(localhost, ,
StorageLevel.MEMORY_AND_DISK_SER)
val lines1 = ssc.socketTextStream(localhost, 1,
StorageLevel.MEMORY_AND_DISK_SER)
   * val words = lines.union(lines1).flatMap(_.split( ))*


val wordCounts = words.map(x = (x, 1)).reduceByKey(_ + _)
wordCounts.print()
wordCounts.saveAsTextFiles(all)

In the above case neither the messages are printed nor the files are saved.
Am I doing something wrong here?

Thanks,
Shrikar


Re: Unable to run a Standalone job([NOT FOUND ] org.eclipse.jetty.orbit#javax.mail.glassfish;1.4.1.v201005082020)

2014-06-05 Thread Shrikar archak
Hi Prabeesh/ Sean,

I tried both the steps you guys mentioned looks like its not able to
resolve it.

[warn] [NOT FOUND  ]
org.eclipse.jetty.orbit#javax.transaction;1.1.1.v201105210645!javax.transaction.orbit
(131ms)
[warn]  public: tried
[warn]
http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.transaction/1.1.1.v201105210645/javax.transaction-1.1.1.v201105210645.orbit
[warn] [NOT FOUND  ]
org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016!javax.servlet.orbit
(225ms)
[warn]  public: tried
[warn]
http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.orbit
[warn] [NOT FOUND  ]
org.eclipse.jetty.orbit#javax.mail.glassfish;1.4.1.v201005082020!javax.mail.glassfish.orbit
(214ms)
[warn]  public: tried
[warn]
http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.mail.glassfish/1.4.1.v201005082020/javax.mail.glassfish-1.4.1.v201005082020.orbit
[warn] [NOT FOUND  ]
org.eclipse.jetty.orbit#javax.activation;1.1.0.v201105071233!javax.activation.orbit
(112ms)
[warn]  public: tried

Thanks,
Shrikar


On Thu, Jun 5, 2014 at 1:27 AM, prabeesh k prabsma...@gmail.com wrote:

 try sbt clean command before build the app.

 or delete .ivy2 ans .sbt  folders(not a good methode). Then try to rebuild
 the project.


 On Thu, Jun 5, 2014 at 11:45 AM, Sean Owen so...@cloudera.com wrote:

 I think this is SPARK-1949 again:
 https://github.com/apache/spark/pull/906
 I think this change fixed this issue for a few people using the SBT
 build, worth committing?

 On Thu, Jun 5, 2014 at 6:40 AM, Shrikar archak shrika...@gmail.com
 wrote:
  Hi All,
  Now that the Spark Version 1.0.0 is release there should not be any
 problem
  with the local jars.
  Shrikars-MacBook-Pro:SimpleJob shrikar$ cat simple.sbt
  name := Simple Project
 
  version := 1.0
 
  scalaVersion := 2.10.4
 
  libraryDependencies ++= Seq(org.apache.spark %% spark-core %
 1.0.0,
  org.apache.spark %% spark-streaming %
  1.0.0)
 
  resolvers += Akka Repository at http://repo.akka.io/releases/;
 
  I am still having this issue
  [error] (run-main) java.lang.NoClassDefFoundError:
  javax/servlet/http/HttpServletResponse
  java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse
  at org.apache.spark.HttpServer.start(HttpServer.scala:54)
  at
 
 org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156)
  at
 
 org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127)
  at
 
 org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31)
  at
 
 org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48)
  at
 
 org.apache.spark.broadcast.BroadcastManager.init(BroadcastManager.scala:35)
  at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218)
  at org.apache.spark.SparkContext.init(SparkContext.scala:202)
 
  Any help would be greatly appreciated.
 
  Thanks,
  Shrikar
 
 
  On Fri, May 23, 2014 at 3:58 PM, Shrikar archak shrika...@gmail.com
 wrote:
 
  Still the same error no change
 
  Thanks,
  Shrikar
 
 
  On Fri, May 23, 2014 at 2:38 PM, Jacek Laskowski ja...@japila.pl
 wrote:
 
  Hi Shrikar,
 
  How did you build Spark 1.0.0-SNAPSHOT on your machine? My
  understanding is that `sbt publishLocal` is not enough and you really
  need `sbt assembly` instead. Give it a try and report back.
 
  As to your build.sbt, upgrade Scala to 2.10.4 and org.apache.spark
  %% spark-streaming % 1.0.0-SNAPSHOT only that will pull down
  spark-core as a transitive dep. The resolver for Akka Repository is
  not needed. Your build.sbt should really look as follows:
 
  name := Simple Project
 
  version := 1.0
 
  scalaVersion := 2.10.4
 
  libraryDependencies += org.apache.spark %% spark-streaming %
  1.0.0-SNAPSHOT
 
  Jacek
 
  On Thu, May 22, 2014 at 11:27 PM, Shrikar archak shrika...@gmail.com
 
  wrote:
   Hi All,
  
   I am trying to run the network count example as a seperate
 standalone
   job
   and running into some issues.
  
   Environment:
   1) Mac Mavericks
   2) Latest spark repo from Github.
  
  
   I have a structure like this
  
   Shrikars-MacBook-Pro:SimpleJob shrikar$ find .
   .
   ./simple.sbt
   ./src
   ./src/main
   ./src/main/scala
   ./src/main/scala/NetworkWordCount.scala
   ./src/main/scala/SimpleApp.scala.bk
  
  
   simple.sbt
   name := Simple Project
  
   version := 1.0
  
   scalaVersion := 2.10.3
  
   libraryDependencies ++= Seq(org.apache.spark %% spark-core %
   1.0.0-SNAPSHOT,
   org.apache.spark %% spark-streaming
 %
   1.0.0-SNAPSHOT)
  
   resolvers += Akka Repository at http://repo.akka.io/releases/;
  
  
   I am able to run the SimpleApp which is mentioned in the doc but
 when I
   try
   to run the NetworkWordCount app I get error like this am I missing
   something?
  
   [info] Running com.shrikar.sparkapps.NetworkWordCount
   14/05/22 14:26:47 INFO spark.SecurityManager

Re: Unable to run a Standalone job

2014-06-04 Thread Shrikar archak
Hi All,
Now that the Spark Version 1.0.0 is release there should not be any problem
with the local jars.
Shrikars-MacBook-Pro:SimpleJob shrikar$ cat simple.sbt
name := Simple Project

version := 1.0

scalaVersion := 2.10.4

libraryDependencies ++= Seq(org.apache.spark %% spark-core % 1.0.0,
org.apache.spark %% spark-streaming %
1.0.0)

resolvers += Akka Repository at http://repo.akka.io/releases/;

I am still having this issue
[error] (run-main) java.lang.NoClassDefFoundError:
javax/servlet/http/HttpServletResponse
java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse
 at org.apache.spark.HttpServer.start(HttpServer.scala:54)
at
org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156)
 at
org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127)
at
org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31)
 at
org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48)
at
org.apache.spark.broadcast.BroadcastManager.init(BroadcastManager.scala:35)
 at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218)
at org.apache.spark.SparkContext.init(SparkContext.scala:202)

Any help would be greatly appreciated.

Thanks,
Shrikar


On Fri, May 23, 2014 at 3:58 PM, Shrikar archak shrika...@gmail.com wrote:

 Still the same error no change

 Thanks,
 Shrikar


 On Fri, May 23, 2014 at 2:38 PM, Jacek Laskowski ja...@japila.pl wrote:

 Hi Shrikar,

 How did you build Spark 1.0.0-SNAPSHOT on your machine? My
 understanding is that `sbt publishLocal` is not enough and you really
 need `sbt assembly` instead. Give it a try and report back.

 As to your build.sbt, upgrade Scala to 2.10.4 and org.apache.spark
 %% spark-streaming % 1.0.0-SNAPSHOT only that will pull down
 spark-core as a transitive dep. The resolver for Akka Repository is
 not needed. Your build.sbt should really look as follows:

 name := Simple Project

 version := 1.0

 scalaVersion := 2.10.4

 libraryDependencies += org.apache.spark %% spark-streaming %
 1.0.0-SNAPSHOT

 Jacek

 On Thu, May 22, 2014 at 11:27 PM, Shrikar archak shrika...@gmail.com
 wrote:
  Hi All,
 
  I am trying to run the network count example as a seperate standalone
 job
  and running into some issues.
 
  Environment:
  1) Mac Mavericks
  2) Latest spark repo from Github.
 
 
  I have a structure like this
 
  Shrikars-MacBook-Pro:SimpleJob shrikar$ find .
  .
  ./simple.sbt
  ./src
  ./src/main
  ./src/main/scala
  ./src/main/scala/NetworkWordCount.scala
  ./src/main/scala/SimpleApp.scala.bk
 
 
  simple.sbt
  name := Simple Project
 
  version := 1.0
 
  scalaVersion := 2.10.3
 
  libraryDependencies ++= Seq(org.apache.spark %% spark-core %
  1.0.0-SNAPSHOT,
  org.apache.spark %% spark-streaming %
  1.0.0-SNAPSHOT)
 
  resolvers += Akka Repository at http://repo.akka.io/releases/;
 
 
  I am able to run the SimpleApp which is mentioned in the doc but when I
 try
  to run the NetworkWordCount app I get error like this am I missing
  something?
 
  [info] Running com.shrikar.sparkapps.NetworkWordCount
  14/05/22 14:26:47 INFO spark.SecurityManager: Changing view acls to:
 shrikar
  14/05/22 14:26:47 INFO spark.SecurityManager: SecurityManager:
  authentication disabled; ui acls disabled; users with view permissions:
  Set(shrikar)
  14/05/22 14:26:48 INFO slf4j.Slf4jLogger: Slf4jLogger started
  14/05/22 14:26:48 INFO Remoting: Starting remoting
  14/05/22 14:26:48 INFO Remoting: Remoting started; listening on
 addresses
  :[akka.tcp://spark@192.168.10.88:49963]
  14/05/22 14:26:48 INFO Remoting: Remoting now listens on addresses:
  [akka.tcp://spark@192.168.10.88:49963]
  14/05/22 14:26:48 INFO spark.SparkEnv: Registering MapOutputTracker
  14/05/22 14:26:48 INFO spark.SparkEnv: Registering BlockManagerMaster
  14/05/22 14:26:48 INFO storage.DiskBlockManager: Created local
 directory at
 
 /var/folders/r2/mbj08pb55n5d_9p8588xk5b0gn/T/spark-local-20140522142648-0a14
  14/05/22 14:26:48 INFO storage.MemoryStore: MemoryStore started with
  capacity 911.6 MB.
  14/05/22 14:26:48 INFO network.ConnectionManager: Bound socket to port
 49964
  with id = ConnectionManagerId(192.168.10.88,49964)
  14/05/22 14:26:48 INFO storage.BlockManagerMaster: Trying to register
  BlockManager
  14/05/22 14:26:48 INFO storage.BlockManagerInfo: Registering block
 manager
  192.168.10.88:49964 with 911.6 MB RAM
  14/05/22 14:26:48 INFO storage.BlockManagerMaster: Registered
 BlockManager
  14/05/22 14:26:48 INFO spark.HttpServer: Starting HTTP Server
  [error] (run-main) java.lang.NoClassDefFoundError:
  javax/servlet/http/HttpServletResponse
  java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse
  at org.apache.spark.HttpServer.start(HttpServer.scala:54)
  at
 
 org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156)
  at
 
 org.apache.spark.broadcast.HttpBroadcast$.initialize

Re: Unable to run a Standalone job

2014-05-23 Thread Shrikar archak
Still the same error no change

Thanks,
Shrikar


On Fri, May 23, 2014 at 2:38 PM, Jacek Laskowski ja...@japila.pl wrote:

 Hi Shrikar,

 How did you build Spark 1.0.0-SNAPSHOT on your machine? My
 understanding is that `sbt publishLocal` is not enough and you really
 need `sbt assembly` instead. Give it a try and report back.

 As to your build.sbt, upgrade Scala to 2.10.4 and org.apache.spark
 %% spark-streaming % 1.0.0-SNAPSHOT only that will pull down
 spark-core as a transitive dep. The resolver for Akka Repository is
 not needed. Your build.sbt should really look as follows:

 name := Simple Project

 version := 1.0

 scalaVersion := 2.10.4

 libraryDependencies += org.apache.spark %% spark-streaming %
 1.0.0-SNAPSHOT

 Jacek

 On Thu, May 22, 2014 at 11:27 PM, Shrikar archak shrika...@gmail.com
 wrote:
  Hi All,
 
  I am trying to run the network count example as a seperate standalone job
  and running into some issues.
 
  Environment:
  1) Mac Mavericks
  2) Latest spark repo from Github.
 
 
  I have a structure like this
 
  Shrikars-MacBook-Pro:SimpleJob shrikar$ find .
  .
  ./simple.sbt
  ./src
  ./src/main
  ./src/main/scala
  ./src/main/scala/NetworkWordCount.scala
  ./src/main/scala/SimpleApp.scala.bk
 
 
  simple.sbt
  name := Simple Project
 
  version := 1.0
 
  scalaVersion := 2.10.3
 
  libraryDependencies ++= Seq(org.apache.spark %% spark-core %
  1.0.0-SNAPSHOT,
  org.apache.spark %% spark-streaming %
  1.0.0-SNAPSHOT)
 
  resolvers += Akka Repository at http://repo.akka.io/releases/;
 
 
  I am able to run the SimpleApp which is mentioned in the doc but when I
 try
  to run the NetworkWordCount app I get error like this am I missing
  something?
 
  [info] Running com.shrikar.sparkapps.NetworkWordCount
  14/05/22 14:26:47 INFO spark.SecurityManager: Changing view acls to:
 shrikar
  14/05/22 14:26:47 INFO spark.SecurityManager: SecurityManager:
  authentication disabled; ui acls disabled; users with view permissions:
  Set(shrikar)
  14/05/22 14:26:48 INFO slf4j.Slf4jLogger: Slf4jLogger started
  14/05/22 14:26:48 INFO Remoting: Starting remoting
  14/05/22 14:26:48 INFO Remoting: Remoting started; listening on addresses
  :[akka.tcp://spark@192.168.10.88:49963]
  14/05/22 14:26:48 INFO Remoting: Remoting now listens on addresses:
  [akka.tcp://spark@192.168.10.88:49963]
  14/05/22 14:26:48 INFO spark.SparkEnv: Registering MapOutputTracker
  14/05/22 14:26:48 INFO spark.SparkEnv: Registering BlockManagerMaster
  14/05/22 14:26:48 INFO storage.DiskBlockManager: Created local directory
 at
 
 /var/folders/r2/mbj08pb55n5d_9p8588xk5b0gn/T/spark-local-20140522142648-0a14
  14/05/22 14:26:48 INFO storage.MemoryStore: MemoryStore started with
  capacity 911.6 MB.
  14/05/22 14:26:48 INFO network.ConnectionManager: Bound socket to port
 49964
  with id = ConnectionManagerId(192.168.10.88,49964)
  14/05/22 14:26:48 INFO storage.BlockManagerMaster: Trying to register
  BlockManager
  14/05/22 14:26:48 INFO storage.BlockManagerInfo: Registering block
 manager
  192.168.10.88:49964 with 911.6 MB RAM
  14/05/22 14:26:48 INFO storage.BlockManagerMaster: Registered
 BlockManager
  14/05/22 14:26:48 INFO spark.HttpServer: Starting HTTP Server
  [error] (run-main) java.lang.NoClassDefFoundError:
  javax/servlet/http/HttpServletResponse
  java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse
  at org.apache.spark.HttpServer.start(HttpServer.scala:54)
  at
 
 org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156)
  at
 
 org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127)
  at
 
 org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31)
  at
 
 org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48)
  at
 
 org.apache.spark.broadcast.BroadcastManager.init(BroadcastManager.scala:35)
  at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218)
  at org.apache.spark.SparkContext.init(SparkContext.scala:202)
  at
 
 org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:549)
  at
 
 org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:561)
  at
 
 org.apache.spark.streaming.StreamingContext.init(StreamingContext.scala:91)
  at
 com.shrikar.sparkapps.NetworkWordCount$.main(NetworkWordCount.scala:39)
  at com.shrikar.sparkapps.NetworkWordCount.main(NetworkWordCount.scala)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
 
 
  Thanks,
  Shrikar
 



 --
 Jacek Laskowski | http://blog.japila.pl
 Never discourage anyone who continually makes progress, no matter how
 slow. Plato



Unable to run a Standalone job

2014-05-22 Thread Shrikar archak
Hi All,

I am trying to run the network count example as a seperate standalone job
and running into some issues.

Environment:
1) Mac Mavericks
2) Latest spark repo from Github.


I have a structure like this

Shrikars-MacBook-Pro:SimpleJob shrikar$ find .
.
./simple.sbt
./src
./src/main
./src/main/scala
./src/main/scala/NetworkWordCount.scala
./src/main/scala/SimpleApp.scala.bk


simple.sbt
name := Simple Project

version := 1.0

scalaVersion := 2.10.3

libraryDependencies ++= Seq(org.apache.spark %% spark-core %
1.0.0-SNAPSHOT,
org.apache.spark %% spark-streaming %
1.0.0-SNAPSHOT)

resolvers += Akka Repository at http://repo.akka.io/releases/;


I am able to run the SimpleApp which is mentioned in the doc but when I try
to run the NetworkWordCount app I get error like this am I missing
something?

[info] Running com.shrikar.sparkapps.NetworkWordCount
14/05/22 14:26:47 INFO spark.SecurityManager: Changing view acls to: shrikar
14/05/22 14:26:47 INFO spark.SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view permissions:
Set(shrikar)
14/05/22 14:26:48 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/05/22 14:26:48 INFO Remoting: Starting remoting
14/05/22 14:26:48 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://spark@192.168.10.88:49963]
14/05/22 14:26:48 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://spark@192.168.10.88:49963]
14/05/22 14:26:48 INFO spark.SparkEnv: Registering MapOutputTracker
14/05/22 14:26:48 INFO spark.SparkEnv: Registering BlockManagerMaster
14/05/22 14:26:48 INFO storage.DiskBlockManager: Created local directory at
/var/folders/r2/mbj08pb55n5d_9p8588xk5b0gn/T/spark-local-20140522142648-0a14
14/05/22 14:26:48 INFO storage.MemoryStore: MemoryStore started with
capacity 911.6 MB.
14/05/22 14:26:48 INFO network.ConnectionManager: Bound socket to port
49964 with id = ConnectionManagerId(192.168.10.88,49964)
14/05/22 14:26:48 INFO storage.BlockManagerMaster: Trying to register
BlockManager
14/05/22 14:26:48 INFO storage.BlockManagerInfo: Registering block manager
192.168.10.88:49964 with 911.6 MB RAM
*14/05/22 14:26:48 INFO storage.BlockManagerMaster: Registered BlockManager*
*14/05/22 14:26:48 INFO spark.HttpServer: Starting HTTP Server*
*[error] (run-main) java.lang.NoClassDefFoundError:
javax/servlet/http/HttpServletResponse*
*java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse*
* at org.apache.spark.HttpServer.start(HttpServer.scala:54)*
* at
org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156)*
* at
org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127)*
at
org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31)
 at
org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48)
at
org.apache.spark.broadcast.BroadcastManager.init(BroadcastManager.scala:35)
 at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218)
at org.apache.spark.SparkContext.init(SparkContext.scala:202)
 at
org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:549)
at
org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:561)
 at
org.apache.spark.streaming.StreamingContext.init(StreamingContext.scala:91)
at com.shrikar.sparkapps.NetworkWordCount$.main(NetworkWordCount.scala:39)
 at com.shrikar.sparkapps.NetworkWordCount.main(NetworkWordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)


Thanks,
Shrikar


Re: Unable to run a Standalone job

2014-05-22 Thread Shrikar archak
I am running as sbt run. I am running it locally .

Thanks,
Shrikar


On Thu, May 22, 2014 at 3:53 PM, Tathagata Das
tathagata.das1...@gmail.comwrote:

 How are you launching the application? sbt run ? spark-submit? local
 mode or Spark standalone cluster? Are you packaging all your code into
 a jar?
 Looks to me that you seem to have spark classes in your execution
 environment but missing some of Spark's dependencies.

 TD



 On Thu, May 22, 2014 at 2:27 PM, Shrikar archak shrika...@gmail.com
 wrote:
  Hi All,
 
  I am trying to run the network count example as a seperate standalone job
  and running into some issues.
 
  Environment:
  1) Mac Mavericks
  2) Latest spark repo from Github.
 
 
  I have a structure like this
 
  Shrikars-MacBook-Pro:SimpleJob shrikar$ find .
  .
  ./simple.sbt
  ./src
  ./src/main
  ./src/main/scala
  ./src/main/scala/NetworkWordCount.scala
  ./src/main/scala/SimpleApp.scala.bk
 
 
  simple.sbt
  name := Simple Project
 
  version := 1.0
 
  scalaVersion := 2.10.3
 
  libraryDependencies ++= Seq(org.apache.spark %% spark-core %
  1.0.0-SNAPSHOT,
  org.apache.spark %% spark-streaming %
  1.0.0-SNAPSHOT)
 
  resolvers += Akka Repository at http://repo.akka.io/releases/;
 
 
  I am able to run the SimpleApp which is mentioned in the doc but when I
 try
  to run the NetworkWordCount app I get error like this am I missing
  something?
 
  [info] Running com.shrikar.sparkapps.NetworkWordCount
  14/05/22 14:26:47 INFO spark.SecurityManager: Changing view acls to:
 shrikar
  14/05/22 14:26:47 INFO spark.SecurityManager: SecurityManager:
  authentication disabled; ui acls disabled; users with view permissions:
  Set(shrikar)
  14/05/22 14:26:48 INFO slf4j.Slf4jLogger: Slf4jLogger started
  14/05/22 14:26:48 INFO Remoting: Starting remoting
  14/05/22 14:26:48 INFO Remoting: Remoting started; listening on addresses
  :[akka.tcp://spark@192.168.10.88:49963]
  14/05/22 14:26:48 INFO Remoting: Remoting now listens on addresses:
  [akka.tcp://spark@192.168.10.88:49963]
  14/05/22 14:26:48 INFO spark.SparkEnv: Registering MapOutputTracker
  14/05/22 14:26:48 INFO spark.SparkEnv: Registering BlockManagerMaster
  14/05/22 14:26:48 INFO storage.DiskBlockManager: Created local directory
 at
 
 /var/folders/r2/mbj08pb55n5d_9p8588xk5b0gn/T/spark-local-20140522142648-0a14
  14/05/22 14:26:48 INFO storage.MemoryStore: MemoryStore started with
  capacity 911.6 MB.
  14/05/22 14:26:48 INFO network.ConnectionManager: Bound socket to port
 49964
  with id = ConnectionManagerId(192.168.10.88,49964)
  14/05/22 14:26:48 INFO storage.BlockManagerMaster: Trying to register
  BlockManager
  14/05/22 14:26:48 INFO storage.BlockManagerInfo: Registering block
 manager
  192.168.10.88:49964 with 911.6 MB RAM
  14/05/22 14:26:48 INFO storage.BlockManagerMaster: Registered
 BlockManager
  14/05/22 14:26:48 INFO spark.HttpServer: Starting HTTP Server
  [error] (run-main) java.lang.NoClassDefFoundError:
  javax/servlet/http/HttpServletResponse
  java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse
  at org.apache.spark.HttpServer.start(HttpServer.scala:54)
  at
 
 org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156)
  at
 
 org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127)
  at
 
 org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31)
  at
 
 org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48)
  at
 
 org.apache.spark.broadcast.BroadcastManager.init(BroadcastManager.scala:35)
  at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218)
  at org.apache.spark.SparkContext.init(SparkContext.scala:202)
  at
 
 org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:549)
  at
 
 org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:561)
  at
 
 org.apache.spark.streaming.StreamingContext.init(StreamingContext.scala:91)
  at
 com.shrikar.sparkapps.NetworkWordCount$.main(NetworkWordCount.scala:39)
  at com.shrikar.sparkapps.NetworkWordCount.main(NetworkWordCount.scala)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
 
 
  Thanks,
  Shrikar
 



Re: Unable to run a Standalone job

2014-05-22 Thread Shrikar archak
Yes I did a sbt publish-local. Ok I will try with Spark 0.9.1.

Thanks,
Shrikar


On Thu, May 22, 2014 at 8:53 PM, Tathagata Das
tathagata.das1...@gmail.comwrote:

 How are you getting Spark with 1.0.0-SNAPSHOT through maven? Did you
 publish Spark locally which allowed you to use it as a dependency?

 This is a weird indeed. SBT should take care of all the dependencies of
 spark.

 In any case, you can try the last released Spark 0.9.1 and see if the
 problem persists.


 On Thu, May 22, 2014 at 3:59 PM, Shrikar archak shrika...@gmail.comwrote:

 I am running as sbt run. I am running it locally .

 Thanks,
 Shrikar


 On Thu, May 22, 2014 at 3:53 PM, Tathagata Das 
 tathagata.das1...@gmail.com wrote:

 How are you launching the application? sbt run ? spark-submit? local
 mode or Spark standalone cluster? Are you packaging all your code into
 a jar?
 Looks to me that you seem to have spark classes in your execution
 environment but missing some of Spark's dependencies.

 TD



 On Thu, May 22, 2014 at 2:27 PM, Shrikar archak shrika...@gmail.com
 wrote:
  Hi All,
 
  I am trying to run the network count example as a seperate standalone
 job
  and running into some issues.
 
  Environment:
  1) Mac Mavericks
  2) Latest spark repo from Github.
 
 
  I have a structure like this
 
  Shrikars-MacBook-Pro:SimpleJob shrikar$ find .
  .
  ./simple.sbt
  ./src
  ./src/main
  ./src/main/scala
  ./src/main/scala/NetworkWordCount.scala
  ./src/main/scala/SimpleApp.scala.bk
 
 
  simple.sbt
  name := Simple Project
 
  version := 1.0
 
  scalaVersion := 2.10.3
 
  libraryDependencies ++= Seq(org.apache.spark %% spark-core %
  1.0.0-SNAPSHOT,
  org.apache.spark %% spark-streaming %
  1.0.0-SNAPSHOT)
 
  resolvers += Akka Repository at http://repo.akka.io/releases/;
 
 
  I am able to run the SimpleApp which is mentioned in the doc but when
 I try
  to run the NetworkWordCount app I get error like this am I missing
  something?
 
  [info] Running com.shrikar.sparkapps.NetworkWordCount
  14/05/22 14:26:47 INFO spark.SecurityManager: Changing view acls to:
 shrikar
  14/05/22 14:26:47 INFO spark.SecurityManager: SecurityManager:
  authentication disabled; ui acls disabled; users with view permissions:
  Set(shrikar)
  14/05/22 14:26:48 INFO slf4j.Slf4jLogger: Slf4jLogger started
  14/05/22 14:26:48 INFO Remoting: Starting remoting
  14/05/22 14:26:48 INFO Remoting: Remoting started; listening on
 addresses
  :[akka.tcp://spark@192.168.10.88:49963]
  14/05/22 14:26:48 INFO Remoting: Remoting now listens on addresses:
  [akka.tcp://spark@192.168.10.88:49963]
  14/05/22 14:26:48 INFO spark.SparkEnv: Registering MapOutputTracker
  14/05/22 14:26:48 INFO spark.SparkEnv: Registering BlockManagerMaster
  14/05/22 14:26:48 INFO storage.DiskBlockManager: Created local
 directory at
 
 /var/folders/r2/mbj08pb55n5d_9p8588xk5b0gn/T/spark-local-20140522142648-0a14
  14/05/22 14:26:48 INFO storage.MemoryStore: MemoryStore started with
  capacity 911.6 MB.
  14/05/22 14:26:48 INFO network.ConnectionManager: Bound socket to port
 49964
  with id = ConnectionManagerId(192.168.10.88,49964)
  14/05/22 14:26:48 INFO storage.BlockManagerMaster: Trying to register
  BlockManager
  14/05/22 14:26:48 INFO storage.BlockManagerInfo: Registering block
 manager
  192.168.10.88:49964 with 911.6 MB RAM
  14/05/22 14:26:48 INFO storage.BlockManagerMaster: Registered
 BlockManager
  14/05/22 14:26:48 INFO spark.HttpServer: Starting HTTP Server
  [error] (run-main) java.lang.NoClassDefFoundError:
  javax/servlet/http/HttpServletResponse
  java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse
  at org.apache.spark.HttpServer.start(HttpServer.scala:54)
  at
 
 org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156)
  at
 
 org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127)
  at
 
 org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31)
  at
 
 org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48)
  at
 
 org.apache.spark.broadcast.BroadcastManager.init(BroadcastManager.scala:35)
  at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218)
  at org.apache.spark.SparkContext.init(SparkContext.scala:202)
  at
 
 org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:549)
  at
 
 org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:561)
  at
 
 org.apache.spark.streaming.StreamingContext.init(StreamingContext.scala:91)
  at
 com.shrikar.sparkapps.NetworkWordCount$.main(NetworkWordCount.scala:39)
  at com.shrikar.sparkapps.NetworkWordCount.main(NetworkWordCount.scala)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25

Re: Unable to run a Standalone job

2014-05-22 Thread Shrikar archak
Hi,
I tried clearing maven and ivy cache and I am a bit confused at this point
in time.

1) Running the example from the spark directory and running using
bin/run-example. It works fine as well as it prints the word counts.

2) Trying to run the same code as a seperate job.
   *) Using the latest 1.0.0-SNAPSHOT it doesn't work and throws exception.
  *) Using 0.9.1 doesn't throws any exception but doesn't print any word
counts.

Thanks,
Shrikar


On Thu, May 22, 2014 at 9:19 PM, Soumya Simanta soumya.sima...@gmail.comwrote:

 Try cleaning your maven (.m2) and ivy cache.



 On May 23, 2014, at 12:03 AM, Shrikar archak shrika...@gmail.com wrote:

 Yes I did a sbt publish-local. Ok I will try with Spark 0.9.1.

 Thanks,
 Shrikar


 On Thu, May 22, 2014 at 8:53 PM, Tathagata Das 
 tathagata.das1...@gmail.com wrote:

 How are you getting Spark with 1.0.0-SNAPSHOT through maven? Did you
 publish Spark locally which allowed you to use it as a dependency?

 This is a weird indeed. SBT should take care of all the dependencies of
 spark.

 In any case, you can try the last released Spark 0.9.1 and see if the
 problem persists.


 On Thu, May 22, 2014 at 3:59 PM, Shrikar archak shrika...@gmail.comwrote:

 I am running as sbt run. I am running it locally .

 Thanks,
 Shrikar


 On Thu, May 22, 2014 at 3:53 PM, Tathagata Das 
 tathagata.das1...@gmail.com wrote:

 How are you launching the application? sbt run ? spark-submit? local
 mode or Spark standalone cluster? Are you packaging all your code into
 a jar?
 Looks to me that you seem to have spark classes in your execution
 environment but missing some of Spark's dependencies.

 TD



 On Thu, May 22, 2014 at 2:27 PM, Shrikar archak shrika...@gmail.com
 wrote:
  Hi All,
 
  I am trying to run the network count example as a seperate standalone
 job
  and running into some issues.
 
  Environment:
  1) Mac Mavericks
  2) Latest spark repo from Github.
 
 
  I have a structure like this
 
  Shrikars-MacBook-Pro:SimpleJob shrikar$ find .
  .
  ./simple.sbt
  ./src
  ./src/main
  ./src/main/scala
  ./src/main/scala/NetworkWordCount.scala
  ./src/main/scala/SimpleApp.scala.bk
 
 
  simple.sbt
  name := Simple Project
 
  version := 1.0
 
  scalaVersion := 2.10.3
 
  libraryDependencies ++= Seq(org.apache.spark %% spark-core %
  1.0.0-SNAPSHOT,
  org.apache.spark %% spark-streaming %
  1.0.0-SNAPSHOT)
 
  resolvers += Akka Repository at http://repo.akka.io/releases/;
 
 
  I am able to run the SimpleApp which is mentioned in the doc but when
 I try
  to run the NetworkWordCount app I get error like this am I missing
  something?
 
  [info] Running com.shrikar.sparkapps.NetworkWordCount
  14/05/22 14:26:47 INFO spark.SecurityManager: Changing view acls to:
 shrikar
  14/05/22 14:26:47 INFO spark.SecurityManager: SecurityManager:
  authentication disabled; ui acls disabled; users with view
 permissions:
  Set(shrikar)
  14/05/22 14:26:48 INFO slf4j.Slf4jLogger: Slf4jLogger started
  14/05/22 14:26:48 INFO Remoting: Starting remoting
  14/05/22 14:26:48 INFO Remoting: Remoting started; listening on
 addresses
  :[akka.tcp://spark@192.168.10.88:49963]
  14/05/22 14:26:48 INFO Remoting: Remoting now listens on addresses:
  [akka.tcp://spark@192.168.10.88:49963]
  14/05/22 14:26:48 INFO spark.SparkEnv: Registering MapOutputTracker
  14/05/22 14:26:48 INFO spark.SparkEnv: Registering BlockManagerMaster
  14/05/22 14:26:48 INFO storage.DiskBlockManager: Created local
 directory at
 
 /var/folders/r2/mbj08pb55n5d_9p8588xk5b0gn/T/spark-local-20140522142648-0a14
  14/05/22 14:26:48 INFO storage.MemoryStore: MemoryStore started with
  capacity 911.6 MB.
  14/05/22 14:26:48 INFO network.ConnectionManager: Bound socket to
 port 49964
  with id = ConnectionManagerId(192.168.10.88,49964)
  14/05/22 14:26:48 INFO storage.BlockManagerMaster: Trying to register
  BlockManager
  14/05/22 14:26:48 INFO storage.BlockManagerInfo: Registering block
 manager
  192.168.10.88:49964 with 911.6 MB RAM
  14/05/22 14:26:48 INFO storage.BlockManagerMaster: Registered
 BlockManager
  14/05/22 14:26:48 INFO spark.HttpServer: Starting HTTP Server
  [error] (run-main) java.lang.NoClassDefFoundError:
  javax/servlet/http/HttpServletResponse
  java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse
  at org.apache.spark.HttpServer.start(HttpServer.scala:54)
  at
 
 org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156)
  at
 
 org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127)
  at
 
 org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31)
  at
 
 org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48)
  at
 
 org.apache.spark.broadcast.BroadcastManager.init(BroadcastManager.scala:35)
  at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218)
  at org.apache.spark.SparkContext.init(SparkContext.scala:202