spark streaming kafka output

2014-05-05 Thread Weide Zhang
Hi ,

Is there any code to implement a kafka output for spark streaming? My use
case is all the output need to be dumped back to kafka cluster again after
data is processed ?  What will be guideline to implement such function ? I
heard foreachRDD will create one instance of producer per batch ? If so,
will that hurt performance ?

Thanks,

Weide


spark streaming question

2014-05-04 Thread Weide Zhang
Hi ,

It might be a very general question to ask here but I'm curious to know why
spark streaming can achieve better throughput than storm as claimed in the
spark streaming paper. Does it depend on certain use cases and/or data
source ? What drives better performance in spark streaming case or in other
ways, what makes storm not as performant as spark streaming ?

Also, in order to guarantee exact-once semantics when node failure happens,
 spark makes replicas of RDDs and checkpoints so that data can be
recomputed on the fly while on Trident case, they use transactional object
to persist the state and result but it's not obvious to me which approach
is more costly and why ? Any one can provide some experience here ?

Thanks a lot,

Weide


what's local[n]

2014-05-03 Thread Weide Zhang
in spark kafka example,

it says

   `./bin/run-example org.apache.spark.streaming.examples.KafkaWordCount
local[2] zoo01,zoo02,zoo03 my-consumer-group topic1,topic2 1`

can any one tell me what does local[2] represent ? i thought master url
should be sth like spark://hostname:portname .

also, the thread count is specified as 1 in kafka example, what will happen
thread count goes to more than 1 ? does that mean multiple kafka consumer
will be created on different spark workers ? I'm not sure how does the code
mapped to realtime worker thread allocation ? Is there any documentation on
that ?

Weide


spark run issue

2014-05-03 Thread Weide Zhang
Hi I'm trying to run the kafka-word-count example in spark2.9.1. I
encountered some exception when initialize kafka consumer/producer config.
I'm using scala 2.10.3 and used maven build inside spark streaming kafka
library comes with spark2.9.1. Any one see this exception before?

Thanks,

producer:
 [java] The args attribute is deprecated. Please use nested arg
elements.
 [java] Exception in thread main java.lang.NoClassDefFoundError:
scala/Tuple2$mcLL$sp
 [java] at
kafka.producer.ProducerConfig.init(ProducerConfig.scala:56)
 [java] at
com.turn.apache.KafkaWordCountProducer$.main(HelloWorld.scala:89)
 [java] at
com.turn.apache.KafkaWordCountProducer.main(HelloWorld.scala)
 [java] Caused by: java.lang.ClassNotFoundException:
scala.Tuple2$mcLL$sp
 [java] at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 [java] at java.security.AccessController.doPrivileged(Native
Method)
 [java] at
java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 [java] at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 [java] ... 3 more
 [java] Java Result: 1


Re: spark run issue

2014-05-03 Thread Weide Zhang
Hi Tathagata,

I figured out the reason. I was adding a wrong kafka lib along side with
the version spark uses. Sorry for spamming.

Weide


On Sat, May 3, 2014 at 7:04 PM, Tathagata Das
tathagata.das1...@gmail.comwrote:

 I am a little confused about the version of Spark you are using. Are you
 using Spark 0.9.1 that uses scala 2.10.3 ?

 TD


 On Sat, May 3, 2014 at 6:16 PM, Weide Zhang weo...@gmail.com wrote:

 Hi I'm trying to run the kafka-word-count example in spark2.9.1. I
 encountered some exception when initialize kafka consumer/producer config.
 I'm using scala 2.10.3 and used maven build inside spark streaming kafka
 library comes with spark2.9.1. Any one see this exception before?

 Thanks,

 producer:
  [java] The args attribute is deprecated. Please use nested arg
 elements.
  [java] Exception in thread main java.lang.NoClassDefFoundError:
 scala/Tuple2$mcLL$sp
  [java] at
 kafka.producer.ProducerConfig.init(ProducerConfig.scala:56)
  [java] at
 com.turn.apache.KafkaWordCountProducer$.main(HelloWorld.scala:89)
  [java] at
 com.turn.apache.KafkaWordCountProducer.main(HelloWorld.scala)
  [java] Caused by: java.lang.ClassNotFoundException:
 scala.Tuple2$mcLL$sp
  [java] at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
  [java] at java.security.AccessController.doPrivileged(Native
 Method)
  [java] at
 java.net.URLClassLoader.findClass(URLClassLoader.java:190)
  [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
  [java] at
 sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
  [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
  [java] ... 3 more
  [java] Java Result: 1





Re: spark run issue

2014-05-03 Thread Weide Zhang
Hi Tathagata,

I actually have a separate question. What's the usage of lib_managed folder
inside spark source folder ? Are those the library required for spark
streaming to run ? Do they needed to be added to spark classpath when
starting sparking cluster?

Weide


On Sat, May 3, 2014 at 7:08 PM, Weide Zhang weo...@gmail.com wrote:

 Hi Tathagata,

 I figured out the reason. I was adding a wrong kafka lib along side with
 the version spark uses. Sorry for spamming.

 Weide


 On Sat, May 3, 2014 at 7:04 PM, Tathagata Das tathagata.das1...@gmail.com
  wrote:

 I am a little confused about the version of Spark you are using. Are you
 using Spark 0.9.1 that uses scala 2.10.3 ?

 TD


 On Sat, May 3, 2014 at 6:16 PM, Weide Zhang weo...@gmail.com wrote:

 Hi I'm trying to run the kafka-word-count example in spark2.9.1. I
 encountered some exception when initialize kafka consumer/producer config.
 I'm using scala 2.10.3 and used maven build inside spark streaming kafka
 library comes with spark2.9.1. Any one see this exception before?

 Thanks,

 producer:
  [java] The args attribute is deprecated. Please use nested arg
 elements.
  [java] Exception in thread main java.lang.NoClassDefFoundError:
 scala/Tuple2$mcLL$sp
  [java] at
 kafka.producer.ProducerConfig.init(ProducerConfig.scala:56)
  [java] at
 com.turn.apache.KafkaWordCountProducer$.main(HelloWorld.scala:89)
  [java] at
 com.turn.apache.KafkaWordCountProducer.main(HelloWorld.scala)
  [java] Caused by: java.lang.ClassNotFoundException:
 scala.Tuple2$mcLL$sp
  [java] at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
  [java] at java.security.AccessController.doPrivileged(Native
 Method)
  [java] at
 java.net.URLClassLoader.findClass(URLClassLoader.java:190)
  [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
  [java] at
 sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
  [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
  [java] ... 3 more
  [java] Java Result: 1






docker image build issue for spark 0.9.1

2014-05-02 Thread Weide Zhang
Hi I tried to build docker image for spark 0.9.1 but get the following
error.

any one has experience resolving the issue ?

The following packages have unmet dependencies:
 tzdata-java : Depends: tzdata (= 2012b-1) but 2013g-0ubuntu0.12.04 is to
be installed
E: Unable to correct problems, you have held broken packages.
2014/05/02 14:16:25 The command [/bin/sh -c apt-get install -y tzdata-java
less openjdk-7-jre-headless net-tools vim-tiny sudo openssh-server]
returned a non-zero code: 100

it seems it relates with openjdk-7-jre-headless installation but i dont
know how to resolve it.

Thanks a lot,

Weide


Re: docker image build issue for spark 0.9.1

2014-05-02 Thread Weide Zhang
yes, the docker script is there inside spark source package. It already
specifies the master and worker container to run in different docker
containers. Mainly it is used for easy deployment and development in my
scenario.




On Fri, May 2, 2014 at 2:30 PM, Nicholas Chammas nicholas.cham...@gmail.com
 wrote:

 Don't have any tips for you, Weide, but I was just learning about Docker
 and it sounds very cool.

 Are you trying to build both master and worker containers that you can
 easily deploy to create a cluster? I'm interested in knowing how Docker is
 used in this case.

 Nick



 On Fri, May 2, 2014 at 5:17 PM, Weide Zhang weo...@gmail.com wrote:

 Hi I tried to build docker image for spark 0.9.1 but get the following
 error.

 any one has experience resolving the issue ?

 The following packages have unmet dependencies:
  tzdata-java : Depends: tzdata (= 2012b-1) but 2013g-0ubuntu0.12.04 is to
 be installed
 E: Unable to correct problems, you have held broken packages.
 2014/05/02 14:16:25 The command [/bin/sh -c apt-get install -y
 tzdata-java less openjdk-7-jre-headless net-tools vim-tiny sudo
 openssh-server] returned a non-zero code: 100

 it seems it relates with openjdk-7-jre-headless installation but i dont
 know how to resolve it.

 Thanks a lot,

 Weide