zookeeper mesos logging in spark

2016-08-26 Thread aecc
Hi,

Everytime I run my spark application using mesos, I get logs in my console
in the form:

2016-08-26 15:25:30,949:960521(0x7f6bccff9700):ZOO_INFO@log_env
2016-08-26 15:25:30,949:960521(0x7f6bccff9700):ZOO_INFO@log_env
2016-08-26 15:25:30,949:960521(0x7f6bccff9700):ZOO_INFO@log_env
2016-08-26 15:25:30,949:960521(0x7f6bccff9700):ZOO_INFO@log_env
2016-08-26 15:25:30,949:960521(0x7f6bccff9700):ZOO_INFO@log_env
I0826 15:25:30.949254 960752 sched.cpp:222] Version: 0.28.2
2016-08-26 15:25:30,949:960521(0x7f6bccff9700):ZOO_INFO@log_env
2016-08-26 15:25:30,949:960521(0x7f6bccff9700):ZOO_INFO@log_env
2016-08-26 15:25:30,949:960521(0x7f6bccff9700):ZOO_INFO@log_env
2016-08-26 15:25:30,949:960521(0x7f6bccff9700):ZOO_INFO@zookeep
2016-08-26 15:25:30,951:960521(0x7f6bb4ff9700):ZOO_INFO@check_e
2016-08-26 15:25:30,952:960521(0x7f6bb4ff9700):ZOO_INFO@check_e
I0826 15:25:30.952505 960729 group.cpp:349] Group process (grou
I0826 15:25:30.952570 960729 group.cpp:831] Syncing group opera
I0826 15:25:30.952592 960729 group.cpp:427] Trying to create pa
I0826 15:25:30.954211 960722 detector.cpp:152] Detected a new l
I0826 15:25:30.954320 960744 group.cpp:700] Trying to get '/mes
I0826 15:25:30.955345 960724 detector.cpp:479] A new leading ma
I0826 15:25:30.955451 960724 sched.cpp:326] New master detected
I0826 15:25:30.955567 960724 sched.cpp:336] No credentials prov
I0826 15:25:30.956478 960732 sched.cpp:703] Framework registere

Anybody know how to disable them through spark-submit ?

Cheers and many thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/zookeeper-mesos-logging-in-spark-tp27607.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark task hangs infinitely when accessing S3 from AWS

2016-01-26 Thread aecc
Sorry, I have not been able to solve the issue. I used speculation mode as
workaround to this.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p26068.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark task hangs infinitely when accessing S3 from AWS

2015-11-12 Thread aecc
Any hints?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p25365.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark task hangs infinitely when accessing S3 from AWS

2015-11-12 Thread aecc
Some other stats:

The number of files I have in the folder is 48.
The number of partitions used when reading data is 7315.
The maximum size of a file to read is 14G
The size of the folder is around: 270G 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p25367.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark task hangs infinitely when accessing S3 from AWS

2015-11-09 Thread aecc
Any help on this? this is really blocking me and I don't find any feasible
solution yet.

Thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p25327.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark task hangs infinitely when accessing S3 from AWS

2015-11-05 Thread aecc
Hi guys, when reading data from S3 from AWS using Spark 1.5.1 one of the
tasks hangs when reading data in a way that cannot be reproduced. Some times
it hangs, some times it doesn't.

This is the thread dump from the hung task:

"Executor task launch worker-3" daemon prio=10 tid=0x7f419c023000
nid=0x6548 runnable [0x7f425df2b000]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:554)
at sun.security.ssl.InputRecord.read(InputRecord.java:509)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:934)
- locked <0x7f42c373b4d8> (a java.lang.Object)
at
sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1332)
- locked <0x7f42c373b610> (a java.lang.Object)
at
sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1359)
at
sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1343)
at
org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:533)
at
org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:401)
at
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:177)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304)
at
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:610)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:445)
at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at
com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:384)
at
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:976)
at
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:956)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:892)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
at org.apache.avro.mapred.FsInput.(FsInput.java:37)
at
org.apache.avro.mapreduce.AvroRecordReaderBase.createSeekableInput(AvroRecordReaderBase.java:171)
at
org.apache.avro.mapreduce.AvroRecordReaderBase.initialize(AvroRecordReaderBase.java:87)
at
org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:153)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:124)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)

This are my only manually passed settings (besides the s3 credentials):

--conf spark.driver.maxResultSize=4g \
--conf spark.akka.frameSize=500 \
--conf spark.hadoop.fs.s3a.connection.maximum=500 \

I'm using aws-java-sdk-1.7.4.jar and hadoop-aws-2.7.1.jar to be able to read
data from AWS.

I have been long time struggling with this issue, the only workaround I have
been able to find is to use Spark Speculation, however that's not a feasible
solution for me anymore.

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Locality level and Kryo

2014-12-16 Thread aecc
Hi guys,

It happens to me quite often that when the locality level of a task goes
further than LOCAL (NODE, RACK, etc), I get some of the following
exceptions: too many files open, encountered unregistered class id,
cannot cast X to Y.

I do not get any exceptions during shuffling (which means that kryo works
well). 

I'm running Spark 1.0.0 with the following characteristics:

- 18 executors with 30G each
- Yarn client mode
- ulimit is defined in 500k
- Input data: hdfs file with 1000 partitions and 10 GB of size

Please any hint would be appreciated



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Locality-level-and-Kryo-tp20708.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Locality Level Kryo

2014-12-16 Thread aecc
Hi guys,

I get Kryo exceptions of the type unregistered class id and cannot cast
to class when the locality level of the tasks go beyond LOCAL. 
However I get no Kryo exceptions during shuffling operations.

If the locality level never goes beyond LOCAL everything works fine.

Is there a special reason for this? How does kryo handles the tasks that are
not in the same locality?

Cheers



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Locality-Level-Kryo-tp20740.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread aecc
Hello guys,

I'm using Spark 1.0.0 and Kryo serialization
In the Spark Shell, when I create a class that contains as an attribute the
SparkContext, in this way:

class AAA(val s: SparkContext) { }
val aaa = new AAA(sc)

and I execute any action using that attribute like:

val myNumber = 5
aaa.s.textFile(FILE).filter(_ == myNumber.toString).count
or
aaa.s.parallelize(1 to 10).filter(_ == myNumber).count

Returns a NonSerializibleException:

org.apache.spark.SparkException: Job aborted due to stage failure: Task not
serializable: java.io.NotSerializableException:
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$AAA
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:770)
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:713)
at
org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:697)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1176)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


Any thoughts about how to solve this issue and how can I give a workaround
to it? I'm actually developing an Api that will need the usage of this
SparkContext several times in different locations, so I will needed to be
accessible.

Thanks a lot for the cooperation



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Context-as-an-attribute-of-a-class-cannot-be-used-tp19668.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread aecc
Marcelo Vanzin wrote
 Do you expect to be able to use the spark context on the remote task?

Not At all, what I want to create is a wrapper of the SparkContext, to be
used only on the driver node.
I would like to have in this AAA wrapper several attributes, such as the
SparkContext and other configurations for my project.

I tested using -Dsun.io.serialization.extendedDebugInfo=true

This is the stacktrace:

org.apache.spark.SparkException: Job aborted due to stage failure: Task not
serializable: java.io.NotSerializableException: $iwC$$iwC$$iwC$$iwC$AAA
- field (class $iwC$$iwC$$iwC$$iwC, name: aaa, type: class
$iwC$$iwC$$iwC$$iwC$AAA)
- object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@24e57dcb)
- field (class $iwC$$iwC$$iwC, name: $iw, type: class
$iwC$$iwC$$iwC$$iwC)
- object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@178cc62b)
- field (class $iwC$$iwC, name: $iw, type: class $iwC$$iwC$$iwC)
- object (class $iwC$$iwC, $iwC$$iwC@1e9f5eeb)
- field (class $iwC, name: $iw, type: class $iwC$$iwC)
- object (class $iwC, $iwC@37d8e87e)
- field (class $line18.$read, name: $iw, type: class $iwC)
- object (class $line18.$read, $line18.$read@124551f)
- field (class $iwC$$iwC$$iwC, name: $VAL15, type: class
$line18.$read)
- object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@2e846e6b)
- field (class $iwC$$iwC$$iwC$$iwC, name: $outer, type: class
$iwC$$iwC$$iwC)
- object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@4b31ba1b)
- field (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, name: $outer, type:
class $iwC$$iwC$$iwC$$iwC)
- object (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, function1)
- field (class org.apache.spark.rdd.FilteredRDD, name: f, type:
interface scala.Function1)
- root object (class org.apache.spark.rdd.FilteredRDD, FilteredRDD[3] 
at
filter at console:20)
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)

I actually don't understand much about this stack trace. If you can help me,
I would appreciate it.

Transient didn't work either

Thanks a lot



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Context-as-an-attribute-of-a-class-cannot-be-used-tp19668p19679.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread aecc
Yes, I'm running this in the Shell. In my compiled Jar it works perfectly,
the issue is I need to do this on the shell.

Any available workarounds?

I checked sqlContext, they use it in the same way I would like to use my
class, they make the class Serializable with transient. Does this affects
somehow the whole pipeline of data moving? I mean, will I get performance
issues when doing this because now the class will be Serialized for some
reason that I still don't understand?


2014-11-24 22:33 GMT+01:00 Marcelo Vanzin [via Apache Spark User List] 
ml-node+s1001560n19687...@n3.nabble.com:

 Hello,

 On Mon, Nov 24, 2014 at 12:07 PM, aecc [hidden email]
 http://user/SendEmail.jtp?type=nodenode=19687i=0 wrote:
  This is the stacktrace:
 
  org.apache.spark.SparkException: Job aborted due to stage failure: Task
 not
  serializable: java.io.NotSerializableException: $iwC$$iwC$$iwC$$iwC$AAA
  - field (class $iwC$$iwC$$iwC$$iwC, name: aaa, type: class
  $iwC$$iwC$$iwC$$iwC$AAA)

 Ah. Looks to me that you're trying to run this in spark-shell, right?

 I'm not 100% sure of how it works internally, but I think the Scala
 repl works a little differently than regular Scala code in this
 regard. When you declare a val in the shell it will behave
 differently than a val inside a method in a compiled Scala class -
 the former will behave like an instance variable, the latter like a
 local variable. So, this is probably why you're running into this.

 Try compiling your code and running it outside the shell to see how it
 goes. I'm not sure whether there's a workaround for this when trying
 things out in the shell - maybe declare an `object` to hold your
 constants? Never really tried, so YMMV.

 --
 Marcelo

 -
 To unsubscribe, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=19687i=1
 For additional commands, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=19687i=2



 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Context-as-an-attribute-of-a-class-cannot-be-used-tp19668p19687.html
  To unsubscribe from Using Spark Context as an attribute of a class cannot
 be used, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=19668code=YWxlc3NhbmRyb2FlY2NAZ21haWwuY29tfDE5NjY4fDE2MzQ0ODgyMDU=
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 
Alessandro Chacón
Aecc_ORG




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Context-as-an-attribute-of-a-class-cannot-be-used-tp19668p19690.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread aecc
Ok, great, I'm gonna do do it that way, thanks :). However I still don't
understand why this object should be serialized and shipped?

aaa.s and sc are both the same object org.apache.spark.SparkContext@1f222881

However this :
aaa.s.parallelize(1 to 10).filter(_ == myNumber).count

Needs to be serialized, and this:

sc.parallelize(1 to 10).filter(_ == myNumber).count

does not.

2014-11-24 23:13 GMT+01:00 Marcelo Vanzin [via Apache Spark User List] 
ml-node+s1001560n19692...@n3.nabble.com:

 On Mon, Nov 24, 2014 at 1:56 PM, aecc [hidden email]
 http://user/SendEmail.jtp?type=nodenode=19692i=0 wrote:
  I checked sqlContext, they use it in the same way I would like to use my
  class, they make the class Serializable with transient. Does this
 affects
  somehow the whole pipeline of data moving? I mean, will I get
 performance
  issues when doing this because now the class will be Serialized for some
  reason that I still don't understand?

 If you want to do the same thing, your AAA needs to be serializable
 and you need to mark all non-serializable fields as @transient. The
 only performance penalty you'll be paying is the serialization /
 deserialization of the AAA instance, which most probably will be
 really small compared to the actual work the task will be doing.

 Unless your class is holding a whole lot of data, in which case you
 should start thinking about using a broadcast instead.

 --
 Marcelo

 -
 To unsubscribe, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=19692i=1
 For additional commands, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=19692i=2



 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Context-as-an-attribute-of-a-class-cannot-be-used-tp19668p19692.html
  To unsubscribe from Using Spark Context as an attribute of a class cannot
 be used, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=19668code=YWxlc3NhbmRyb2FlY2NAZ21haWwuY29tfDE5NjY4fDE2MzQ0ODgyMDU=
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 
Alessandro Chacón
Aecc_ORG




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Context-as-an-attribute-of-a-class-cannot-be-used-tp19668p19694.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Sliding Subwindows

2014-04-01 Thread aecc
Hello, I would like to have a kind of sub windows. The idea is to have 3
windows in the following way:

future    - --   
 past
 w1 w2   w3

So I can do some processing with the new data coming (w1) to main main
window (w2) and some processing on the data leaving the window (w3)

Any ideas of how can I do this in Spark? Is there a way to create sub
windows? or to specify when a window should start reading?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Sliding-Subwindows-tp3572.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Kafka in Yarn

2014-03-13 Thread aecc
Hi, I would like to know how is the correct way to add kafka to my project in
StandAlone YARN, given that now it's in a different artifact than the Spark
core.

I tried adding the dependency to my project but I get a
NotClassFoundException to my main class. Also, that makes my Jar file very
big, so it's not convenient.

The next code is how I run it:

SPARK_JAR=/opt/spark/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar
/opt/spark/bin/spark-class org.apache.spark.deploy.yarn.Client --jar
project.jar --class StarterClass --args yarn-standalone  --num-workers 1
--worker-cores 1 --master-memory 1536m --worker-memory 1536m

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-in-Yarn-tp2673.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.