Re: can't kill spark job in supervise mode

2016-01-30 Thread PhuDuc Nguyen
running? > > Tim > > On Sat, Jan 30, 2016 at 8:19 AM, PhuDuc Nguyen <duc.was.h...@gmail.com> > wrote: > >> I have a spark job running on Mesos in multi-master and supervise mode. >> If I kill it, it is resilient as expected and respawns on another node. >

can't kill spark job in supervise mode

2016-01-30 Thread PhuDuc Nguyen
I have a spark job running on Mesos in multi-master and supervise mode. If I kill it, it is resilient as expected and respawns on another node. However, I cannot kill it when I need to. I have tried 2 methods: 1) ./bin/spark-class org.apache.spark.deploy.Client kill 2)

Re: Spark Streaming + Kafka + scala job message read issue

2015-12-25 Thread PhuDuc Nguyen
Vivek, Did you say you have 8 spark jobs that are consuming from the same topic and all jobs are using the same consumer group name? If so, each job would get a subset of messages from that kafka topic, ie each job would get 1 out of 8 messages from that topic. Is that your intent? regards, Duc

Re: Preventing an RDD from shuffling

2015-12-16 Thread PhuDuc Nguyen
There is a way and it's called "map-side-join". To be clear, there is no explicit function call/API to execute a map-side-join. You have to code it using a local/broadcast value combined with the map() function. A caveat for this to work is that one side of the join must be small-ish to exist as a

Re: [mesos][docker] addFile doesn't work properly

2015-12-10 Thread PhuDuc Nguyen
Have you tried setting spark.mesos.uri property like val conf = new SparkConf().set("spark.mesos.uris", ...) val sc = new SparkContext(conf) ... http://spark.apache.org/docs/latest/running-on-mesos.html HTH, Duc On Thu, Dec 10, 2015 at 1:04 PM, PHELIPOT, REMY

Re: Need to maintain the consumer offset by myself when using spark streaming kafka direct approach?

2015-12-08 Thread PhuDuc Nguyen
Kafka Receiver-based approach: This will maintain the consumer offsets in ZK for you. Kafka Direct approach: You can use checkpointing and that will maintain consumer offsets for you. You'll want to checkpoint to a highly available file system like HDFS or S3.

Re: Spark UI - Streaming Tab

2015-12-04 Thread PhuDuc Nguyen
I believe the "Streaming" tab is dynamic - it appears once you have a streaming job running, not when the cluster is simply up. It does not depend on 1.6 and has been in there since at least 1.0. HTH, Duc On Fri, Dec 4, 2015 at 7:28 AM, patcharee wrote: > Hi, > > We

Re: Parallelizing operations using Spark

2015-11-17 Thread PhuDuc Nguyen
You should try passing your solr writer into rdd.foreachPartition() for max parallelism - each partition on each executor will execute the function passed in. HTH, Duc On Tue, Nov 17, 2015 at 7:36 AM, Susheel Kumar wrote: > Any input/suggestions on parallelizing below

Re: dynamic allocation w/ spark streaming on mesos?

2015-11-11 Thread PhuDuc Nguyen
ramming Scala, 2nd Edition > <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly) > Typesafe <http://typesafe.com> > @deanwampler <http://twitter.com/deanwampler> > http://polyglotprogramming.com > > On Wed, Nov 11, 2015 at 8:09 AM, PhuDuc Nguyen <duc.was.h...@gm

dynamic allocation w/ spark streaming on mesos?

2015-11-11 Thread PhuDuc Nguyen
I'm trying to get Spark Streaming to scale up/down its number of executors within Mesos based on workload. It's not scaling down. I'm using Spark 1.5.1 reading from Kafka using the direct (receiver-less) approach. Based on this ticket https://issues.apache.org/jira/browse/SPARK-6287 with the

Re: dynamic allocation w/ spark streaming on mesos?

2015-11-11 Thread PhuDuc Nguyen
> 2. Use StreamingListener to get the scheduling delay and processing times, > and use that do a request or kill executors. > > TD > > On Wed, Nov 11, 2015 at 9:48 AM, PhuDuc Nguyen <duc.was.h...@gmail.com> > wrote: > >> Dean, >> >> Thanks for the reply. I'm sea