Re: In Java how can I create an RDD with a large number of elements

2014-12-08 Thread praveen seluka
Steve, Something like this will do I think = sc.parallelize(1 to 1000, 1000).flatMap(x = 1 to 10) the above will launch 1000 tasks (maps), with each task creating 10^5 numbers (total of 100 million elements) On Mon, Dec 8, 2014 at 6:17 PM, Steve Lewis lordjoe2...@gmail.com wrote: assume

executorAdded event to DAGScheduler

2014-09-26 Thread praveen seluka
Can someone explain the motivation behind passing executorAdded event to DAGScheduler ? *DAGScheduler *does *submitWaitingStages *when *executorAdded *method is called by *TaskSchedulerImpl*. I see some issue in the below code, *TaskSchedulerImpl.scala code* if (!executorsByHost.contains(o.host))

Re: executorAdded event to DAGScheduler

2014-09-26 Thread praveen seluka
Some corrections. On Fri, Sep 26, 2014 at 5:32 PM, praveen seluka praveen.sel...@gmail.com wrote: Can someone explain the motivation behind passing executorAdded event to DAGScheduler ? *DAGScheduler *does *submitWaitingStages *when *executorAdded *method is called by *TaskSchedulerImpl*. I

Re: executorAdded event to DAGScheduler

2014-09-26 Thread praveen seluka
) I’m not sure if it will create an issue when you have multiple workers in the same host, as submitWaitingStages is called everywhere and I never try such a deployment mode Best, -- Nan Zhu On Friday, September 26, 2014 at 8:02 AM, praveen seluka wrote: Can someone explain the motivation

Yarn Over-allocating Containers

2014-09-12 Thread praveen seluka
Hi all Am seeing a strange issue in Spark on Yarn(Stable). Let me know if known, or am missing something as it looks very fundamental. Launch a Spark job with 2 Containers. addContainerRequest called twice and then calls allocate to AMRMClient. This will get 2 Containers allocated. Fine as of

Re: API to add/remove containers inside an application

2014-09-05 Thread Praveen Seluka
Mailed our list - will send it to Spark Dev On Fri, Sep 5, 2014 at 11:28 AM, Rajat Gupta rgu...@qubole.com wrote: +1 on this. First step to more automated autoscaling of spark application master... On Fri, Sep 5, 2014 at 12:56 AM, Praveen Seluka psel...@qubole.com wrote: +user

Re: API to add/remove containers inside an application

2014-09-04 Thread Praveen Seluka
+user On Thu, Sep 4, 2014 at 10:53 PM, Praveen Seluka psel...@qubole.com wrote: Spark on Yarn has static allocation of resources. https://issues.apache.org/jira/browse/SPARK-3174 - This JIRA by Sandy is about adding and removing executors dynamically based on load. Even before doing

Re: import org.apache.spark.streaming.twitter._ in Shell

2014-07-15 Thread Praveen Seluka
If you want to make Twitter* classes available in your shell, I believe you could do the following 1. Change the parent pom module ordering - Move external/twitter before assembly 2. In assembly/pom.xm, add external/twitter dependency - this will package twitter* into the assembly jar Now when

Re: Number of executors change during job running

2014-07-11 Thread Praveen Seluka
If I understand correctly, you could not change the number of executors at runtime right(correct me if am wrong) - its defined when we start the application and fixed. Do you mean number of tasks? On Fri, Jul 11, 2014 at 6:29 AM, Tathagata Das tathagata.das1...@gmail.com wrote: Can you try

Re: Getting started : Spark on YARN issue

2014-06-20 Thread Praveen Seluka
Praveen Seluka psel...@qubole.com: I am trying to run Spark on YARN. I have a hadoop 2.2 cluster (YARN + HDFS) in EC2. Then, I compiled Spark using Maven with 2.2 hadoop profiles. Now am trying to run the example Spark job . (In Yarn-cluster mode). From my *local machine. *I have setup

Getting started : Spark on YARN issue

2014-06-19 Thread Praveen Seluka
I am trying to run Spark on YARN. I have a hadoop 2.2 cluster (YARN + HDFS) in EC2. Then, I compiled Spark using Maven with 2.2 hadoop profiles. Now am trying to run the example Spark job . (In Yarn-cluster mode). From my *local machine. *I have setup HADOOP_CONF_DIR environment variable