Re: Running multiple threads with same Spark Context

2015-02-25 Thread Yana Kadiyska
I am not sure if your issue is setting the Fair mode correctly or something
else so let's start with the FAIR mode.

Do you see scheduler mode actually being set to FAIR:

I have this line in spark-defaults.conf
spark.scheduler.allocation.file=/spark/conf/fairscheduler.xml

Then, when I start my application, I can see that it is using that
scheduler in the UI -- go to master UI and click on your application. Then
you should see this (note the scheduling mode is shown as Fair):





On Wed, Feb 25, 2015 at 4:06 AM, Harika Matha matha.har...@gmail.com
wrote:

 Hi Yana,

 I tried running the program after setting the property
 spark.scheduler.mode to FAIR. But the result is same as previous. Are
 there any other properties that have to be set?


 On Tue, Feb 24, 2015 at 10:26 PM, Yana Kadiyska yana.kadiy...@gmail.com
 wrote:

 It's hard to tell. I have not run this on EC2 but this worked for me:

 The only thing that I can think of is that the scheduling mode is set to

- *Scheduling Mode:* FAIR


 val pool: ExecutorService = Executors.newFixedThreadPool(poolSize)
 while_loop to get curr_job
  pool.execute(new ReportJob(sqlContext, curr_job, i))

 class ReportJob(sqlContext:org.apache.spark.sql.hive.HiveContext,query: 
 String,id:Int) extends Runnable with Logging {
   def threadId = (Thread.currentThread.getName() + \t)

   def run() {
 logInfo(s* Running ${threadId} ${id})
 val startTime = Platform.currentTime
 val hiveQuery=query
 val result_set = sqlContext.sql(hiveQuery)
 result_set.repartition(1)
 result_set.saveAsParquetFile(shdfs:///tmp/${id})
 logInfo(s* DONE ${threadId} ${id} time: 
 +(Platform.currentTime-startTime))
   }
 }

 ​

 On Tue, Feb 24, 2015 at 4:04 AM, Harika matha.har...@gmail.com wrote:

 Hi all,

 I have been running a simple SQL program on Spark. To test the
 concurrency,
 I have created 10 threads inside the program, all threads using same
 SQLContext object. When I ran the program on my EC2 cluster using
 spark-submit, only 3 threads were running in parallel. I have repeated
 the
 test on different EC2 clusters (containing different number of cores) and
 found out that only 3 threads are running in parallel on every cluster.

 Why is this behaviour seen? What does this number 3 specify?
 Is there any configuration parameter that I have to set if I want to run
 more threads concurrently?

 Thanks
 Harika



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Running-multiple-threads-with-same-Spark-Context-tp21784.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Running multiple threads with same Spark Context

2015-02-25 Thread Harika Matha
Hi Yana,

I tried running the program after setting the property
spark.scheduler.mode to FAIR. But the result is same as previous. Are
there any other properties that have to be set?


On Tue, Feb 24, 2015 at 10:26 PM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:

 It's hard to tell. I have not run this on EC2 but this worked for me:

 The only thing that I can think of is that the scheduling mode is set to

- *Scheduling Mode:* FAIR


 val pool: ExecutorService = Executors.newFixedThreadPool(poolSize)
 while_loop to get curr_job
  pool.execute(new ReportJob(sqlContext, curr_job, i))

 class ReportJob(sqlContext:org.apache.spark.sql.hive.HiveContext,query: 
 String,id:Int) extends Runnable with Logging {
   def threadId = (Thread.currentThread.getName() + \t)

   def run() {
 logInfo(s* Running ${threadId} ${id})
 val startTime = Platform.currentTime
 val hiveQuery=query
 val result_set = sqlContext.sql(hiveQuery)
 result_set.repartition(1)
 result_set.saveAsParquetFile(shdfs:///tmp/${id})
 logInfo(s* DONE ${threadId} ${id} time: 
 +(Platform.currentTime-startTime))
   }
 }

 ​

 On Tue, Feb 24, 2015 at 4:04 AM, Harika matha.har...@gmail.com wrote:

 Hi all,

 I have been running a simple SQL program on Spark. To test the
 concurrency,
 I have created 10 threads inside the program, all threads using same
 SQLContext object. When I ran the program on my EC2 cluster using
 spark-submit, only 3 threads were running in parallel. I have repeated the
 test on different EC2 clusters (containing different number of cores) and
 found out that only 3 threads are running in parallel on every cluster.

 Why is this behaviour seen? What does this number 3 specify?
 Is there any configuration parameter that I have to set if I want to run
 more threads concurrently?

 Thanks
 Harika



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Running-multiple-threads-with-same-Spark-Context-tp21784.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Running multiple threads with same Spark Context

2015-02-24 Thread Harika
Hi all,

I have been running a simple SQL program on Spark. To test the concurrency,
I have created 10 threads inside the program, all threads using same
SQLContext object. When I ran the program on my EC2 cluster using
spark-submit, only 3 threads were running in parallel. I have repeated the
test on different EC2 clusters (containing different number of cores) and
found out that only 3 threads are running in parallel on every cluster. 

Why is this behaviour seen? What does this number 3 specify?
Is there any configuration parameter that I have to set if I want to run
more threads concurrently?

Thanks
Harika



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Running-multiple-threads-with-same-Spark-Context-tp21784.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Running multiple threads with same Spark Context

2015-02-24 Thread Yana Kadiyska
It's hard to tell. I have not run this on EC2 but this worked for me:

The only thing that I can think of is that the scheduling mode is set to

   - *Scheduling Mode:* FAIR


val pool: ExecutorService = Executors.newFixedThreadPool(poolSize)
while_loop to get curr_job
 pool.execute(new ReportJob(sqlContext, curr_job, i))

class ReportJob(sqlContext:org.apache.spark.sql.hive.HiveContext,query:
String,id:Int) extends Runnable with Logging {
  def threadId = (Thread.currentThread.getName() + \t)

  def run() {
logInfo(s* Running ${threadId} ${id})
val startTime = Platform.currentTime
val hiveQuery=query
val result_set = sqlContext.sql(hiveQuery)
result_set.repartition(1)
result_set.saveAsParquetFile(shdfs:///tmp/${id})
logInfo(s* DONE ${threadId} ${id} time:
+(Platform.currentTime-startTime))
  }
}

​

On Tue, Feb 24, 2015 at 4:04 AM, Harika matha.har...@gmail.com wrote:

 Hi all,

 I have been running a simple SQL program on Spark. To test the concurrency,
 I have created 10 threads inside the program, all threads using same
 SQLContext object. When I ran the program on my EC2 cluster using
 spark-submit, only 3 threads were running in parallel. I have repeated the
 test on different EC2 clusters (containing different number of cores) and
 found out that only 3 threads are running in parallel on every cluster.

 Why is this behaviour seen? What does this number 3 specify?
 Is there any configuration parameter that I have to set if I want to run
 more threads concurrently?

 Thanks
 Harika



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Running-multiple-threads-with-same-Spark-Context-tp21784.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org