Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread Ashish Dutt
Thanks you Akhil for the link Sincerely, Ashish Dutt PhD Candidate Department of Information Systems University of Malaya, Lembah Pantai, 50603 Kuala Lumpur, Malaysia On Wed, Jul 8, 2015 at 3:43 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Have a look

Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread Akhil Das
Have a look http://alvinalexander.com/scala/how-to-create-java-thread-runnable-in-scala, create two threads and call thread1.start(), thread2.start() Thanks Best Regards On Wed, Jul 8, 2015 at 1:06 PM, Ashish Dutt ashish.du...@gmail.com wrote: Thanks for your reply Akhil. How do you

Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread Srikanth
Your tableLoad() APIs are not actions. File will be read fully only when an action is performed. If the action is something like table1.join(table2), then I think both files will be read in parallel. Can you try that and look at the execution plan or in 1.4 this is shown in Spark UI. Srikanth On

Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread Brandon White
The point of running them in parallel would be faster creation of the tables. Has anybody been able to efficiently parallelize something like this in Spark? On Jul 8, 2015 12:29 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Whats the point of creating them in parallel? You can multi-thread it

Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread ayan guha
Do you have a benchmark to say running these two statements as it is will be slower than what you suggest? On 9 Jul 2015 01:06, Brandon White bwwintheho...@gmail.com wrote: The point of running them in parallel would be faster creation of the tables. Has anybody been able to efficiently

Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread Ashish Dutt
Thanks for your reply Akhil. How do you multithread it? Sincerely, Ashish Dutt On Wed, Jul 8, 2015 at 3:29 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Whats the point of creating them in parallel? You can multi-thread it run it in parallel though. Thanks Best Regards On Wed, Jul 8,

Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread Akhil Das
Whats the point of creating them in parallel? You can multi-thread it run it in parallel though. Thanks Best Regards On Wed, Jul 8, 2015 at 5:34 AM, Brandon White bwwintheho...@gmail.com wrote: Say I have a spark job that looks like following: def loadTable1() { val table1 =

Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-07 Thread Brandon White
Say I have a spark job that looks like following: def loadTable1() { val table1 = sqlContext.jsonFile(ss3://textfiledirectory/) table1.cache().registerTempTable(table1)} def loadTable2() { val table2 = sqlContext.jsonFile(ss3://testfiledirectory2/)