Parallelizing multiple RDD / DataFrame creation in Spark

Brandon White Tue, 07 Jul 2015 17:05:16 -0700

Say I have a spark job that looks like following:

def loadTable1() {
  val table1 = sqlContext.jsonFile(s"s3://textfiledirectory/")
  table1.cache().registerTempTable("table1")}
def loadTable2() {
  val table2 = sqlContext.jsonFile(s"s3://testfiledirectory2/")
  table2.cache().registerTempTable("table2")}


def loadAllTables() {
  loadTable1()
  loadTable2()}

loadAllTables()

How do I parallelize this Spark job so that both tables are created at the
same time or in parallel?

Parallelizing multiple RDD / DataFrame creation in Spark

Reply via email to