Re: Reading Hive tables Parallel in Spark

Pralabh Kumar Mon, 17 Jul 2017 05:27:14 -0700

Run the spark context in multithreaded way .

Something like this


val spark =  SparkSession.builder()
  .appName("practice")
  .config("spark.scheduler.mode","FAIR")
  .enableHiveSupport().getOrCreate()
val sc = spark.sparkContext
val hc = spark.sqlContext


val thread1 = new Thread {
     override def run {
       hc.sql("select * from table1")
     }
   }

   val thread2 = new Thread {
     override def run {
       hc.sql("select * from table2")
     }
   }

   thread1.start()
   thread2.start()



On Mon, Jul 17, 2017 at 5:42 PM, FN <nuson.fr...@gmail.com> wrote:

> Hi
> I am currently trying to parallelize reading multiple tables from Hive . As
> part of an archival framework, i need to convert few hundred tables which
> are in txt format to Parquet. For now i am calling a Spark SQL inside a for
> loop for conversion. But this execute sequential and entire process takes
> longer time to finish.
>
> I tired  submitting 4 different Spark jobs ( each having set of tables to
> be
> converted), it did give me some parallelism , but i would like to do this
> in
> single Spark job due to few limitation of our cluster and process
>
> Any help will be greatly appreciated
>
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Reading-Hive-tables-Parallel-in-Spark-tp28869.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: Reading Hive tables Parallel in Spark

Reply via email to