Calling hiveContext.sql("insert into table xyz...") in multiple threads?

unk1102 Mon, 17 Aug 2015 09:40:55 -0700

Hi I have around 2000 Hive source partitions to process and insert data into
same table and different partition. For e.g. I have the following query


hiveContext.sql("insert into table myTable
partition(mypartition="someparition") bla bla)

If I call above query in Spark driver program it runs fine and creates
corresponding partition in HDFS. Now this works but it is very slow takes
4-5 hours to process all 2000 partitions. So I though of using
ExecutorService and calling above query with couple of similar insert into
queries in Callable threads. Now using threads become definitely faster but
I dont see any parition created in HDFS is it concurrency issue since every
thread is trying to insert into same table but different patition I see
tasks are running very fast and getting finished but dont see any partition
in HDFS please guide I am new to Spark and Hive.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Calling-hiveContext-sql-insert-into-table-xyz-in-multiple-threads-tp24298.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Calling hiveContext.sql("insert into table xyz...") in multiple threads?

Reply via email to