Hi, Do you have a more detailed log/error message? Also, can you please provide us details on the tables (no of rows, columns, size etc). Is this just a one time thing or something regular? If it is a one time thing then I would tend more towards putting each table in HDFS (parquet or ORC) and then join them. What is the Hive and Spark version?
Best regards > On 2. Nov 2017, at 20:57, Chetan Khatri <chetan.opensou...@gmail.com> wrote: > > Hello Spark Developers, > > I have 3 tables that i am reading from HBase and wants to do join > transformation and save to Hive Parquet external table. Currently my join is > failing with container failed error. > > 1. Read table A from Hbase with ~17 billion records. > 2. repartition on primary key of table A > 3. create temp view of table A Dataframe > 4. Read table B from HBase with ~4 billion records > 5. repartition on primary key of table B > 6. create temp view of table B Dataframe > 7. Join both view of A and B and create Dataframe C > 8. Join Dataframe C with table D > 9. coleance(20) to reduce number of file creation on already repartitioned DF. > 10. Finally store to external hive table with partition by skey. > > Any Suggestion or resources you come across please do share suggestions on > this to optimize this. > > Thanks > Chetan --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org