Hi,

Do you have a more detailed log/error message? 
Also, can you please provide us details on the tables (no of rows, columns, 
size etc).
Is this just a one time thing or something regular?
If it is a one time thing then I would tend more towards putting each table in 
HDFS (parquet or ORC) and then join them.
What is the Hive and Spark version?

Best regards

> On 2. Nov 2017, at 20:57, Chetan Khatri <chetan.opensou...@gmail.com> wrote:
> 
> Hello Spark Developers,
> 
> I have 3 tables that i am reading from HBase and wants to do join 
> transformation and save to Hive Parquet external table. Currently my join is 
> failing with container failed error.
> 
> 1. Read table A from Hbase with ~17 billion records.
> 2. repartition on primary key of table A
> 3. create temp view of table A Dataframe
> 4. Read table B from HBase with ~4 billion records
> 5. repartition on primary key of table B
> 6. create temp view of table B Dataframe
> 7. Join both view of A and B and create Dataframe C
> 8.  Join Dataframe C with table D
> 9. coleance(20) to reduce number of file creation on already repartitioned DF.
> 10. Finally store to external hive table with partition by skey.
> 
> Any Suggestion or resources you come across please do share suggestions on 
> this to optimize this.
> 
> Thanks
> Chetan

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to