Re: Reparitioning Hive tables - Container killed by YARN for exceeding memory limits

2017-08-03 Thread Chetan Khatri
Thanks Holden ! On Thu, Aug 3, 2017 at 4:02 AM, Holden Karau wrote: > The memory overhead is based less on the total amount of data and more on > what you end up doing with the data (e.g. if your doing a lot of off-heap > processing or using Python you need to increase

Re: Reparitioning Hive tables - Container killed by YARN for exceeding memory limits

2017-08-02 Thread Holden Karau
The memory overhead is based less on the total amount of data and more on what you end up doing with the data (e.g. if your doing a lot of off-heap processing or using Python you need to increase it). Honestly most people find this number for their job "experimentally" (e.g. they try a few

Re: Reparitioning Hive tables - Container killed by YARN for exceeding memory limits

2017-08-02 Thread Chetan Khatri
Ryan, Thank you for reply. For 2 TB of Data what should be the value of spark.yarn.executor.memoryOverhead = ? with regards to this - i see issue at spark https://issues.apache.org/jira/browse/SPARK-18787 , not sure whether it works or not at Spark 2.0.1 ! can you elaborate more for

Re: Reparitioning Hive tables - Container killed by YARN for exceeding memory limits

2017-08-02 Thread Ryan Blue
Chetan, When you're writing to a partitioned table, you want to use a shuffle to avoid the situation where each task has to write to every partition. You can do that either by adding a repartition by your table's partition keys, or by adding an order by with the partition keys and then columns

Re: Reparitioning Hive tables - Container killed by YARN for exceeding memory limits

2017-08-02 Thread Ravindra
either increase overall executor memory if you have scope. or try to give more % to overhead memory from default of .7. Read this for more details. On Wed, Aug 2, 2017 at 11:03 PM Chetan Khatri

Re: Reparitioning Hive tables - Container killed by YARN for exceeding memory limits

2017-08-02 Thread Chetan Khatri
Can anyone please guide me with above issue. On Wed, Aug 2, 2017 at 6:28 PM, Chetan Khatri wrote: > Hello Spark Users, > > I have Hbase table reading and writing to Hive managed table where i > applied partitioning by date column which worked fine but it has

Reparitioning Hive tables - Container killed by YARN for exceeding memory limits

2017-08-02 Thread Chetan Khatri
Hello Spark Users, I have Hbase table reading and writing to Hive managed table where i applied partitioning by date column which worked fine but it has generate more number of files in almost 700 partitions but i wanted to use reparation to reduce File I/O by reducing number of files inside each