Hi, I am reading the parquet file around 50+ G which has 4013 partitions with 240 columns. Below is my configuration
driver : 20G memory with 4 cores executors: 45 executors with 15G memory and 4 cores. I tried to read the data using both Dataframe read and using hive context to read the data using hive SQL but for the both cases, it throws me below error with no further description on error. hive_context.sql("select * from test.base_table where date='{0}'".format(part_dt)) sqlcontext.read.parquet("/path/to/partion/") # # java.lang.OutOfMemoryError: Requested array size exceeds VM limit # -XX:OnOutOfMemoryError="kill -9 %p" # Executing /bin/sh -c "kill -9 16953"... What could be wrong over here since I think increasing memory only will not help in this case since it reached the array size limit. Thanks, Bijay