Re: SqlContext parquet read OutOfMemoryError: Requested array size exceeds VM limit error
Thanks for the suggestions and links. The problem arises when I used DataFrame api to write but it works fine when doing insert overwrite in hive table. # Works good hive_context.sql("insert overwrite table {0} partiton (e_dt, c_dt) select * from temp_table".format(table_name)) # Doesn't work, throws java.lang.OutOfMemoryError: Requested array size exceeds VM limit df.write.mode('overwrite').partitionBy('e_dt','c_dt').parquet("/path/to/file/") Thanks, Bijay On Wed, May 4, 2016 at 3:02 PM, Prajwal Tuladharwrote: > If you are running on 64-bit JVM with less than 32G heap, you might want > to enable -XX:+UseCompressedOops[1]. And if your dataframe is somehow > generating more than 2^31-1 number of arrays, you might have to rethink > your options. > > [1] https://spark.apache.org/docs/latest/tuning.html > > On Wed, May 4, 2016 at 9:44 PM, Bijay Kumar Pathak > wrote: > >> Hi, >> >> I am reading the parquet file around 50+ G which has 4013 partitions with >> 240 columns. Below is my configuration >> >> driver : 20G memory with 4 cores >> executors: 45 executors with 15G memory and 4 cores. >> >> I tried to read the data using both Dataframe read and using hive context >> to read the data using hive SQL but for the both cases, it throws me below >> error with no further description on error. >> >> hive_context.sql("select * from test.base_table where >> date='{0}'".format(part_dt)) >> sqlcontext.read.parquet("/path/to/partion/") >> >> # >> # java.lang.OutOfMemoryError: Requested array size exceeds VM limit >> # -XX:OnOutOfMemoryError="kill -9 %p" >> # Executing /bin/sh -c "kill -9 16953"... >> >> >> What could be wrong over here since I think increasing memory only will >> not help in this case since it reached the array size limit. >> >> Thanks, >> Bijay >> > > > > -- > -- > Cheers, > Praj >
Re: SqlContext parquet read OutOfMemoryError: Requested array size exceeds VM limit error
If you are running on 64-bit JVM with less than 32G heap, you might want to enable -XX:+UseCompressedOops[1]. And if your dataframe is somehow generating more than 2^31-1 number of arrays, you might have to rethink your options. [1] https://spark.apache.org/docs/latest/tuning.html On Wed, May 4, 2016 at 9:44 PM, Bijay Kumar Pathakwrote: > Hi, > > I am reading the parquet file around 50+ G which has 4013 partitions with > 240 columns. Below is my configuration > > driver : 20G memory with 4 cores > executors: 45 executors with 15G memory and 4 cores. > > I tried to read the data using both Dataframe read and using hive context > to read the data using hive SQL but for the both cases, it throws me below > error with no further description on error. > > hive_context.sql("select * from test.base_table where > date='{0}'".format(part_dt)) > sqlcontext.read.parquet("/path/to/partion/") > > # > # java.lang.OutOfMemoryError: Requested array size exceeds VM limit > # -XX:OnOutOfMemoryError="kill -9 %p" > # Executing /bin/sh -c "kill -9 16953"... > > > What could be wrong over here since I think increasing memory only will > not help in this case since it reached the array size limit. > > Thanks, > Bijay > -- -- Cheers, Praj
Re: SqlContext parquet read OutOfMemoryError: Requested array size exceeds VM limit error
Have you seen this thread ? http://search-hadoop.com/m/q3RTtyXr2N13hf9O=java+lang+OutOfMemoryError+Requested+array+size+exceeds+VM+limit On Wed, May 4, 2016 at 2:44 PM, Bijay Kumar Pathakwrote: > Hi, > > I am reading the parquet file around 50+ G which has 4013 partitions with > 240 columns. Below is my configuration > > driver : 20G memory with 4 cores > executors: 45 executors with 15G memory and 4 cores. > > I tried to read the data using both Dataframe read and using hive context > to read the data using hive SQL but for the both cases, it throws me below > error with no further description on error. > > hive_context.sql("select * from test.base_table where > date='{0}'".format(part_dt)) > sqlcontext.read.parquet("/path/to/partion/") > > # > # java.lang.OutOfMemoryError: Requested array size exceeds VM limit > # -XX:OnOutOfMemoryError="kill -9 %p" > # Executing /bin/sh -c "kill -9 16953"... > > > What could be wrong over here since I think increasing memory only will > not help in this case since it reached the array size limit. > > Thanks, > Bijay >