Re: Parquet problems

Anders Arpteg Thu, 25 Jun 2015 02:53:46 -0700

Yes, both the driver and the executors. Works a little bit better with more
space, but still a leak that will cause failure after a number of reads.
There are about 700 different data sources that needs to be loaded, lots of
data...


tor 25 jun 2015 08:02 Sabarish Sasidharan <sabarish.sasidha...@manthan.com>
skrev:

> Did you try increasing the perm gen for the driver?
>
> Regards
> Sab
> On 24-Jun-2015 4:40 pm, "Anders Arpteg" <arp...@spotify.com> wrote:
>
>> When reading large (and many) datasets with the Spark 1.4.0 DataFrames
>> parquet reader (the org.apache.spark.sql.parquet format), the following
>> exceptions are thrown:
>>
>> Exception in thread "task-result-getter-0"
>> Exception: java.lang.OutOfMemoryError thrown from the
>> UncaughtExceptionHandler in thread "task-result-getter-0"
>> Exception in thread "task-result-getter-3" java.lang.OutOfMemoryError:
>> PermGen space
>> Exception in thread "task-result-getter-1" java.lang.OutOfMemoryError:
>> PermGen space
>> Exception in thread "task-result-getter-2" java.lang.OutOfMemoryError:
>> PermGen space
>>
>> and many more like these from different threads. I've tried increasing
>> the PermGen space using the -XX:MaxPermSize VM setting, but even after
>> tripling the space, the same errors occur. I've also tried storing
>> intermediate results, and am able to get the full job completed by running
>> it multiple times and starting for the last successful intermediate result.
>> There seems to be some memory leak in the parquet format. Any hints on how
>> to fix this problem?
>>
>> Thanks,
>> Anders
>>
>

Re: Parquet problems

Reply via email to