Re: SPARK Storagelevel issues

2017-07-29 Thread ??????????
...@gmail.com> Date: 2017/7/28 17:25:03 To: "Gourav Sengupta"<gourav.sengu...@gmail.com>; Cc: "user"<user@spark.apache.org>; Subject: Re: SPARK Storagelevel issues All right, i did not catch the point ,sorry for that.But you can take a snapshot of the heap, and the

Re: SPARK Storagelevel issues

2017-07-28 Thread 周康
All right, i did not catch the point ,sorry for that. But you can take a snapshot of the heap, and then analysis heap dump by mat or other tools. >From the code i can not find any clue. 2017-07-28 17:09 GMT+08:00 Gourav Sengupta : > Hi, > > I have done all of that, but

Re: SPARK Storagelevel issues

2017-07-28 Thread Gourav Sengupta
Hi, I have done all of that, but my question is "why should a 62 MB data give memory error when we have over 2 GB of memory available". Therefore all that is mentioned by Zhoukang is not pertinent at all. Regards, Gourav Sengupta On Fri, Jul 28, 2017 at 4:43 AM, 周康

Re: SPARK Storagelevel issues

2017-07-27 Thread 周康
testdf.persist(pyspark.storagelevel.StorageLevel.MEMORY_ONLY_SER) maybe StorageLevel should change.And check you config " spark.memory.storageFraction" which default value is 0.5 2017-07-28 3:04 GMT+08:00 Gourav Sengupta : > Hi, > > I cached in a table in a large EMR

SPARK Storagelevel issues

2017-07-27 Thread Gourav Sengupta
Hi, I cached in a table in a large EMR cluster and it has a size of 62 MB. Therefore I know the size of the table while cached. But when I am trying to cache in the table in smaller cluster which still has a total of 3 GB Driver memory and two executors with close to 2.5 GB memory the job still