Java heap error

2019-05-22 Thread Kumar sp
Hi , I am getting # java.lang.OutOfMemoryError: Java heap space . I have increased my driver memory and executor memory still i am facing this issue. I am using r4 for driver and core nodes(16). How can we see which step or whether its related to any GC . Can we pin point to single point on code

Error using .collect()

2019-05-13 Thread Kumar sp
I have a use case where i am using collect().toMap (Group by certain column and finding count ,creating map with a key) and use that map to enable some further calculations. I am getting Out of memory errors and is there any alternative than .collect() to create a structure like Map or some

Window function range between

2019-03-25 Thread Kumar sp
Hi, I am trying to use range between window function but i am keep on getting below error main" org.apache.spark.sql.AnalysisException: Window Frame specifiedwindowframe(RangeFrame, currentrow$(), 5) must match the required frame specified I need to check next consecutive 5 seconds interval

Avoiding MUltiple GroupBy

2019-02-18 Thread Kumar sp
Can we avoid multiple group by , l have a million records and its a performance concern. Below is my query , even with Windows functions also i guess it is a performance hit, can you please advice if there is a better alternative. I need to get max no of equipments for that house for list of

Design recommendation

2019-02-13 Thread Kumar sp
Hello I need a design recommendation. I need to calcualte a couple of calculations with min shuffling and better perf. I have an nested structure with say a class have n number of students and structure will be similiar to this { classId: String, StudendId:String, Score:Int, AreaCode:String}

[no subject]

2019-02-13 Thread Kumar sp