unsubscribe

2023-05-09 Thread Balakumar iyer S

Re: Spark 2.3 Dataframe Grouby operation throws IllegalArgumentException on Large dataset

2019-07-23 Thread Balakumar iyer S
shows is that an exception happened while writing out > the orc file, not what that underlying exception is, there should be at > least one more caused by under the one you included. > > Thanks, > > Bobby > > On Mon, Jul 22, 2019 at 5:58 AM Balakumar iyer S > wrote: > >>

Spark 2.3 Dataframe Grouby operation throws IllegalArgumentException on Large dataset

2019-07-22 Thread Balakumar iyer S
Hi , I am trying to perform a group by followed by aggregate collect set operation on a two column data-setschema (LeftData int , RightData int). code snippet val wind_2 = dframe.groupBy("LeftData").agg(collect_set(array("RightData")))

The following Java MR code works for small dataset but throws(arrayindexoutofBound) error for large dataset

2019-05-09 Thread Balakumar iyer S
Hi All, I am trying to read a orc file and perform groupBy operation on it , but When i run it on a large data set we are facing the following error message. Input format of INPUT DATA |178111256| 107125374| |178111256| 107148618| |178111256| 107175361| |178111256| 107189910| and we are

An alternative logic to collaborative filtering works fine but we are facing run time issues in executing the job

2019-04-16 Thread Balakumar iyer S
Hi , While running the following spark code in the cluster with following configuration it is spread into 3 job Id's CLUSTER CONFIGURATION 3 NODE CLUSTER NODE 1 - 64GB 16CORES NODE 2 - 64GB 16CORES NODE 3 - 64GB 16CORES At Job Id 2 job is stuck at the stage 51 of 254 and then it starts