I already checked and G is taking 1 secs for each task. is this too much? if yes how to avoid this?
On 16 April 2015 at 21:58, Akhil Das <ak...@sigmoidanalytics.com> wrote: > Open the driver ui and see which stage is taking time, you can look > whether its adding any GC time etc. > > Thanks > Best Regards > > On Thu, Apr 16, 2015 at 9:56 PM, Jeetendra Gangele <gangele...@gmail.com> > wrote: > >> Hi All I have below code whether distinct is running for more time. >> >> blockingRdd is the combination of <Long,String> and it will have 400K >> records >> JavaPairRDD<Long,Integer> >> completeDataToprocess=blockingRdd.flatMapValues( new Function<String, >> Iterable<Integer>>(){ >> >> @Override >> public Iterable<Integer> call(String v1) throws Exception { >> return ckdao.getSingelkeyresult(v1); >> } >> }).distinct(32); >> >> I am running distinct on 800K records and its taking 2 hours on 16 cores >> and 20 GB RAM. >> > >