Can you paste your complete code? Did you try repartioning/increasing level of parallelism to speed up the processing. Since you have 16 cores, and I'm assuming your 400k records isn't bigger than a 10G dataset.
Thanks Best Regards On Thu, Apr 16, 2015 at 10:00 PM, Jeetendra Gangele <gangele...@gmail.com> wrote: > I already checked and G is taking 1 secs for each task. is this too much? > if yes how to avoid this? > > > On 16 April 2015 at 21:58, Akhil Das <ak...@sigmoidanalytics.com> wrote: > >> Open the driver ui and see which stage is taking time, you can look >> whether its adding any GC time etc. >> >> Thanks >> Best Regards >> >> On Thu, Apr 16, 2015 at 9:56 PM, Jeetendra Gangele <gangele...@gmail.com> >> wrote: >> >>> Hi All I have below code whether distinct is running for more time. >>> >>> blockingRdd is the combination of <Long,String> and it will have 400K >>> records >>> JavaPairRDD<Long,Integer> >>> completeDataToprocess=blockingRdd.flatMapValues( new Function<String, >>> Iterable<Integer>>(){ >>> >>> @Override >>> public Iterable<Integer> call(String v1) throws Exception { >>> return ckdao.getSingelkeyresult(v1); >>> } >>> }).distinct(32); >>> >>> I am running distinct on 800K records and its taking 2 hours on 16 cores >>> and 20 GB RAM. >>> >> >> > > > >