Open the driver ui and see which stage is taking time, you can look whether its adding any GC time etc.
Thanks Best Regards On Thu, Apr 16, 2015 at 9:56 PM, Jeetendra Gangele <gangele...@gmail.com> wrote: > Hi All I have below code whether distinct is running for more time. > > blockingRdd is the combination of <Long,String> and it will have 400K > records > JavaPairRDD<Long,Integer> completeDataToprocess=blockingRdd.flatMapValues( > new Function<String, Iterable<Integer>>(){ > > @Override > public Iterable<Integer> call(String v1) throws Exception { > return ckdao.getSingelkeyresult(v1); > } > }).distinct(32); > > I am running distinct on 800K records and its taking 2 hours on 16 cores > and 20 GB RAM. >