Can you paste your complete code? Did you try repartioning/increasing level
of parallelism to speed up the processing. Since you have 16 cores, and I'm
assuming your 400k records isn't bigger than a 10G dataset.

Thanks
Best Regards

On Thu, Apr 16, 2015 at 10:00 PM, Jeetendra Gangele <gangele...@gmail.com>
wrote:

> I already checked and G is taking 1 secs for each task. is this too much?
> if yes how to avoid this?
>
>
> On 16 April 2015 at 21:58, Akhil Das <ak...@sigmoidanalytics.com> wrote:
>
>> Open the driver ui and see which stage is taking time, you can look
>> whether its adding any GC time etc.
>>
>> Thanks
>> Best Regards
>>
>> On Thu, Apr 16, 2015 at 9:56 PM, Jeetendra Gangele <gangele...@gmail.com>
>> wrote:
>>
>>> Hi All I have below code whether distinct is running for more time.
>>>
>>> blockingRdd is the combination of <Long,String> and it will have 400K
>>> records
>>> JavaPairRDD<Long,Integer>
>>> completeDataToprocess=blockingRdd.flatMapValues( new Function<String,
>>> Iterable<Integer>>(){
>>>
>>> @Override
>>> public Iterable<Integer> call(String v1) throws Exception {
>>> return ckdao.getSingelkeyresult(v1);
>>> }
>>>  }).distinct(32);
>>>
>>> I am running distinct on 800K records and its taking 2 hours on 16 cores
>>> and 20 GB RAM.
>>>
>>
>>
>
>
>
>

Reply via email to