Re: orc read issue n spark

2015-11-18 Thread Reynold Xin
What do you mean by starts delay scheduling? Are you saying it is no longer
doing local reads?

If that's the case you can increase the spark.locality.read timeout.

On Wednesday, November 18, 2015, Renu Yadav  wrote:

> Hi ,
> I am using spark 1.4.1 and saving orc file using
> df.write.format("orc").save("outputlocation")
>
> outputloation size 440GB
>
> and while reading df.read.format("orc").load("outputlocation").count
>
>
> it has 2618 partitions .
> the count operation runs fine uptil 2500 but starts delay scheduling after
> that which results in slow performance.
>
> *If anyone has any idea on this.Please do reply as I need this  very
> urgent*
>
> Thanks in advance
>
>
> Regards,
> Renu Yadav
>
>
>


orc read issue n spark

2015-11-18 Thread Renu Yadav
Hi ,
I am using spark 1.4.1 and saving orc file using
df.write.format("orc").save("outputlocation")

outputloation size 440GB

and while reading df.read.format("orc").load("outputlocation").count


it has 2618 partitions .
the count operation runs fine uptil 2500 but starts delay scheduling after
that which results in slow performance.

*If anyone has any idea on this.Please do reply as I need this  very urgent*

Thanks in advance


Regards,
Renu Yadav