subject:"orc read issue n spark"

Re: orc read issue n spark

2015-11-18 Thread Reynold Xin

What do you mean by starts delay scheduling? Are you saying it is no longer doing local reads? If that's the case you can increase the spark.locality.read timeout. On Wednesday, November 18, 2015, Renu Yadav wrote: > Hi , > I am using spark 1.4.1 and saving orc file using >

orc read issue n spark

2015-11-18 Thread Renu Yadav

Hi , I am using spark 1.4.1 and saving orc file using df.write.format("orc").save("outputlocation") outputloation size 440GB and while reading df.read.format("orc").load("outputlocation").count it has 2618 partitions . the count operation runs fine uptil 2500 but starts delay scheduling after