set spark.sql.hive.verifyPartitionPath=true didn’t help. Still getting the same
error.
I tried to copy a file with a _ prefix and I am not getting the error and the
file is also ignored by SparkSQL. But when scheduling the job in prod and if
during one execution there is no data to be processed the query will again
fail. How to deal with this scenario.
From: Sea <261810...@qq.com>
Date: Sunday, May 21, 2017 at 8:04 AM
To: Steve Loughran , "Bajpai, Amit X. -ND"
Cc: "user@spark.apache.org"
Subject: Re: SparkSQL not able to read a empty table location
please try spark.sql.hive.verifyPartitionPath true
-- Original --
From: "Steve Loughran";;
Date: Sat, May 20, 2017 09:19 PM
To: "Bajpai, Amit X. -ND";
Cc: "user@spark.apache.org";
Subject: Re: SparkSQL not able to read a empty table location
On 20 May 2017, at 01:44, Bajpai, Amit X. -ND
mailto:n...@disney.com>> wrote:
Hi,
I have a hive external table with the S3 location having no files (but the S3
location directory does exists). When I am trying to use Spark SQL to count the
number of records in the table it is throwing error saying “File s3n://data/xyz
does not exist. null/0”.
select * from tablex limit 10
Can someone let me know how we can fix this issue.
Thanks
There isn't really a "directory" in S3, just a set of objects whose paths begin
with a string. Try creating an empty file with an _ prefix in the directory; it
should be ignored by Spark SQL but will cause the "directory" to come into being