Thanks for your reply. It makes sense as to why the option is not provided.
(Since the user is the one who is imperatively asking spark to read the
files.)

Yes, I provide the list of files. I'll try the ignoreCorruptFiles option.
Also, I'll look into how I can avoid missing files or at least check if
file is present before reading.

Regards,
Naresh

On Mon, Jul 1, 2019, 19:34 Steve Loughran <ste...@cloudera.com.invalid>
wrote:

> Where is this list of files coming from?
>
> If you made the list, then yes, the expectation is generally "supply a
> list of files which are present" on the basis that general convention is
> "missing files are considered bad"
>
> Though you could try setting spark.sql.files.ignoreCorruptFiles=true to
> see what happens
>
> Past discussion on the topic of : what if the set of files off s3 includes
> files which have been moved offline, where the conclusion was "you get to
> filter, sorry"
>
> https://issues.apache.org/jira/browse/SPARK-21797
>
>
>
> On Mon, Jul 1, 2019 at 2:52 AM Naresh Peshwe <nareshpeshwe12...@gmail.com>
> wrote:
>
>> Hi All,
>> When I try to read a list parquet files from S3, my application errors
>> out if even one of the files are absent. When I searched for solutions most
>> of them suggested filtering the list of files (on presence) before calling
>> read.
>> Shouldn't this be handled by Spark by providing an option for continuing
>> without throwing an error? If not, could you point me to the thread where
>> this was discussed upon.
>>
>>
>> Regards,
>> Naresh
>>
>

Reply via email to