Hi.
I have built a Hive external table on top of a directory 'A' which has data
stored in ORC format. This directory has several subdirectories inside it,
each of which contains the actual ORC files.
These subdirectories are actually created by spark jobs which ingest data
from other sources and write it into this directory.
I tried creating a table and setting the table properties of the same as
*hive.mapred.supports.subdirectories=TRUE* and *mapred.input.dir.recursive*
*=TRUE*.
As a result of this, when i fire the simplest query of *select count(*)
from ExtTable* via the Hive CLI, it successfully gives me the expected
count of records in the table.
However, when i fire the same query via sparkSQL, i get count = 0.

I think the sparkSQL isn't able to descend into the subdirectories for
getting the data while hive is able to do so.
Are there any configurations needed to be set on the spark side so that
this works as it does via hive cli?
I am using Spark on YARN.

Thanks,
Rishikesh

Tags: subdirectories, subdirectory, recursive, recursion, hive external
table, orc, sparksql, yarn

Reply via email to