Spark has partition discovery if your data is laid out in a parquet-friendly directory structure: http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery
You can also use wildcards to get subdirectories (I'm using spark 1.6 here) >> data2 = sqlContext.read.load("/my/data/parquetTable/*", "parquet") # gets all subdirectories >> <http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery>Another option would be to CREATE a Hive table on top of your data that uses PARTITIONED BY to identify the subdirectories, and then use Spark SQL to query that Hive table. There might be a cleaner way to do this in Spark 2.0+ but that is a common pattern for me in Spark 1.6 when I know the directory structure but don't have "=" signs in the paths. Jon Gregg On Fri, Feb 17, 2017 at 7:02 PM, 颜发才(Yan Facai) <facai....@gmail.com> wrote: > Hi, Abdelfatah, > How to you read these files? spark.read.parquet or spark.sql? > Could you show some code? > > > On Wed, Feb 15, 2017 at 8:47 PM, Ahmed Kamal Abdelfatah < > ahmed.abdelfa...@careem.com> wrote: > >> Hi folks, >> >> >> >> How can I force spark sql to recursively get data stored in parquet >> format from subdirectories ? In Hive, I could achieve this by setting few >> Hive configs. >> >> >> >> set hive.input.dir.recursive=true; >> >> set hive.mapred.supports.subdirectories=true; >> >> set hive.supports.subdirectories=true; >> >> set mapred.input.dir.recursive=true; >> >> >> >> I tried to set these configs through spark sql queries but I get 0 >> records all the times compared to hive which get me the expected results. I >> also put these confs in hive-site.xml file but nothing changed. How can I >> handle this issue ? >> >> >> >> Spark Version : 2.1.0 >> >> I used Hive 2.1.1 on emr-5.3.1 >> >> >> >> *Regards, * >> >> >> >> >> *Ahmed Kamal* >> *MTS in Data Science* >> >> *Email: **ahmed.abdelfa...@careem.com <ahmed.abdelfa...@careem.com>* >> >> >> >> >> > >