Douglas Drinka created SPARK-28098: -------------------------------------- Summary: Native ORC reader doesn't support subdirectories with Hive tables Key: SPARK-28098 URL: https://issues.apache.org/jira/browse/SPARK-28098 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.3 Reporter: Douglas Drinka
The Hive ORC reader supports recursive directory reads from S3. Spark's native ORC reader supports recursive directory reads, but not when used with Hive. {code:java} val testData = List(1,2,3,4,5) val dataFrame = testData.toDF() dataFrame .coalesce(1) .write .mode(SaveMode.Overwrite) .format("orc") .option("compression", "zlib") .save("s3://ddrinka.sparkbug/dirTest/dir1/dir2/") spark.sql("DROP TABLE IF EXISTS ddrinka_sparkbug.dirTest") spark.sql("CREATE EXTERNAL TABLE ddrinka_sparkbug.dirTest (val INT) STORED AS ORC LOCATION 's3://ddrinka.sparkbug/dirTest/'") spark.conf.set("hive.mapred.supports.subdirectories","true") spark.conf.set("mapred.input.dir.recursive","true") spark.conf.set("mapreduce.input.fileinputformat.input.dir.recursive","true") spark.conf.set("spark.sql.hive.convertMetastoreOrc", "true") println(spark.sql("SELECT * FROM ddrinka_sparkbug.dirTest").count) //0 spark.conf.set("spark.sql.hive.convertMetastoreOrc", "false") println(spark.sql("SELECT * FROM ddrinka_sparkbug.dirTest").count) //5{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org