- this column was added in later partitions and not present in earlier ones. - - i assume partition pruning should just load from that particular partition i am specifying when using spark sql ? - (spark version 2.2)
On Fri, May 11, 2018 at 2:24 PM, ARAVIND ARUMUGHAM Sethurathnam < arvind...@gmail.com> wrote: > I have a hive table created on top of s3 DATA in parquet format and > partitioned by one column named eventdate. > > 1) When using HIVE QUERY, it returns data for a column named "headertime" > which is in the schema of BOTH the table and the file. > > select headertime from dbName.test_bug where eventdate=20180510 limit 10 > > 2) FROM a scala NOTEBOOK , when directly loading a file from a particular > partition that also works, > > val session = org.apache.spark.sql.SparkSession.builder > .appName("searchRequests") > .enableHiveSupport() > .getOrCreate; > > val searchRequest = > session.sqlContext.read.parquet("s3n://bucketName/module/search_request/eventDate=20180510") > > searchRequest.createOrReplaceTempView("SearchRequest") > > val exploreDF = session.sql("select headertime from SearchRequest where > SearchRequestHeaderDate='2018-05-10' limit 100") > > exploreDF.show(20) > > this also displays the values for the column "headertime" > > 3) But, when using spark sql to query directly the HIVE table as below, > > val exploreDF = session.sql("select headertime from > > dbName.test_bug where eventdate=20180510 limit 100") > > exploreDF.show(20) > > it keeps returning null always. > > I opened the parquet file and see that the column headertime is present > with values, but not sure why spark SQL is not able to read the values for > that column. > > it will be helpful if someone can point out from where the spark SQL gets > the schema? I was expecting it to behave similar to the HIVE QUERY > > -- Wealth is not money. Wealth is relationships with people.