- this column was added in later partitions and not present in earlier
   ones.
   -
   - i assume partition pruning should just load from that particular
   partition i am specifying when using spark sql ?
   -   (spark version 2.2)


On Fri, May 11, 2018 at 2:24 PM, ARAVIND ARUMUGHAM Sethurathnam <
arvind...@gmail.com> wrote:

> I have a hive table created on top of s3 DATA in parquet format and
> partitioned by one column named eventdate.
>
> 1) When using HIVE QUERY, it returns data for a column named "headertime"
> which is in the schema of BOTH the table and the file.
>
> select headertime from dbName.test_bug where eventdate=20180510 limit 10
>
> 2) FROM a scala NOTEBOOK , when directly loading a file from a particular
> partition that also works,
>
> val session = org.apache.spark.sql.SparkSession.builder
> .appName("searchRequests")
> .enableHiveSupport()
> .getOrCreate;
>
> val searchRequest = 
> session.sqlContext.read.parquet("s3n://bucketName/module/search_request/eventDate=20180510")
>
> searchRequest.createOrReplaceTempView("SearchRequest")
>
> val exploreDF = session.sql("select headertime from SearchRequest where 
> SearchRequestHeaderDate='2018-05-10' limit 100")
>
> exploreDF.show(20)
>
> this also displays the values for the column "headertime"
>
> 3) But, when using spark sql to query directly the HIVE table as below,
>
> val exploreDF = session.sql("select headertime from
>
> dbName.test_bug where eventdate=20180510 limit 100")
>
> exploreDF.show(20)
>
> it keeps returning null always.
>
> I opened the parquet file and see that the column headertime is present
> with values, but not sure why spark SQL is not able to read the values for
> that column.
>
> it will be helpful if someone can point out from where the spark SQL gets
> the schema? I was expecting it to behave similar to the HIVE QUERY
>
>


-- 
Wealth is not money. Wealth is relationships with people.

Reply via email to