Hi,
We have 2 Hive tables which are read in spark and joined using a join key,
let’s call it user_id.
Then, we write this joined dataset to S3 and register it hive as a 3rd table
for subsequent tasks to use this joined dataset.
One of the other columns in the joined dataset is called
unsubscribe
--
Wealth is not money. Wealth is relationships with people.
- this column was added in later partitions and not present in earlier
ones.
-
- i assume partition pruning should just load from that particular
partition i am specifying when using spark sql ?
- (spark version 2.2)
On Fri, May 11, 2018 at 2:24 PM, ARAVIND ARUMUGHAM
I have a hive table created on top of s3 DATA in parquet format and
partitioned by one column named eventdate.
1) When using HIVE QUERY, it returns data for a column named "headertime"
which is in the schema of BOTH the table and the file.
select headertime from dbName.test_bug where