Spark join: grouping of records having same value for a particular column in same partition

2020-02-26 Thread ARAVIND ARUMUGHAM SETHURATHNAM
Hi, We have 2 Hive tables which are read in spark and joined using a join key, let’s call it user_id. Then, we write this joined dataset to S3 and register it hive as a 3rd table for subsequent tasks to use this joined dataset. One of the other columns in the joined dataset is called

unsubscribe

2018-06-15 Thread ARAVIND ARUMUGHAM Sethurathnam
unsubscribe -- Wealth is not money. Wealth is relationships with people.

Re: SPARK SQL: returns null for a column, while HIVE query returns data for the same column

2018-05-11 Thread ARAVIND ARUMUGHAM Sethurathnam
- this column was added in later partitions and not present in earlier ones. - - i assume partition pruning should just load from that particular partition i am specifying when using spark sql ? - (spark version 2.2) On Fri, May 11, 2018 at 2:24 PM, ARAVIND ARUMUGHAM

SPARK SQL: returns null for a column, while HIVE query returns data for the same column

2018-05-11 Thread ARAVIND ARUMUGHAM Sethurathnam
I have a hive table created on top of s3 DATA in parquet format and partitioned by one column named eventdate. 1) When using HIVE QUERY, it returns data for a column named "headertime" which is in the schema of BOTH the table and the file. select headertime from dbName.test_bug where