Also worthwhile using temporary tables for the joint query.

 

I can join a Hive table with any other JDBC accessed table from any other 
databases with DF and temporary tables 

 

//

//Get the FACT table from Hive

//

var s = HiveContext.sql("SELECT AMOUNT_SOLD, TIME_ID, CHANNEL_ID FROM 
oraclehadoop.sales")

 

//

//Get the Dimension table from Oracle via JDBC

//

val c = HiveContext.load("jdbc",

Map("url" -> "jdbc:oracle:thin:@rhes564:1521:mydb",

"dbtable" -> "(SELECT to_char(CHANNEL_ID) AS CHANNEL_ID, CHANNEL_DESC FROM 
sh.channels)",

"user" -> "sh",

"password" -> "xxx"))

 

 

s.registerTempTable("t_s")

c.registerTempTable("t_c")

 

And do the join

 

SELECT rs.Month, rs.SalesChannel, round(TotalSales,2)

FROM

(

SELECT t_t.CALENDAR_MONTH_DESC AS Month, t_c.CHANNEL_DESC AS SalesChannel, 
SUM(t_s.AMOUNT_SOLD) AS TotalSales

FROM t_s, t_t, t_c

WHERE t_s.TIME_ID = t_t.TIME_ID

AND   t_s.CHANNEL_ID = t_c.CHANNEL_ID

GROUP BY t_t.CALENDAR_MONTH_DESC, t_c.CHANNEL_DESC

ORDER by t_t.CALENDAR_MONTH_DESC, t_c.CHANNEL_DESC

) rs

LIMIT 1000

"""

HiveContext.sql(sqltext).collect.foreach(println)

 

HTH

 

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Technology Ltd, its subsidiaries nor their employees 
accept any responsibility.

 

 

From: Ted Yu [mailto:yuzhih...@gmail.com] 
Sent: 15 February 2016 08:44
To: SRK <swethakasire...@gmail.com>
Cc: user <user@spark.apache.org>
Subject: Re: How to join an RDD with a hive table?

 

Have you tried creating a DataFrame from the RDD and join with DataFrame which 
corresponds to the hive table ?

 

On Sun, Feb 14, 2016 at 9:53 PM, SRK <swethakasire...@gmail.com 
<mailto:swethakasire...@gmail.com> > wrote:

Hi,

How to join an RDD with a hive table and retrieve only the records that I am
interested. Suppose, I have an RDD that has 1000 records and there is a Hive
table with 100,000 records, I should be able to join the RDD with the hive
table  by an Id and I should be able to load only those 1000 records from
Hive table so that are no memory issues. Also, I was planning on storing the
data in hive in the form of parquet files. Any help on this is greatly
appreciated.

Thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-join-an-RDD-with-a-hive-table-tp26225.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
<mailto:user-unsubscr...@spark.apache.org> 
For additional commands, e-mail: user-h...@spark.apache.org 
<mailto:user-h...@spark.apache.org> 

 

Reply via email to