Re: How to join an RDD with a hive table?

2016-02-16 Thread swetha kasireddy
How to use a customPartttioner hashed by userId inside saveAsTable using a dataframe? On Mon, Feb 15, 2016 at 11:24 AM, swetha kasireddy < swethakasire...@gmail.com> wrote: > How about saving the dataframe as a table partitioned by userId? My User > records have userId, number of sessions, visit

Re: How to join an RDD with a hive table?

2016-02-15 Thread swetha kasireddy
How about saving the dataframe as a table partitioned by userId? My User records have userId, number of sessions, visit count etc as the columns and it should be partitioned by userId. I will need to join the userTable saved in the database as follows with an incoming session RDD. The session RDD

Re: How to join an RDD with a hive table?

2016-02-15 Thread swetha kasireddy
OK. would it only query for the records that I want in hive as per filter or just load the entire table? My user table will have millions of records and I do not want to cause OOM errors by loading the entire table in memory. On Mon, Feb 15, 2016 at 12:51 AM, Mich Talebzadeh

RE: How to join an RDD with a hive table?

2016-02-15 Thread Mich Talebzadeh
Also worthwhile using temporary tables for the joint query. I can join a Hive table with any other JDBC accessed table from any other databases with DF and temporary tables // //Get the FACT table from Hive // var s = HiveContext.sql("SELECT AMOUNT_SOLD, TIME_ID, CHANNEL_ID FROM

Re: How to join an RDD with a hive table?

2016-02-15 Thread Ted Yu
Have you tried creating a DataFrame from the RDD and join with DataFrame which corresponds to the hive table ? On Sun, Feb 14, 2016 at 9:53 PM, SRK wrote: > Hi, > > How to join an RDD with a hive table and retrieve only the records that I > am > interested. Suppose, I