RE: Trying to join a registered Hive table as temporary with two Oracle tables registered as temporary in Spark

2016-02-14 Thread Mich Talebzadeh
Thanks very much Sab that did the trick.

 

I can join a FACT table from Hive (ORC partitioned + bucketed) with dimension 
tables from Oracle 

 

Sounds like HiveContext  is a superset of SQLContext

 

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Technology Ltd, its subsidiaries nor their employees 
accept any responsibility.

 

 

From: Sabarish Sasidharan [mailto:sabarish.sasidha...@manthan.com] 
Sent: 14 February 2016 23:37
To: Mich Talebzadeh 
Cc: user 
Subject: RE: Trying to join a registered Hive table as temporary with two 
Oracle tables registered as temporary in Spark

 

The Hive context can be used instead of sql context even when you are accessing 
data from non-Hive sources like mysql or postgres for ex.  It has better sql 
support than the sqlcontext as it uses the HiveQL parser.

Regards
Sab

On 15-Feb-2016 3:07 am, "Mich Talebzadeh" mailto:m...@peridale.co.uk> > wrote:

Thanks.

 

I tried to access Hive table via JDBC (it works) through sqlContext

 

 

scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)

sqlContext: org.apache.spark.sql.SQLContext = 
org.apache.spark.sql.SQLContext@4f60415b 
<mailto:org.apache.spark.sql.SQLContext@4f60415b> 

 

scala> val s = sqlContext.load("jdbc",

 | Map("url" -> "jdbc:hive2://rhes564:10010/oraclehadoop",

 | "dbtable" -> "SALES",

 | "user" -> "hduser",

 | "password" -> "xxx"))

warning: there were 1 deprecation warning(s); re-run with -deprecation for 
details

java.sql.SQLException: Method not supported

 

In general one should expect this to work

 

The attraction of Spark is to cache these tables in memory via registering them 
as temporary tables and do the queries there.

 

 

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Technology Ltd, its subsidiaries nor their employees 
accept any responsibility.

 

 

From: ayan guha [mailto:guha.a...@gmail.com <mailto:guha.a...@gmail.com> ] 
Sent: 14 February 2016 21:07
To: Mich Talebzadeh mailto:m...@peridale.co.uk> >
Cc: user mailto:user@spark.apache.org> >
Subject: Re: Trying to join a registered Hive table as temporary with two 
Oracle tables registered as temporary in Spark

 

Why can't you use the jdbc in hive context? I don't think sharing data across 
contexts are allowed. 

On 15 Feb 2016 07:22, "Mich Talebzadeh" mailto:m...@peridale.co.uk> > wrote:

I am intending to get a table from Hive and register it as temporary table in 
Spark.

 

I have created contexts for both Hive and Spark as below

 

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)

//

 

I get the Hive table as below using HiveContext

 

//Get the FACT table from Hive

//

var s = hiveContext.sql("SELECT AMOUNT_SOLD, TIME_ID, CHANNEL_ID FROM 
oraclehadoop.sales")

 

s.registerTempTable("t_s")

 

This works fine using HiveContext

 

scala> hiveContext.sql("select count(1) from t_s").collect.foreach(println)

[4991761]

 

Now I use JDBC to get data from two Oracle tables and registar them as 
temporary tables using sqlContext

 

val c = sqlContext.load("jdbc",

Map("url" -> "jdbc:oracle:thin:@rhes564:1521:mydb",

"dbtable" -> "(SELECT to_char(CHANNEL_ID) AS CHANNEL_ID, CHANNEL_DESC FROM 
sh.channels)",

"user" -> "sh",

"password" -> "xxx"))

 

val t = sqlContext.load("jdbc",

Map("url" -> "jdbc:oracle:thin:@rhes564:1521:mydb&qu

RE: Trying to join a registered Hive table as temporary with two Oracle tables registered as temporary in Spark

2016-02-14 Thread Sabarish Sasidharan
The Hive context can be used instead of sql context even when you are
accessing data from non-Hive sources like mysql or postgres for ex.  It has
better sql support than the sqlcontext as it uses the HiveQL parser.

Regards
Sab
On 15-Feb-2016 3:07 am, "Mich Talebzadeh"  wrote:

> Thanks.
>
>
>
> I tried to access Hive table via JDBC (it works) through sqlContext
>
>
>
>
>
> scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>
> sqlContext: org.apache.spark.sql.SQLContext =
> org.apache.spark.sql.SQLContext@4f60415b
>
>
>
> scala> val s = sqlContext.load("jdbc",
>
>  | Map("url" -> "jdbc:hive2://rhes564:10010/oraclehadoop",
>
>  | "dbtable" -> "SALES",
>
>  | "user" -> "hduser",
>
>  | "password" -> "xxx"))
>
> warning: there were 1 deprecation warning(s); re-run with -deprecation for
> details
>
> *java.sql.SQLException: Method not supported*
>
>
>
> In general one should expect this to work
>
>
>
> The attraction of Spark is to cache these tables in memory via registering
> them as temporary tables and do the queries there.
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Technology Ltd, its subsidiaries nor their
> employees accept any responsibility.
>
>
>
>
>
> *From:* ayan guha [mailto:guha.a...@gmail.com]
> *Sent:* 14 February 2016 21:07
> *To:* Mich Talebzadeh 
> *Cc:* user 
> *Subject:* Re: Trying to join a registered Hive table as temporary with
> two Oracle tables registered as temporary in Spark
>
>
>
> Why can't you use the jdbc in hive context? I don't think sharing data
> across contexts are allowed.
>
> On 15 Feb 2016 07:22, "Mich Talebzadeh"  wrote:
>
> I am intending to get a table from Hive and register it as temporary table
> in Spark.
>
>
>
> I have created contexts for both Hive and Spark as below
>
>
>
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>
> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>
> //
>
>
>
> I get the Hive table as below using HiveContext
>
>
>
> //Get the FACT table from Hive
>
> //
>
> var s = hiveContext.sql("SELECT AMOUNT_SOLD, TIME_ID, CHANNEL_ID FROM
> oraclehadoop.sales")
>
>
>
> s.registerTempTable("t_s")
>
>
>
> This works fine using HiveContext
>
>
>
> scala> hiveContext.sql("select count(1) from
> t_s").collect.foreach(println)
>
> [4991761]
>
>
>
> Now I use JDBC to get data from two Oracle tables and registar them as
> temporary tables using sqlContext
>
>
>
> val c = sqlContext.load("jdbc",
>
> Map("url" -> "jdbc:oracle:thin:@rhes564:1521:mydb",
>
> "dbtable" -> "(SELECT to_char(CHANNEL_ID) AS CHANNEL_ID, CHANNEL_DESC FROM
> sh.channels)",
>
> "user" -> "sh",
>
> "password" -> "xxx"))
>
>
>
> val t = sqlContext.load("jdbc",
>
> Map("url" -> "jdbc:oracle:thin:@rhes564:1521:mydb",
>
> "dbtable" -> "(SELECT to_char(TIME_ID) AS TIME_ID, CALENDAR_MONTH_DESC
> FROM sh.times)",
>
> "user" -> "sh",
>
> "password" -> "sxxx"))
>
>
>
> And register them as temporary tables
>
>
>
> c.registerTempTable("t_c")
>
> t.registerTempTable("t_t")
>
> //
>
>
>
> Now trying to do SQL on three tables using sqlContext. However it cannot
> see the hive table
>
>
>
> var sqltext : String = ""
>
> sqltext = """
>
> SELECT rs.Month, rs.SalesChannel, round(TotalSales,2)
>
> FROM
>
> (
>
> SELECT t_t.CALENDAR_MONTH_DESC AS Month, t_c.CHANNEL_DESC A

RE: Trying to join a registered Hive table as temporary with two Oracle tables registered as temporary in Spark

2016-02-14 Thread Mich Talebzadeh
Thanks.

 

I tried to access Hive table via JDBC (it works) through sqlContext

 

 

scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)

sqlContext: org.apache.spark.sql.SQLContext = 
org.apache.spark.sql.SQLContext@4f60415b

 

scala> val s = sqlContext.load("jdbc",

 | Map("url" -> "jdbc:hive2://rhes564:10010/oraclehadoop",

 | "dbtable" -> "SALES",

 | "user" -> "hduser",

 | "password" -> "xxx"))

warning: there were 1 deprecation warning(s); re-run with -deprecation for 
details

java.sql.SQLException: Method not supported

 

In general one should expect this to work

 

The attraction of Spark is to cache these tables in memory via registering them 
as temporary tables and do the queries there.

 

 

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Technology Ltd, its subsidiaries nor their employees 
accept any responsibility.

 

 

From: ayan guha [mailto:guha.a...@gmail.com] 
Sent: 14 February 2016 21:07
To: Mich Talebzadeh 
Cc: user 
Subject: Re: Trying to join a registered Hive table as temporary with two 
Oracle tables registered as temporary in Spark

 

Why can't you use the jdbc in hive context? I don't think sharing data across 
contexts are allowed. 

On 15 Feb 2016 07:22, "Mich Talebzadeh" mailto:m...@peridale.co.uk> > wrote:

I am intending to get a table from Hive and register it as temporary table in 
Spark.

 

I have created contexts for both Hive and Spark as below

 

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)

//

 

I get the Hive table as below using HiveContext

 

//Get the FACT table from Hive

//

var s = hiveContext.sql("SELECT AMOUNT_SOLD, TIME_ID, CHANNEL_ID FROM 
oraclehadoop.sales")

 

s.registerTempTable("t_s")

 

This works fine using HiveContext

 

scala> hiveContext.sql("select count(1) from t_s").collect.foreach(println)

[4991761]

 

Now I use JDBC to get data from two Oracle tables and registar them as 
temporary tables using sqlContext

 

val c = sqlContext.load("jdbc",

Map("url" -> "jdbc:oracle:thin:@rhes564:1521:mydb",

"dbtable" -> "(SELECT to_char(CHANNEL_ID) AS CHANNEL_ID, CHANNEL_DESC FROM 
sh.channels)",

"user" -> "sh",

"password" -> "xxx"))

 

val t = sqlContext.load("jdbc",

Map("url" -> "jdbc:oracle:thin:@rhes564:1521:mydb",

"dbtable" -> "(SELECT to_char(TIME_ID) AS TIME_ID, CALENDAR_MONTH_DESC FROM 
sh.times)",

"user" -> "sh",

"password" -> "sxxx"))

 

And register them as temporary tables

 

c.registerTempTable("t_c")

t.registerTempTable("t_t")

//

 

Now trying to do SQL on three tables using sqlContext. However it cannot see 
the hive table 

 

var sqltext : String = ""

sqltext = """

SELECT rs.Month, rs.SalesChannel, round(TotalSales,2)

FROM

(

SELECT t_t.CALENDAR_MONTH_DESC AS Month, t_c.CHANNEL_DESC AS SalesChannel, 
SUM(t_s.AMOUNT_SOLD) AS TotalSales

FROM t_s, t_t, t_c

WHERE t_s.TIME_ID = t_t.TIME_ID

AND   t_s.CHANNEL_ID = t_c.CHANNEL_ID

GROUP BY t_t.CALENDAR_MONTH_DESC, t_c.CHANNEL_DESC

ORDER by t_t.CALENDAR_MONTH_DESC, t_c.CHANNEL_DESC

) rs

LIMIT 10

 

 

sqlContext.sql(sqltext).collect.foreach(println)

 

org.apache.spark.sql.AnalysisException: no such table t_s; line 5 pos 10

 

I guess this is due to two  different Data Frame used. Is there any solution? 
For example can I transorm from HiveContext to sqlContext?

 

Thanks

 

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Technology Ltd, its subsidiaries nor their employees 
accept any responsibility.

 

 



Re: Trying to join a registered Hive table as temporary with two Oracle tables registered as temporary in Spark

2016-02-14 Thread ayan guha
Why can't you use the jdbc in hive context? I don't think sharing data
across contexts are allowed.
On 15 Feb 2016 07:22, "Mich Talebzadeh"  wrote:

> I am intending to get a table from Hive and register it as temporary table
> in Spark.
>
>
>
> I have created contexts for both Hive and Spark as below
>
>
>
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>
> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>
> //
>
>
>
> I get the Hive table as below using HiveContext
>
>
>
> //Get the FACT table from Hive
>
> //
>
> var s = hiveContext.sql("SELECT AMOUNT_SOLD, TIME_ID, CHANNEL_ID FROM
> oraclehadoop.sales")
>
>
>
> s.registerTempTable("t_s")
>
>
>
> This works fine using HiveContext
>
>
>
> scala> hiveContext.sql("select count(1) from
> t_s").collect.foreach(println)
>
> [4991761]
>
>
>
> Now I use JDBC to get data from two Oracle tables and registar them as
> temporary tables using sqlContext
>
>
>
> val c = sqlContext.load("jdbc",
>
> Map("url" -> "jdbc:oracle:thin:@rhes564:1521:mydb",
>
> "dbtable" -> "(SELECT to_char(CHANNEL_ID) AS CHANNEL_ID, CHANNEL_DESC FROM
> sh.channels)",
>
> "user" -> "sh",
>
> "password" -> "xxx"))
>
>
>
> val t = sqlContext.load("jdbc",
>
> Map("url" -> "jdbc:oracle:thin:@rhes564:1521:mydb",
>
> "dbtable" -> "(SELECT to_char(TIME_ID) AS TIME_ID, CALENDAR_MONTH_DESC
> FROM sh.times)",
>
> "user" -> "sh",
>
> "password" -> "sxxx"))
>
>
>
> And register them as temporary tables
>
>
>
> c.registerTempTable("t_c")
>
> t.registerTempTable("t_t")
>
> //
>
>
>
> Now trying to do SQL on three tables using sqlContext. However it cannot
> see the hive table
>
>
>
> var sqltext : String = ""
>
> sqltext = """
>
> SELECT rs.Month, rs.SalesChannel, round(TotalSales,2)
>
> FROM
>
> (
>
> SELECT t_t.CALENDAR_MONTH_DESC AS Month, t_c.CHANNEL_DESC AS SalesChannel,
> SUM(t_s.AMOUNT_SOLD) AS TotalSales
>
> FROM t_s, t_t, t_c
>
> WHERE t_s.TIME_ID = t_t.TIME_ID
>
> AND   t_s.CHANNEL_ID = t_c.CHANNEL_ID
>
> GROUP BY t_t.CALENDAR_MONTH_DESC, t_c.CHANNEL_DESC
>
> ORDER by t_t.CALENDAR_MONTH_DESC, t_c.CHANNEL_DESC
>
> ) rs
>
> LIMIT 10
>
>
>
>
>
> sqlContext.sql(sqltext).collect.foreach(println)
>
>
>
> *org.apache.spark.sql.AnalysisException: no such table t_s; line 5 pos 10*
>
>
>
> I guess this is due to two  different Data Frame used. Is there any
> solution? For example can I transorm from HiveContext to sqlContext?
>
>
>
> Thanks
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Technology Ltd, its subsidiaries nor their
> employees accept any responsibility.
>
>
>
>
>