Re: temporary tables created by registerTempTable()

2016-02-15 Thread Mich Talebzadeh
 

Hi Michael, 

A temporary table in Hive is private to the session that created that
table itself within the lifetime of that session. The table is created
in the same database (in this case oraclehadoop.db) "first" and then
moved to /tmp directory in hdfs in _tmp_space_db directory
(actually_tmp_space.db) as shown below 

INFO : Moving data 

 to:
hdfs://rhes564:9000/tmp/hive/hduser/d762c796-5953-478e-98bd-d51f13ddadbf/_tmp_space.db/7972610a-3088-4535-ad25-af2c9cfce6b2


 from 

hdfs://rhes564:9000/user/hive/warehouse/oraclehadoop.db/.HIVE-STAGING_HIVE_2016-02-15_19-34-08_358_5135480012597277692-4/-ext-10002


Example creating a temporary Hive table called tmp 

0: jdbc:hive2://rhes564:10010/default> CREATE TEMPORARY TABLE tmp AS
0: jdbc:hive2://rhes564:10010/default> SELECT t.calendar_month_desc,
c.channel_desc, SUM(s.amount_sold) AS TotalSales
0: jdbc:hive2://rhes564:10010/default> FROM sales s, times t, channels c
0: jdbc:hive2://rhes564:10010/default> WHERE s.time_id = t.time_id
0: jdbc:hive2://rhes564:10010/default> AND s.channel_id = c.channel_id
0: jdbc:hive2://rhes564:10010/default> GROUP BY t.calendar_month_desc,
c.channel_desc
0: jdbc:hive2://rhes564:10010/default> ;
INFO : Ended Job = job_1455564800116_0001
INFO : Moving data to:
hdfs://rhes564:9000/tmp/hive/hduser/d762c796-5953-478e-98bd-d51f13ddadbf/_tmp_space.db/7972610a-3088-4535-ad25-af2c9cfce6b2
from
hdfs://rhes564:9000/user/hive/warehouse/oraclehadoop.db/.hive-staging_hive_2016-02-15_19-34-08_358_5135480012597277692-4/-ext-10002
INFO : Table oraclehadoop.tmp stats: [numFiles=1, numRows=150,
totalSize=3934, rawDataSize=3784] 

Other sessions will not see that table and can create a temp table with
the same name as well as I did with the first session still open 

INFO : Moving data 

 to:
hdfs://rhes564:9000/tmp/hive/hduser/65f3317b-1341-40b8-86c3-f0b8a9c02ad6/_tmp_space.db/cd0c783d-8966-45bb-8f7a-5059159cbf8d


 from 

hdfs://rhes564:9000/user/hive/warehouse/oraclehadoop.db/.HIVE-STAGING_HIVE_2016-02-15_20-04-57_780_9035563040280767266-11/-ext-10002

INFO : Table oraclehadoop.tmp stats: [numFiles=1, numRows=150,
totalSize=3934, rawDataSize=3784] 

note that each temp table file is time stamped (shown in bold) so there
will not be any collision and temp table will disappear after the
session is complete. 

HTH 

Mich 

On 15/02/2016 18:41, Michael Segel wrote: 

> I was just looking at that... 
> 
> Out of curiosity... if you make it a Hive Temp Table... who has access to the 
> data? 
> 
> Just your app, or anyone with access to the same database? (Would you be able 
> to share data across different JVMs? ) 
> 
> (E.G - I have a reader who reads from source A that needs to publish the data 
> to a bunch of minions (B) ) 
> 
> Would this be an option? 
> 
> Thx 
> 
> -Mike 
> 
> On Feb 15, 2016, at 7:54 AM, Mich Talebzadeh 
> <mich.talebza...@cloudtechnologypartners.co.uk> wrote: 
> 
> Hi, 
> 
> It is my understanding that the registered temporary tables created by 
> registerTempTable() used in Spark shell built on ORC files? 
> 
> For example the following Data Frame just creates a logical abstraction 
> 
> scala> var s = HiveContext.sql("SELECT AMOUNT_SOLD, TIME_ID, CHANNEL_ID FROM 
> oraclehadoop.sales")
> s: org.apache.spark.sql.DataFrame = [AMOUNT_SOLD: decimal(10,0), TIME_ID: 
> timestamp, CHANNEL_ID: bigint] 
> 
> Then I registar this data frame as temporary table using registerTempTable() 
> call 
> 
> s.registerTempTable("t_s") 
> 
> Also I believe that s.registerTempTable("t_s") creates an in-memory table 
> that is scoped to the cluster in which it was created. The data is stored 
> using Hive's ORC format and this tempTable is stored in memory on all nodes 
> of the cluster? In other words every node in the cluster has a copy of 
> tempTable in its memory? 
> 
> Thanks, 
> 
> -- 
> 
> Dr Mich Talebzadeh
> 
> LinkedIn 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  [1]
> 
> http://talebzadehmich.wordpress.com [2]
> 
> NOTE: The information in this email is proprietary and confidential. This 
> message is for the designated recipient only, if you are not the intended 
> recipient, you should destroy it immediately. Any information in this message 
> shall not be understood as given or endorsed by Cloud Technology Partners 
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is 
> the responsibility of the recipient to ensure that this email is virus free, 
> therefore neither Cloud Technology partners Ltd, its subsidiaries nor their 
> employees accept any responsibility.
> 
> -- 
> 
> Dr Mich Talebzadeh
> 
> LinkedIn 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd

Re: temporary tables created by registerTempTable()

2016-02-15 Thread Michael Segel
I was just looking at that… 

Out of curiosity… if you make it a Hive Temp Table… who has access to the data? 

Just your app, or anyone with access to the same database?  (Would you be able 
to share data across different JVMs? ) 

(E.G - I have a reader who reads from source A that needs to publish the data 
to a bunch of minions (B)   ) 

Would this be an option? 

Thx

-Mike

> On Feb 15, 2016, at 7:54 AM, Mich Talebzadeh 
> <mich.talebza...@cloudtechnologypartners.co.uk> wrote:
> 
>> Hi,
>> 
>>  
>> It is my understanding that the registered temporary tables created by 
>> registerTempTable() used in Spark shell built on ORC files?
>> 
>> For example the following Data Frame just creates a logical abstraction
>> 
>> scala> var s = HiveContext.sql("SELECT AMOUNT_SOLD, TIME_ID, CHANNEL_ID FROM 
>> oraclehadoop.sales")
>> s: org.apache.spark.sql.DataFrame = [AMOUNT_SOLD: decimal(10,0), TIME_ID: 
>> timestamp, CHANNEL_ID: bigint] 
>> 
>> Then I registar this data frame as temporary table using registerTempTable() 
>> call
>> 
>> s.registerTempTable("t_s")
>> 
>> Also I believe that s.registerTempTable("t_s") creates an in-memory table 
>> that is scoped to the cluster in which it was created. The data is stored 
>> using Hive's ORC format and this tempTable is stored in memory on all nodes 
>> of the cluster?  In other words every node in the cluster has a copy of 
>> tempTable in its memory?
>> 
>> Thanks,
>> 
>>  
>> -- 
>> Dr Mich Talebzadeh
>> 
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> 
>> http://talebzadehmich.wordpress.com
>> 
>> NOTE: The information in this email is proprietary and confidential. This 
>> message is for the designated recipient only, if you are not the intended 
>> recipient, you should destroy it immediately. Any information in this 
>> message shall not be understood as given or endorsed by Cloud Technology 
>> Partners Ltd, its subsidiaries or their employees, unless expressly so 
>> stated. It is the responsibility of the recipient to ensure that this email 
>> is virus free, therefore neither Cloud Technology partners Ltd, its 
>> subsidiaries nor their employees accept any responsibility.
>> 
>  
>  
> -- 
> Dr Mich Talebzadeh
> 
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> 
> http://talebzadehmich.wordpress.com
> 
> NOTE: The information in this email is proprietary and confidential. This 
> message is for the designated recipient only, if you are not the intended 
> recipient, you should destroy it immediately. Any information in this message 
> shall not be understood as given or endorsed by Cloud Technology Partners 
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is 
> the responsibility of the recipient to ensure that this email is virus free, 
> therefore neither Cloud Technology partners Ltd, its subsidiaries nor their 
> employees accept any responsibility.
> 



Re: temporary tables created by registerTempTable()

2016-02-15 Thread Mich Talebzadeh
 

> Hi, 
> 
> It is my understanding that the registered temporary tables created by 
> registerTempTable() used in Spark shell built on ORC files? 
> 
> For example the following Data Frame just creates a logical abstraction 
> 
> scala> var s = HiveContext.sql("SELECT AMOUNT_SOLD, TIME_ID, CHANNEL_ID FROM 
> oraclehadoop.sales")
> s: org.apache.spark.sql.DataFrame = [AMOUNT_SOLD: decimal(10,0), TIME_ID: 
> timestamp, CHANNEL_ID: bigint] 
> 
> Then I registar this data frame as temporary table using registerTempTable() 
> call 
> 
> s.registerTempTable("t_s") 
> 
> Also I believe that s.registerTempTable("t_s") creates an in-memory table 
> that is scoped to the cluster in which it was created. The data is stored 
> using Hive's ORC format and this tempTable is stored in memory on all nodes 
> of the cluster? In other words every node in the cluster has a copy of 
> tempTable in its memory? 
> 
> Thanks, 
> 
> -- 
> 
> Dr Mich Talebzadeh
> 
> LinkedIn 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> 
> http://talebzadehmich.wordpress.com
> 
> NOTE: The information in this email is proprietary and confidential. This 
> message is for the designated recipient only, if you are not the intended 
> recipient, you should destroy it immediately. Any information in this message 
> shall not be understood as given or endorsed by Cloud Technology Partners 
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is 
> the responsibility of the recipient to ensure that this email is virus free, 
> therefore neither Cloud Technology partners Ltd, its subsidiaries nor their 
> employees accept any responsibility.

-- 

Dr Mich Talebzadeh

LinkedIn
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com

NOTE: The information in this email is proprietary and confidential.
This message is for the designated recipient only, if you are not the
intended recipient, you should destroy it immediately. Any information
in this message shall not be understood as given or endorsed by Cloud
Technology Partners Ltd, its subsidiaries or their employees, unless
expressly so stated. It is the responsibility of the recipient to ensure
that this email is virus free, therefore neither Cloud Technology
partners Ltd, its subsidiaries nor their employees accept any
responsibility.