[ 
https://issues.apache.org/jira/browse/HIVE-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593316#comment-13593316
 ] 

Amareshwari Sriramadasu commented on HIVE-4115:
-----------------------------------------------


Illustrating above model with an example :
* Define a SALES_CUBE cube with measures : "Sales, Discount" and Dimensions: 
"CustomerID, Location, Transaction-time" 

* Dimensions:
** CustomerID is a simple dimension which refers to the customer table on 
column ID. CustomerTable is having the schema : "ID, Age, Gender"
** Location is hierarchical dimension with the hierarchy : "Zipcode, CityID, 
StateID, CountryID, RegionID"
*** Zipcode refers to ZipTable on column code. ZipTable schema : "code, 
street-name, cityID, stateID"
*** CityID refers to cityTable on column ID. CityTable schema : "ID, name, 
stateID"
*** stateID refers to stateTable on column ID. StateTable schema : "ID, name, 
capital, countryID"
*** countryID refers to counteryTable on column ID. CounterTable : "ID, name, 
capital, Region"
*** Region is an inline dimension with values "'APAC', 'EMEA', 'USA'"
** Transaction-time is simple dimension with timestamp field.

* Facts :Sales_cube can have the following fact tables :
## RawFact with columns "Sales, Discount, CustomerId, ZipCode, Transaction-time"
## CountryFact with columns "Sales, Discount, CountryID"


Physical storage tables :
------------------------------------
In the example described above say that RawFact is rolled hourly in Cluster c1, 
is rolled daily and monthly on Cluster C2; CountryFact is rolled daily, 
monthly, quarterly and yearly on Cluster C2; Also, Customer table is available 
in HBase cluster H1; All the location tables are available in HDFS cluster C2.

The physical tables would be :
* C1_Rawfact_hourly - schema : "Sales, Discount, CustomerId, ZipCode, 
Transaction-time" Partitioned by dt and state.
* C2_Rawfact_daily - schema : "Sales, Discount, CustomerId, ZipCode, 
Transaction-time" Partitioned by dt and state.
* C2_Rawfact_monthly - schema : "Sales, Discount, CustomerId, ZipCode, 
Transaction-time" Partitioned by dt and state.
* C2_CountryFact_daily - Schema : "Sales, Discount, CountryID" Partitioned by dt
* C2_CountryFact_monthly - Schema : "Sales, Discount, CountryID" Partitioned by 
dt
* C2_CountryFact_quarterly - Schema : "Sales, Discount, CountryID" Partitioned 
by dt
* C2_CountryFact_yearly - Schema : "Sales, Discount, CountryID" Partitioned by 
dt
* H1_CustomerTable - schema :  "ID, Age, Gender"
* C2_ZipTable - schema : "code, street-name, cityID, stateID"
* C2_CityTable - schema : "ID, name, stateID"
* C2_StateTable -schema : "ID, name, capital, countryID"
* C2_CountryTable -schema : "ID, name, capital, Region"


If User queries the data on cube with a query like the following :
* Select sales from SALES_CUBE where region = 'APAC' and 
time_range_in(09/01/2012, 12/31/2012)  // Q4 -2012.

Cube Abstraction provided would be smart enough to figure out which table to go 
and give the result . In this case the query translates to :

* Select sales from C2_CountryFact_quarterly join C2_countryTable on 
C2_CountryFact_quarterly.CountryID = C2_countryTable.ID where dt = "Q4-2012" 
and C2_countryTable.region = 'APAC';


                
> Introduce cube abstraction in hive
> ----------------------------------
>
>                 Key: HIVE-4115
>                 URL: https://issues.apache.org/jira/browse/HIVE-4115
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>
> We would like to define a cube abstraction so that user can query at cube 
> layer and do not know anything about storage and rollups. 
> Will describe the model more in following comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to