[jira] [Commented] (LENS-511) identify facts associated with underlying hive/jdbc tables

Amareshwari Sriramadasu (JIRA) Thu, 16 Apr 2015 05:59:32 -0700

    [ 
https://issues.apache.org/jira/browse/LENS-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498016#comment-14498016
 ]


Amareshwari Sriramadasu commented on LENS-511:
----------------------------------------------

[~praagarw], I could not understand the usecase much. But I will try to put why 
the current model is the way it is and how storage tables are associated wit 
fact tables.

Let me try to put why the data model is put the it is right now.

We would like user not worry about storages. Querying user cares only about all 
the fields that can be queried and might want to know more information about 
them on what the field is. and how it is related to others.
A schema designer wants to first design the logical schema and table layout for 
raw and aggregate fact tables. Once the schema is fixed for a table, the same 
layout can be put on any number of storages.

Here is the model right now.

FactTable is a table with columns defined. And is present on many storages and 
additional storage table description is provided on how the table is stored on 
the storage. 

Fact on each storage corresponds to one physical table. The physical table is 
directly associated with a table location on file system or is stored by a 
storagehandler. We cannot have same physical table associated with more than 
one fact (though there is no restriction with respect to location across 
facts). When a FactTable is defined- the table schema layout is associated with 
the FactTable. The same layout would be present on the underlying storage. When 
the same layout is present on more than one storage, the fact table is 
associated with more than one storage. So, it is FactTable + Storage 
corresponds single physical table.

bq. Current storage_table definition doesn't give me enough detail to find 
where all partitions needs to be added given a new partition is added to an 
external table.
Add partition happens for a fact on a storage - which always corresponds to 
single physical table. Underlying storage table name is created with convention 
right now : storageName_factName. And exposing the underlying name is not 
required, because api talks FactTable on a storage or DimTable on a storage.

bq.  We are also having thoughts about revamping the model. For example, 
storing model as a list of key Value pair. This needs a deeper discussion.
[~prongs], Storing the model in Hive Tables as table properties is one 
implementation how the model is persisted. There can be other ways to persist 
model. I hope that is what is you are mentioning here. But that should not 
require revamping the model itself. I would say that needs a different jira and 
not this.

> identify facts associated with underlying hive/jdbc tables
> ----------------------------------------------------------
>
>                 Key: LENS-511
>                 URL: https://issues.apache.org/jira/browse/LENS-511
>             Project: Apache Lens
>          Issue Type: Improvement
>          Components: api, cube
>    Affects Versions: 2.0
>            Reporter: Pranav Kumar Agarwal
>
> Consider following storage table defn inside a fact:
> <x_fact_table cube_name="sample_cube" name="fact2" weight="200.0"
> xmlns="uri:lens:cube:0.1"
>    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
> xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">
> ...
>   <storage_tables>
>      <storage_table>
>        <update_periods>
>          <update_period>HOURLY</update_period>
>          <update_period>DAILY</update_period>
>        </update_periods>
>        <storage_name>local</storage_name>
>        <table_desc external="true" field_delimiter=","
> table_location="/tmp/examples/fact2_local">
>          <part_cols>
>            <column comment="Time column" name="dt" type="STRING"/>
>          </part_cols>
>          <time_part_cols>dt</time_part_cols>
>        </table_desc>
>      </storage_table>
>    </storage_tables>
> </x_fact_table>
> In an event a new partition is added to the external table location
> "/tmp/examples/fact2_local" then I wish to add a new partition on the
> fact2, however I have no way to find what all facts are built on
> external table location "/tmp/examples/fact2_local". We can possibly do
> it by matching the location, however that doesn't seem to be quite
> nice.. Consider for a JDBC source the table_location is kind of dummy as
> its not really used to query the content from that location. JDBCDriver
> expects a table with the name as storageName_factName in the target
> datastore, thus no indication on which all facts to be updated.
> Problem Statement: Current storage_table definition doesn't give me
> enough detail to find where all partitions needs to be added given a new
> partition is added to an external table.
> I propose that we add a table_name property as part of table_desc and
> provide following API's:
> GET /storages/{storageName}/tableNames/
> GET /storages/{storageName}/tableNames/{tableName}/facts



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (LENS-511) identify facts associated with underlying hive/jdbc tables

Reply via email to