[
https://issues.apache.org/jira/browse/LENS-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498016#comment-14498016
]
Amareshwari Sriramadasu commented on LENS-511:
----------------------------------------------
[~praagarw], I could not understand the usecase much. But I will try to put why
the current model is the way it is and how storage tables are associated wit
fact tables.
Let me try to put why the data model is put the it is right now.
We would like user not worry about storages. Querying user cares only about all
the fields that can be queried and might want to know more information about
them on what the field is. and how it is related to others.
A schema designer wants to first design the logical schema and table layout for
raw and aggregate fact tables. Once the schema is fixed for a table, the same
layout can be put on any number of storages.
Here is the model right now.
FactTable is a table with columns defined. And is present on many storages and
additional storage table description is provided on how the table is stored on
the storage.
Fact on each storage corresponds to one physical table. The physical table is
directly associated with a table location on file system or is stored by a
storagehandler. We cannot have same physical table associated with more than
one fact (though there is no restriction with respect to location across
facts). When a FactTable is defined- the table schema layout is associated with
the FactTable. The same layout would be present on the underlying storage. When
the same layout is present on more than one storage, the fact table is
associated with more than one storage. So, it is FactTable + Storage
corresponds single physical table.
bq. Current storage_table definition doesn't give me enough detail to find
where all partitions needs to be added given a new partition is added to an
external table.
Add partition happens for a fact on a storage - which always corresponds to
single physical table. Underlying storage table name is created with convention
right now : storageName_factName. And exposing the underlying name is not
required, because api talks FactTable on a storage or DimTable on a storage.
bq. We are also having thoughts about revamping the model. For example,
storing model as a list of key Value pair. This needs a deeper discussion.
[~prongs], Storing the model in Hive Tables as table properties is one
implementation how the model is persisted. There can be other ways to persist
model. I hope that is what is you are mentioning here. But that should not
require revamping the model itself. I would say that needs a different jira and
not this.
> identify facts associated with underlying hive/jdbc tables
> ----------------------------------------------------------
>
> Key: LENS-511
> URL: https://issues.apache.org/jira/browse/LENS-511
> Project: Apache Lens
> Issue Type: Improvement
> Components: api, cube
> Affects Versions: 2.0
> Reporter: Pranav Kumar Agarwal
>
> Consider following storage table defn inside a fact:
> <x_fact_table cube_name="sample_cube" name="fact2" weight="200.0"
> xmlns="uri:lens:cube:0.1"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">
> ...
> <storage_tables>
> <storage_table>
> <update_periods>
> <update_period>HOURLY</update_period>
> <update_period>DAILY</update_period>
> </update_periods>
> <storage_name>local</storage_name>
> <table_desc external="true" field_delimiter=","
> table_location="/tmp/examples/fact2_local">
> <part_cols>
> <column comment="Time column" name="dt" type="STRING"/>
> </part_cols>
> <time_part_cols>dt</time_part_cols>
> </table_desc>
> </storage_table>
> </storage_tables>
> </x_fact_table>
> In an event a new partition is added to the external table location
> "/tmp/examples/fact2_local" then I wish to add a new partition on the
> fact2, however I have no way to find what all facts are built on
> external table location "/tmp/examples/fact2_local". We can possibly do
> it by matching the location, however that doesn't seem to be quite
> nice.. Consider for a JDBC source the table_location is kind of dummy as
> its not really used to query the content from that location. JDBCDriver
> expects a table with the name as storageName_factName in the target
> datastore, thus no indication on which all facts to be updated.
> Problem Statement: Current storage_table definition doesn't give me
> enough detail to find where all partitions needs to be added given a new
> partition is added to an external table.
> I propose that we add a table_name property as part of table_desc and
> provide following API's:
> GET /storages/{storageName}/tableNames/
> GET /storages/{storageName}/tableNames/{tableName}/facts
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)