[
https://issues.apache.org/jira/browse/LENS-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499506#comment-14499506
]
Bala Nathan commented on LENS-511:
----------------------------------
Amareshwari: Just a small prologue on this issue.
We are using lens as our query layer to connect to hetrogenous data stores. Our
initial release will support hive and vertica as data stores. On a high level,
our data platform lifecycle is below:
define schema -> creates a schema object -> define a pipeline -> populate
underlying tables -> create cube, fact, dimensions -> query
Since cube and fact definition happen much later in the cycle, the actual
physical tables are not created as "storagename_factname". This poses a
restriction in the table naming convention for us as our schema definition
layer doesnt expect this convention to be followed. If a user wants to define a
schema on our JDBC source (e.g vertica), it is not expected of the schema
definition layer to create the underlying table as storage_fact as fact
definition happens much later in the process.
In light of the above, I believe Pranav's proposal is on two areas:
1) An additional property to identify the physical table name. This makes it
simple for discovering our underlying tables upon which the facts can be built
later. This could either be an additional property on the metastore or
something else.
2) Since the physical table names may be different from the fact tables, a way
to update fact meta when partitions are being added or dropped to physical
tables.
Let me know if this clarifies a bit
> identify facts associated with underlying hive/jdbc tables
> ----------------------------------------------------------
>
> Key: LENS-511
> URL: https://issues.apache.org/jira/browse/LENS-511
> Project: Apache Lens
> Issue Type: Improvement
> Components: api, cube
> Affects Versions: 2.0
> Reporter: Pranav Kumar Agarwal
>
> Consider following storage table defn inside a fact:
> <x_fact_table cube_name="sample_cube" name="fact2" weight="200.0"
> xmlns="uri:lens:cube:0.1"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">
> ...
> <storage_tables>
> <storage_table>
> <update_periods>
> <update_period>HOURLY</update_period>
> <update_period>DAILY</update_period>
> </update_periods>
> <storage_name>local</storage_name>
> <table_desc external="true" field_delimiter=","
> table_location="/tmp/examples/fact2_local">
> <part_cols>
> <column comment="Time column" name="dt" type="STRING"/>
> </part_cols>
> <time_part_cols>dt</time_part_cols>
> </table_desc>
> </storage_table>
> </storage_tables>
> </x_fact_table>
> In an event a new partition is added to the external table location
> "/tmp/examples/fact2_local" then I wish to add a new partition on the
> fact2, however I have no way to find what all facts are built on
> external table location "/tmp/examples/fact2_local". We can possibly do
> it by matching the location, however that doesn't seem to be quite
> nice.. Consider for a JDBC source the table_location is kind of dummy as
> its not really used to query the content from that location. JDBCDriver
> expects a table with the name as storageName_factName in the target
> datastore, thus no indication on which all facts to be updated.
> Problem Statement: Current storage_table definition doesn't give me
> enough detail to find where all partitions needs to be added given a new
> partition is added to an external table.
> I propose that we add a table_name property as part of table_desc and
> provide following API's:
> GET /storages/{storageName}/tableNames/
> GET /storages/{storageName}/tableNames/{tableName}/facts
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)