[
https://issues.apache.org/jira/browse/LENS-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502325#comment-14502325
]
Amareshwari Sriramadasu commented on LENS-511:
----------------------------------------------
Thanks for the detail [~balaknathan].
I think it is clear that fact on a storage maps to single underlying table, and
cannot be multiple tables - whether it is existing table or a new table.
Accepting existing tables and their partitions as is, has some issues right now
- there is some discussion on LENS-340.
Here is a workaround until that feature is supported.
Existing : define schema -> creates a schema object -> define a pipeline ->
populate underlying tables
New flow : define schema -> creates a schema object -> create cube, fact,
dimensions, dimtables
Associating existing data : Create storage tables in facts/dimtables as
external tables -> Reregister partitions on existing data; Register new
partitions on external tables.
Since same location can be associated with external tables more than once, both
existing table and new <storagename_factName> will be present in metastore.
Lens will not use existing table and its partitions, but it will come to know
about data presence through re-registration. Let us know if this sounds good.
bq. location doesn't make any sense for the JDBC tables, I propose to
introduce a new parameter table_name.
JDBC underlying tables are linked to tables in catalog through
lens.metastore.native.db.name and lens.metastore.native.table.name
> identify facts associated with underlying hive/jdbc tables
> ----------------------------------------------------------
>
> Key: LENS-511
> URL: https://issues.apache.org/jira/browse/LENS-511
> Project: Apache Lens
> Issue Type: Improvement
> Components: api, cube
> Affects Versions: 2.0
> Reporter: Pranav Kumar Agarwal
>
> Consider following storage table defn inside a fact:
> <x_fact_table cube_name="sample_cube" name="fact2" weight="200.0"
> xmlns="uri:lens:cube:0.1"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">
> ...
> <storage_tables>
> <storage_table>
> <update_periods>
> <update_period>HOURLY</update_period>
> <update_period>DAILY</update_period>
> </update_periods>
> <storage_name>local</storage_name>
> <table_desc external="true" field_delimiter=","
> table_location="/tmp/examples/fact2_local">
> <part_cols>
> <column comment="Time column" name="dt" type="STRING"/>
> </part_cols>
> <time_part_cols>dt</time_part_cols>
> </table_desc>
> </storage_table>
> </storage_tables>
> </x_fact_table>
> In an event a new partition is added to the external table location
> "/tmp/examples/fact2_local" then I wish to add a new partition on the
> fact2, however I have no way to find what all facts are built on
> external table location "/tmp/examples/fact2_local". We can possibly do
> it by matching the location, however that doesn't seem to be quite
> nice.. Consider for a JDBC source the table_location is kind of dummy as
> its not really used to query the content from that location. JDBCDriver
> expects a table with the name as storageName_factName in the target
> datastore, thus no indication on which all facts to be updated.
> Problem Statement: Current storage_table definition doesn't give me
> enough detail to find where all partitions needs to be added given a new
> partition is added to an external table.
> I propose that we add a table_name property as part of table_desc and
> provide following API's:
> GET /storages/{storageName}/tableNames/
> GET /storages/{storageName}/tableNames/{tableName}/facts
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)