[ 
https://issues.apache.org/jira/browse/KYLIN-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma resolved KYLIN-1079.
-------------------------------
    Resolution: Fixed

> Manager large number of entries in metadata store
> -------------------------------------------------
>
>                 Key: KYLIN-1079
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1079
>             Project: Kylin
>          Issue Type: Improvement
>    Affects Versions: v2.0, v1.1, v1.0
>            Reporter: hongbin ma
>            Assignee: hongbin ma
>              Labels: newbie
>             Fix For: v2.1
>
>
> Kylin saves cube metadata, table metadata as well as job history/output in a 
> metadata store. The HBaseMetadataStore is a fault tolerant implementation 
> which brings no extra dependencies to the system. We use it in real world 
> deployments.
> When cube or hive table is updated, the correspond entries in metadata store 
> simply updated.(so there's no way to trace history cube definitions, anyway 
> this is not very expected function).However Job histories and outputs are a 
> little special, each cubing job's definition and output are saved as new 
> entries in the metadata store. As more and more jobs accumulate, a lot of job 
> histories will reside in the metadata store. This might harm frontend 
> performance when user wants to query job histories.
> We should tackle the problem from two perspectives:
> 1.Backend tool to delete/archive job history based on given conditions,e.g. 
> "all jobs that is older than one month and not referenced by any cube 
> segment(each cube segment keeps track of which job created it)"
> 2.Frontend display enforce timestamp filter to retrieve from metadata store 
> for efficiency. When showing job lists, for example, a "Show last N days" 
> filter is enforced, where N is configurable by the user. For 
> HBaseMetadataStore, we saved timestamp for each entry in a separate column, 
> this is where HBase SingleColumnValueFilter can help.
> We can start working this on 2.x-staging branch(as it is the latest dev 
> branch, and is more friendly to developers), and backport it to 1.x-staging 
> branch if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to