[ 
https://issues.apache.org/jira/browse/KYLIN-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-1079:
------------------------------
    Description: 
Kylin saves cube metadata, table metadata as well as job history/output in a 
metadata store. The HBaseMetadataStore is a fault tolerant implementation which 
brings no extra dependencies to the system. We use it in real world deployments.

When cube or hive table is updated, the correspond entries in metadata store 
simply updated.(so there's no way to trace history cube definitions, anyway 
this is not very expected function).However Job histories and outputs are a 
little special, each cubing job's definition and output are saved as new 
entries in the metadata store. As more and more jobs accumulate, a lot of job 
histories will reside in the metadata store. This might harm frontend 
performance when user wants to query job histories.

We should tackle the problem from two perspectives:
1.Backend tool to delete/archive job history based on given conditions,e.g. 
"all jobs that is older than one month and not referenced by any cube 
segment(each cube segment keeps track of which job created it)"
2.

  was:
Each job's metadata and output are now independently stored in metastore(a 
possibly wrong place). As more and more jobs accumulate, a lot of job histories 
will reside in the metastore. This might harm frontend performance when user 
wants to query job histories.

Some kind of job history archiving/truncating should be applied.


> Manager large number of entries in metadata store
> -------------------------------------------------
>
>                 Key: KYLIN-1079
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1079
>             Project: Kylin
>          Issue Type: Improvement
>            Reporter: hongbin ma
>            Assignee: hongbin ma
>
> Kylin saves cube metadata, table metadata as well as job history/output in a 
> metadata store. The HBaseMetadataStore is a fault tolerant implementation 
> which brings no extra dependencies to the system. We use it in real world 
> deployments.
> When cube or hive table is updated, the correspond entries in metadata store 
> simply updated.(so there's no way to trace history cube definitions, anyway 
> this is not very expected function).However Job histories and outputs are a 
> little special, each cubing job's definition and output are saved as new 
> entries in the metadata store. As more and more jobs accumulate, a lot of job 
> histories will reside in the metadata store. This might harm frontend 
> performance when user wants to query job histories.
> We should tackle the problem from two perspectives:
> 1.Backend tool to delete/archive job history based on given conditions,e.g. 
> "all jobs that is older than one month and not referenced by any cube 
> segment(each cube segment keeps track of which job created it)"
> 2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to