Thanks , a lot Xiaoxiang,Really appreciate, your support, to impart clarity to
us on this.Thanks,KetanSent from my Samsung Galaxy smartphone.
-------- Original message --------From: Xiaoxiang Yu
<xiaoxiang...@kyligence.io> Date: 11/02/2019 9:16 am (GMT+05:30) To:
dev@kylin.apache.org Cc: Xiaoxiang Yu <xiaoxiang...@kyligence.io> Subject: Re:
Hdfs Working directory usage Hi, Ketan.This is what I find:-
<hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/cuboid - This dir
contains the cuboid data with each row contains dimensions array and
MeasureAggregator array. - The size is depend on the cardinality of each
columns and it is often very large. - When merge job completed, cuboid file
of all segments which be merged successfully will be deleted automatically.-
<hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/fact_distinct_columns
- This dir contains the distinct value of each column. - It should be
deleted after current segment build job succeed.-
<hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/hfile - This dir
contains data file which be bulk loaded into hbase. - It should be deleted
after current segment build job succeed.-
<hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/rowkey_stats - Files
under this dir are often very small, you may not need deleted them yourself.
- These files are used to partition hfile.I think you should update your
auto-merge settings to let auto-merge more often, if you find any mistakes,
please let me know, thank you!----------------Best wishes,Xiaoxiang Yu On
[DATE], "[NAME]" <[ADDRESS]> wrote: Hi team, Any updates on the same ?
Thanks, Ketan > On 01-Feb-2019, at 11:39 AM, ketan dikshit
<kdcool6...@yahoo.com> wrote: > > Hi Team, > > We have a lot of
data accumulated in our hdfs-working-directory, so we want to understand the
usage of the following job data, once the job has been completed and segment is
successfully created. > >
<hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/cuboid >
<hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/fact_distinct_columns
> <hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/hfile >
<hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/rowkey_stats > >
Basically I need to understand the purpose of:
cuboid,fact_distinct_columns,hfile,rowkey_stats after the job has built the
cube segment (assuming we don’t use and merging/automerging of segments on the
cube later). > > The space taken up by these data in hdfs-working-dir is
quite huge(affecting our costing), and is not getting cleaned by by cleanup
job(org.apache.kylin.tool.StorageCleanupJob). So we need to be understand, that
if we manually clean this up we will not get any issues later. > >
Thanks, > Ketan@Exponential