Hi, Ketan. This is what I find:
- <hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/cuboid - This dir contains the cuboid data with each row contains dimensions array and MeasureAggregator array. - The size is depend on the cardinality of each columns and it is often very large. - When merge job completed, cuboid file of all segments which be merged successfully will be deleted automatically. - <hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/fact_distinct_columns - This dir contains the distinct value of each column. - It should be deleted after current segment build job succeed. - <hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/hfile - This dir contains data file which be bulk loaded into hbase. - It should be deleted after current segment build job succeed. - <hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/rowkey_stats - Files under this dir are often very small, you may not need deleted them yourself. - These files are used to partition hfile. I think you should update your auto-merge settings to let auto-merge more often, if you find any mistakes, please let me know, thank you! ---------------- Best wishes, Xiaoxiang Yu On [DATE], "[NAME]" <[ADDRESS]> wrote: Hi team, Any updates on the same ? Thanks, Ketan > On 01-Feb-2019, at 11:39 AM, ketan dikshit <kdcool6...@yahoo.com> wrote: > > Hi Team, > > We have a lot of data accumulated in our hdfs-working-directory, so we want to understand the usage of the following job data, once the job has been completed and segment is successfully created. > > <hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/cuboid > <hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/fact_distinct_columns > <hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/hfile > <hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/rowkey_stats > > Basically I need to understand the purpose of: cuboid,fact_distinct_columns,hfile,rowkey_stats after the job has built the cube segment (assuming we don’t use and merging/automerging of segments on the cube later). > > The space taken up by these data in hdfs-working-dir is quite huge(affecting our costing), and is not getting cleaned by by cleanup job(org.apache.kylin.tool.StorageCleanupJob). So we need to be understand, that if we manually clean this up we will not get any issues later. > > Thanks, > Ketan@Exponential