Hi Team,

We have a lot of data accumulated in our hdfs-working-directory, so we want to 
understand the usage of the following job data, once the job has been completed 
and segment is successfully created. 

<hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/cuboid
<hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/fact_distinct_columns
<hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/hfile
<hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/rowkey_stats

Basically I need to understand the purpose of: 
cuboid,fact_distinct_columns,hfile,rowkey_stats after the job has built the 
cube segment (assuming we don’t use and merging/automerging of segments on the 
cube later).

The space taken up by these data in hdfs-working-dir is quite huge(affecting 
our costing), and is not getting cleaned by by cleanup 
job(org.apache.kylin.tool.StorageCleanupJob). So we need to be understand, that 
if we manually clean this up we will not get any issues later.

Thanks,
Ketan@Exponential

Reply via email to