Re: Hdfs Working directory usage

kdcool6932 Sun, 10 Feb 2019 20:25:13 -0800
Thanks , a lot Xiaoxiang,Really appreciate, your support, to impart clarity to 
us on this.Thanks,KetanSent from my Samsung Galaxy smartphone.
-------- Original message --------From: Xiaoxiang Yu 
<xiaoxiang...@kyligence.io> Date: 11/02/2019  9:16 am  (GMT+05:30) To: 
dev@kylin.apache.org Cc: Xiaoxiang Yu <xiaoxiang...@kyligence.io> Subject: Re: 
Hdfs Working directory usage Hi, Ketan.This is what I find:- 
<hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/cuboid     - This dir 
contains the cuboid data with each row contains dimensions array and 
MeasureAggregator array.     - The size is depend on the cardinality of each 
columns and it is often very large.     - When merge job completed, cuboid file 
of all segments which be merged successfully will be deleted automatically.- 
<hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/fact_distinct_columns    
 - This dir contains the distinct value of each column.    - It should be 
deleted after current segment build job succeed.- 
<hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/hfile     - This dir 
contains data file which be bulk loaded into hbase.    - It should be deleted 
after current segment build job succeed.- 
<hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/rowkey_stats     - Files 
under this dir are often very small, you may not need deleted them yourself.    
- These files are used to partition hfile.I think you should update your 
auto-merge settings to let auto-merge more often, if you find any mistakes, 
please let me know, thank you!----------------Best wishes,Xiaoxiang Yu  On 
[DATE], "[NAME]" <[ADDRESS]> wrote:    Hi team,    Any updates on the same ?    
      Thanks,    Ketan        > On 01-Feb-2019, at 11:39 AM, ketan dikshit 
<kdcool6...@yahoo.com> wrote:    >     > Hi Team,    >     > We have a lot of 
data accumulated in our hdfs-working-directory, so we want to understand the 
usage of the following job data, once the job has been completed and segment is 
successfully created.     >     > 
<hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/cuboid    > 
<hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/fact_distinct_columns    
> <hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/hfile    > 
<hdfs-working-dir>/<metdata-name>/<job-id>/<cube-name>/rowkey_stats    >     > 
Basically I need to understand the purpose of: 
cuboid,fact_distinct_columns,hfile,rowkey_stats after the job has built the 
cube segment (assuming we don’t use and merging/automerging of segments on the 
cube later).    >     > The space taken up by these data in hdfs-working-dir is 
quite huge(affecting our costing), and is not getting cleaned by by cleanup 
job(org.apache.kylin.tool.StorageCleanupJob). So we need to be understand, that 
if we manually clean this up we will not get any issues later.    >     > 
Thanks,    > Ketan@Exponential
Re: Hdfs Working directory usage

Reply via email to