[jira] [Commented] (KYLIN-4341) by-level cuboid intermediate files are left behind and not cleaned up after job is complete

2020-03-02 Thread Vsevolod Ostapenko (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049369#comment-17049369
 ] 

Vsevolod Ostapenko commented on KYLIN-4341:
---

@[~wangrupeng] 

I have to respectfully disagree with this naive explanation of this not being a 
bug.
I did not request segment merging in the cube configuration, therefore files 
are expected to be removed.

Removing files manually is not a constructive proposition. Any manual 
management of intermediate files is an operational burden. Plus, it's not 
properly documented.

> by-level cuboid intermediate files are left behind and not cleaned up after 
> job is complete
> ---
>
> Key: KYLIN-4341
> URL: https://issues.apache.org/jira/browse/KYLIN-4341
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.6.4
> Environment: Kylin 2.6.4, CenOS 7.6, HDP 2.6.5
>Reporter: Vsevolod Ostapenko
>Assignee: wangrupeng
>Priority: Major
>
> Setup: MR as a cube build engine and by-level cube build strategy (auto 
> picked).
> Upon completion of a cube segment build job a number of intermediate files 
> are still left behind.
> Namely, output of the MR-jobs that produce the base cuboid, subsequent level 
> cuboids, as well as rowkey_stats from the hfile creation step.
> The files in question consume about the same amount of space in HDFS as the 
> final hfile.
> This lead to wasted space in HDFS that is not released for as long as the 
> corresponding cube segment is online. The only point the leaked space is 
> released, is when segment is offlined and cleaned up as part of the segment 
> retention.
> Sample output is as follows.
> {quote}$ hadoop fs -ls -R 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/
> drwxr-xr-x - kylin hdfs 0 2020-01-07 04:44 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89
> drwxr-xr-x - kylin hdfs 0 2020-01-07 04:26 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid
> drwxr-xr-x - kylin hdfs 0 2020-01-07 04:36 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid
> -rw-r--r-- 2 kylin hdfs 0 2020-01-07 04:36 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/_SUCCESS
> -rw-r--r-- 2 kylin hdfs 51570048 2020-01-07 04:35 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-0
> -rw-r--r-- 2 kylin hdfs 51477377 2020-01-07 04:36 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-1
> -rw-r--r-- 2 kylin hdfs 51615162 2020-01-07 04:35 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-2
> -rw-r--r-- 2 kylin hdfs 51591031 2020-01-07 04:36 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-3
> -rw-r--r-- 2 kylin hdfs 51648914 2020-01-07 04:35 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-4
> -rw-r--r-- 2 kylin hdfs 51532761 2020-01-07 04:36 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-5
> -rw-r--r-- 2 kylin hdfs 51455652 2020-01-07 04:35 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-6
> -rw-r--r-- 2 kylin hdfs 51552752 2020-01-07 04:36 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-7
> drwxr-xr-x - kylin hdfs 0 2020-01-07 04:25 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid
> -rw-r--r-- 2 kylin hdfs 0 2020-01-07 04:25 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/_SUCCESS
> -rw-r--r-- 2 kylin hdfs 16293012 2020-01-07 04:25 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/part-r-0
> -rw-r--r-- 2 kylin hdfs 16283730 2020-01-07 04:25 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/part-r-1
> -rw-r--r-- 2 kylin hdfs 16288965 2020-01-07 04:25 
> 

[jira] [Commented] (KYLIN-4341) by-level cuboid intermediate files are left behind and not cleaned up after job is complete

2020-02-29 Thread wangrupeng (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048440#comment-17048440
 ] 

wangrupeng commented on KYLIN-4341:
---

Hi, 

Kylin will not delete cuboids file on hdfs when segment building job finished, 
because if this segment need to be merged, kylin won't calculate again, this 
will save the calculation resource. If you are sure the segment won't be 
merged, you can delete the cuboids files on hdfs yourself. :)

 

> by-level cuboid intermediate files are left behind and not cleaned up after 
> job is complete
> ---
>
> Key: KYLIN-4341
> URL: https://issues.apache.org/jira/browse/KYLIN-4341
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.6.4
> Environment: Kylin 2.6.4, CenOS 7.6, HDP 2.6.5
>Reporter: Vsevolod Ostapenko
>Assignee: wangrupeng
>Priority: Major
>
> Setup: MR as a cube build engine and by-level cube build strategy (auto 
> picked).
> Upon completion of a cube segment build job a number of intermediate files 
> are still left behind.
> Namely, output of the MR-jobs that produce the base cuboid, subsequent level 
> cuboids, as well as rowkey_stats from the hfile creation step.
> The files in question consume about the same amount of space in HDFS as the 
> final hfile.
> This lead to wasted space in HDFS that is not released for as long as the 
> corresponding cube segment is online. The only point the leaked space is 
> released, is when segment is offlined and cleaned up as part of the segment 
> retention.
> Sample output is as follows.
> {quote}$ hadoop fs -ls -R 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/
> drwxr-xr-x - kylin hdfs 0 2020-01-07 04:44 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89
> drwxr-xr-x - kylin hdfs 0 2020-01-07 04:26 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid
> drwxr-xr-x - kylin hdfs 0 2020-01-07 04:36 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid
> -rw-r--r-- 2 kylin hdfs 0 2020-01-07 04:36 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/_SUCCESS
> -rw-r--r-- 2 kylin hdfs 51570048 2020-01-07 04:35 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-0
> -rw-r--r-- 2 kylin hdfs 51477377 2020-01-07 04:36 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-1
> -rw-r--r-- 2 kylin hdfs 51615162 2020-01-07 04:35 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-2
> -rw-r--r-- 2 kylin hdfs 51591031 2020-01-07 04:36 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-3
> -rw-r--r-- 2 kylin hdfs 51648914 2020-01-07 04:35 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-4
> -rw-r--r-- 2 kylin hdfs 51532761 2020-01-07 04:36 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-5
> -rw-r--r-- 2 kylin hdfs 51455652 2020-01-07 04:35 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-6
> -rw-r--r-- 2 kylin hdfs 51552752 2020-01-07 04:36 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-7
> drwxr-xr-x - kylin hdfs 0 2020-01-07 04:25 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid
> -rw-r--r-- 2 kylin hdfs 0 2020-01-07 04:25 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/_SUCCESS
> -rw-r--r-- 2 kylin hdfs 16293012 2020-01-07 04:25 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/part-r-0
> -rw-r--r-- 2 kylin hdfs 16283730 2020-01-07 04:25 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/part-r-1
> -rw-r--r-- 2 kylin hdfs 16288965 2020-01-07 04:25 
>