[ 
https://issues.apache.org/jira/browse/KYLIN-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nichunen updated KYLIN-4322:
----------------------------
    Fix Version/s:     (was: v3.1.0)
                   v3.1.1

> Cost–benefit of compression HBase result
> ----------------------------------------
>
>                 Key: KYLIN-4322
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4322
>             Project: Kylin
>          Issue Type: Bug
>            Reporter: ZhouKang
>            Assignee: ZhouKang
>            Priority: Major
>             Fix For: v3.1.1
>
>
> kylin.storage.hbase.endpoint-compress-result is  TRUE as default.
> In our production environment, when the hbase scan result is larger than 
> 200M, it will take more than 10s to compress data.
> We can find this by hbase's log:
> ||Size||avg rate||min rate||avg time||max time||
> |<1M|0.12|0.25|0.18ms|0.7s|
> |1M ~ 10M|0.39|0.97|0.2s|0.6s|
> |10M ~ 100M|0.47|0.81|2s|6.3s|
> |>100M|0.95|0.96|15.7s|24.8s|
> Notice:
>  # rate: compressed data size / origin data size
>  # when the source data size is < 1M, compressed data may larger than the 
> source data. So the table(Row 1) only calculate then compressed data less 
> than the source data
>  # In our environment, 65% compression data (<1M) is larger than source data 
> When source data is less then 10M, the latency of data transmission is 
> acceptability. When data is larger then 100M, it will take a long time to 
> compress data.
>  
> So, I think kylin.storage.hbase.endpoint-compress-result  should be FALSE by 
> default;
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to