[ https://issues.apache.org/jira/browse/KYLIN-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
nichunen updated KYLIN-4322: ---------------------------- Fix Version/s: (was: v3.1.0) v3.1.1 > Cost–benefit of compression HBase result > ---------------------------------------- > > Key: KYLIN-4322 > URL: https://issues.apache.org/jira/browse/KYLIN-4322 > Project: Kylin > Issue Type: Bug > Reporter: ZhouKang > Assignee: ZhouKang > Priority: Major > Fix For: v3.1.1 > > > kylin.storage.hbase.endpoint-compress-result is TRUE as default. > In our production environment, when the hbase scan result is larger than > 200M, it will take more than 10s to compress data. > We can find this by hbase's log: > ||Size||avg rate||min rate||avg time||max time|| > |<1M|0.12|0.25|0.18ms|0.7s| > |1M ~ 10M|0.39|0.97|0.2s|0.6s| > |10M ~ 100M|0.47|0.81|2s|6.3s| > |>100M|0.95|0.96|15.7s|24.8s| > Notice: > # rate: compressed data size / origin data size > # when the source data size is < 1M, compressed data may larger than the > source data. So the table(Row 1) only calculate then compressed data less > than the source data > # In our environment, 65% compression data (<1M) is larger than source data > When source data is less then 10M, the latency of data transmission is > acceptability. When data is larger then 100M, it will take a long time to > compress data. > > So, I think kylin.storage.hbase.endpoint-compress-result should be FALSE by > default; > -- This message was sent by Atlassian Jira (v8.3.4#803005)