[ https://issues.apache.org/jira/browse/KYLIN-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shaofeng SHI updated KYLIN-3462: -------------------------------- Fix Version/s: v2.5.0 > "dfs.replication=2" and compression not work in Spark cube engine > ----------------------------------------------------------------- > > Key: KYLIN-3462 > URL: https://issues.apache.org/jira/browse/KYLIN-3462 > Project: Kylin > Issue Type: Bug > Components: Spark Engine > Affects Versions: v2.3.0, v2.3.1, v2.4.0 > Reporter: Shaofeng SHI > Assignee: Shaofeng SHI > Priority: Major > Fix For: v2.5.0 > > Attachments: cuboid_generated_by_mr.png, cuboid_generated_by_spark.png > > > In a comparison between Spark and MR cubing, I noticed the cuboid files that > Spark engine generated is 3x lager than MR, and took 4x larger more disk on > HDFS than MR. > > The reason is, the "dfs.replication=2" didn't work when Spark save to HDFS. > And by default no compression for spark. > > The converted HFiles are in the same size, the query results are the same. So > this difference may easily be overlooked. -- This message was sent by Atlassian JIRA (v7.6.3#76005)