tanyao created KYLIN-5099:
-----------------------------

             Summary: parquet file size is too big in kylin4 with spark3 than 
kylin3 with mr
                 Key: KYLIN-5099
                 URL: https://issues.apache.org/jira/browse/KYLIN-5099
             Project: Kylin
          Issue Type: Bug
    Affects Versions: v4.0.0
            Reporter: tanyao
         Attachments: image-2021-10-15-10-43-54-830.png, 
image-2021-10-15-10-44-49-178.png

hi ,

 i am trying to use spark 3.1.1 as the build engine in kylin4.0, the hive table 
has 200W+ rows with orc type, and there are 10 dimentions definded, the 
original size is about 50M.

when i use kylin4.0 to build this cube ,the final parquet files size all 
together  area 1G+,that is to say , a singal segment  is about 1G+. However , i 
use the same hive table data with the same cube model and dimentions , the 
hbase segment size is just 100M+,

why this happened?

!image-2021-10-15-10-43-54-830.png!

 

!image-2021-10-15-10-44-49-178.png!

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to