[ 
https://issues.apache.org/jira/browse/KYLIN-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548902#comment-16548902
 ] 

ASF GitHub Bot commented on KYLIN-3457:
---------------------------------------

shaofengshi commented on a change in pull request #166: KYLIN-3457 Distribute 
by multi column if not set distribute column
URL: https://github.com/apache/kylin/pull/166#discussion_r203622944
 
 

 ##########
 File path: core-job/src/main/java/org/apache/kylin/job/JoinedFlatTable.java
 ##########
 @@ -252,16 +260,30 @@ private static String getHiveDataType(String 
javaDataType) {
         return hiveDataType;
     }
 
-    public static String 
generateRedistributeFlatTableStatement(IJoinedFlatTableDesc flatDesc) {
+    public static String 
generateRedistributeFlatTableStatement(IJoinedFlatTableDesc flatDesc, CubeDesc 
cubeDesc) {
         final String tableName = flatDesc.getTableName();
         StringBuilder sql = new StringBuilder();
         sql.append("INSERT OVERWRITE TABLE " + tableName + " SELECT * FROM " + 
tableName);
 
-        TblColRef clusterCol = flatDesc.getClusterBy();
-        if (clusterCol != null) {
-            appendClusterStatement(sql, clusterCol);
+        if (flatDesc.getClusterBy() != null) {
+            appendClusterStatement(sql, flatDesc.getClusterBy());
+        } else if (flatDesc.getDistributedBy() != null) {
+            appendDistributeStatement(sql, 
Lists.newArrayList(flatDesc.getDistributedBy()));
         } else {
-            appendDistributeStatement(sql, flatDesc.getDistributedBy());
+            int redistColumnCount = 
KylinConfig.getInstanceFromEnv().getHiveRedistributeColumnCount();
 
 Review comment:
   Should use cube level config, so that this parameter can be overwritten at 
cube level.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Distribute by multi column if not set distribute column during the 
> redistribute step
> ------------------------------------------------------------------------------------
>
>                 Key: KYLIN-3457
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3457
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>            Reporter: Chao Long
>            Assignee: Chao Long
>            Priority: Major
>             Fix For: v2.5.0
>
>
> KYLIN-3388 remove redistribute step may cause a data skew problem。



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to