[ 
https://issues.apache.org/jira/browse/KYLIN-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289024#comment-17289024
 ] 

ASF GitHub Bot commented on KYLIN-4903:
---------------------------------------

zzcclp commented on a change in pull request #1583:
URL: https://github.com/apache/kylin/pull/1583#discussion_r580960574



##########
File path: 
kylin-spark-project/kylin-spark-engine/src/main/scala/org/apache/kylin/engine/spark/job/CubeBuildJob.java
##########
@@ -319,45 +324,61 @@ private void build(Collection<NBuildSourceInfo> 
buildSourceInfos, SegmentInfo se
     // build current layer and return the next layer to be built.
     private List<NBuildSourceInfo> buildLayer(Collection<NBuildSourceInfo> 
buildSourceInfos, SegmentInfo seg,
                                               SpanningTree st) {
-        int cuboidsNumInLayer = 0;
 
-        // build current layer
+        int cuboidsNumInLayer = 0;
         List<LayoutEntity> allIndexesInCurrentLayer = new ArrayList<>();
+
+        //record build infos before building
         for (NBuildSourceInfo info : buildSourceInfos) {
             Collection<LayoutEntity> toBuildCuboids = info.getToBuildCuboids();
             infos.recordParent2Children(info.getLayout(),
                     
toBuildCuboids.stream().map(LayoutEntity::getId).collect(Collectors.toList()));
             cuboidsNumInLayer += toBuildCuboids.size();
             Preconditions.checkState(!toBuildCuboids.isEmpty(), "To be built 
cuboids is empty.");
-            Dataset<Row> parentDS = info.getParentDS();
-            // record the source count of flat table
-            if (info.getLayoutId() == ParentSourceChooser.FLAT_TABLE_FLAG()) {
-                cuboidsRowCount.putIfAbsent(info.getLayoutId(), 
parentDS.count());
-            }
+            
info.getToBuildCuboids().stream().forEach(allIndexesInCurrentLayer::add);
+            infos.recordCuboidsNumPerLayer(seg.id(), cuboidsNumInLayer);

Review comment:
       the value of 'cuboidsNumInLayer' is the total count of current segments, 
why needs to set this value into infos per loop




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> cache parent datasource to accelerate next layer's cuboid building
> ------------------------------------------------------------------
>
>                 Key: KYLIN-4903
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4903
>             Project: Kylin
>          Issue Type: Improvement
>    Affects Versions: v4.0.0-beta
>            Reporter: ShengJun Zheng
>            Assignee: ShengJun Zheng
>            Priority: Major
>             Fix For: v4.0.0-GA
>
>
> In Kylin V4, parent datasource is not cached in next layer's cuboid building, 
> causing repeated HDFS files read. Cacheing parent datasource in memory will 
> in enhance 20~30% build performance in our case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to