[ 
https://issues.apache.org/jira/browse/KYLIN-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135521#comment-17135521
 ] 

Guangxu Cheng commented on KYLIN-4567:
--------------------------------------

{noformat}
2020-06-15 13:57:14,386 INFO [main] org.apache.kylin.engine.mr.KylinMapper: 
Accepting Mapper Key with ordinal: 1
2020-06-15 13:57:14,386 INFO [main] org.apache.kylin.engine.mr.KylinMapper: Do 
map, available memory: 3480m
2020-06-15 13:57:46,733 INFO [main] org.apache.kylin.engine.mr.KylinMapper: 
Accepting Mapper Key with ordinal: 100001
2020-06-15 13:57:46,733 INFO [main] org.apache.kylin.engine.mr.KylinMapper: Do 
map, available memory: 2806m
2020-06-15 13:58:13,424 INFO [main] org.apache.kylin.engine.mr.KylinMapper: 
Accepting Mapper Key with ordinal: 200001
2020-06-15 13:58:13,424 INFO [main] org.apache.kylin.engine.mr.KylinMapper: Do 
map, available memory: 2336m
2020-06-15 13:58:16,037 INFO [main] org.apache.hadoop.mapred.MapTask: Spilling 
map output
2020-06-15 13:58:16,037 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart 
= 0; bufend = 80520185; bufvoid = 104857600
2020-06-15 13:58:16,037 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 
26214396(104857584); kvend = 25372888(101491552); length = 841509/6553600
2020-06-15 13:58:16,037 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 
81363241 kvi 20340804(81363216)
2020-06-15 13:58:16,106 INFO [SpillThread] 
org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.deflate]
2020-06-15 13:58:16,116 INFO [SpillThread] 
org.apache.kylin.engine.mr.KylinReducer: Do setup, available memory: 2577m
2020-06-15 13:58:16,119 INFO [SpillThread] 
org.apache.kylin.engine.mr.KylinReducer: Accepting Reducer Key with ordinal: 1
2020-06-15 13:58:16,120 INFO [SpillThread] 
org.apache.kylin.engine.mr.KylinReducer: Do reduce, available memory: 2576m
2020-06-15 13:58:16,120 INFO [SpillThread] 
org.apache.kylin.engine.mr.steps.CuboidReducer: Handling value with ordinal 
(This is not KV number!): 1
2020-06-15 13:58:17,975 INFO [SpillThread] 
org.apache.kylin.engine.mr.KylinReducer: Do cleanup, available memory: 2991m
2020-06-15 13:58:17,975 INFO [SpillThread] 
org.apache.kylin.engine.mr.KylinReducer: Total rows: 1
2020-06-15 13:58:17,979 INFO [SpillThread] 
org.apache.kylin.engine.mr.KylinReducer: Do setup, available memory: 2991m
2020-06-15 13:58:17,980 INFO [SpillThread] 
org.apache.kylin.engine.mr.KylinReducer: Accepting Reducer Key with ordinal: 1
2020-06-15 13:58:17,980 INFO [SpillThread] 
org.apache.kylin.engine.mr.KylinReducer: Do reduce, available memory: 2991m
2020-06-15 13:58:17,980 INFO [SpillThread] 
org.apache.kylin.engine.mr.steps.CuboidReducer: Handling value with ordinal 
(This is not KV number!): 1
2020-06-15 13:58:28,764 INFO [main] org.apache.kylin.engine.mr.KylinMapper: Do 
cleanup, available memory: 3283m
2020-06-15 13:58:28,764 INFO [main] org.apache.kylin.engine.mr.KylinMapper: 
Total rows: 255999
2020-06-15 13:58:28,767 INFO [main] org.apache.hadoop.mapred.MapTask: Starting 
flush of map output
2020-06-15 13:58:33,280 INFO [SpillThread] 
org.apache.kylin.engine.mr.KylinReducer: Do cleanup, available memory: 3048m
2020-06-15 13:58:33,280 INFO [SpillThread] 
org.apache.kylin.engine.mr.KylinReducer: Total rows: 1
2020-06-15 13:58:33,282 INFO [SpillThread] 
org.apache.kylin.engine.mr.KylinReducer: Do setup, available memory: 3048m
2020-06-15 13:58:33,283 INFO [SpillThread] 
org.apache.kylin.engine.mr.KylinReducer: Accepting Reducer Key with ordinal: 1
2020-06-15 13:58:33,283 INFO [SpillThread] 
org.apache.kylin.engine.mr.KylinReducer: Do reduce, available memory: 3048m
2020-06-15 13:58:33,283 INFO [SpillThread] 
org.apache.kylin.engine.mr.steps.CuboidReducer: Handling value with ordinal 
(This is not KV number!): 1
2020-06-15 13:59:22,359 INFO [SpillThread] 
org.apache.kylin.engine.mr.KylinReducer: Do cleanup, available memory: 3305m
2020-06-15 13:59:22,359 INFO [SpillThread] 
org.apache.kylin.engine.mr.KylinReducer: Total rows: 1
2020-06-15 13:59:22,374 INFO [SpillThread] 
org.apache.kylin.engine.mr.KylinReducer: Do setup, available memory: 3305m
2020-06-15 13:59:22,374 INFO [SpillThread] 
org.apache.kylin.engine.mr.KylinReducer: Accepting Reducer Key with ordinal: 1
2020-06-15 13:59:22,374 INFO [SpillThread] 
org.apache.kylin.engine.mr.KylinReducer: Do reduce, available memory: 3305m
2020-06-15 13:59:22,374 INFO [SpillThread] 
org.apache.kylin.engine.mr.steps.CuboidReducer: Handling value with ordinal 
(This is not KV number!): 1
2020-06-15 14:39:34,082 INFO [SpillThread] 
org.apache.kylin.engine.mr.steps.CuboidReducer: Handling value with ordinal 
(This is not KV number!): 100001
{noformat}

>From the map logs we can find that  calculating the TOPN of 13 columns takes 
>about 40 minutes to process 100k rows, which is too slow.

> Improve TopN merge performance in MR engine
> -------------------------------------------
>
>                 Key: KYLIN-4567
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4567
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Measure - TopN
>            Reporter: Guangxu Cheng
>            Assignee: Guangxu Cheng
>            Priority: Major
>
> We have a cube that needs to calculate the TOPN of 13 columns. The number of 
> data source is only 500k. But, the cubing job always fail when building base 
> cuboid.
> we found that the map task always killed by the ApplicationMaster due to time 
> out
> {noformat}
> ERROR-[-10001]-[MR]:[Mr Task 
> Timeout]:[AttemptID:attempt_1591996262448_229922_m_000000_1 Timed out after 
> 3600 secs!] ERROR-[-10015]-[MR]:[Container Exit Accidentally]:[Container 
> killed by the ApplicationMaster. Container killed on request. Exit code is 
> 143 Container exited with a non-zero exit code 143 ]
> {noformat}
> the stack information as below:
> {noformat}
> "SpillThread" #35 daemon prio=5 os_prio=0 tid=0x00007f9a89771800 nid=0x133a2 
> runnable [0x00007f9a56e3f000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.LinkedList.toArray(LinkedList.java:1052)
>         at java.util.List.sort(List.java:477)
>         at java.util.Collections.sort(Collections.java:175)
>         at 
> org.apache.kylin.measure.topn.TopNCounter.sortAndRetain(TopNCounter.java:96)
>         at 
> org.apache.kylin.measure.topn.TopNCounter.merge(TopNCounter.java:183)
>         at 
> org.apache.kylin.measure.topn.TopNAggregator.aggregate(TopNAggregator.java:44)
>         at 
> org.apache.kylin.measure.topn.TopNAggregator.aggregate(TopNAggregator.java:27)
>         at 
> org.apache.kylin.measure.MeasureAggregators.aggregate(MeasureAggregators.java:83)
>         at 
> org.apache.kylin.engine.mr.steps.CuboidReducer.doReduce(CuboidReducer.java:108)
>         at 
> org.apache.kylin.engine.mr.steps.CuboidReducer.doReduce(CuboidReducer.java:44)
>         at 
> org.apache.kylin.engine.mr.KylinReducer.reduce(KylinReducer.java:77)
>         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
>         at 
> org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1688)
>         at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1645)
>         at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:884)
>         at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1540)
> {noformat}
> From the stack information, we found that sorting takes a lot of time. After 
> merge another counter into this counter, need to re-sort this counter. Maybe 
> we can reduce the frequency of sorting



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to