[jira] [Created] (KYLIN-5561) Optimize the build performance for models containing semi-additive measure

Guangyuan Feng (Jira) Tue, 06 Jun 2023 23:39:08 -0700

Guangyuan Feng created KYLIN-5561:
-------------------------------------

             Summary: Optimize the build performance for models containing 
semi-additive measure
                 Key: KYLIN-5561
                 URL: https://issues.apache.org/jira/browse/KYLIN-5561
             Project: Kylin
          Issue Type: Bug
          Components: Modeling
    Affects Versions: 5.0-alpha
            Reporter: Guangyuan Feng
            Assignee: Yaguang Jia
             Fix For: 5.0-alpha



When building a model with aggregate function `sum_lc`, it takes too much time 
to complete the calculation even on a small dataset. After dug into it's 
implementation, we found the root cause is that the `serialize` will always 
allocate a new array with `1024 * 1024` bytes as the temporary place to store 
the serialized value of `SumLCCounter`.

Actually, only a decimal and a long value of a `SumLCCounter` object should be 
serialized, generally the serialized data size is about `8 + 8` bytes in 64-bit 
platform, so obviously the temporary array is too big to store the result.

After deduce the init size of the temporary array, for example 32-Bytes, the 
total time to complete the calculation of `sum_lc` on 10GB datasets, have been 
reduced from 16min => 4min.

Here is the benchmark tests:
{code:java}
// After optimized

# Warmup: 1 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: io.kyligence.pe.JmhSumLCApplication.dynamicLength

# Run progress: 0.00% complete, ETA 00:04:00
# Fork: 1 of 2
# Warmup Iteration   1: 39082.864 ops/ms
Iteration   1: 41760.550 ops/ms
Iteration   2: 47911.634 ops/ms
Iteration   3: 47353.936 ops/ms
Iteration   4: 46888.688 ops/ms
Iteration   5: 48378.075 ops/ms

# Run progress: 25.00% complete, ETA 00:03:02
# Fork: 2 of 2
# Warmup Iteration   1: 39479.279 ops/ms
Iteration   1: 42066.415 ops/ms
Iteration   2: 48499.974 ops/ms
Iteration   3: 48524.844 ops/ms
Iteration   4: 48431.830 ops/ms
Iteration   5: 48451.256 ops/ms


Result "io.kyligence.pe.JmhSumLCApplication.dynamicLength":
  46826.720 ±(99.9%) 4002.887 ops/ms [Average]
  (min, avg, max) = (41760.550, 46826.720, 48524.844), stdev = 2647.662
  CI (99.9%): [42823.833, 50829.607] (assumes normal distribution)


// Before optimized
# Warmup: 1 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: io.kyligence.pe.JmhSumLCApplication.fixLength

# Run progress: 50.00% complete, ETA 00:02:01
# Fork: 1 of 2
# Warmup Iteration   1: 22.364 ops/ms
Iteration   1: 25.354 ops/ms
Iteration   2: 25.252 ops/ms
Iteration   3: 20.566 ops/ms
Iteration   4: 20.668 ops/ms
Iteration   5: 21.585 ops/ms

# Run progress: 75.00% complete, ETA 00:01:00
# Fork: 2 of 2
# Warmup Iteration   1: 22.953 ops/ms
Iteration   1: 25.362 ops/ms
Iteration   2: 24.041 ops/ms
Iteration   3: 21.774 ops/ms
Iteration   4: 25.131 ops/ms
Iteration   5: 25.594 ops/ms


Result "io.kyligence.pe.JmhSumLCApplication.fixLength":
  23.533 ±(99.9%) 3.210 ops/ms [Average]
  (min, avg, max) = (20.566, 23.533, 25.594), stdev = 2.123
  CI (99.9%): [20.323, 26.743] (assumes normal distribution)


# Run complete. Total time: 00:04:03

REMEMBER: The numbers below are just data. To gain reusable insights, you need 
to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design 
factorial
experiments, perform baseline and negative tests that provide experimental 
control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from 
the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                           Mode  Cnt      Score      Error   Units
JmhSumLCApplication.dynamicLength  thrpt   10  46826.720 ± 4002.887  ops/ms
JmhSumLCApplication.fixLength      thrpt   10     23.533 ±    3.210  ops/ms 
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KYLIN-5561) Optimize the build performance for models containing semi-additive measure

Reply via email to