[
https://issues.apache.org/jira/browse/KYLIN-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927722#comment-17927722
]
Alexander commented on KYLIN-5985:
----------------------------------
After research, we identified the problem and created a test case for it.
Steps to recreate in learn_kylin ssb project:
# Edit model and add a huge value of measeures and dimensions. In my case a
179 measure and 29 dimension.
!image-2025-02-17-12-48-56-969.png|width=1239,height=450!
2. Run Full build.
!image-2025-02-17-13-28-55-544.png|width=1184,height=582!
there are many too long metadata updates:
{code:java}
2025-02-14T16:20:11,940 WARN [logger-thread-0] transaction.UnitOfWork :
UnitOfWork 62ab38c4-6c23-1fd2-394e-94d49177f5cc takes too long time 21594ms to
complete
2025-02-14T16:20:11,940 WARN [logger-thread-0] transaction.UnitOfWork :
current stack:
java.lang.Throwable: null
at
org.apache.kylin.common.persistence.transaction.UnitOfWork.logIfLongTransaction(UnitOfWork.java:184)
~[newten-job.jar:5.0.0-r5]
at
org.apache.kylin.common.persistence.transaction.UnitOfWork.doTransaction(UnitOfWork.java:153)
~[newten-job.jar:5.0.0-r5]
at
org.apache.kylin.common.persistence.transaction.UnitOfWork.doInTransactionWithRetry(UnitOfWork.java:115)
~[newten-job.jar:5.0.0-r5]
at
org.apache.kylin.common.persistence.transaction.UnitOfWork.doInTransactionWithRetry(UnitOfWork.java:85)
~[newten-job.jar:5.0.0-r5] {code}
!image-2025-02-17-13-30-48-044.png!
Whole build Excuting: 24.31m
Save metadata - 477 981ms = 7,96m - which is 32.76% from whole build.
We have a huge cluster that processes data, and 32.76% of the time is spent in
bottlenecks in the driver, which is spent saving metadata...
export project in attahment. - learn_kylin_model_metadata.zip
spark logs and history in attachment. -
learn_kylin_model_metadata_build_diagnostic.zip
spark dump - eventLogs-application_1737114540896_0916.zip
> Spark build indexes persormance issue. Many pauses/freezes in spark UI.
> -----------------------------------------------------------------------
>
> Key: KYLIN-5985
> URL: https://issues.apache.org/jira/browse/KYLIN-5985
> Project: Kylin
> Issue Type: Bug
> Components: Job Engine
> Affects Versions: 5.0.0
> Environment: Rocky linux 8.
> Hadoop - Bigtop 3.3.0 distribution.
> Reporter: Alexander
> Priority: Blocker
> Attachments: SparkUI.jpeg,
> eventLogs-application_1738068922293_0056.zip,
> image-2025-01-29-12-40-18-326.png, image-2025-01-29-12-40-58-725.png,
> image-2025-01-29-12-46-03-795.png, image-2025-01-29-12-46-50-561.png,
> image-2025-01-29-12-52-20-907.png, image-2025-01-29-12-53-01-478.png,
> image-2025-02-17-12-48-56-969.png, image-2025-02-17-13-28-55-544.png,
> image-2025-02-17-13-30-48-044.png,
> ns3246587.ip-57-128-229.eu_7070_job_2025_01_29_09_27_19_862C85.zip
>
>
> Spark build job freezes and a lot of time spend.
> Build job works 88,94 minutes.
> !image-2025-01-29-12-40-18-326.png!
> Loads a model with only 71,734 rows and 97 indexes.
> !image-2025-01-29-12-52-20-907.png!
>
> Builded segment size 1.46mb
> !image-2025-01-29-12-53-01-478.png!
> With next parameters
> |spark.driver.memory|6G|
> |spark.driver.memoryOverhead|2G|
> |spark.executor.instances|6|
> |spark.executor.memory|6G|
> |spark.executor.memoryOverhead|2G|
>
> in spark UI we have a lot of pauses/freezes. It looks like nothing happend at
> this pauses. No any tasks on executors.
> !image-2025-01-29-12-46-50-561.png!
> Full screen of a spark job - SparkUI.jpeg.
> Diagnostic information in attachment.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)