[ 
https://issues.apache.org/jira/browse/KYLIN-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927722#comment-17927722
 ] 

Alexander commented on KYLIN-5985:
----------------------------------

After research, we identified the problem and created a test case for it.

Steps to recreate in learn_kylin ssb project:
 # Edit model and add a huge value of measeures and dimensions. In my case a 
179 measure and 29 dimension.

!image-2025-02-17-12-48-56-969.png|width=1239,height=450!

      2. Run Full build. 

!image-2025-02-17-13-28-55-544.png|width=1184,height=582!

there are many too long metadata updates:
{code:java}
2025-02-14T16:20:11,940 WARN  [logger-thread-0] transaction.UnitOfWork : 
UnitOfWork 62ab38c4-6c23-1fd2-394e-94d49177f5cc takes too long time 21594ms to 
complete
2025-02-14T16:20:11,940 WARN  [logger-thread-0] transaction.UnitOfWork : 
current stack: 
java.lang.Throwable: null
    at 
org.apache.kylin.common.persistence.transaction.UnitOfWork.logIfLongTransaction(UnitOfWork.java:184)
 ~[newten-job.jar:5.0.0-r5]
    at 
org.apache.kylin.common.persistence.transaction.UnitOfWork.doTransaction(UnitOfWork.java:153)
 ~[newten-job.jar:5.0.0-r5]
    at 
org.apache.kylin.common.persistence.transaction.UnitOfWork.doInTransactionWithRetry(UnitOfWork.java:115)
 ~[newten-job.jar:5.0.0-r5]
    at 
org.apache.kylin.common.persistence.transaction.UnitOfWork.doInTransactionWithRetry(UnitOfWork.java:85)
 ~[newten-job.jar:5.0.0-r5] {code}
 

!image-2025-02-17-13-30-48-044.png!

Whole build Excuting: 24.31m

Save metadata - 477 981ms = 7,96m - which is 32.76% from whole build.

We have a huge cluster that processes data, and 32.76% of the time is spent in 
bottlenecks in the driver, which is spent saving metadata...

 

export project in attahment. - learn_kylin_model_metadata.zip

spark logs and history in attachment. - 
learn_kylin_model_metadata_build_diagnostic.zip

spark dump - eventLogs-application_1737114540896_0916.zip

> Spark build indexes persormance issue. Many pauses/freezes in spark UI.
> -----------------------------------------------------------------------
>
>                 Key: KYLIN-5985
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5985
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine
>    Affects Versions: 5.0.0
>         Environment: Rocky linux 8.
> Hadoop - Bigtop 3.3.0 distribution.
>            Reporter: Alexander
>            Priority: Blocker
>         Attachments: SparkUI.jpeg, 
> eventLogs-application_1738068922293_0056.zip, 
> image-2025-01-29-12-40-18-326.png, image-2025-01-29-12-40-58-725.png, 
> image-2025-01-29-12-46-03-795.png, image-2025-01-29-12-46-50-561.png, 
> image-2025-01-29-12-52-20-907.png, image-2025-01-29-12-53-01-478.png, 
> image-2025-02-17-12-48-56-969.png, image-2025-02-17-13-28-55-544.png, 
> image-2025-02-17-13-30-48-044.png, 
> ns3246587.ip-57-128-229.eu_7070_job_2025_01_29_09_27_19_862C85.zip
>
>
> Spark build job freezes and a lot of time spend.
> Build job works 88,94 minutes.
> !image-2025-01-29-12-40-18-326.png!
> Loads a model with only 71,734 rows and 97 indexes.
> !image-2025-01-29-12-52-20-907.png!
>  
> Builded segment size 1.46mb
> !image-2025-01-29-12-53-01-478.png!
> With next parameters
> |spark.driver.memory|6G|
> |spark.driver.memoryOverhead|2G|
> |spark.executor.instances|6|
> |spark.executor.memory|6G|
> |spark.executor.memoryOverhead|2G|
>  
> in spark UI we have a lot of pauses/freezes. It looks like nothing happend at 
> this pauses. No any tasks on executors.
> !image-2025-01-29-12-46-50-561.png!
> Full screen of a spark job - SparkUI.jpeg.
> Diagnostic information in attachment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to