AnkeshThakur opened a new issue, #8361:
URL: https://github.com/apache/incubator-devlake/issues/8361
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
## Question
We have been using devlake for a while and we are increasing its scope to
monitor many of our projects. With increased usage, we are hit with very long
Dora calculation step. Total process takes around 10+ hours to complete. Logs
says dora's `calculateChangeLeadTime` is taking a lot of time (log snip is
below).
Is there any known way to improve it significantly? Have added few indexes
but they did not help. Maybe more indexes or some parameters to boost
concurrency?
## Screenshots
If applicable, add screenshots to help explain.
## Additional context
Slow query log is showing that these tasks are processing one data point at
a time and scan significantly high number of rows. Adding index to relevant
columns improves efficiency by ~25%. However, overall process still takes good
amount of time which makes it unsuitable for daily data refresh because during
this task no data is available in `project_pr_metrics` and other important
tables.
Sample slow query logs:
```
# Time: 2025-03-25T10:26:54.420337Z
# User@Host: ########[#####] @ [##.##.###.##] Id: 17460
# Query_time: 10.153949 Lock_time: 0.000004 Rows_sent: 1 Rows_examined:
314841
SET timestamp=1742898404;
SELECT dc.* FROM cicd_deployment_commits dc LEFT JOIN
cicd_deployment_commits p ON (dc.prev_success_deployment_commit_id = p.id) LEFT
JOIN project_mapping pm ON (pm.table = 'cicd_scopes' AND pm.row_id =
dc.cicd_scope_id) INNER JOIN commits_diffs cd ON (cd.new_commit_sha =
dc.commit_sha AND cd.old_commit_sha = COALESCE (p.commit_sha, '')) WHERE
dc.prev_success_deployment_commit_id <> '' AND dc.environment = 'PRODUCTION'
AND (pm.project_name = '#####' AND cd.commit_sha =
'csrffa028d4c5c85341f861beb685fba4e115e65' AND dc.RESULT = 'SUCCESS') ORDER BY
dc.started_date, dc.id ASC LIMIT 1;
```
Log of `calculateChangeLeadTime`
```
[GIN] 2025/03/25 - 13:41:54 | 200 | 2.83µs | 10.221.51.137 | GET
"/ping"
time="2025-03-25 13:41:55" level=info msg=" [pipeline service] [pipeline
#144] [task #2081] [calculateChangeLeadTime] finished records: 1699(not
exactly)"
time="2025-03-25 13:41:58" level=info msg=" [pipeline service] [pipeline
#144] [task #2081] [calculateChangeLeadTime] finished records: 1701(not
exactly)"
[GIN] 2025/03/25 - 13:41:59 | 200 | 2.35µs | 10.221.51.137 | GET
"/ping"
[GIN] 2025/03/25 - 13:41:59 | 200 | 1.34µs | 10.221.51.137 | GET
"/ping"
time="2025-03-25 13:42:02" level=info msg=" [pipeline service] [pipeline
#144] [task #2081] [calculateChangeLeadTime] finished records: 1703(not
exactly)"
[GIN] 2025/03/25 - 13:42:04 | 200 | 2.62µs | 10.221.51.137 | GET
"/ping"
[GIN] 2025/03/25 - 13:42:04 | 200 | 880ns | 10.221.51.137 | GET
"/ping"
time="2025-03-25 13:42:06" level=info msg=" [pipeline service] [pipeline
#144] [task #2081] [calculateChangeLeadTime] finished records: 1705(not
exactly)"
[GIN] 2025/03/25 - 13:42:09 | 200 | 2.69µs | 10.221.51.137 | GET
"/ping"
[GIN] 2025/03/25 - 13:42:09 | 200 | 960ns | 10.221.51.137 | GET
"/ping"
time="2025-03-25 13:42:10" level=info msg=" [pipeline service] [pipeline
#144] [task #2081] [calculateChangeLeadTime] finished records: 1707(not
exactly)"
time="2025-03-25 13:42:13" level=info msg=" [pipeline service] [pipeline
#144] [task #2081] [calculateChangeLeadTime] finished records: 1709(not
exactly)"
[GIN] 2025/03/25 - 13:42:14 | 200 | 2.63µs | 10.221.51.137 | GET
"/ping"
[GIN] 2025/03/25 - 13:42:14 | 200 | 880ns | 10.221.51.137 | GET
"/ping"
time="2025-03-25 13:42:16" level=info msg=" [pipeline service] [pipeline
#144] [task #2081] [calculateChangeLeadTime] finished records: 1711(not
exactly)"
[GIN] 2025/03/25 - 13:42:19 | 200 | 2.35µs | 10.221.51.137 | GET
"/ping"
[GIN] 2025/03/25 - 13:42:19 | 200 | 870ns | 10.221.51.137 | GET
"/ping"
time="2025-03-25 13:42:20" level=info msg=" [pipeline service] [pipeline
#144] [task #2081] [calculateChangeLeadTime] finished records: 1713(not
exactly)"
time="2025-03-25 13:42:23" level=info msg=" [pipeline service] [pipeline
#144] [task #2081] [calculateChangeLeadTime] finished records: 1715(not
exactly)"
[GIN] 2025/03/25 - 13:42:24 | 200 | 2.41µs | 10.221.51.137 | GET
"/ping"
[GIN] 2025/03/25 - 13:42:24 | 200 | 1.28µs | 10.221.51.137 | GET
"/ping"
time="2025-03-25 13:42:26" level=info msg=" [pipeline service] [pipeline
#144] [task #2081] [calculateChangeLeadTime] finished records: 1717(not
exactly)"
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]