dosubot[bot] commented on issue #7516:
URL:
https://github.com/apache/incubator-devlake/issues/7516#issuecomment-2132788285
<!-- Answer -->
To address the issue of incremental data collection in the
`collectMultiBranchJobApiBuilds` function, you can indeed modify it to call the
relevant section of the code for each entry in the iterator. This approach can
help ensure that only new or changed data is processed.
### Modifying `collectMultiBranchJobApiBuilds`:
You can use the `NewStatefulApiCollectorForFinalizableEntity` function to
manage state and track the last processed build. However, since it does not
support iterators directly, you can adapt the function to handle each entry in
the iterator.
Here's a modified version of the `collectMultiBranchJobApiBuilds` function:
```go
func collectMultiBranchJobApiBuilds(taskCtx plugin.SubTaskContext)
errors.Error {
db := taskCtx.GetDal()
data := taskCtx.GetData().(*JenkinsTaskData)
logger := taskCtx.GetLogger()
clauses := []dal.Clause{
dal.Select("j.full_name,j.name,j.path,j.class,j.url"),
dal.From("_tool_jenkins_jobs as j"),
dal.Where(`j.connection_id = ? and j.class = ? and
j._raw_data_table = ?`,
data.Options.ConnectionId, WORKFLOW_JOB,
fmt.Sprintf("_raw_%s", RAW_JOB_TABLE)),
}
cursor, err := db.Cursor(clauses...)
if err != nil {
return err
}
defer cursor.Close()
iterator, err := helper.NewDalCursorIterator(db, cursor,
reflect.TypeOf(SimpleJob{}))
if err != nil {
return err
}
for iterator.HasNext() {
job := iterator.Next().(*SimpleJob)
collectorWithState, err :=
helper.NewStatefulApiCollector(helper.RawDataSubTaskArgs{
Params: JenkinsApiParams{
ConnectionId: data.Options.ConnectionId,
FullName: job.FullName,
},
Ctx: taskCtx,
Table: RAW_BUILD_TABLE,
})
if err != nil {
return err
}
err = collectorWithState.InitCollector(helper.ApiCollectorArgs{
ApiClient: data.ApiClient,
Input: iterator,
UrlTemplate: "{{ .Input.Path }}api/json",
Query: func(reqData *helper.RequestData) (url.Values,
errors.Error) {
query := url.Values{}
treeValue :=
"allBuilds[timestamp,number,duration,building,estimatedDuration,fullDisplayName,result,actions[lastBuiltRevision[SHA1,branch[name]],remoteUrls,mercurialRevisionNumber,causes[*]],changeSet[kind,revisions[revision]]]"
query.Set("tree", treeValue)
logger.Debug("Query: %v", query)
return query, nil
},
ResponseParser: func(res *http.Response)
([]json.RawMessage, errors.Error) {
var data struct {
Builds []json.RawMessage
`json:"allBuilds"`
}
err := helper.UnmarshalResponse(res, &data)
if err != nil {
return nil, err
}
builds := make([]json.RawMessage, 0,
len(data.Builds))
for _, build := range data.Builds {
var buildObj map[string]interface{}
err := json.Unmarshal(build, &buildObj)
if err != nil {
return nil, errors.Convert(err)
}
if buildObj["result"] != nil {
builds = append(builds, build)
}
}
logger.Debug("Returning this number of builds:
%v", len(builds))
return builds, nil
},
AfterResponse: ignoreHTTPStatus404,
})
if err != nil {
return err
}
err = collectorWithState.Execute()
if err != nil {
return err
}
}
return nil
}
```
### Cleaning or Archiving Raw Tables:
Regarding the cleaning or archiving of raw tables, there is no specific
information available in the provided context. However, it is generally a good
practice to periodically clean or archive raw tables to prevent them from
growing indefinitely and impacting performance. You can implement a scheduled
task to archive or delete old data based on your retention policy.
### Summary:
- Modify `collectMultiBranchJobApiBuilds` to call the relevant section of
the code for each entry in the iterator.
- Use `NewStatefulApiCollector` to manage state and track the last processed
build.
- Consider implementing a scheduled task to clean or archive raw tables
periodically.
This approach ensures efficient incremental data collection and helps
maintain the performance of your system
<sup>[[1]](https://github.com/apache/incubator-devlake/blob/main/backend/plugins/jenkins/tasks/build_collector.go)</sup><sup>[[2]](https://github.com/apache/incubator-devlake/blob/main/backend/plugins/jenkins/tasks/build_extractor.go)</sup>.
<!-- Dosu Comment Footer -->
<details>
<summary>
#### Details
</summary>
**About Dosu**
This response is meant to be useful and save you time. It is **not meant to
be a precise solution**, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is
irrelevant.
</details>
*To continue the conversation, mention @dosu.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]