renflo commented on issue #5849: URL: https://github.com/apache/incubator-devlake/issues/5849#issuecomment-1674263355
@klesh the [link you gave](https://devlake.apache.org/docs/Overview/SupportedDataSources/) says that "Full Refresh, Incremental Sync(for issues,MRs)". So I guess we're discussing adding jobs to issues and MR ;) I did some quick research and found the following out: 1. Gitlab supports pagination for the jobs endpoint https://docs.gitlab.com/ee/api/jobs.html#list-pipeline-jobs 2.Gitlab support pages https://docs.gitlab.com/ee/api/jobs.html#list-pipeline-jobs 3. devlake is using it at https://github.com/apache/incubator-devlake/blob/3469c1e786cc058dfcdeb4cd29765f21b8663738/backend/plugins/gitlab/tasks/job_collector.go#L71 So you already have something in place :) I have not fine read the code so i'm not sure how you rely on gitlab's page feature. There are likely many ways of doing it to avoid refeshing everything all the time, for example this: 1. when the job collection task starts it selects the number of previously existing job info objects (haven't looked up what you store) in devlake's database. Say it's N. 2. `(N / per_page)+1` would give the page to which to start the listing from, which would be provided in `page` when querying gitlab This ofc supposes the `per_page` value has not changed between runs. If that's a stretch (eg you haven't hard coded it as I suppose) you could likely store what `per_page` value was used at pipeline level and the current configured value at connection level. When any gitlab collection tasks starts that supports pages it would check the curremt configured value at connection level and compare it to value used during last successful pipeline run. If it's the same the algorithm other could be used. if it's not a full refresh would be necessary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
