renflo commented on issue #5849:
URL: 
https://github.com/apache/incubator-devlake/issues/5849#issuecomment-1674263355

   @klesh the [link you 
gave](https://devlake.apache.org/docs/Overview/SupportedDataSources/) says that 
"Full Refresh, Incremental Sync(for issues,MRs)". So I guess we're discussing 
adding jobs to issues and MR ;)
   
   I did some quick research and found the following out:
   1. Gitlab supports pagination for the jobs endpoint 
https://docs.gitlab.com/ee/api/jobs.html#list-pipeline-jobs
   2.Gitlab support pages 
https://docs.gitlab.com/ee/api/jobs.html#list-pipeline-jobs
   3. devlake is using it at 
https://github.com/apache/incubator-devlake/blob/3469c1e786cc058dfcdeb4cd29765f21b8663738/backend/plugins/gitlab/tasks/job_collector.go#L71
 
   
   So you already have something in place :) 
   
   I have not fine read the code so i'm not sure how you rely on gitlab's page 
feature. There are likely many ways of doing it to avoid refeshing everything 
all the time, for example this:
   
   1. when the job collection task starts it selects the number of previously 
existing job info objects (haven't looked up what you store) in devlake's 
database. Say it's N.
   2. `(N / per_page)+1`  would give the page to which to start the listing 
from, which would be provided in `page` when querying gitlab
   
   This ofc supposes the `per_page` value has not changed between runs. If 
that's a stretch (eg you haven't hard coded it as I suppose) you could likely 
store what `per_page` value was used at pipeline level and the current 
configured value at connection level. When any gitlab collection tasks starts 
that supports pages it would check the curremt configured value at connection 
level and compare it to value used during last successful pipeline run. If it's 
the same the algorithm other could be used. if it's not a full refresh would be 
necessary.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to