Startrekzky opened a new issue, #6853:
URL: https://github.com/apache/incubator-devlake/issues/6853

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and 
found no similar feature requirement.
   
   
   ### Use case
   
   As a user who has large repos with more than 100,000 commits, I'd like to 
have incremental sync when collecting Git data. 
   
   Currently, every pipeline takes more than 5 hours to collect data. That 
makes it difficult to utilize DevLake in my org.
   
   ### Description
   
   Support incremental sync in the GitExtractor plugin. Specifically,
   
   | Entity | Sync Mode | Cursor Field |
   | ------| ------------| ------------ |
   | repos  | Full refresh. There's no need to be incremental | N/A |
   | refs     | Full refresh. There's no create/update date of the ref as far 
as I know | N/A |
   | commits  | Incremental |  committed_date. It seems to make more sense than 
commits.authored_date |
   | commit_files  | Incremental |  committed_date. Update the commit_files of 
the new commits. |
   
   ### Related issues
   
   https://github.com/apache/incubator-devlake/issues/6138
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to