wouldd commented on issue #7826: URL: https://github.com/apache/incubator-devlake/issues/7826#issuecomment-2343343281
@klesh the problem is not consistently happening (which is part of the problem) so I don't think it's something obviously co-inciding with code changes. rather I suspect a subtle timing condition based on the jira project itself and how things happen to run in the code. I've been adding debug logging as best I can to flesh out my understanding of what's happening and I do wonder if there is a potential problem with the batch divider logic. my understanding is that the code batches db writes by issue type into sets of 500 before they are then written in one go. The first time the code sees a given issue type it creates an empty batch to start using and at that point it calls delete on the database  I'm seeing quite a few deletes to the same raw database during the process and it's not clear to me that this is scoped. I'm wondering if there is a scenario in which data has been written by one batch when another is created and triggers a wipe of the data that was already written? in general my observation is that the stricture of these raw data tables is forcing a situation whereby there is no unique identifier for given issue payload? maybe I'm misreading things but it would seem there would be no need to purge this table ahead of a full refresh if the id was based on the jira unique issue id, it would just be able to do a createorupdate which would mean you'd never have weird gaps when the data is dropped etc. I will say that having instrumented the code and switched on debug logging I have not caught a failure scenario which could be bad luck or it could be that the act of logging more has shifted the timing a little to make it less of a problem -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
