JoaoManierii opened a new issue, #8515:
URL: https://github.com/apache/incubator-devlake/issues/8515
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
## Question
We recently had an incident where someone accidentally deleted all three
layers of the pull request data: Raw, Tool, and the final processed tables.
To mitigate the issue, we started manually creating records based on what
the ETL pipeline was failing on, we created some missing entries in the raw
and tool layers and iteratively fixed missing pieces based on the ETL errors.
However, we noticed that the conversion between layers does not seem to be
working reliably. For example, we now have some labels appearing, but the
corresponding pull requests for those labels are missing.
Is there a safer and more efficient way to rebuild the layers to ensure data
consistency and integrity across Raw, Tool, and domain layers? We want to avoid
manual patching if possible and ensure that no orphaned or partial data is left
behind.
## Screenshots
N/A
## Additional context
We suspect that some conversions silently fail, causing incomplete data
propagation. A way to verify and recover from missing or partially converted
data would be very helpful.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]