danielemoraschi opened a new pull request, #8815:
URL: https://github.com/apache/incubator-devlake/pull/8815

   ### ⚠️ Pre Checklist
    
   - [x] I have read through the [Contributing 
Documentation](https://devlake.apache.org/community/).
   - [x] I have added relevant tests. 
   - [x] I have added relevant documentation.
   - [x] I will add labels to the PR, such as `pr-type/bug-fix`, 
`pr-type/feature-development`, etc. 
   
   ### Summary
   
   Fixes `clearHistoryData()` in the linker plugin which was deleting all 
`pull_request_issues` records instead of only the current project's 
linker-created rows.
    
   **Root cause:** The function used a `LEFT JOIN` with `project_name` in the 
`ON` clause. With a LEFT JOIN, unmatched rows still appear in the result, so 
the subquery returned every PR ID in the system effectively wiping the entire 
`pull_request_issues` table on every linker run.
    
   **Impact:** When two projects share a GitHub repo, running the pipeline for 
one project deleted all PR-issue links created by the other project's pipeline 
(and links from the GitHub converter).
    
   **Fix:** 
   - `INNER JOIN` + `WHERE` for `project_name` (fixes the LEFT JOIN bug)
   - Issue-side subquery scoped to current project's boards (prevents 
cross-project deletion)
   - `_raw_data_table` / `_raw_data_remark` filter to only delete 
linker-created rows (preserves GitHub converter rows)
    
   **Tests:**
   - Added `TestLinkPrToIssueWithSharedRepo` e2e test with CSV fixtures 
simulating two projects sharing a repo
   - Verifies that running the linker for one project correctly creates its 
links, deletes its stale links, and preserves the other project's linker links 
and converter links
   - Existing `TestLinkPrToIssue` continues to pass unchanged 
   
   ### Does this close any open issues?
   
   Closes #8814
   
   ### Screenshots
   
   N/A — backend-only change.
   
   ### Other Information
   
   The bug was introduced in commit 
[`a4cb023ba`](https://github.com/apache/incubator-devlake/commit/a4cb023ba) 
(May 2024, "Clear history data when running linker"). The existing e2e test did 
not catch it because it only covered a single project and flushed the table 
before running.
   
   **One edge case:**
   If an issue is removed from a project's board between two linker runs, the 
old stale link for that issue won't be cleaned up (because the issue-side 
subquery no longer matches it). But this matches how the creation logic works, 
it also scopes to current board state. And the old code had the same conceptual 
issue (it just masked it by deleting everything).
   
   
   Opening this PR since this issue is a blocker for my setup.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to