dosubot[bot] commented on issue #8529:
URL:
https://github.com/apache/incubator-devlake/issues/8529#issuecomment-3166612215
<!-- Answer -->
The _raw_gitlab_api_users table grows large and accumulates duplicates
because DevLake collects user data for every repository, especially on GitLab
Server. There’s no built-in deduplication, so you’ll need to manage cleanup
manually or limit how often users are collected.
**To deduplicate or clean up the table:**
1. **Back up your database first.**
2. For a one-time manual cleanup, you can use SQL to remove duplicates.
Here’s an example for PostgreSQL (adjust table name if needed):
```sql
DELETE FROM _raw_gitlab_api_users a
USING _raw_gitlab_api_users b
WHERE
a.ctid < b.ctid
AND a.id = b.id;
```
For MySQL, you might use:
```sql
DELETE t1 FROM _raw_gitlab_api_users t1
INNER JOIN _raw_gitlab_api_users t2
WHERE
t1.id = t2.id
AND t1.primary_key_column > t2.primary_key_column;
```
Replace `primary_key_column` with your actual primary key.
3. To automate, schedule this SQL as a cron job or use your DB’s scheduler.
**To limit how often users are collected:**
The most effective workaround is to refactor your pipeline so user
collection happens only once per pipeline run, not per repository. This
involves:
- Adding a first stage that collects, extracts, and converts users.
- Removing duplicate user collection tasks from later stages.
- This approach has been tested and shown to reduce table size and
duplication significantly ([details and
example](https://github.com/apache/incubator-devlake/issues/8283), [more
context](https://github.com/apache/incubator-devlake/issues/8523)).
There’s no config or environment variable for this in 1.0.2-beta5, but a
proposed `GITLAB_SERVER_COLLECT_ALL_USERS` variable may help in future versions.
Always test cleanup on a staging environment first and check for unique
constraints before running destructive queries.
<!-- Dosu Comment Footer -->
*To reply, just mention [@dosu](https://go.dosu.dev/dosubot).*
---
<sup>How did I do?
[Good](https://app.dosu.dev/response-feedback/94786185-8a80-4c4b-b583-2ab4b66f7fd9?feedback_type=great_response)
|
[Irrelevant](https://app.dosu.dev/response-feedback/94786185-8a80-4c4b-b583-2ab4b66f7fd9?feedback_type=irrelevant_answer)
|
[Incorrect](https://app.dosu.dev/response-feedback/94786185-8a80-4c4b-b583-2ab4b66f7fd9?feedback_type=incorrect_sources)
|
[Verbose](https://app.dosu.dev/response-feedback/94786185-8a80-4c4b-b583-2ab4b66f7fd9?feedback_type=too_verbose)
|
[Hallucination](https://app.dosu.dev/response-feedback/94786185-8a80-4c4b-b583-2ab4b66f7fd9?feedback_type=hallucination)
| [Report
🐛](https://app.dosu.dev/response-feedback/94786185-8a80-4c4b-b583-2ab4b66f7fd9?feedback_type=bug_report)
|
[Other](https://app.dosu.dev/response-feedback/94786185-8a80-4c4b-b583-2ab4b66f7fd9?feedback_type=other)</sup> [](https://app.dosu.dev/b4e8e847-d479-4541-83a8-d88d83fea5c9/ask?utm_source=githu
b) [](https://go.dosu.dev/discord-bot) [](https://twitter.com/intent/tweet?text=%40dosu_ai%20helped%20me%20solve%20this%20issue!&url=https%3A//github.com/apache/incubator-devlake/issues/8529)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]