GitHub user dosubot[bot] edited a comment on the discussion: Transient 502s & 
Incremental Cursor Gaps: Scaling github_graphql for 300+ Repos

<!-- Greeting -->
Hi @zaiddialpad! I'm [Dosu](https://go.dosu.dev/dosubot) and I’m helping the 
incubator-devlake team.

<!-- Answer -->
Great questions — I dug into the codebase to give you concrete answers on all 
three.

**1. Retry Logic: Yes, it exists, but with caveats.**

The `GraphqlAsyncClient` *does* retry HTTP-level errors including 502 and 503 
[[1]](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/helpers/pluginhelper/api/graphql_async_client.go#L158-L189).
 It uses a **fixed backoff** (not exponential) controlled by two env vars:

| Variable | Default | What it does |
|---|---|---|
| `API_RETRY` | `3` | Max retry attempts per GraphQL request |
| `API_TIMEOUT` | `30s` | Wait duration *between* retries (for the GraphQL 
client specifically) |

So if you're seeing 502s kill tasks after retries, you may want to bump 
`API_RETRY` higher (e.g., `5` or `7`) given your scale. Note that GraphQL-level 
`DataError` responses (schema/validation errors) are *not* retried — only 
transport-level failures are.

A closely related issue was reported in 
[#8614](https://github.com/apache/incubator-devlake/issues/8614) 
[[2]](https://github.com/apache/incubator-devlake/issues/8614) where 502s and 
stream cancellations were caused by GitHub's server-side response size limits 
on large repos. The fix (merged in v1.0.3-beta8) introduced 
`GITHUB_JOB_GRAPHQL_PAGE_SIZE` to reduce page sizes. If you haven't tuned this, 
it may help reduce the frequency of 502s.

**⚠️ Heads-up on two active bugs that compound this:**

- The GraphQL client's background rate-limit goroutine calls `panic(err)` on 
*any* error (including a transient 502 during rate-limit polling), which 
**crashes the entire DevLake process** 
[[3]](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/helpers/pluginhelper/api/graphql_async_client.go#L127-L131).
 This is tracked in 
[#8788](https://github.com/apache/incubator-devlake/issues/8788) 
[[4]](https://github.com/apache/incubator-devlake/issues/8788).
- If you're authenticating via a GitHub App, the GraphQL client uses a 
`StaticTokenSource` that **freezes the token at task start**. After ~1 hour 
when it expires, all GraphQL requests fail with 401 and trigger the panic 
above. The REST client was already fixed for this, but the GraphQL client was 
not [[4]](https://github.com/apache/incubator-devlake/issues/8788).

Both are addressed in the open [PR 
#8791](https://github.com/apache/incubator-devlake/pull/8791) 
[[5]](https://github.com/apache/incubator-devlake/pull/8791) — not yet merged. 
Given your GitHub App auth + 340 repos (likely >1 hour syncs), this is almost 
certainly hitting you.

**2. Cursor Commitment: Strict all-or-nothing — your concern about gaps is 
partially valid.**

The `CollectorStateManager.Close()` that persists `LatestSuccessStart` is 
**only called when all nested collectors succeed** 
[[6]](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/helpers/pluginhelper/api/api_collector_stateful.go#L94-L104)
 
[[7]](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/helpers/pluginhelper/api/collector_state_manager.go#L108-L119).
 If a subtask fails at any point, `Close()` is never reached, so the cursor 
stays at its previous position. On the next run, collection restarts from the 
same `since` timestamp.

Here's the nuance on data gaps:

- **Raw data** collected before the failure **persists** in the 
`_raw_github_graphql_*` tables (no rollback) 
[[8]](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/helpers/pluginhelper/api/graphql_collector.go#L296-L308).
- On the next incremental run, the raw table is **not** cleared, so the re-run 
will produce **duplicate raw rows** for data already collected in the failed 
run.
- The GraphQL pagination cursor (`EndCursor`) is **entirely in-memory** — if a 
subtask fails on page 47 of 100, the next run restarts from page 1.
- **No data should be silently skipped**, because the `since` timestamp doesn't 
advance on failure. The risk is duplicates, not gaps. The extractor/converter 
pipeline should handle deduplication at the domain layer via primary keys, 
though it's worth verifying this for your specific data models.

**3. Recommended Workarounds at Scale**

- **Increase retries**: Set `API_RETRY=5` or higher and consider increasing 
`API_TIMEOUT` (the inter-retry wait) to give GitHub more breathing room.
- **Reduce page sizes**: Set `GITHUB_JOB_GRAPHQL_PAGE_SIZE` to a smaller value 
(e.g., 20 or 30) to reduce the chance of GitHub returning 502 on large 
responses.
- **Watch for PR #8791**: This is critical for your setup — it fixes the panic 
crash and the GitHub App token expiry issue for the GraphQL client. You may 
want to comment on it to signal urgency.
- **Consider splitting blueprints**: Breaking your 346 repos into multiple 
blueprints (e.g., groups of 50-80) can reduce the blast radius of a single 
failure and keep individual sync times under the 1-hour token expiry window.

Hope this helps make the pipelines more resilient. The retry mechanism is there 
but the defaults are conservative for a 340+ repo deployment, and the two bugs 
in #8788 are likely compounding your failures significantly.


<!-- Dosu Comment Footer -->
*To reply, just mention 
[@dosu](https://go.dosu.dev/dosubot?utm_source=github&utm_medium=bot-comment&utm_campaign=github-comment-footer-20260415&utm_content=reply-with-mention&utm_term=apache%2Fincubator-devlake).*

---

Docs are dead. Just use 
[Dosu](https://dosu.dev?utm_source=github&utm_medium=bot-comment&utm_campaign=github-comment-footer-20260415&utm_content=static-docs-tagline&utm_term=apache%2Fincubator-devlake).

[![Leave 
Feedback](https://img.shields.io/badge/Leave%20Feedback-555555?style=flat)](https://app.dosu.dev/response-feedback?utm_source=github&utm_medium=bot-comment&utm_campaign=github-comment-footer-20260415&utm_content=static-docs-feedback&utm_term=apache%2Fincubator-devlake#message_id=befe9d1e-5953-4581-a571-60faa502849b)
 [![Ask Dosu about 
incubator-devlake](https://img.shields.io/badge/Ask%20Dosu%20about%20incubator--devlake-2f7b3f?style=flat&logo=data%3Aimage%2Fsvg%2Bxml%3Bbase64%2CPHN2ZyB3aWR0aD0iODYiIGhlaWdodD0iODkiIHZpZXdCb3g9IjAgMCA4NiA4OSIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48cGF0aCBkPSJNNS4yOTIzNiAxMi43OTI4TDE3Ljc1OTMgNi42ODE4OFY3Mi41NjY3TDUuMjkyMzYgODQuMDYxOFYxMi43OTI4WiIgZmlsbD0iI0I0QkI5MSIvPjxwYXRoIGQ9Ik0xOC4yNTc1IDczLjExOTZMNTkuMTMyOSA3Mi43NDhMNTEuNzAxMSA4Mi40MDk1TDI5LjAzMzggODYuMjkxTDYuMjM5NjIgODUuMTU1NEwxOC4yNTc1IDczLjExOTZaIiBmaWxsPSIjNzc4NTYxIi8%2BPHBhdGggZD0iTTE3LjQ5MTYgMy43MzYzM0wzLjU4NTU3IDEyLjcwOTlWODMuNTc5MkMzLjU4NTU3IDg0Ljc1NDIgNC45
 
ODU2MyA4NS4zNjUyIDUuODQ3MDUgODQuNTY2TDE5LjYyOTYgNzEuNzgwMSIgc3Ryb2tlPSJibGFjayIgc3Ryb2tlLXdpZHRoPSI2LjQyODQ0IiBzdHJva2UtbGluZWNhcD0icm91bmQiLz48bWFzayBpZD0iZG9zdS1kLWN1dG91dCIgZmlsbD0id2hpdGUiPjxwYXRoIGZpbGwtcnVsZT0iZXZlbm9kZCIgY2xpcC1ydWxlPSJldmVub2RkIiBkPSJNNDAuNzA0IDAuNTE4MDY2SDE3LjA0MzlWNzYuMjIyMUg0MC43MDRINDIuNTgwNUg0Ny44MDEzQzY4LjcwNjQgNzYuMjIyMSA4NS42NTMzIDU5LjI3NTIgODUuNjUzMyAzOC4zNzAxQzg1LjY1MzMgMTcuNDY1IDY4LjcwNjMgMC41MTgwNjYgNDcuODAxMyAwLjUxODA2Nkg0Mi41ODA1SDQwLjcwNFoiLz48L21hc2s%2BPHBhdGggZmlsbC1ydWxlPSJldmVub2RkIiBjbGlwLXJ1bGU9ImV2ZW5vZGQiIGQ9Ik00MC43MDQgMC41MTgwNjZIMTcuMDQzOVY3Ni4yMjIxSDQwLjcwNEg0Mi41ODA1SDQ3LjgwMTNDNjguNzA2NCA3Ni4yMjIxIDg1LjY1MzMgNTkuMjc1MiA4NS42NTMzIDM4LjM3MDFDODUuNjUzMyAxNy40NjUgNjguNzA2MyAwLjUxODA2NiA0Ny44MDEzIDAuNTE4MDY2SDQyLjU4MDVINDAuNzA0WiIgZmlsbD0iI0YzRjZGMSIvPjxwYXRoIGQ9Ik0xNy4wNDM5IDAuNTE4MDY2Vi02LjU3OTE5SDkuOTQ2NjlWMC41MTgwNjZIMTcuMDQzOVpNMTcuMDQzOSA3Ni4yMjIxSDkuOTQ2NjlWODMuMzE5NEgxNy4wNDM5Vjc2LjIyMjFaTTE3LjA0MzkgNy42MTUzMkg0MC43MDRWLTYuNTc
 
5MTlIMTcuMDQzOVY3LjYxNTMyWk0yNC4xNDEyIDc2LjIyMjFWMC41MTgwNjZIOS45NDY2OVY3Ni4yMjIxSDI0LjE0MTJaTTQwLjcwNCA2OS4xMjQ5SDE3LjA0MzlWODMuMzE5NEg0MC43MDRWNjkuMTI0OVpNNDIuNTgwNSA2OS4xMjQ5SDQwLjcwNFY4My4zMTk0SDQyLjU4MDVWNjkuMTI0OVpNNDcuODAxMyA2OS4xMjQ5SDQyLjU4MDVWODMuMzE5NEg0Ny44MDEzVjY5LjEyNDlaTTc4LjU1NiAzOC4zNzAxQzc4LjU1NiA1NS4zNTU1IDY0Ljc4NjcgNjkuMTI0OSA0Ny44MDEzIDY5LjEyNDlWODMuMzE5NEM3Mi42MjYxIDgzLjMxOTQgOTIuNzUwNSA2My4xOTQ5IDkyLjc1MDUgMzguMzcwMUg3OC41NTZaTTQ3LjgwMTMgNy42MTUzMkM2NC43ODY2IDcuNjE1MzIgNzguNTU2IDIxLjM4NDcgNzguNTU2IDM4LjM3MDFIOTIuNzUwNUM5Mi43NTA1IDEzLjU0NTMgNzIuNjI2IC02LjU3OTE5IDQ3LjgwMTMgLTYuNTc5MTlWNy42MTUzMlpNNDIuNTgwNSA3LjYxNTMySDQ3LjgwMTNWLTYuNTc5MTlINDIuNTgwNVY3LjYxNTMyWk00MC43MDQgNy42MTUzMkg0Mi41ODA1Vi02LjU3OTE5SDQwLjcwNFY3LjYxNTMyWiIgZmlsbD0iYmxhY2siIG1hc2s9InVybCgjZG9zdS1kLWN1dG91dCkiLz48cGF0aCBkPSJNNjguOTIxNSAzNi4wMTM1QzY4LjkyMTUgMzYuMDEzNSA2NS43MzY5IDQ5LjQ3MzggNTEuNDIzMSA0OS40NzM4QzM3LjEwOTMgNDkuNDczOCAzMi41Nzg3IDM3LjM1OTYgMzIuNTc4NyAzNi4wMTM1IiBzdHJva2U9ImJsYWNrIiBz
 
dHJva2Utd2lkdGg9IjcuNjkxNjEiIHN0cm9rZS1saW5lY2FwPSJyb3VuZCIgc3Ryb2tlLWxpbmVqb2luPSJyb3VuZCIvPjxwYXRoIGQ9Ik0wLjM0ODYzMyA4NS40OTQ2QzAuMzQ4NjMzIDg1LjQ5NDYgMjkuNDg1NiA4NS44MzA5IDM0LjgwOSA4NS42OThDNDQuODMzNyA4NS40NDc3IDUxLjI4NzIgODQuNDAyIDU3LjUyNjkgNzguOTcyNEM2Mi44MTI5IDc0LjM3MjcgNzUuMTM0MiA1OS42ODM2IDc1LjEzNDIgNTkuNjgzNiIgc3Ryb2tlPSJibGFjayIgc3Ryb2tlLXdpZHRoPSI2LjE2NDgyIi8%2BPC9zdmc%2B)](https://github.dosu.com/apache/incubator-devlake?utm_source=github&utm_medium=bot-comment&utm_campaign=github-comment-footer-20260415&utm_content=static-docs-ask-repo&utm_term=apache%2Fincubator-devlake)
 [![Share Dosu with your 
team](https://img.shields.io/badge/Share%20Dosu%20with%20your%20team-1f6feb?style=flat)](https://app.dosu.dev/signup?referrer=openSource&source=github-footer&utm_source=github&utm_medium=bot-comment&utm_campaign=github-comment-footer-20260415&utm_content=static-docs-share-team&utm_term=apache%2Fincubator-devlake)

GitHub link: 
https://github.com/apache/incubator-devlake/discussions/8821#discussioncomment-16398435

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to