GitHub user dosubot[bot] added a comment to the discussion: Collect Pull requests Phase 2 triggers persistent 502s
<!-- Answer --> Great write-up — your analysis is spot-on. Here's what I found in the codebase and existing issues that directly relates: **The `updateRateRemaining` panic is a known bug tracked in [#8788](https://github.com/apache/incubator-devlake/issues/8788)** [[1]](https://github.com/apache/incubator-devlake/issues/8788). The panic at `graphql_async_client.go:129` crashes the process on *any* error from the rate-limit polling goroutine — including transient 502s and 401s from expired GitHub App tokens. There's actually a second panic at [line 71](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/helpers/pluginhelper/api/graphql_async_client.go#L71) during initial client setup as well. **[PR #8791](https://github.com/apache/incubator-devlake/pull/8791)** [[2]](https://github.com/apache/incubator-devlake/pull/8791) provides the comprehensive fix for this — it replaces both panics with graceful error handling and also wires up the GraphQL client to use `RefreshRoundTripper` for automatic GitHub App token renewal (matching what the REST client already does via #8746). However, **PR #8791 has been open since March 22 and is not yet merged**. You may want to comment on it to signal urgency given your 42+ pod restarts. **On the `InputStep: 100` issue** — you're correct that it's [hardcoded at line 243](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/github_graphql/tasks/pr_collector.go#L243) in `pr_collector.go` with no configuration option. The same value of 100 is used for [issue collection](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/github_graphql/tasks/issue_collector.go#L166) and [account collection](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/github_graphql/tasks/account_collector.go#L102). Notably, the **job collector already uses `InputStep: 10`** [[3]](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/github_graphql/tasks/job_collector.go#L191), likely for the exact same reason — large responses causing 502s. A similar problem was addressed for jobs i n [PR #8616](https://github.com/apache/incubator-devlake/pull/8616) which introduced the `GITHUB_JOB_GRAPHQL_PAGE_SIZE` env var, but no equivalent was added for PRs or issues. Your suggestion #1 (reducing `InputStep` to 10-20) is the right approach. Ideally this would follow the same pattern as `GITHUB_JOB_GRAPHQL_PAGE_SIZE` — an environment variable like `GITHUB_PR_GRAPHQL_BATCH_SIZE` — so it can be tuned per deployment without forking the code. **One additional workaround until these fixes land:** @zaiddialpad reported very [similar 502 issues at scale](https://github.com/apache/incubator-devlake/discussions/8821#discussioncomment-16398435) in discussion #8821. The recommended mitigations are: - Set `API_RETRY=5` or higher (default is 3) - Increase `API_TIMEOUT` for more breathing room between retries - If using GitHub App auth, try switching to a **Personal Access Token** to avoid the token expiry → panic chain entirely - Split large blueprints so individual runs stay under ~45 minutes **On making Phase 2 optional:** There's no direct toggle, but setting `fullSync: true` in the Blueprint's SyncPolicy skips Phase 2 since it only runs during incremental collection [[4]](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/helpers/pluginhelper/api/api_collector_stateful.go#L73-L84). The trade-off is you lose incremental behavior entirely, which is probably not what you want for a daily sync. A dedicated per-scope flag to skip Phase 2 refetch would be the ideal long-term solution. <!-- Dosu Comment Footer --> *To reply, just mention [@dosu](https://go.dosu.dev/dosubot?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=brand-link).* --- <sup>How did I do? [Good](https://app.dosu.dev/response-feedback?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=feedback-great_response#message_id=238f958d-5a2e-4721-b8a2-e449588c2d97&feedback_type=great_response) | [Irrelevant](https://app.dosu.dev/response-feedback?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=feedback-irrelevant_answer#message_id=238f958d-5a2e-4721-b8a2-e449588c2d97&feedback_type=irrelevant_answer) | [Incorrect](https://app.dosu.dev/response-feedback?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=feedback-incorrect_sources#message_id=238f958d-5a2e-4721-b8a2-e449588c2d97&feedback_type=incorrect_sources) | [Verbose](https://app.dosu.dev/response-feedback?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=feedback-too_verbose#message_id=238f958d-5a2e-4721-b8a2-e449588c2d97&feedback_type=too_verbose) | [Hallucination](https://app .dosu.dev/response-feedback?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=feedback-hallucination#message_id=238f958d-5a2e-4721-b8a2-e449588c2d97&feedback_type=hallucination) | [Report 🐛](https://app.dosu.dev/response-feedback?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=feedback-bug_report#message_id=238f958d-5a2e-4721-b8a2-e449588c2d97&feedback_type=bug_report) | [Other](https://app.dosu.dev/response-feedback?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=feedback-other#message_id=238f958d-5a2e-4721-b8a2-e449588c2d97&feedback_type=other)</sup> [](https://app.dosu.dev/b4e8e847-d479-4541-83a8-d88d83fea5c9/ask?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=chat-badge) [](https://cursor.com/link/prompt?text=Great%20write-up%20%E2%80%94%20your%20analysis%20is%20spot-on.%20Here%27s%20what%20I%20found%20in%20the%20codebase%20and%20existing%20issues%20that%20directly%20relates%3A%0A%0A%2A%2AThe%20%60updateRateRemaining%60%20panic%20is%20a%20known%20bug%20tracked%20in%20%5B%238788%5D%28https%3A//github.com/apache/incubator-devlake/issues/8788%29%2A%2A%20%5B%5B1%5D%5D%28https%3A//github.com/apache/incubator-devlake/issues/8788%29.%20The%20panic%20at%20%60graphql_async_client.go%3A129%60%20crashes%20the%20process%20on%20%2Aany%2A%20error%20from%20the%20rate-limit%20polling%20goroutine%20%E2%80%94%20including%20transient%20502s%20and%20401s%20from%20expired%20GitHub%20App%20tokens.%20The re%27s%20actually%20a%20second%20panic%20at%20%5Bline%2071%5D%28https%3A//github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/helpers/pluginhelper/api/graphql_async_client.go%23L71%29%20during%20initial%20client%20setup%20as%20well.%0A%0A%2A%2A%5BPR%20%238791%5D%28https%3A//github.com/apache/incubator-devlake/pull/8791%29%2A%2A%20%5B%5B2%5D%5D%28https%3A//github.com/apache/incubator-devlake/pull/8791%29%20provides%20the%20comprehensive%20fix%20for%20this%20%E2%80%94%20it%20replaces%20both%20panics%20with%20graceful%20error%20handling%20and%20also%20wires%20up%20the%20GraphQL%20client%20to%20use%20%60RefreshRoundTripper%60%20for%20automatic%20GitHub%20App%20token%20renewal%20%28matching%20what%20the%20REST%20client%20already%20does%20via%20%238746%29.%20However%2C%20%2A%2APR%20%238791%20has%20been%20open%20since%20March%2022%20and%20is%20not%20yet%20merged%2A%2A.%20You%20may%20want%20to%20comment%20on%20it%20to%20signal%20urgency%20given%20your%2 042%2B%20pod%20restarts.%0A%0A%2A%2AOn%20the%20%60InputStep%3A%20100%60%20issue%2A%2A%20%E2%80%94%20you%27re%20correct%20that%20it%27s%20%5Bhardcoded%20at%20line%20243%5D%28https%3A//github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/github_graphql/tasks/pr_collector.go%23L243%29%20in%20%60pr_collector.go%60%20with%20no%20configuration%20option.%20The%20same%20value%20of%20100%20is%20used%20for%20%5Bissue%20collection%5D%28https%3A//github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/github_graphql/tasks/issue_collector.go%23L166%29%20and%20%5Baccount%20collection%5D%28https%3A//github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/github_graphql/tasks/account_collector.go%23L102%29.%20Notably%2C%20the%20%2A%2Ajob%20collector%20already%20uses%20%60InputStep%3A%2010%60%2A%2A%20%5B%5B3%5D%5D%28https%3A//github.com/apache/incubator-devlake/blob/2c15ad 880dce830c6512fa58039527cbf4745b8b/backend/plugins/github_graphql/tasks/job_collector.go%23L191%29%2C%20likely%20for%20the%20exact%20same%20reason%20%E2%80%94%20large%20responses%20causing%20502s.%20A%20similar%20problem%20was%20addressed%20for%20jobs%20in%20%5BPR%20%238616%5D%28https%3A//github.com/apache/incubator-devlake/pull/8616%29%20which%20introduced%20the%20%60GITHUB_JOB_GRAPHQL_PAGE_SIZE%60%20env%20var%2C%20but%20no%20equivalent%20was%20added%20for%20PRs%20or%20issues.%0A%0AYour%20suggestion%20%231%20%28reducing%20%60InputStep%60%20to%2010-20%29%20is%20the%20right%20approach.%20Ideally%20this%20would%20follow%20the%20same%20pattern%20as%20%60GITHUB_JOB_GRAPHQL_PAGE_SIZE%60%20%E2%80%94%20an%20environment%20variable%20like%20%60GITHUB_PR_GRAPHQL_BATCH_SIZE%60%20%E2%80%94%20so%20it%20can%20be%20tuned%20per%20deployment%20without%20forking%20the%20code.%0A%0A%2A%2AOne%20additional%20workaround%20until%20these%20fixes%20land%3A%2A%2A%20%40zaiddialpad%20reported%20very%20%5Bsimil ar%20502%20issues%20at%20scale%5D%28https%3A//github.com/apache/incubator-devlake/discussions/8821%23discussioncomment-16398435%29%20in%20discussion%20%238821.%20The%20recommended%20mitigations%20are%3A%0A%0A-%20Set%20%60API_RETRY%3D5%60%20or%20higher%20%28default%20is%203%29%0A-%20Increase%20%60API_TIMEOUT%60%20for%20more%20breathing%20room%20between%20retries%0A-%20If%20using%20GitHub%20App%20auth%2C%20try%20switching%20to%20a%20%2A%2APersonal%20Access%20Token%2A%2A%20to%20avoid%20the%20token%20expiry%20%E2%86%92%20panic%20chain%20entirely%0A-%20Split%20large%20blueprints%20so%20individual%20runs%20stay%20under%20~45%20minutes%0A%0A%2A%2AOn%20making%20Phase%202%20optional%3A%2A%2A%20There%27s%20no%20direct%20toggle%2C%20but%20setting%20%60fullSync%3A%20true%60%20in%20the%20Blueprint%27s%20SyncPolicy%20skips%20Phase%202%20since%20it%20only%20runs%20during%20incremental%20collection%20%5B%5B4%5D%5D%28https%3A//github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cb f4745b8b/backend/helpers/pluginhelper/api/api_collector_stateful.go%23L73-L84%29.%20The%20trade-off%20is%20you%20lose%20incremental%20behavior%20entirely%2C%20which%20is%20probably%20not%20what%20you%20want%20for%20a%20daily%20sync.%20A%20dedicated%20per-scope%20flag%20to%20skip%20Phase%202%20refetch%20would%20be%20the%20ideal%20long-term%20solution.) [](https://go.dosu.dev/discord-bot?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=join-discord) [](https://twitter.com/intent/tweet?text=%40dosu_ai%20helped%20me%20solve%20this%20issue!&url=https%3A//github.com/apache/incubator-devlake/discussions/8824) GitHub link: https://github.com/apache/incubator-devlake/discussions/8824#discussioncomment-16478979 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
