GitHub user dosubot[bot] added a comment to the discussion: Collect Pull 
requests Phase 2 triggers persistent 502s

<!-- Answer -->
Great write-up — your analysis is spot-on. Here's what I found in the codebase 
and existing issues that directly relates:

**The `updateRateRemaining` panic is a known bug tracked in 
[#8788](https://github.com/apache/incubator-devlake/issues/8788)** 
[[1]](https://github.com/apache/incubator-devlake/issues/8788). The panic at 
`graphql_async_client.go:129` crashes the process on *any* error from the 
rate-limit polling goroutine — including transient 502s and 401s from expired 
GitHub App tokens. There's actually a second panic at [line 
71](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/helpers/pluginhelper/api/graphql_async_client.go#L71)
 during initial client setup as well.

**[PR #8791](https://github.com/apache/incubator-devlake/pull/8791)** 
[[2]](https://github.com/apache/incubator-devlake/pull/8791) provides the 
comprehensive fix for this — it replaces both panics with graceful error 
handling and also wires up the GraphQL client to use `RefreshRoundTripper` for 
automatic GitHub App token renewal (matching what the REST client already does 
via #8746). However, **PR #8791 has been open since March 22 and is not yet 
merged**. You may want to comment on it to signal urgency given your 42+ pod 
restarts.

**On the `InputStep: 100` issue** — you're correct that it's [hardcoded at line 
243](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/github_graphql/tasks/pr_collector.go#L243)
 in `pr_collector.go` with no configuration option. The same value of 100 is 
used for [issue 
collection](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/github_graphql/tasks/issue_collector.go#L166)
 and [account 
collection](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/github_graphql/tasks/account_collector.go#L102).
 Notably, the **job collector already uses `InputStep: 10`** 
[[3]](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/github_graphql/tasks/job_collector.go#L191),
 likely for the exact same reason — large responses causing 502s. A similar 
problem was addressed for jobs i
 n [PR #8616](https://github.com/apache/incubator-devlake/pull/8616) which 
introduced the `GITHUB_JOB_GRAPHQL_PAGE_SIZE` env var, but no equivalent was 
added for PRs or issues.

Your suggestion #1 (reducing `InputStep` to 10-20) is the right approach. 
Ideally this would follow the same pattern as `GITHUB_JOB_GRAPHQL_PAGE_SIZE` — 
an environment variable like `GITHUB_PR_GRAPHQL_BATCH_SIZE` — so it can be 
tuned per deployment without forking the code.

**One additional workaround until these fixes land:** @zaiddialpad reported 
very [similar 502 issues at 
scale](https://github.com/apache/incubator-devlake/discussions/8821#discussioncomment-16398435)
 in discussion #8821. The recommended mitigations are:

- Set `API_RETRY=5` or higher (default is 3)
- Increase `API_TIMEOUT` for more breathing room between retries
- If using GitHub App auth, try switching to a **Personal Access Token** to 
avoid the token expiry → panic chain entirely
- Split large blueprints so individual runs stay under ~45 minutes

**On making Phase 2 optional:** There's no direct toggle, but setting 
`fullSync: true` in the Blueprint's SyncPolicy skips Phase 2 since it only runs 
during incremental collection 
[[4]](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/helpers/pluginhelper/api/api_collector_stateful.go#L73-L84).
 The trade-off is you lose incremental behavior entirely, which is probably not 
what you want for a daily sync. A dedicated per-scope flag to skip Phase 2 
refetch would be the ideal long-term solution.


<!-- Dosu Comment Footer -->
*To reply, just mention 
[@dosu](https://go.dosu.dev/dosubot?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=brand-link).*

---
<sup>How did I do? 
[Good](https://app.dosu.dev/response-feedback?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=feedback-great_response#message_id=238f958d-5a2e-4721-b8a2-e449588c2d97&feedback_type=great_response)
 | 
[Irrelevant](https://app.dosu.dev/response-feedback?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=feedback-irrelevant_answer#message_id=238f958d-5a2e-4721-b8a2-e449588c2d97&feedback_type=irrelevant_answer)
 | 
[Incorrect](https://app.dosu.dev/response-feedback?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=feedback-incorrect_sources#message_id=238f958d-5a2e-4721-b8a2-e449588c2d97&feedback_type=incorrect_sources)
 | 
[Verbose](https://app.dosu.dev/response-feedback?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=feedback-too_verbose#message_id=238f958d-5a2e-4721-b8a2-e449588c2d97&feedback_type=too_verbose)
 | [Hallucination](https://app
 
.dosu.dev/response-feedback?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=feedback-hallucination#message_id=238f958d-5a2e-4721-b8a2-e449588c2d97&feedback_type=hallucination)
 | [Report 
🐛](https://app.dosu.dev/response-feedback?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=feedback-bug_report#message_id=238f958d-5a2e-4721-b8a2-e449588c2d97&feedback_type=bug_report)
 | 
[Other](https://app.dosu.dev/response-feedback?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=feedback-other#message_id=238f958d-5a2e-4721-b8a2-e449588c2d97&feedback_type=other)</sup>
[![Chat with 
Dosu](https://dosu.dev/dosu-chat-badge.svg)](https://app.dosu.dev/b4e8e847-d479-4541-83a8-d88d83fea5c9/ask?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=chat-badge)&nbsp;[![Open
 in 
Cursor](https://dosu.dev/dosu-in-cursor.svg)](https://cursor.com/link/prompt?text=Great%20write-up%20%E2%80%94%20your%20analysis%20is%20spot-on.%20Here%27s%20what%20I%20found%20in%20the%20codebase%20and%20existing%20issues%20that%20directly%20relates%3A%0A%0A%2A%2AThe%20%60updateRateRemaining%60%20panic%20is%20a%20known%20bug%20tracked%20in%20%5B%238788%5D%28https%3A//github.com/apache/incubator-devlake/issues/8788%29%2A%2A%20%5B%5B1%5D%5D%28https%3A//github.com/apache/incubator-devlake/issues/8788%29.%20The%20panic%20at%20%60graphql_async_client.go%3A129%60%20crashes%20the%20process%20on%20%2Aany%2A%20error%20from%20the%20rate-limit%20polling%20goroutine%20%E2%80%94%20including%20transient%20502s%20and%20401s%20from%20expired%20GitHub%20App%20tokens.%20The
 
re%27s%20actually%20a%20second%20panic%20at%20%5Bline%2071%5D%28https%3A//github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/helpers/pluginhelper/api/graphql_async_client.go%23L71%29%20during%20initial%20client%20setup%20as%20well.%0A%0A%2A%2A%5BPR%20%238791%5D%28https%3A//github.com/apache/incubator-devlake/pull/8791%29%2A%2A%20%5B%5B2%5D%5D%28https%3A//github.com/apache/incubator-devlake/pull/8791%29%20provides%20the%20comprehensive%20fix%20for%20this%20%E2%80%94%20it%20replaces%20both%20panics%20with%20graceful%20error%20handling%20and%20also%20wires%20up%20the%20GraphQL%20client%20to%20use%20%60RefreshRoundTripper%60%20for%20automatic%20GitHub%20App%20token%20renewal%20%28matching%20what%20the%20REST%20client%20already%20does%20via%20%238746%29.%20However%2C%20%2A%2APR%20%238791%20has%20been%20open%20since%20March%2022%20and%20is%20not%20yet%20merged%2A%2A.%20You%20may%20want%20to%20comment%20on%20it%20to%20signal%20urgency%20given%20your%2
 
042%2B%20pod%20restarts.%0A%0A%2A%2AOn%20the%20%60InputStep%3A%20100%60%20issue%2A%2A%20%E2%80%94%20you%27re%20correct%20that%20it%27s%20%5Bhardcoded%20at%20line%20243%5D%28https%3A//github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/github_graphql/tasks/pr_collector.go%23L243%29%20in%20%60pr_collector.go%60%20with%20no%20configuration%20option.%20The%20same%20value%20of%20100%20is%20used%20for%20%5Bissue%20collection%5D%28https%3A//github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/github_graphql/tasks/issue_collector.go%23L166%29%20and%20%5Baccount%20collection%5D%28https%3A//github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/github_graphql/tasks/account_collector.go%23L102%29.%20Notably%2C%20the%20%2A%2Ajob%20collector%20already%20uses%20%60InputStep%3A%2010%60%2A%2A%20%5B%5B3%5D%5D%28https%3A//github.com/apache/incubator-devlake/blob/2c15ad
 
880dce830c6512fa58039527cbf4745b8b/backend/plugins/github_graphql/tasks/job_collector.go%23L191%29%2C%20likely%20for%20the%20exact%20same%20reason%20%E2%80%94%20large%20responses%20causing%20502s.%20A%20similar%20problem%20was%20addressed%20for%20jobs%20in%20%5BPR%20%238616%5D%28https%3A//github.com/apache/incubator-devlake/pull/8616%29%20which%20introduced%20the%20%60GITHUB_JOB_GRAPHQL_PAGE_SIZE%60%20env%20var%2C%20but%20no%20equivalent%20was%20added%20for%20PRs%20or%20issues.%0A%0AYour%20suggestion%20%231%20%28reducing%20%60InputStep%60%20to%2010-20%29%20is%20the%20right%20approach.%20Ideally%20this%20would%20follow%20the%20same%20pattern%20as%20%60GITHUB_JOB_GRAPHQL_PAGE_SIZE%60%20%E2%80%94%20an%20environment%20variable%20like%20%60GITHUB_PR_GRAPHQL_BATCH_SIZE%60%20%E2%80%94%20so%20it%20can%20be%20tuned%20per%20deployment%20without%20forking%20the%20code.%0A%0A%2A%2AOne%20additional%20workaround%20until%20these%20fixes%20land%3A%2A%2A%20%40zaiddialpad%20reported%20very%20%5Bsimil
 
ar%20502%20issues%20at%20scale%5D%28https%3A//github.com/apache/incubator-devlake/discussions/8821%23discussioncomment-16398435%29%20in%20discussion%20%238821.%20The%20recommended%20mitigations%20are%3A%0A%0A-%20Set%20%60API_RETRY%3D5%60%20or%20higher%20%28default%20is%203%29%0A-%20Increase%20%60API_TIMEOUT%60%20for%20more%20breathing%20room%20between%20retries%0A-%20If%20using%20GitHub%20App%20auth%2C%20try%20switching%20to%20a%20%2A%2APersonal%20Access%20Token%2A%2A%20to%20avoid%20the%20token%20expiry%20%E2%86%92%20panic%20chain%20entirely%0A-%20Split%20large%20blueprints%20so%20individual%20runs%20stay%20under%20~45%20minutes%0A%0A%2A%2AOn%20making%20Phase%202%20optional%3A%2A%2A%20There%27s%20no%20direct%20toggle%2C%20but%20setting%20%60fullSync%3A%20true%60%20in%20the%20Blueprint%27s%20SyncPolicy%20skips%20Phase%202%20since%20it%20only%20runs%20during%20incremental%20collection%20%5B%5B4%5D%5D%28https%3A//github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cb
 
f4745b8b/backend/helpers/pluginhelper/api/api_collector_stateful.go%23L73-L84%29.%20The%20trade-off%20is%20you%20lose%20incremental%20behavior%20entirely%2C%20which%20is%20probably%20not%20what%20you%20want%20for%20a%20daily%20sync.%20A%20dedicated%20per-scope%20flag%20to%20skip%20Phase%202%20refetch%20would%20be%20the%20ideal%20long-term%20solution.)&nbsp;[![Join
 
Discord](https://img.shields.io/badge/join-5865F2?logo=discord&logoColor=white&label=)](https://go.dosu.dev/discord-bot?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=join-discord)&nbsp;[![Share
 on 
X](https://img.shields.io/badge/X-share-black)](https://twitter.com/intent/tweet?text=%40dosu_ai%20helped%20me%20solve%20this%20issue!&url=https%3A//github.com/apache/incubator-devlake/discussions/8824)

GitHub link: 
https://github.com/apache/incubator-devlake/discussions/8824#discussioncomment-16478979

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to