Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

Ryan Blue Mon, 07 Jul 2025 15:43:22 -0700

If the 1.9.2 release doesn't fix the issue that users were hitting, which
if I understand correctly was a 503, do we need to do a patch release? Are
users hitting 502 and 504 and need to stop the retry?


On Mon, Jul 7, 2025 at 3:11 PM Prashant Singh <prashant010...@gmail.com>
wrote:

> Thanks for the background, Dennis ! Thanks Ryan, It seems like we are ok
> in treating all 5xx errors as CommitStateUnknown and hence not retry,
> considering what spec says about 502
> <https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L1070>,
> 504
> <https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L1084>
> vs 503
> <https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L4648>,
> I think it requires spec changes considering we now require 503 to throw
> CommitStateUnknown.
>
> As 1.9.2 is a patch release, from my understanding, it should not cause
> behaviour changes, as treating 503 the same as 502 / 504 causes behaviour
> changes, IMHO !
>
> Given that PR-13352 <https://github.com/apache/iceberg/pull/13352> got
> in, I started a vote thread for the same.
>
> Best,
> Prashant Singh
>
> On Thu, Jun 26, 2025 at 3:12 PM Ryan Blue <rdb...@gmail.com> wrote:
>
>> Thanks for all the background, Dennis! I think I agree that we should
>> treat all 5xx errors as CommitStateUnknown because we don't necessarily
>> know what was passed to the service. We would need to update the spec,
>> right?
>>
>> On Thu, Jun 26, 2025 at 3:03 PM Dennis Huo <huoi...@gmail.com> wrote:
>>
>>> Discussed offline with various folks who were interested in more details
>>> about what kinds of standard infra components use 503 for things that
>>> should be interpreted as "CommitStateUnknown", so I dug into some common
>>> open-source components to track down common sources of such 503s.
>>>
>>> Two in particular seem to provide the best indication as to the
>>> non-retriability for 503s from mutation requests:
>>>
>>> 1. Istio itself added a flag for its own retry policy default behaviors
>>> on 503:
>>> https://github.com/istio/istio/blob/8132a36e67380f5a171937f203f31ff27ebae86d/releasenotes/notes/50506.yaml#L9
>>>
>>>     **Fixed** default retry policy to exclude retries on 503 which is
>>>> potentially unsafe for idempotent requets.
>>>>     It can be reverted by setting
>>>> "EXCLUDE_UNSAFE_503_FROM_DEFAULT_RETRY" to false. Please note that this
>>>> flag is
>>>>     only available for backward compatibility reasons and will be
>>>> removed in future releases
>>>
>>>
>>> 2. Based on some logs that showed 503 coming from
>>> "upstream_reset_before_response_started", this is at least one of the
>>> places Envoy does it:
>>>
>>>
>>> https://github.com/envoyproxy/envoy/blob/87d4c6c372f535ee3e2077526638a696b67bde84/envoy/stream_info/stream_info.h#L216
>>>
>>> defines EarlyUpstreamReset:
>>>
>>>   // The upstream connection was reset before a response was started.
>>>> This
>>>>   // will generally be accompanied by details about why the reset
>>>> occurred.
>>>>   const std::string EarlyUpstreamReset =
>>>> "upstream_reset_before_response_started";
>>>
>>>
>>> This is used here:
>>> https://github.com/envoyproxy/envoy/blob/87d4c6c372f535ee3e2077526638a696b67bde84/source/common/router/router.cc#L996
>>>
>>>
>>>>       // not buffering any data for retry, shadow, and internal
>>>> redirect, and there will be
>>>>       // no more upstream request, abort the request and clean up.
>>>>       cleanup();
>>>>       callbacks_->sendLocalReply(
>>>>           Http::Code::ServiceUnavailable,
>>>>           "upstream is closed prematurely during decoding data from
>>>> downstream", modify_headers_,
>>>>           absl::nullopt,
>>>> StreamInfo::ResponseCodeDetails::get().EarlyUpstreamReset);
>>>>       return Http::FilterDataStatus::StopIterationNoBuffer;
>>>
>>>
>>> On Wed, Jun 18, 2025 at 9:43 PM Dennis Huo <huoi...@gmail.com> wrote:
>>>
>>>> Sounds like the confusion is indeed stemming from anchoring on
>>>> behaviors of 502 and 504 as established prior art while not applying the
>>>> same initial decision factors to 503.
>>>>
>>>> If we don't trust 5xx error codes, then wouldn't it make the most sense
>>>>> to throw CommitStateUnknown for any 5xx response?
>>>>
>>>>
>>>> I think this is the crux of the issue, and I agree with this statement
>>>> entirely. It appears most prudent to treat any 5xx response as something
>>>> that requires higher-level reconciliation of state before attempting and
>>>> retries, so in general:
>>>>
>>>> 1. Any low-level client retries should not happen in response to 5xx
>>>> errors
>>>> 2. These should propagate out as CommitStateUnknown consistently
>>>>
>>>> I think there's some variation in how folks are interpreting the
>>>> meaning of being unable to "trust 5xx error codes" though -- certainly, if
>>>> 503 is more universally understood to mean a *guarantee* that the backend
>>>> *did not* and *will not* process the contents of the request, then we're
>>>> talking about "not trusting that the server returned the correct 5xx error
>>>> code".
>>>>
>>>> On the other hand, if the 503 RFC is understood to *not make
>>>> guarantees* about whether the backend did or did not start processing
>>>> contents, then we're talking about "don't trust your assumptions about
>>>> server state after receiving a 503".
>>>>
>>>> Based on just a cursory search from different service domains, it looks
>>>> like 503 does not generally provide guarantees about no processing. For
>>>> example:
>>>>
>>>> 1. At least one managed service observes the Istio layer returning 503s
>>>> during a maintenance event when the underlying k8s application did not yet
>>>> produce a response at all
>>>> 2.
>>>> https://www.hostwinds.com/blog/503-error-common-causes-and-how-to-fix-it
>>>> lists "3. Backend Fetch Failed", "4. Connection Timeout", and "7. Out of
>>>> Server Resources" as 503 causes, all of which are semantically closer to a
>>>> 502 than a 429 (where 429 is more similar to the listed "Server Overloaded"
>>>> or "Server Under Maintenance" items)
>>>> 3.
>>>> https://support.google.com/business/thread/10553916/started-getting-this-error-today-googlejsonresponseexception-503-service-unavailable?hl=en
>>>> Shows some APIs explicitly stating 503 as a "partially processed" state:
>>>>
>>>>   "code" : 503,
>>>>   "errors" : [ {
>>>>     "domain" : "global",
>>>>     "message" : "The service encountered a transient error. Part of the
>>>> request may have succeeeded. Refer to details for which fields failed.",
>>>>     "reason" : "backendError"
>>>>   } ],
>>>>
>>>> My two cents would be to go ahead with consistent handling of 500, 502,
>>>> 503, and 504 at least.
>>>>
>>>> 501 and 505 might be universally understood to be fast-rejections, but
>>>> I don't feel strongly about those.
>>>>
>>>> On Wed, Jun 18, 2025 at 6:03 PM Prashant Singh <
>>>> prashant010...@gmail.com> wrote:
>>>>
>>>>> EDIT: corrected some typos:
>>>>>
>>>>> Hey Ryan,
>>>>> My reasoning was the following: we retry on 502 / 504 as well here
>>>>> <https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/ExponentialHttpRequestRetryStrategy.java#L86>,
>>>>> we have a spec definition stating here
>>>>> <https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L1070>
>>>>>  that
>>>>> it can be unknown, so says the RFC
>>>>> <https://datatracker.ietf.org/doc/html/rfc9110#name-502-bad-gateway>
>>>>> as something went wrong in the middle of processing the request. So we
>>>>> retry with 502 and 504 we can conflict with ourselves, as we don't know
>>>>> when we receive the status of 429  409 is it because we retried on
>>>>> 502 and then got 429  409 or something else happened, so I thought
>>>>> it's better to throw the commit state unknown if we know the commit has
>>>>> been retried.
>>>>>
>>>>> The Iceberg users who accompanied this was with 504 : slack thread -
>>>>> https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1747992294134219
>>>>>
>>>>> We have similar handling for glue as we did here
>>>>> https://github.com/apache/iceberg/pull/7198  as there are internal
>>>>> http retries over the glue SDK.
>>>>>
>>>>> and hence we did it this way. I do agree 429 / 503 is unnecessary
>>>>> being wrapped and we can miss the opportunity to clean, but it doesn't
>>>>> hurt  as alternative is we can capture all the codes we got and inspect 
>>>>> the
>>>>> permutation and combination of the status code, just wanted to avoid that,
>>>>> plus i admit there was a bug in our service which we are chasing and 
>>>>> fixing
>>>>> as well where its throwing 503 instead of 502.... but the thing is even
>>>>> with that the problem issue will still exist due to retries if they are 
>>>>> not
>>>>> handled properly.
>>>>>
>>>>> That was the thought process of mine while doing that, and happy to
>>>>> incorporate your feedback :
>>>>>
>>>>> Maybe another alternative is that we take out 502 / 504 from the retry
>>>>> code and just retry on 429 / 503 and clean up only when 409 or 503 or 501
>>>>> are the final status code ? rest all are unknown
>>>>>
>>>>> Best,
>>>>> Prashant Singh
>>>>>
>>>>> On Wed, Jun 18, 2025 at 4:21 PM Prashant Singh <
>>>>> prashant010...@gmail.com> wrote:
>>>>>
>>>>>> Hey Ryan,
>>>>>> My reasoning was the following: we retry on 502 / 504 as well here
>>>>>> <https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/ExponentialHttpRequestRetryStrategy.java#L86>,
>>>>>> we have a spec definition stating here
>>>>>> <https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L1070>
>>>>>>  that
>>>>>> it can be unknown, so says the RFC
>>>>>> <https://datatracker.ietf.org/doc/html/rfc9110#name-502-bad-gateway>
>>>>>> as something went wrong in the middle of processing the request. So we
>>>>>> retry with 502 and 504 we can conflict with ourselves, as we don't know
>>>>>> when we receive the status of 429 is it because we retried on 502 and 
>>>>>> then
>>>>>> got 429 or something else happened, so I thought it's better to throw the
>>>>>> commit state unknown if we know the commit has been retried.
>>>>>>
>>>>>> The Iceberg users who accompanied this was with 504 : slack thread -
>>>>>> https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1747992294134219
>>>>>>
>>>>>> We have similar handling for glue as we did here
>>>>>> https://github.com/apache/iceberg/pull/7198  as there are internal
>>>>>> http retries over the glue SDK.
>>>>>>
>>>>>> and hence we did it this way. I do agree 429 / 503 is unnecessary
>>>>>> being wrapped and we can miss the opportunity to clean, but it doesn't
>>>>>> hurt  as alternative is we can capture all the codes we got and inspect 
>>>>>> the
>>>>>> permutation and combination of the status code, just wanted to avoid 
>>>>>> that,
>>>>>> plus i admit there was a bug in our service which we are chasing and 
>>>>>> fixing
>>>>>> as well where its throwing 503 instead of 502.... but the thing is even
>>>>>> with that the problem issue will still exist due to retries if they are 
>>>>>> not
>>>>>> handled properly.
>>>>>>
>>>>>> That was the thought process of mine while doing that, and happy to
>>>>>> incorporate your feedback :
>>>>>>
>>>>>> Maybe another alternative is that we take out 502 / 504 from the
>>>>>> retry code and just retry on 429 / 503 and clean up only when 409 or 503 
>>>>>> or
>>>>>> 501 are the final status code ? rest all are unknown
>>>>>>
>>>>>> Best,
>>>>>> Prashant Singh
>>>>>>
>>>>>> On Wed, Jun 18, 2025 at 12:22 PM Ryan Blue <rdb...@gmail.com> wrote:
>>>>>>
>>>>>>> Are we confident that the REST client fix is the right path?
>>>>>>>
>>>>>>> If I understand correctly, the issue is that there is a load
>>>>>>> balancer that is converting a 500 response to a 503 that the client will
>>>>>>> automatically retry, even though the 500 response is not retry-able. 
>>>>>>> Then
>>>>>>> the retry results in a 409 because the commit conflicts with itself.
>>>>>>>
>>>>>>> I'm fairly concerned about a fix like this that is operating on the
>>>>>>> assumption that we cannot trust the 5xx error code that was sent to the
>>>>>>> client. If we don't trust 5xx error codes, then wouldn't it make the 
>>>>>>> most
>>>>>>> sense to throw CommitStateUnknown for any 5xx response?
>>>>>>>
>>>>>>> On Tue, Jun 17, 2025 at 1:00 PM Russell Spitzer <
>>>>>>> russell.spit...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Sorry I didn't get to reply here, I think the fix Ajantha is
>>>>>>>> contributing is extremely important but probably out of scope for a 
>>>>>>>> patch
>>>>>>>> release. Because it will take a bit of manual intervention to fix after
>>>>>>>> jumping to the next version I think we should save this for 1.10.0 
>>>>>>>> which
>>>>>>>> also should come out pretty soon.
>>>>>>>>
>>>>>>>> On Tue, Jun 17, 2025 at 1:16 PM Prashant Singh <
>>>>>>>> prashant010...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thank you for confirming over slack Ajantha, I also double checked
>>>>>>>>> with Russell offline, this seems to be a behaviour change which can't 
>>>>>>>>> go in
>>>>>>>>> a patch release, maybe 1.10 then. So I think we should be good for 
>>>>>>>>> now.
>>>>>>>>> That being said, I will start working on getting a RC out with
>>>>>>>>> just this commit
>>>>>>>>> <https://github.com/singhpk234/iceberg/commit/14ba08e672df5061f8ac0978ea81f10535da2246>
>>>>>>>>> on top of 1.9.1, and will start a thread for that accordingly !
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Prashant Singh
>>>>>>>>>
>>>>>>>>> On Tue, Jun 17, 2025 at 8:44 AM Prashant Singh <
>>>>>>>>> prashant010...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hey Ajantha,
>>>>>>>>>>
>>>>>>>>>> I was going to wait for 2-3 days before cutting an RC anyway :),
>>>>>>>>>> to see if anyone has an objection or some more *critical*
>>>>>>>>>> changes to get in. Thank you for the heads up !
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Prashant Singh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 16, 2025 at 11:02 AM Ajantha Bhat <
>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I have a PR that needs to be included for 1.9.2 as well!
>>>>>>>>>>>
>>>>>>>>>>> I was about to start a release discussion for 1.9.2 tomorrow.
>>>>>>>>>>>
>>>>>>>>>>> It is from an oversight during partition stats refactoring (in
>>>>>>>>>>> 1.9.0). We missed that field ids are tracked in spec during 
>>>>>>>>>>> refactoring!
>>>>>>>>>>> Because of it, schema field ids are not as per partition stats spec.
>>>>>>>>>>> Solution is to implicitly recompute stats when old stats are 
>>>>>>>>>>> detected with
>>>>>>>>>>> invalid field id. Including the fix in 1.9.2 can reduce the issue 
>>>>>>>>>>> impact as
>>>>>>>>>>> new stats will have valid schema ids and also handle corrupted 
>>>>>>>>>>> stats.
>>>>>>>>>>> Feature is not widely integrated yet and engine integrations are in
>>>>>>>>>>> progress for 1.10.0. But we still want to reduce the impact of the 
>>>>>>>>>>> issue. I
>>>>>>>>>>> have chatted with Peter and Russell about this.
>>>>>>>>>>>
>>>>>>>>>>> PR will be shared tomorrow.
>>>>>>>>>>>
>>>>>>>>>>> Just giving heads up to wait a day or two for a release cut.
>>>>>>>>>>>
>>>>>>>>>>> - Ajantha
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 16, 2025 at 11:25 PM Steven Wu <stevenz...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> +1 for a 1.9.2 release
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jun 16, 2025 at 10:53 AM Prashant Singh <
>>>>>>>>>>>> prashant010...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hey Kevin,
>>>>>>>>>>>>> This goes well before 1.8, if you will see the issue that my
>>>>>>>>>>>>> PR refers to is reported from iceberg 1.7, It has been there 
>>>>>>>>>>>>> since the
>>>>>>>>>>>>> beginning of the IRC client.
>>>>>>>>>>>>> We were having similar debates on if we should patch all the
>>>>>>>>>>>>> releases, but I think this requires more wider discussions, but 
>>>>>>>>>>>>> 1.9.2
>>>>>>>>>>>>> sounds like the immediate next steps !
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Prashant Singh
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jun 16, 2025 at 10:42 AM Kevin Liu <
>>>>>>>>>>>>> kevinjq...@apache.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Prashant,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This sounds like a good reason to do a patch release. I'm +1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do you know if this is also affecting other minor versions?
>>>>>>>>>>>>>> Do we also need to patch 1.8.x?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Kevin Liu
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jun 16, 2025 at 10:30 AM Prashant Singh <
>>>>>>>>>>>>>> prashant010...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hey all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> A couple of users have recently reported a *critical table
>>>>>>>>>>>>>>> corruption issue* when using the Iceberg REST catalog
>>>>>>>>>>>>>>> client. This issue has existed from the beginning and can 
>>>>>>>>>>>>>>> result in *committed
>>>>>>>>>>>>>>> snapshot data being incorrectly deleted*, effectively
>>>>>>>>>>>>>>> rendering the table unusable (i.e., corrupted). The root cause 
>>>>>>>>>>>>>>> is *self-conflict
>>>>>>>>>>>>>>> during internal HTTP client retries*.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> More context can be found here:
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/12818#issue-3000325526
>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>> https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1747992294134219
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We’ve seen this issue hit by users in the wider community
>>>>>>>>>>>>>>> and also internally on our end. I’ve submitted a fix here (now 
>>>>>>>>>>>>>>> merged) :
>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/12818
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Given the *severity* and the *broad version impact*, I’d
>>>>>>>>>>>>>>> like to propose cutting a *1.9.2 patch release* to include
>>>>>>>>>>>>>>> this fix.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm also happy to *volunteer as the release manager* for
>>>>>>>>>>>>>>> this release.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Looking forward to hearing the community's thoughts.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>> Prashant Singh
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>

Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

Reply via email to