featzhang commented on PR #27351:
URL: https://github.com/apache/flink/pull/27351#issuecomment-4293188795
Hi @featzhang and @dylanhz,
I've reviewed the remaining feedback from @dylanhz regarding the
documentation-implementation mismatch. Here's my analysis and proposed fix:
## Issue Summary
The current documentation states:
> "If the input is NULL, or there is an issue with the decoding process(such
as encountering an illegal escape pattern), or the encoding scheme is not
supported, the function returns NULL."
However, the actual implementation (UrlDecodeFunction.java lines ~157-164)
has more nuanced behavior:
```java
} catch (RuntimeException e) {
// If we successfully decoded at least once, return the last successful
result
if (iteration > 0) {
return StringData.fromString(previousValue);
}
return null;
}
```
## Actual Behavior
- ✅ Returns NULL if input is NULL
- ✅ Returns NULL if the **first** decoding attempt fails (iteration == 0)
- ⚠️ Returns the **last successfully decoded result** if at least one
iteration succeeded (iteration > 0), even if a subsequent iteration fails
### Concrete Example
Input: `"test%253A%25"`
1. First decode: `"test%253A%25"` → `"test%3A%"` ✓ (success)
2. Second decode: `"test%3A%"` → FAILS (illegal escape `"%"`)
**Current docs say:** Returns NULL
**Actual implementation:** Returns `"test%3A%"` (last successful result)
This behavior is actually **intentional and tested** (see
`UrlDecodeFunctionTest.java` line 110-117).
## Proposed Documentation Fix
Update the description in **4 files** to accurately reflect this behavior:
### 1. `docs/data/sql_functions.yml` (line ~475)
**Current:**
```yaml
If the input is NULL, or there is an issue with the decoding process(such as
encountering an illegal escape pattern), or the encoding scheme is not
supported, the function returns NULL.
```
**Proposed:**
```yaml
Returns NULL if the input is NULL or if the first decoding attempt fails. If
at least one decoding iteration succeeds but a subsequent iteration encounters
an error (such as an illegal escape pattern), the function returns the last
successfully decoded result.
```
### 2. `docs/data/sql_functions_zh.yml` (line ~572)
**Current:**
```yaml
如果输入为 NULL,或者解码过程出现问题(例如遇到非法转义模式或者不支持该编码方案),该函数将返回 NULL。
```
**Proposed:**
```yaml
如果输入为 NULL 或第一次解码尝试失败,则返回
NULL。如果至少有一次解码迭代成功,但后续迭代遇到错误(例如非法转义模式),则函数返回最后一次成功解码的结果。
```
### 3. `BaseExpressions.java` (lines ~1437-1439 and ~1447-1450)
Update both `urlDecodeRecursive()` and `urlDecodeRecursive(InType maxDepth)`
Javadoc.
**Current:**
```java
If the input is null, or there is an issue with the decoding process(such as
encountering an illegal escape pattern), or the encoding scheme is not
supported, will return null.
```
**Proposed:**
```java
Returns null if the input is null or if the first decoding attempt fails. If
at least one decoding iteration succeeds but a subsequent iteration encounters
an error (such as an illegal escape pattern), the function returns the last
successfully decoded result.
```
### 4. `expression.py` (lines ~1445-1447)
**Current:**
```python
If the input is null, or there is an issue with the decoding process, or the
encoding scheme is not supported, returns null.
```
**Proposed:**
```python
Returns null if the input is null or if the first decoding attempt fails. If
at least one decoding iteration succeeds but a subsequent iteration encounters
an error (such as an illegal escape pattern), the function returns the last
successfully decoded result.
```
## Why This Matters
This "best-effort" behavior is actually **valuable in production**:
- Partially corrupted multi-encoded URLs can still yield useful data
- The function provides the maximum information available rather than
failing completely
- The behavior is explicitly tested and intentional
## Summary
This is purely a **documentation update** — no code changes needed. The
implementation is correct and well-tested; we just need the docs to accurately
describe what it does.
Please let me know if you'd like me to prepare a patch file or if you'll
make these changes directly. Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]