featzhang commented on PR #27351:
URL: https://github.com/apache/flink/pull/27351#issuecomment-4293188795

   Hi @featzhang and @dylanhz,
   
   I've reviewed the remaining feedback from @dylanhz regarding the 
documentation-implementation mismatch. Here's my analysis and proposed fix:
   
   ## Issue Summary
   
   The current documentation states:
   > "If the input is NULL, or there is an issue with the decoding process(such 
as encountering an illegal escape pattern), or the encoding scheme is not 
supported, the function returns NULL."
   
   However, the actual implementation (UrlDecodeFunction.java lines ~157-164) 
has more nuanced behavior:
   
   ```java
   } catch (RuntimeException e) {
       // If we successfully decoded at least once, return the last successful 
result
       if (iteration > 0) {
           return StringData.fromString(previousValue);
       }
       return null;
   }
   ```
   
   ## Actual Behavior
   
   - ✅ Returns NULL if input is NULL
   - ✅ Returns NULL if the **first** decoding attempt fails (iteration == 0)
   - ⚠️  Returns the **last successfully decoded result** if at least one 
iteration succeeded (iteration > 0), even if a subsequent iteration fails
   
   ### Concrete Example
   
   Input: `"test%253A%25"`
   1. First decode: `"test%253A%25"` → `"test%3A%"` ✓ (success)
   2. Second decode: `"test%3A%"` → FAILS (illegal escape `"%"`)
   
   **Current docs say:** Returns NULL  
   **Actual implementation:** Returns `"test%3A%"` (last successful result)
   
   This behavior is actually **intentional and tested** (see 
`UrlDecodeFunctionTest.java` line 110-117).
   
   ## Proposed Documentation Fix
   
   Update the description in **4 files** to accurately reflect this behavior:
   
   ### 1. `docs/data/sql_functions.yml` (line ~475)
   
   **Current:**
   ```yaml
   If the input is NULL, or there is an issue with the decoding process(such as 
encountering an illegal escape pattern), or the encoding scheme is not 
supported, the function returns NULL.
   ```
   
   **Proposed:**
   ```yaml
   Returns NULL if the input is NULL or if the first decoding attempt fails. If 
at least one decoding iteration succeeds but a subsequent iteration encounters 
an error (such as an illegal escape pattern), the function returns the last 
successfully decoded result.
   ```
   
   ### 2. `docs/data/sql_functions_zh.yml` (line ~572)
   
   **Current:**
   ```yaml
   如果输入为 NULL,或者解码过程出现问题(例如遇到非法转义模式或者不支持该编码方案),该函数将返回 NULL。
   ```
   
   **Proposed:**
   ```yaml
   如果输入为 NULL 或第一次解码尝试失败,则返回 
NULL。如果至少有一次解码迭代成功,但后续迭代遇到错误(例如非法转义模式),则函数返回最后一次成功解码的结果。
   ```
   
   ### 3. `BaseExpressions.java` (lines ~1437-1439 and ~1447-1450)
   
   Update both `urlDecodeRecursive()` and `urlDecodeRecursive(InType maxDepth)` 
Javadoc.
   
   **Current:**
   ```java
   If the input is null, or there is an issue with the decoding process(such as 
encountering an illegal escape pattern), or the encoding scheme is not 
supported, will return null.
   ```
   
   **Proposed:**
   ```java
   Returns null if the input is null or if the first decoding attempt fails. If 
at least one decoding iteration succeeds but a subsequent iteration encounters 
an error (such as an illegal escape pattern), the function returns the last 
successfully decoded result.
   ```
   
   ### 4. `expression.py` (lines ~1445-1447)
   
   **Current:**
   ```python
   If the input is null, or there is an issue with the decoding process, or the 
encoding scheme is not supported, returns null.
   ```
   
   **Proposed:**
   ```python
   Returns null if the input is null or if the first decoding attempt fails. If 
at least one decoding iteration succeeds but a subsequent iteration encounters 
an error (such as an illegal escape pattern), the function returns the last 
successfully decoded result.
   ```
   
   ## Why This Matters
   
   This "best-effort" behavior is actually **valuable in production**:
   - Partially corrupted multi-encoded URLs can still yield useful data
   - The function provides the maximum information available rather than 
failing completely
   - The behavior is explicitly tested and intentional
   
   ## Summary
   
   This is purely a **documentation update** — no code changes needed. The 
implementation is correct and well-tested; we just need the docs to accurately 
describe what it does.
   
   Please let me know if you'd like me to prepare a patch file or if you'll 
make these changes directly. Thanks!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to