avamingli opened a new pull request, #1597:
URL: https://github.com/apache/cloudberry/pull/1597

   
   COPY FROM with SEGMENT REJECT LIMIT had two bugs when encountering invalid 
multi-byte encoding sequences:
   
   1. Encoding errors were double-counted: HandleCopyError() incremented 
rejectcount, then RemoveInvalidDataInBuf() incremented it again for the same 
error. This caused the reject limit to be reached twice as fast as expected.
   
   2. SREH (Single Row Error Handling) was completely disabled when transcoding 
was required (file encoding != database encoding). Any encoding error during 
transcoding would raise an ERROR instead of skipping the bad row.
   
   Fix by removing the duplicate rejectcount++ from RemoveInvalidDataInBuf(), 
removing the !need_transcoding guard that blocked SREH for transcoding, and 
adding proper buffer cleanup for the transcoding case (advance raw_buf past the 
bad line using FindEolInUnverifyRawBuf).
   
   Add regression tests covering both non-transcoding (invalid UTF-8) and 
transcoding (invalid EUC_CN to UTF-8) cases with various reject limits.
   
   Fixes https://github.com/apache/cloudberry/issues/1425
   
   <!-- Thank you for your contribution to Apache Cloudberry (Incubating)! -->
   
   Fixes #ISSUE_Number
   
   ### What does this PR do?
   <!-- Brief overview of the changes, including any major features or fixes -->
   
   ### Type of Change
   - [ ] Bug fix (non-breaking change)
   - [ ] New feature (non-breaking change)
   - [ ] Breaking change (fix or feature with breaking changes)
   - [ ] Documentation update
   
   ### Breaking Changes
   <!-- Remove if not applicable. If yes, explain impact and migration path -->
   
   ### Test Plan
   <!-- How did you test these changes? -->
   - [ ] Unit tests added/updated
   - [ ] Integration tests added/updated
   - [ ] Passed `make installcheck`
   - [ ] Passed `make -C src/test installcheck-cbdb-parallel`
   
   ### Impact
   <!-- Remove sections that don't apply -->
   **Performance:**
   <!-- Any performance implications? -->
   
   **User-facing changes:**
   <!-- Any changes visible to users? -->
   
   **Dependencies:**
   <!-- New dependencies or version changes? -->
   
   ### Checklist
   - [ ] Followed [contribution 
guide](https://cloudberry.apache.org/contribute/code)
   - [ ] Added/updated documentation
   - [ ] Reviewed code for security implications
   - [ ] Requested review from [cloudberry 
committers](https://github.com/orgs/apache/teams/cloudberry-committers)
   
   ### Additional Context
   <!-- Any other information that would help reviewers? Remove if none -->
   
   ### CI Skip Instructions
   <!--
   To skip CI builds, add the appropriate CI skip identifier to your PR title.
   The identifier must:
   - Be in square brackets []
   - Include the word "ci" and either "skip" or "no"
   - Only use for documentation-only changes or when absolutely necessary
   -->
   
   ---
   <!-- Join our community:
   - Mailing list: 
[[email protected]](https://lists.apache.org/[email protected])
 (subscribe: [email protected])
   - Discussions: https://github.com/apache/cloudberry/discussions -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to