yzeng1618 opened a new pull request, #9270:
URL: https://github.com/apache/seatunnel/pull/9270

   <!--
   
   Thank you for contributing to SeaTunnel! Please make sure that your code 
changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   
   
   ## Contribution Checklist
     - Make sure that the pull request corresponds to a [GITHUB 
issue](https://github.com/apache/seatunnel/issues).
     - Name the pull request in the form "[Feature] [component] Title of the 
pull request", where *Feature* can be replaced by `Hotfix`, `Bug`, etc.
     - Minor fixes should be named following this pattern: `[hotfix] [docs] Fix 
typo in README.md doc`.
   -->
   
   ### Purpose of this pull request
   <!-- Describe the purpose of this pull request. For example: This pull 
request adds checkstyle plugin.-->
   https://github.com/apache/seatunnel/issues/9268
   
   This PR fixes an issue with the JDBC connector where Oracle BLOB data is not 
properly preserved during synchronization. Currently, when transferring BLOB 
fields (containing text, XML, HTML, etc.) from Oracle to target systems like 
Doris, the data is converted to Base64-encoded strings, making it unusable in 
its original format.
   
   The implementation enhances the OracleTypeConverter to properly handle BLOB 
data based on the `handle_blob_as_string` configuration parameter:
   1. When `handle_blob_as_string=true`, BLOB data is treated as STRING type, 
preserving the original content format
   2. When `handle_blob_as_string=false` (default), BLOB data is treated as 
BYTES type as before
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as 
the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes 
- provide the console output, description and/or an example to show the 
behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to 
the released SeaTunnel versions or within the unreleased branches such as dev.
   If no, write 'No'.
   If you are adding/modifying connector documents, please follow our new 
specifications: https://github.com/apache/seatunnel/issues/4544.
   -->
   Yes, this PR introduces a user-facing change in how Oracle BLOB data is 
handled during synchronization. Users can now configure the JDBC connector to 
preserve the original format of BLOB data by setting 
`handle_blob_as_string=true` in their connector configuration.
   
   Before this change, Oracle BLOB data containing structured content like XML 
or HTML would be converted to Base64-encoded strings in the target system, 
making it difficult to use. With this change, users can choose to preserve the 
original content format.
   
   ### How was this patch tested?
   
   <!--
   If tests were added, say they were added here. Please make sure to add some 
test cases that check the changes thoroughly including negative and positive 
cases if possible.
   If it was tested in a way different from regular unit tests, please clarify 
how you tested step by step, ideally copy and paste-able, so that other 
reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why 
it was difficult to add.
   If you are adding E2E test cases, maybe refer to 
https://github.com/apache/seatunnel/blob/dev/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-cdc-mysql-e2e/src/test/resources/mysqlcdc_to_mysql.conf,
 here is a good example.
   -->
   
   1. Manually tested with Oracle tables containing BLOB fields with various 
data types (text, XML, HTML)
   2. Verified that the original data format is preserved when 
`handle_blob_as_string=true`
   
   For local testing verification, for example, when the parameter 
handle_blob_as_string=false or not set, the situation is as follows: In the 
Oracle source table (TEST_BLOB_TABLE), we have BLOB data with different content 
types:
   
   Row 1: Simple text "Hello, World!"
   
   Row 2: XML content
   
   Row 3: HTML content
   
   However, after synchronization to the Doris target table, all BLOB data is 
converted to Base64-encoded strings:
   
   Row 1: "SGVsbG8sIFdvcmxkIQ=="
   
   Row 2: "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4..."
   
   Row 3: "PCFET0NUWVBFIGh0bWw+PGh0bWwgc3R5bGU9Im92..."
   
   When the parameter is set to true, the BLOB fields in the Doris target table 
maintain data consistent with the source data.
   
   
   ### Check list
   
   * [ ] If any new Jar binary package adding in your PR, please add License 
Notice according
     [New License 
Guide](https://github.com/apache/seatunnel/blob/dev/docs/en/contribution/new-license.md)
   * [x ] If necessary, please update the documentation to describe the new 
feature. https://github.com/apache/seatunnel/tree/dev/docs
   * [ ] If you are contributing the connector code, please check that the 
following files are updated:
     1. Update 
[plugin-mapping.properties](https://github.com/apache/seatunnel/blob/dev/plugin-mapping.properties)
 and add new connector information in it
     2. Update the pom file of 
[seatunnel-dist](https://github.com/apache/seatunnel/blob/dev/seatunnel-dist/pom.xml)
     4. Add ci label in 
[label-scope-conf](https://github.com/apache/seatunnel/blob/dev/.github/workflows/labeler/label-scope-conf.yml)
     5. Add e2e testcase in 
[seatunnel-e2e](https://github.com/apache/seatunnel/tree/dev/seatunnel-e2e/seatunnel-connector-v2-e2e/)
     6. Update connector 
[plugin_config](https://github.com/apache/seatunnel/blob/dev/config/plugin_config)
   * [x ] Update the 
[`release-note`](https://github.com/apache/seatunnel/blob/dev/release-note.md).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to