hubgeter opened a new pull request, #44848:
URL: https://github.com/apache/doris/pull/44848

   bp #43469
   
   Problem Summary:
   Support reading json format hive table like:
   ```mysql
   mysql> show create table basic_json_table;
   CREATE TABLE `basic_json_table`(
     `id` int,
     `name` string,
     `age` tinyint,
     `salary` float,
     `is_active` boolean,
     `join_date` date,
     `last_login` timestamp,
     `height` double,
     `profile` binary,
     `rating` decimal(10,2))
   ROW FORMAT SERDE
     'org.apache.hive.hcatalog.data.JsonSerDe'
   ```
   
   Behavior changed:
   To implement this feature, this pr modifies `new_json_reader`. Previously, 
`new_json_reader` could only insert data into columnString. In order to support 
inserting data into columns of other types, `DataTypeSerDe` is introduced to 
insert data into columns. To maintain compatibility with previous versions, 
changes to this pr are triggered only when reading hive json tables.
   
   Limitation of Use:
   1. Currently, only query is supported, and writing is not supported.
   2. Currently, only the `ROW FORMAT SERDE 
'org.apache.hive.hcatalog.data.JsonSerDe';` scenario is supported. For some 
properties specified in `with serdeproperties`, Doris does not take effect.
   3. Since Hive does not allow columns with the same name but different case 
when creating a table in Json format (including inside a Struct), we convert 
the field names in the Json data to lowercase when reading the Json data file, 
and then match according to the lowercase field names. For field names that are 
duplicated after being converted to lowercase in the data, the value of the 
last field is used (consistent with Hive behavior).
   example:
   ```
   create table json_table(
       column int
   )ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';
   
   a.json:
   {"column":1,"COLumn",2,"COLUMN":3}
   {"column":10,"COLumn",20}
   {"column":100}
   in Hive : load a.json to table json_table
   
   in Doris query:
   ---
   3
   20
   100
   ---
   ```
   
   Todo(in next pr):
   Merge `serde` and `json_reader` ,because they have logical conflicts.
   
   Hive catalog support read json format table.
   
   ### What problem does this PR solve?
   
   Issue Number: close #xxx
   
   Related PR: #xxx
   
   Problem Summary:
   
   ### Release note
   
   None
   
   ### Check List (For Author)
   
   - Test <!-- At least one of them must be included. -->
       - [ ] Regression test
       - [ ] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason <!-- Add your reason?  -->
   
   - Behavior changed:
       - [ ] No.
       - [ ] Yes. <!-- Explain the behavior change -->
   
   - Does this need documentation?
       - [ ] No.
       - [ ] Yes. <!-- Add document PR link here. eg: 
https://github.com/apache/doris-website/pull/1214 -->
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label <!-- Add branch pick label that this PR should 
merge into -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to