Re: [PR] [FLINK-35167][cdc-connector] Introduce MaxCompute pipeline DataSink [flink-cdc]

via GitHub Thu, 13 Jun 2024 23:02:25 -0700


dingxin-tech commented on code in PR #3254:
URL: https://github.com/apache/flink-cdc/pull/3254#discussion_r1639306007



##########
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-maxcompute/src/main/java/org/apache/flink/cdc/connectors/maxcompute/utils/TypeConvertUtils.java:
##########
@@ -0,0 +1,540 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.cdc.connectors.maxcompute.utils;
+
+import org.apache.flink.cdc.common.data.ArrayData;
+import org.apache.flink.cdc.common.data.MapData;
+import org.apache.flink.cdc.common.data.RecordData;
+import org.apache.flink.cdc.common.schema.Schema;
+import org.apache.flink.cdc.common.types.ArrayType;
+import org.apache.flink.cdc.common.types.DataType;
+import org.apache.flink.cdc.common.types.DecimalType;
+import org.apache.flink.cdc.common.types.MapType;
+import org.apache.flink.cdc.common.types.RowType;
+import org.apache.flink.cdc.common.utils.SchemaUtils;
+
+import com.aliyun.odps.Column;
+import com.aliyun.odps.OdpsType;
+import com.aliyun.odps.TableSchema;
+import com.aliyun.odps.data.ArrayRecord;
+import com.aliyun.odps.data.Binary;
+import com.aliyun.odps.data.SimpleStruct;
+import com.aliyun.odps.data.Struct;
+import com.aliyun.odps.table.utils.Preconditions;
+import com.aliyun.odps.type.StructTypeInfo;
+import com.aliyun.odps.type.TypeInfo;
+import com.aliyun.odps.type.TypeInfoFactory;
+
+import java.math.BigDecimal;
+import java.time.Instant;
+import java.time.LocalDate;
+import java.time.LocalDateTime;
+import java.time.LocalTime;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.cdc.common.types.DataTypeChecks.getFieldCount;
+import static org.apache.flink.cdc.common.types.DataTypeChecks.getPrecision;
+import static org.apache.flink.cdc.common.types.DataTypeChecks.getScale;
+
+/**
+ * Data type mapping table This table shows the mapping relationship from 
Flink types to MaxCompute
+ * types and the corresponding Java type representation.
+ *
+ * <pre>
+ * | Flink Type                        | MaxCompute Type| Flink Java Type     
| MaxCompute Java Type |
+ * 
|-----------------------------------|----------------|---------------------|----------------------|
+ * | CHAR/VARCHAR/STRING               | STRING         | StringData          
| String               |
+ * | BOOLEAN                           | BOOLEAN        | Boolean             
| Boolean              |
+ * | BINARY/VARBINARY                  | BINARY         | byte[]              
| odps.data.Binary     |
+ * | DECIMAL                           | DECIMAL        | DecimalData         
| BigDecimal           |
+ * | TINYINT                           | TINYINT        | Byte                
| Byte                 |
+ * | SMALLINT                          | SMALLINT       | Short               
| Short                |
+ * | INTEGER                           | INTEGER        | Integer             
| Integer              |
+ * | BIGINT                            | BIGINT         | Long                
| Long                 |
+ * | FLOAT                             | FLOAT          | Float               
| Float                |
+ * | DOUBLE                            | DOUBLE         | Double              
| Double               |
+ * | TIME_WITHOUT_TIME_ZONE            | STRING         | Integer             
| String               |
+ * | DATE                              | DATE           | Integer             
| LocalDate            |
+ * | TIMESTAMP_WITHOUT_TIME_ZONE       | TIMESTAMP_NTZ  | TimestampData       
| LocalDateTime        |
+ * | TIMESTAMP_WITH_LOCAL_TIME_ZONE    | TIMESTAMP      | 
LocalZonedTimestampData | Instant          |
+ * | TIMESTAMP_WITH_TIME_ZONE          | TIMESTAMP      | ZonedTimestampData  
| Instant              |
+ * | ARRAY                             | ARRAY          | ArrayData           
| ArrayList            |
+ * | MAP                               | MAP            | MapData             
| HashMap              |
+ * | ROW                               | STRUCT         | RowData             
| odps.data.SimpleStruct|
+ * </pre>
+ *
+ * <p>When converting, put the Flink Type Name into the Column comment to 
facilitate conversion
+ * back.
+ */
+public class TypeConvertUtils {
+
+    public static TableSchema toMaxCompute(Schema flinkSchema) {
+        Preconditions.checkNotNull(flinkSchema, "flink Schema");
+        TableSchema tableSchema = new TableSchema();
+        Set<String> primaryKeys = new HashSet<>(flinkSchema.primaryKeys());
+        Set<String> partitionKeys = new HashSet<>(flinkSchema.partitionKeys());
+        List<org.apache.flink.cdc.common.schema.Column> columns = 
flinkSchema.getColumns();
+        for (int i = 0; i < flinkSchema.getColumnCount(); i++) {
+            org.apache.flink.cdc.common.schema.Column flinkColumn = 
columns.get(i);
+            Column odpsColumn =
+                    toMaxCompute(flinkColumn, 
primaryKeys.contains(flinkColumn.getName()));
+            if (partitionKeys.contains(flinkColumn.getName())) {
+                tableSchema.addPartitionColumn(odpsColumn);
+            } else {
+                tableSchema.addColumn(odpsColumn);
+            }
+        }
+        return tableSchema;
+    }
+
+    public static Column toMaxCompute(
+            org.apache.flink.cdc.common.schema.Column flinkColumn, boolean 
isPrimaryKey) {
+        Preconditions.checkNotNull(flinkColumn, "flink Schema Column");
+        DataType type = flinkColumn.getType();
+        Column.ColumnBuilder columnBuilder =
+                Column.newBuilder(flinkColumn.getName(), toMaxCompute(type))
+                        .withComment(type.asSummaryString());
+        if (isPrimaryKey) {
+            columnBuilder.primaryKey();
+        }
+        return columnBuilder.build();
+    }
+
+    public static TypeInfo toMaxCompute(DataType type) {
+        switch (type.getTypeRoot()) {
+            case CHAR:
+            case VARCHAR:
+            case TIME_WITHOUT_TIME_ZONE:
+                return TypeInfoFactory.STRING;

Review Comment:
   Yes, you are correct.
   
   Additionally, I discovered that the MaxCompute SDK has an issue with 
creating tables based on primary keys. This issue results in ignoring the 
user-specified primary key order during table creation. I plan to fix this next 
week.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-35167][cdc-connector] Introduce MaxCompute pipeline DataSink [flink-cdc]

Reply via email to