[ https://issues.apache.org/jira/browse/FLINK-16818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068362#comment-17068362 ]
Kurt Young commented on FLINK-16818: ------------------------------------ Can you share the execution graph? To be more specific, is there a shuffle by partition value before writing to hive? > Optimize data skew when flink write data to hive dynamic partition table > ------------------------------------------------------------------------ > > Key: FLINK-16818 > URL: https://issues.apache.org/jira/browse/FLINK-16818 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive > Affects Versions: 1.10.0 > Environment: {code:java} > {code} > Reporter: Jun Zhang > Priority: Major > Fix For: 1.11.0 > > > I read the source table data of hive through flink sql, and then write the > target table of hive. The target table is a partitioned table. When the data > of a partition is particularly large, data skew occurs, resulting in a > particularly long execution time. > By default Configuration, the same sql, hive on spark takes five minutes, and > flink takes about 40 minutes. > example: > > {code:java} > // the schema of myparttable > name string, > age int, > PARTITIONED BY ( > type string, > day string > ) > INSERT OVERWRITE myparttable SELECT name, age, type,day from sourcetable; > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)