[ https://issues.apache.org/jira/browse/SPARK-40170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17582935#comment-17582935 ]
caican edited comment on SPARK-40170 at 8/22/22 12:13 PM: ---------------------------------------------------------- [~kabhwan] My program code is very simple,as shown below. ``` val rdd = spark.sql("select triggerId,adMetadata,userData from iceberg_my_cloud.mydb.myTable where date = 20220801").rdd println(rdd.count()) ``` In addition to string decode, the conversion of Tuple2 to MAP is slow and i have submitted a patch:https://github.com/apache/spark/pull/37609 to optimize it but right now I don't have a good way to optimize string decode was (Author: JIRAUSER280464): My program code is very simple,As shown below. ``` val rdd = spark.sql("select triggerId,adMetadata,userData from iceberg_my_cloud.mydb.myTable where date = 20220801").rdd println(rdd.count()) ``` > StringCoding UTF8 decode slowly > ------------------------------- > > Key: SPARK-40170 > URL: https://issues.apache.org/jira/browse/SPARK-40170 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1 > Reporter: caican > Priority: Major > Attachments: image-2022-08-22-10-56-54-768.png, > image-2022-08-22-10-57-11-744.png > > > When `UnsafeRow` is converted to `Row` at > `org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow > `, UTF8String decoding and copyMemory process are very slow. > Does anyone have any ideas for optimization? > !image-2022-08-22-10-56-54-768.png! > > !image-2022-08-22-10-57-11-744.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org