[ https://issues.apache.org/jira/browse/SPARK-21582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16108310#comment-16108310 ]
Liang-Chi Hsieh commented on SPARK-21582: ----------------------------------------- Please call toDF API with the renamed column names. It could save much time. > DataFrame.withColumnRenamed cause huge performance overhead > ----------------------------------------------------------- > > Key: SPARK-21582 > URL: https://issues.apache.org/jira/browse/SPARK-21582 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.1.0 > Reporter: GuangFancui(ISCAS) > Attachments: 4654.stack > > > Table "item_feature" (DataFrame) has over 900 columns. > When I use > {code:java} > val nameSequeceExcept = Set("gid","category_name","merchant_id") > val df1 = spark.table("item_feature") > val newdf1 = df1.schema.map(_.name).filter(name => > !nameSequeceExcept.contains(name)).foldLeft(df1)((df1, name) => > df1.withColumnRenamed(name, name + "_1" )) > {code} > It took over 30 minutes. > *PID* in stack file is *0x126d* > It seems that _transform_ took too long time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org