GuangFancui(ISCAS) created SPARK-21582: ------------------------------------------
Summary: DataFrame.withColumnRenamed cause huge performance overhead Key: SPARK-21582 URL: https://issues.apache.org/jira/browse/SPARK-21582 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.1.0 Reporter: GuangFancui(ISCAS) Table "item_feature" (DataFrame) has over 900 columns. When I use {code:java} val nameSequeceExcept = Set("gid","category_name","merchant_id") val df1 = spark.table("item_feature") val newdf1 = df1.schema.map(_.name).filter(name => !nameSequeceExcept.contains(name)).foldLeft(df1)((df1, name) => df1.withColumnRenamed(name, name + "_1" )) {code} It took over 30 minutes. *PID* in stack file is *0x126d* It seems that _transform_ took too long time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org