Jiayi Liao created SPARK-26182: ---------------------------------- Summary: Cost increases when optimizing scalaUDF Key: SPARK-26182 URL: https://issues.apache.org/jira/browse/SPARK-26182 Project: Spark Issue Type: Bug Components: Optimizer Affects Versions: 2.4.0 Reporter: Jiayi Liao
Let's Assume that we have a udf called splitUDF which outputs a map data. The SQL {code:java} select g['a'], g['b'] from ( select splitUDF(x) as g from table) tbl {code} will be optimized to the same logical plan of {code:java} select splitUDF(x)['a'], splitUDF(x)['b'] from table {code} which means that the splitUDF is executed twice instead of once. The optimization is from CollapseProject. I'm not sure whether this is a bug or not. Please tell me if I was wrong about this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org