[ 
https://issues.apache.org/jira/browse/SPARK-35500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350458#comment-17350458
 ] 

Takeshi Yamamuro commented on SPARK-35500:
------------------------------------------

Which version did you use? v3.1.0 does not exist, so v3.1.1? I tried to run the 
queries in v3.1.1 to reproduce it, but it couldn't happen;
{code:java}
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1
      /_/
         
Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)

scala> sql("create table test_code_gen(a array<int>)")
scala> sql("insert into test_code_gen values (array(1, 1))")
scala> sc.setLogLevel("debug")

// The first run
scala> sql("select * from test_code_gen").collect()
...
21/05/24 23:14:00 DEBUG GenerateSafeProjection: code for 
createexternalrow(staticinvoke(class scala.collection.mutable.WrappedArray$, 
ObjectType(interface scala.collection.Seq), make, 
mapobjects(lambdavariable(MapObject, IntegerType, true, -1), 
lambdavariable(MapObject, IntegerType, true, -1), input[0, array<int>, true], 
None).array, true, false), StructField(a,ArrayType(IntegerType,true),true)):
/* 001 */ public java.lang.Object generate(Object[] references) {
/* 002 */   return new SpecificSafeProjection(references);
/* 003 */ }
/* 004 */
/* 005 */ class SpecificSafeProjection extends 
org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection {
/* 006 */
/* 007 */   private Object[] references;
/* 008 */   private InternalRow mutableRow;
/* 009 */   private boolean resultIsNull_0;
/* 010 */   private int value_MapObject_lambda_variable_1;
/* 011 */   private boolean isNull_MapObject_lambda_variable_1;
/* 012 */   private boolean globalIsNull_0;
/* 013 */   private java.lang.Object[] mutableStateArray_0 = new 
java.lang.Object[1];
/* 014 */
...


// The second run
scala> sql("select * from test_code_gen").collect()
...
21/05/24 23:14:28 DEBUG GenerateSafeProjection: code for 
createexternalrow(staticinvoke(class scala.collection.mutable.WrappedArray$, 
ObjectType(interface scala.collection.Seq), make, 
mapobjects(lambdavariable(MapObject, IntegerType, true, -1), 
lambdavariable(MapObject, IntegerType, true, -1), input[0, array<int>, true], 
None).array, true, false), StructField(a,ArrayType(IntegerType,true),true)):
/* 001 */ public java.lang.Object generate(Object[] references) {
/* 002 */   return new SpecificSafeProjection(references);
/* 003 */ }
/* 004 */
/* 005 */ class SpecificSafeProjection extends 
org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection {
/* 006 */
/* 007 */   private Object[] references;
/* 008 */   private InternalRow mutableRow;
/* 009 */   private boolean resultIsNull_0;
/* 010 */   private int value_MapObject_lambda_variable_1;
/* 011 */   private boolean isNull_MapObject_lambda_variable_1;
/* 012 */   private boolean globalIsNull_0;
/* 013 */   private java.lang.Object[] mutableStateArray_0 = new 
java.lang.Object[1];
...
 {code}
Actually, this issue should be fixed in 
SPARK-27871([https://github.com/apache/spark/pull/24735]). Or, do I miss 
something?

> GenerateSafeProjection.generate will generate SpecificSafeProjection class, 
> but if column is array type or map type, the code cannot be reused which 
> impact the query performance
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-35500
>                 URL: https://issues.apache.org/jira/browse/SPARK-35500
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Yahui Liu
>            Priority: Minor
>              Labels: codegen
>
> Reproduce steps:
>  # create a new table with array type: create table test_code_gen(a 
> array<int>);
>  # Add 
> log4j.logger.org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator 
> = DEBUG to log4j.properties;
>  # Enter spark-shell, fire a query: spark.sql("select * from 
> test_code_gen").collect
>  # Everytime, Dataset.collect is called, SpecificSafeProjection class is 
> generated, but the code for the class cannot be reused because everytime the 
> id for two variables in the generated class is changed: MapObjects_loopValue 
> and MapObjects_loopIsNull. So even the class generated before has been 
> cached, new code cannot match the cache key so that new code need to be 
> compiled again which cost some time.  The time cost for compile is increasing 
> with the growth of column number, for wide table, this cost can more than 2s. 
> {code:java}
> object MapObjects {
>   private val curId = new java.util.concurrent.atomic.AtomicInteger()
>  val id = curId.getAndIncrement()
>  val loopValue = s"MapObjects_loopValue$id"
>  val loopIsNull = if (elementNullable) {
>    s"MapObjects_loopIsNull$id"
>  } else {
>    "false"
>  }
> {code}
> First time run: 
> {code:java}
> class SpecificSafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection {
>  private int MapObjects_loopValue1;
>  private boolean MapObjects_loopIsNull1;
>  private UTF8String MapObjects_loopValue2;
>  private boolean MapObjects_loopIsNull2;
> }
> {code}
> Second time run:
> {code:java}
> class SpecificSafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection {
>  private int MapObjects_loopValue3;
>  private boolean MapObjects_loopIsNull3;
>  private UTF8String MapObjects_loopValue4;
>  private boolean MapObjects_loopIsNull4;
> }{code}
> Expectation:
> The code generated by GenerateSafeProjection can be reused if the query is 
> same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to