[jira] [Commented] (SPARK-29372) Codegen grows beyond 64 KB for more columns in case of SupportsScanColumnarBatch

Razibul Hossain (Jira) Tue, 29 Oct 2019 20:59:48 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-29372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962663#comment-16962663
 ]


Razibul Hossain commented on SPARK-29372:
-----------------------------------------

I kind of face same issue following some error details

CodeGenerator:91 - failed to compile: 
org.codehaus.janino.InternalCompilerException: Compiling "GeneratedClass": Code 
of method "processNext()V" of class 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage0"
 grows beyond 64 KB
org.codehaus.janino.InternalCompilerException: Compiling "GeneratedClass": Code 
of method "processNext()V" of class 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage0"
 grows beyond 64 KB
 at org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:361)
 at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:234)
 at 
org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:446)
 at 
org.codehaus.janino.ClassBodyEvaluator.compileToClass(ClassBodyEvaluator.java:313)
 at org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:235)
 at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:204)
 at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80)
 at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1417)
 at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1493)
 at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1490)
 at 
org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
 at 
org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
 at 
org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
 at org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
 at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
 at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
 at 
org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
 at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1365)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:579)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:578)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.doExecute(DataSourceV2ScanExec.scala:90)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:584)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:578)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.doExecute(DataSourceV2ScanExec.scala:90)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:584)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:578)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.doExecute(DataSourceV2ScanExec.scala:90)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:584)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:578)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.doExecute(DataSourceV2ScanExec.scala:90)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:584)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:578)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.doExecute(DataSourceV2ScanExec.scala:90)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:584)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:578)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.doExecute(DataSourceV2ScanExec.scala:90)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:584)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:578)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.doExecute(DataSourceV2ScanExec.scala:90)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:584)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:578)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.doExecute(DataSourceV2ScanExec.scala:90)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:584)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:578)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.doExecute(DataSourceV2ScanExec.scala:90)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:584)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:578)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.doExecute(DataSourceV2ScanExec.scala:90)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:584)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:578)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.doExecute(DataSourceV2ScanExec.scala:90)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:584)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:578)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.doExecute(DataSourceV2ScanExec.scala:90)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.prepareShuffleDependency(ShuffleExchangeExec.scala:92)
 at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:128)
 at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:119)
 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
 at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
 at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:337)
 at 
org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
 at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3278)
 at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)
 at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)
 at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3259)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
 at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258)
 at org.apache.spark.sql.Dataset.head(Dataset.scala:2489)
 at org.apache.spark.sql.Dataset.take(Dataset.scala:2703)
 at org.apache.spark.sql.Dataset.showString(Dataset.scala:254)
 at org.apache.spark.sql.Dataset.show(Dataset.scala:723)

> Codegen grows beyond 64 KB for more columns in case of 
> SupportsScanColumnarBatch
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-29372
>                 URL: https://issues.apache.org/jira/browse/SPARK-29372
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 2.3.2
>            Reporter: Shubham Chaurasia
>            Priority: Major
>
> In case of vectorized DSv2 readers i.e. if it implements 
> {{SupportsScanColumnarBatch}} and number of columns is around(or greater 
> than) 1000 then it throws
> {code:java}
> Caused by: org.codehaus.janino.InternalCompilerException: Code of method 
> "processNext()V" of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage0"
>  grows beyond 64 KB
>       at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:990)
>       at org.codehaus.janino.CodeContext.write(CodeContext.java:899)
>       at org.codehaus.janino.CodeContext.writeBranch(CodeContext.java:1016)
>       at org.codehaus.janino.UnitCompiler.writeBranch(UnitCompiler.java:11911)
>       at 
> org.codehaus.janino.UnitCompiler.compileBoolean2(UnitCompiler.java:3675)
>       at org.codehaus.janino.UnitCompiler.access$5500(UnitCompiler.java:212)
> {code}
> I can see from logs that it tries to disable Whole-stage codegen but it's 
> failing even after that on each retry.
> {code}
> 19/10/07 20:49:35 WARN WholeStageCodegenExec: Whole-stage codegen disabled 
> for plan (id=0):
>  *(0) DataSourceV2Scan [column_0#3558, column_1#3559, column_2#3560, 
> column_3#3561, column_4#3562, column_5#3563, column_6#3564, column_7#3565, 
> column_8#3566, column_9#3567, column_10#3568, column_11#3569, column_12#3570, 
> column_13#3571, column_14#3572, column_15#3573, column_16#3574, 
> column_17#3575, column_18#3576, column_19#3577, column_20#3578, 
> column_21#3579, column_22#3580, column_23#3581, ... 976 more fields], 
> com.shubham.reader.MyDataSourceReader@5c7673b8
> {code}
> Repro code for a simple reader can be: 
> {code:java}
> public class MyDataSourceReader implements DataSourceReader, 
> SupportsScanColumnarBatch {
>   private StructType schema;
>   private int numCols = 10;
>   private int numRows = 10;
>   private int numReaders = 1;
>   public MyDataSourceReader(Map<String, String> options) {
>     initOptions(options);
>     System.out.println("MyDataSourceReader.MyDataSourceReader: 
> Instantiated...." + this);
>   }
>   private void initOptions(Map<String, String> options) {
>     String numColumns = options.get("num_columns");
>     if (numColumns != null) {
>       numCols = Integer.parseInt(numColumns);
>     }
>     String numRowsOption = options.get("num_rows_per_reader");
>     if (numRowsOption != null) {
>       numRows = Integer.parseInt(numRowsOption);
>     }
>     String readersOption = options.get("num_readers");
>     if (readersOption != null) {
>       numReaders = Integer.parseInt(readersOption);
>     }
>   }
>   @Override public StructType readSchema() {
>     final String colPrefix = "column_";
>     StructField[] fields = new StructField[numCols];
>     for (int i = 0; i < numCols; i++) {
>       fields[i] = new StructField(colPrefix + i, DataTypes.IntegerType, true, 
> Metadata.empty());
>     }
>     schema = new StructType(fields);
>     return schema;
>   }
>   @Override public List<DataReaderFactory<ColumnarBatch>> 
> createBatchDataReaderFactories() {
>     System.out.println("MyDataSourceReader.createDataReaderFactories: " + 
> numReaders);
>     return new ArrayList<>();
>   }
> }
> {code}
> If I pass {{num_columns}} 1000 or greater, the issue appears.
> {code:java}
> spark.read.format("com.shubham.MyDataSource").option("num_columns", 
> "1000").option("num_rows_per_reader", 2).option("num_readers", 1).load.show
> {code}
> Any fixes/workarounds for this? 
> SPARK-16845 and SPARK-17092 are resolved but looks like they don't deal with 
> the vectorized part.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29372) Codegen grows beyond 64 KB for more columns in case of SupportsScanColumnarBatch

Reply via email to