[GitHub] spark issue #19865: [SPARK-22668][SQL] Exclude global variables from argumen...

kiszk Sun, 03 Dec 2017 06:39:28 -0800

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19865
  
    @mgaido91 @viirya  As you see, we see an assertion failure. Here is an 
evidence that we pass a global variable to arguments of split function.
    In practice, we did not guarantee that we do not pass a global variable.
    
    
    An 
[value](github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala)
 was declared as a global variable. Then, it is passed as `ExprCode.value`. 
Finally, `value` is passed as an argument in `CodeGenContext.splitFunction`. 
Fortunally, this `expressions` did not update the global variable. Thus, it 
worked fuctionally correct. 
    In general, it is hard to ensure there is no update in the `expressions`. 
Of course, we do not like to use regular expressions to detect it.
    
    As you said, how do we ensure that we do not pass a global variable?
    
    
    
    ```
    **********************************************************************
    File 
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/ml/feature.py", 
line 1205, in __main__.MinHashLSH
    Failed example:
    ...
        Caused by: java.lang.AssertionError: assertion failed: smj_value16 in 
arguments should not be declared as a global variable
                at scala.Predef$.assert(Predef.scala:170)
                at 
org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.org$apache$spark$sql$catalyst$expressions$codegen$CodegenContext$$isDeclaredMutableState(CodeGenerator.scala:226)
                at 
org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext$$anonfun$9.apply(CodeGenerator.scala:854)
                at 
org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext$$anonfun$9.apply(CodeGenerator.scala:854)
                at 
scala.collection.TraversableLike$$anonfun$filterImpl$1.apply(TraversableLike.scala:248)
                at scala.collection.immutable.List.foreach(List.scala:381)
                at 
scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
                at 
scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
                at 
scala.collection.AbstractTraversable.filter(Traversable.scala:104)
                at 
org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.splitExpressions(CodeGenerator.scala:853)
                at 
org.apache.spark.sql.catalyst.expressions.HashExpression.genHashForStruct(hash.scala:395)
                at 
org.apache.spark.sql.catalyst.expressions.HashExpression.computeHashWithTailRec(hash.scala:421)
                at 
org.apache.spark.sql.catalyst.expressions.HashExpression.computeHash(hash.scala:429)
                at 
org.apache.spark.sql.catalyst.expressions.HashExpression$$anonfun$1.apply(hash.scala:276)
                at 
org.apache.spark.sql.catalyst.expressions.HashExpression$$anonfun$1.apply(hash.scala:273)
                at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
                at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
                at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
                at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
                at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
                at 
scala.collection.AbstractTraversable.map(Traversable.scala:104)
                at 
org.apache.spark.sql.catalyst.expressions.HashExpression.doGenCode(hash.scala:273)
                at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:107)
                at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
                at scala.Option.getOrElse(Option.scala:121)
                at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:104)
                at 
org.apache.spark.sql.execution.aggregate.HashAggregateExec.doConsumeWithKeys(HashAggregateExec.scala:772)
                at 
org.apache.spark.sql.execution.aggregate.HashAggregateExec.doConsume(HashAggregateExec.scala:173)
                at 
org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:162)
                at 
org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:35)
                at 
org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:65)
                at 
org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:162)
                at 
org.apache.spark.sql.execution.joins.SortMergeJoinExec.consume(SortMergeJoinExec.scala:36)
                at 
org.apache.spark.sql.execution.joins.SortMergeJoinExec.doProduce(SortMergeJoinExec.scala:626)
                at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:85)
                at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:80)
                at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:141)
                at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
                at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:138)
                at 
org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:80)
                at 
org.apache.spark.sql.execution.joins.SortMergeJoinExec.produce(SortMergeJoinExec.scala:36)
                at 
org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:45)
                at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:85)
                at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:80)
                at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:141)
                at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
                at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:138)
                at 
org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:80)
                at 
org.apache.spark.sql.execution.ProjectExec.produce(basicPhysicalOperators.scala:35)
                at 
org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduceWithKeys(HashAggregateExec.scala:647)
                at 
org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduce(HashAggregateExec.scala:165)
                at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:85)
                at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:80)
                at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:141)
                at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
                at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:138)
                at 
org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:80)
                at 
org.apache.spark.sql.execution.aggregate.HashAggregateExec.produce(HashAggregateExec.scala:39)
                at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doCodeGen(WholeStageCodegenExec.scala:374)
                at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:422)
                at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
                at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:113)
                at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:141)
                at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
                at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:138)
                at 
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
                at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.prepareShuffleDependency(ShuffleExchangeExec.scala:89)
                at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:125)
                at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:116)
                at 
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
    ...```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19865: [SPARK-22668][SQL] Exclude global variables from argumen...

Reply via email to