[GitHub] spark pull request #21770: [SPARK-24806][SQL] Brush up generated code so tha...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21770#discussion_r203261248 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -318,7 +318,8 @@ case class SampleExec( v => s""" | $v = new $samplerClass($lowerBound, $upperBound, false); | $v.setSeed(${seed}L + partitionIndex); - """.stripMargin.trim) + """.stripMargin.trim, +forceInline = true) --- End diff -- aha, ok. let's us wait for the other developer's comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21770: [SPARK-24806][SQL] Brush up generated code so tha...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21770#discussion_r202623270 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -318,7 +318,8 @@ case class SampleExec( v => s""" | $v = new $samplerClass($lowerBound, $upperBound, false); | $v.setSeed(${seed}L + partitionIndex); - """.stripMargin.trim) + """.stripMargin.trim, +forceInline = true) --- End diff -- Though this may cause problems leading to the 64KB limit issue. So if we are not including the jdk support I'd be against this change... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21770: [SPARK-24806][SQL] Brush up generated code so tha...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21770#discussion_r202563865 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -1588,6 +1588,15 @@ object CodeGenerator extends Logging { def primitiveTypeName(dt: DataType): String = primitiveTypeName(javaType(dt)) + def getClassName(cls: Class[_]): String = { +try { + return Option(cls.getCanonicalName).getOrElse(cls.getName) +} catch { + case err: InternalError => --- End diff -- Anyway, `return` removed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21770: [SPARK-24806][SQL] Brush up generated code so tha...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21770#discussion_r202563817 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -1588,6 +1588,15 @@ object CodeGenerator extends Logging { def primitiveTypeName(dt: DataType): String = primitiveTypeName(javaType(dt)) + def getClassName(cls: Class[_]): String = { +try { + return Option(cls.getCanonicalName).getOrElse(cls.getName) +} catch { + case err: InternalError => --- End diff -- Please see the commit https://github.com/apache/spark/commit/cc88d7fad16e8b5cbf7b6b9bfe412908782b4a45 and this is the same fix with [Utils.getSimpleName](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L2732). Yea, I see and I think this is workaround now. Yea, in some cases, the doc says that `getCanonicalName` could return null; ``` /** * Returns the canonical name of the underlying class as * defined by the Java Language Specification. Returns null if * the underlying class does not have a canonical name (i.e., if * it is a local or anonymous class or an array whose component * type does not have a canonical name). * @return the canonical name of the underlying class if it exists, and * {@code null} otherwise. * @since 1.5 */ public String getCanonicalName() { ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21770: [SPARK-24806][SQL] Brush up generated code so tha...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21770#discussion_r202562989 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -890,7 +890,7 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String case StringType => val intOpt = ctx.freshName("intOpt") (c, evPrim, evNull) => s""" -scala.Option $intOpt = +scala.Option $intOpt = --- End diff -- I don't look into the bytecode that JDK compilers generate though, the type of the return value seems to be erased in the bytecode; ``` ... /* 045 */ scala.Option intOpt_0 = /* 046 */ org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToDate(((UTF8String) references[0] /* literal */)); /* 047 */ if (intOpt_0.isDefined()) { /* 048 */ value_1 = ((Integer) intOpt_0.get()).intValue(); /* 049 */ } else { /* 050 */ isNull_1 = true; /* 051 */ } /* 052 */ ... - Day / DayOfMonth *** FAILED *** Code generation of dayofmonth(cast(2000-02-29 as date)) failed: java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: failed to compile: (Line 67, Column 62) incompatible types: scala.Option cannot be converted to scala.Option java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: failed to compile: (Line 67, Column 62) incompatible types: scala.Option cannot be converted to scala.Option at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21770: [SPARK-24806][SQL] Brush up generated code so tha...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/21770#discussion_r202548522 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -890,7 +890,7 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String case StringType => val intOpt = ctx.freshName("intOpt") (c, evPrim, evNull) => s""" -scala.Option $intOpt = +scala.Option $intOpt = --- End diff -- Out of curiosity, why is this change needed? Is it because it's really a scala int that's returned? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21770: [SPARK-24806][SQL] Brush up generated code so tha...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/21770#discussion_r202548556 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -1588,6 +1588,15 @@ object CodeGenerator extends Logging { def primitiveTypeName(dt: DataType): String = primitiveTypeName(javaType(dt)) + def getClassName(cls: Class[_]): String = { +try { + return Option(cls.getCanonicalName).getOrElse(cls.getName) +} catch { + case err: InternalError => --- End diff -- What would cause InternalError here? just wondering if there's a cleaner way. Will `getCanonicalName` ever return null? otherwise seems like you don't need to fall back to `getName` above. Also `return` isn't needed below? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21770: [SPARK-24806][SQL] Brush up generated code so tha...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21770#discussion_r202547003 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3758,7 +3758,14 @@ case class ArrayUnion(left: Expression, right: Expression) extends ArraySetLike } else { val arrayUnion = classOf[ArrayUnion].getName val et = ctx.addReferenceObj("elementTypeUnion", elementType) -val order = ctx.addReferenceObj("orderingUnion", ordering) +// Some data types (e.g., `BinaryType`) have anonymous classes for ordering and +// `addReferenceObj` generates code below for anonymous classes; +// +// (org.apache.spark.sql.types.BinaryType$$anon$1) references[0]; +// +// Janino can compile this statement, but JDK Java compilers can't. So, we explicitly +// set `className` here. +val order = ctx.addReferenceObj("orderingUnion", ordering, "scala.math.Ordering") --- End diff -- @mgaido91 I a little misunderstood, so I updated the comment. I don't know why this fails in JDK compilers now, so I think I need to dig into this more. cc: @rednaxelafx --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21770: [SPARK-24806][SQL] Brush up generated code so tha...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21770#discussion_r202543135 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -1585,6 +1585,9 @@ case class InitializeJavaBean(beanInstance: Expression, setters: Map[String, Exp } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +// Resolves setters before compilation +require(resolvedSetters.nonEmpty) --- End diff -- In the master, [this test](https://github.com/apache/spark/blob/69993217fc4f5e5e41a297702389e86fe534dc2f/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ObjectExpressionsSuite.scala#L217) currently depends on the message of janino compilation errors. So, If we used JDK java compilers, the test could fail (because the compilers throw an exception with a different message). To fix this, the change check the setters before compilation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21770: [SPARK-24806][SQL] Brush up generated code so tha...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21770#discussion_r202542918 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -318,7 +318,8 @@ case class SampleExec( v => s""" | $v = new $samplerClass($lowerBound, $upperBound, false); | $v.setSeed(${seed}L + partitionIndex); - """.stripMargin.trim) + """.stripMargin.trim, +forceInline = true) --- End diff -- Generic arrays not allowed; ``` /* 019 */ private org.apache.spark.util.random.BernoulliCellSampler[] sample_mutableStateArray_0 = new org.apache.spark.util.random.BernoulliCellSampler[1]; --- Cause: java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: failed to compile: (Line 40, Column 101) generic array creation ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21770: [SPARK-24806][SQL] Brush up generated code so tha...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21770#discussion_r202534510 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -1585,6 +1585,9 @@ case class InitializeJavaBean(beanInstance: Expression, setters: Map[String, Exp } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +// Resolves setters before compilation +require(resolvedSetters.nonEmpty) --- End diff -- why do we need to add this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21770: [SPARK-24806][SQL] Brush up generated code so tha...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21770#discussion_r202534515 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -318,7 +318,8 @@ case class SampleExec( v => s""" | $v = new $samplerClass($lowerBound, $upperBound, false); | $v.setSeed(${seed}L + partitionIndex); - """.stripMargin.trim) + """.stripMargin.trim, +forceInline = true) --- End diff -- why do we need this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21770: [SPARK-24806][SQL] Brush up generated code so tha...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21770#discussion_r202534499 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3758,7 +3758,10 @@ case class ArrayUnion(left: Expression, right: Expression) extends ArraySetLike } else { val arrayUnion = classOf[ArrayUnion].getName val et = ctx.addReferenceObj("elementTypeUnion", elementType) -val order = ctx.addReferenceObj("orderingUnion", ordering) +// Some data types (e.g., `BinaryType`) have anonymous classes for ordering and +// `getCanonicalName` returns null in these classes. Therefore, we need to +// explicitly set `className` here. --- End diff -- nit: as we are adding this comment, shall we also mention that Janino works anyway, but JDK complains here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21770: [SPARK-24806][SQL] Brush up generated code so tha...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/21770 [SPARK-24806][SQL] Brush up generated code so that JDK compilers can handle it ## What changes were proposed in this pull request? This pr brushed up code so that JDK compilers can handle it because the master sometimes generates Java code that only janino can compile it. This is the issue I found when working on SPARK-24498. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark FixJavaCompilerErrors Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21770.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21770 commit 08fddcbb755b8b4e84238a2e3d96c2e0967daa99 Author: Takeshi Yamamuro Date: 2018-06-18T12:33:43Z Fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org