[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21537 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r209464356 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -1024,26 +1033,29 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToIntervalCode(from: DataType): CastFunction = from match { case StringType => (c, evPrim, evNull) => -s"""$evPrim = CalendarInterval.fromString($c.toString()); +code"""$evPrim = CalendarInterval.fromString($c.toString()); if(${evPrim} == null) { ${evNull} = true; } """.stripMargin } - private[this] def decimalToTimestampCode(d: String): String = -s"($d.toBigDecimal().bigDecimal().multiply(new java.math.BigDecimal(100L))).longValue()" - private[this] def longToTimeStampCode(l: String): String = s"$l * 100L" - private[this] def timestampToIntegerCode(ts: String): String = -s"java.lang.Math.floor((double) $ts / 100L)" - private[this] def timestampToDoubleCode(ts: String): String = s"$ts / 100.0" + private[this] def decimalToTimestampCode(d: ExprValue): Block = { +val block = code"new java.math.BigDecimal(100L)" --- End diff -- `JavaCode.expression` is used to create `ExprValue` we need to track it in a code block. We don't need to track this `BigDecimal` variable. Here we just interpolate it into next code block. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r209353035 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -1024,26 +1033,29 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToIntervalCode(from: DataType): CastFunction = from match { case StringType => (c, evPrim, evNull) => -s"""$evPrim = CalendarInterval.fromString($c.toString()); +code"""$evPrim = CalendarInterval.fromString($c.toString()); if(${evPrim} == null) { ${evNull} = true; } """.stripMargin } - private[this] def decimalToTimestampCode(d: String): String = -s"($d.toBigDecimal().bigDecimal().multiply(new java.math.BigDecimal(100L))).longValue()" - private[this] def longToTimeStampCode(l: String): String = s"$l * 100L" - private[this] def timestampToIntegerCode(ts: String): String = -s"java.lang.Math.floor((double) $ts / 100L)" - private[this] def timestampToDoubleCode(ts: String): String = s"$ts / 100.0" + private[this] def decimalToTimestampCode(d: ExprValue): Block = { +val block = code"new java.math.BigDecimal(100L)" --- End diff -- maybe a `JavaCode.expression` then for this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r202839264 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -1024,26 +1033,29 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToIntervalCode(from: DataType): CastFunction = from match { case StringType => (c, evPrim, evNull) => -s"""$evPrim = CalendarInterval.fromString($c.toString()); +code"""$evPrim = CalendarInterval.fromString($c.toString()); if(${evPrim} == null) { ${evNull} = true; } """.stripMargin } - private[this] def decimalToTimestampCode(d: String): String = -s"($d.toBigDecimal().bigDecimal().multiply(new java.math.BigDecimal(100L))).longValue()" - private[this] def longToTimeStampCode(l: String): String = s"$l * 100L" - private[this] def timestampToIntegerCode(ts: String): String = -s"java.lang.Math.floor((double) $ts / 100L)" - private[this] def timestampToDoubleCode(ts: String): String = s"$ts / 100.0" + private[this] def decimalToTimestampCode(d: ExprValue): Block = { +val block = code"new java.math.BigDecimal(100L)" --- End diff -- This piece of code that creates a `BigDecimal` object is simply used as an argument to `multiply` method. I think we may not need to track it as an `ExprValue`. Actually I also think here we can just `inline` it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r202816019 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -1024,26 +1033,29 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToIntervalCode(from: DataType): CastFunction = from match { case StringType => (c, evPrim, evNull) => -s"""$evPrim = CalendarInterval.fromString($c.toString()); +code"""$evPrim = CalendarInterval.fromString($c.toString()); if(${evPrim} == null) { ${evNull} = true; } """.stripMargin } - private[this] def decimalToTimestampCode(d: String): String = -s"($d.toBigDecimal().bigDecimal().multiply(new java.math.BigDecimal(100L))).longValue()" - private[this] def longToTimeStampCode(l: String): String = s"$l * 100L" - private[this] def timestampToIntegerCode(ts: String): String = -s"java.lang.Math.floor((double) $ts / 100L)" - private[this] def timestampToDoubleCode(ts: String): String = s"$ts / 100.0" + private[this] def decimalToTimestampCode(d: ExprValue): Block = { +val block = code"new java.math.BigDecimal(100L)" --- End diff -- Does it matter? I think a literal is a piece of code (small) which is a left value and does not contain any reference to any reference to variables and can be put anywhere in the code returning always the same result. In the use case of splitting function, for instance, I imagine a literal is just reused everywhere needed without need to pass it to the new methods. So for these reasons IMHO this is a literal. WDYT? Also @cloud-fan WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r202800594 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -1024,26 +1033,29 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToIntervalCode(from: DataType): CastFunction = from match { case StringType => (c, evPrim, evNull) => -s"""$evPrim = CalendarInterval.fromString($c.toString()); +code"""$evPrim = CalendarInterval.fromString($c.toString()); if(${evPrim} == null) { ${evNull} = true; } """.stripMargin } - private[this] def decimalToTimestampCode(d: String): String = -s"($d.toBigDecimal().bigDecimal().multiply(new java.math.BigDecimal(100L))).longValue()" - private[this] def longToTimeStampCode(l: String): String = s"$l * 100L" - private[this] def timestampToIntegerCode(ts: String): String = -s"java.lang.Math.floor((double) $ts / 100L)" - private[this] def timestampToDoubleCode(ts: String): String = s"$ts / 100.0" + private[this] def decimalToTimestampCode(d: ExprValue): Block = { +val block = code"new java.math.BigDecimal(100L)" --- End diff -- It looks like a statement to create `BigDecimal` object instead of just a `BigDecimal` literal? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r202800647 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -196,7 +221,7 @@ object Block { EmptyBlock } else { args.foreach { - case _: ExprValue => + case _: ExprValue | _: Inline => --- End diff -- Moved. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r202778928 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -825,43 +832,43 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToStringCode(from: DataType, ctx: CodegenContext): CastFunction = { from match { case BinaryType => -(c, evPrim, evNull) => s"$evPrim = UTF8String.fromBytes($c);" +(c, evPrim, evNull) => code"$evPrim = UTF8String.fromBytes($c);" case DateType => -(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( +(c, evPrim, evNull) => code"""$evPrim = UTF8String.fromString( org.apache.spark.sql.catalyst.util.DateTimeUtils.dateToString($c));""" case TimestampType => -val tz = ctx.addReferenceObj("timeZone", timeZone) -(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( +val tz = JavaCode.global(ctx.addReferenceObj("timeZone", timeZone), timeZone.getClass) --- End diff -- It should be the same case as `ctx.addNewFunction`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r202778833 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -740,31 +739,37 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private def writeMapToStringBuilder( kt: DataType, vt: DataType, - map: String, - buffer: String, - ctx: CodegenContext): String = { + map: ExprValue, + buffer: ExprValue, + ctx: CodegenContext): Block = { def dataToStringFunc(func: String, dataType: DataType) = { val funcName = ctx.freshName(func) val dataToStringCode = castToStringCode(dataType, ctx) - ctx.addNewFunction(funcName, + val data = JavaCode.variable("data", dataType) + val dataStr = JavaCode.variable("dataStr", StringType) + val functionCall = ctx.addNewFunction(funcName, --- End diff -- We need to do it later. `ctx.addNewFunction` is used by too many places. We need to change its return type at all. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r202743580 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -1024,26 +1033,29 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToIntervalCode(from: DataType): CastFunction = from match { case StringType => (c, evPrim, evNull) => -s"""$evPrim = CalendarInterval.fromString($c.toString()); +code"""$evPrim = CalendarInterval.fromString($c.toString()); if(${evPrim} == null) { ${evNull} = true; } """.stripMargin } - private[this] def decimalToTimestampCode(d: String): String = -s"($d.toBigDecimal().bigDecimal().multiply(new java.math.BigDecimal(100L))).longValue()" - private[this] def longToTimeStampCode(l: String): String = s"$l * 100L" - private[this] def timestampToIntegerCode(ts: String): String = -s"java.lang.Math.floor((double) $ts / 100L)" - private[this] def timestampToDoubleCode(ts: String): String = s"$ts / 100.0" + private[this] def decimalToTimestampCode(d: ExprValue): Block = { +val block = code"new java.math.BigDecimal(100L)" --- End diff -- nit: why isn't this a literal? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r202718372 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -740,31 +739,37 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private def writeMapToStringBuilder( kt: DataType, vt: DataType, - map: String, - buffer: String, - ctx: CodegenContext): String = { + map: ExprValue, + buffer: ExprValue, + ctx: CodegenContext): Block = { def dataToStringFunc(func: String, dataType: DataType) = { val funcName = ctx.freshName(func) val dataToStringCode = castToStringCode(dataType, ctx) - ctx.addNewFunction(funcName, + val data = JavaCode.variable("data", dataType) + val dataStr = JavaCode.variable("dataStr", StringType) + val functionCall = ctx.addNewFunction(funcName, --- End diff -- can we make `ctx.addNewFunction` return JavaCode? Or we will do it later? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r202597179 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -196,7 +221,7 @@ object Block { EmptyBlock } else { args.foreach { - case _: ExprValue => + case _: ExprValue | _: Inline => --- End diff -- nit: also move `_: Block` here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r202718932 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -825,43 +832,43 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToStringCode(from: DataType, ctx: CodegenContext): CastFunction = { from match { case BinaryType => -(c, evPrim, evNull) => s"$evPrim = UTF8String.fromBytes($c);" +(c, evPrim, evNull) => code"$evPrim = UTF8String.fromBytes($c);" case DateType => -(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( +(c, evPrim, evNull) => code"""$evPrim = UTF8String.fromString( org.apache.spark.sql.catalyst.util.DateTimeUtils.dateToString($c));""" case TimestampType => -val tz = ctx.addReferenceObj("timeZone", timeZone) -(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( +val tz = JavaCode.global(ctx.addReferenceObj("timeZone", timeZone), timeZone.getClass) --- End diff -- ditto, will `ctx.addReferenceObj` return JavaCode eventually? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r202582231 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -720,31 +719,36 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private def writeMapToStringBuilder( kt: DataType, vt: DataType, - map: String, - buffer: String, - ctx: CodegenContext): String = { + map: ExprValue, + buffer: ExprValue, + ctx: CodegenContext): Block = { def dataToStringFunc(func: String, dataType: DataType) = { val funcName = ctx.freshName(func) val dataToStringCode = castToStringCode(dataType, ctx) + val data = JavaCode.variable("data", dataType) + val dataStr = JavaCode.variable("dataStr", StringType) ctx.addNewFunction(funcName, --- End diff -- Ok. Sounds good. Will update later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r202271473 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -579,6 +579,18 @@ class CodegenContext { s"${fullName}_$id" } + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshVariable(name: String, dt: DataType): VariableValue = +JavaCode.variable(freshName(name), dt) + + /** + * Creates an `ExprValue` representing a local java variable of required data type. --- End diff -- nit: `data type` -> `Java class` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r202269577 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -720,31 +719,36 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private def writeMapToStringBuilder( kt: DataType, vt: DataType, - map: String, - buffer: String, - ctx: CodegenContext): String = { + map: ExprValue, + buffer: ExprValue, + ctx: CodegenContext): Block = { def dataToStringFunc(func: String, dataType: DataType) = { val funcName = ctx.freshName(func) val dataToStringCode = castToStringCode(dataType, ctx) + val data = JavaCode.variable("data", dataType) + val dataStr = JavaCode.variable("dataStr", StringType) ctx.addNewFunction(funcName, --- End diff -- Since this method `dataToStringFunc()` is not used in other files, it would be good to address it in this PR. WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r200563441 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -1004,26 +1012,29 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToIntervalCode(from: DataType): CastFunction = from match { case StringType => (c, evPrim, evNull) => -s"""$evPrim = CalendarInterval.fromString($c.toString()); +code"""$evPrim = CalendarInterval.fromString($c.toString()); if(${evPrim} == null) { ${evNull} = true; } """.stripMargin } - private[this] def decimalToTimestampCode(d: String): String = -s"($d.toBigDecimal().bigDecimal().multiply(new java.math.BigDecimal(100L))).longValue()" - private[this] def longToTimeStampCode(l: String): String = s"$l * 100L" - private[this] def timestampToIntegerCode(ts: String): String = -s"java.lang.Math.floor((double) $ts / 100L)" - private[this] def timestampToDoubleCode(ts: String): String = s"$ts / 100.0" + private[this] def decimalToTimestampCode(d: ExprValue): Block = { +val block = code"new java.math.BigDecimal(100L)" +code"($d.toBigDecimal().bigDecimal().multiply($block)).longValue()" + } + private[this] def longToTimeStampCode(l: ExprValue): Block = code"$l * 100L" + private[this] def timestampToIntegerCode(ts: ExprValue): Block = +code"java.lang.Math.floor((double) $ts / 100L)" + private[this] def timestampToDoubleCode(ts: ExprValue): Block = +code"$ts / 100.0" private[this] def castToBooleanCode(from: DataType): CastFunction = from match { case StringType => - val stringUtils = StringUtils.getClass.getName.stripSuffix("$") + val stringUtils = inline"${StringUtils.getClass.getName.stripSuffix("$")}" --- End diff -- I made inline not a `Block` but just a `JavaCode` now. After rethinking it, inline is used for embedding a piece of string into code block because we don't allow direct string interpolation. So inline as a `Block` doesn't make such sense. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r197352302 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -256,6 +283,22 @@ object EmptyBlock extends Block with Serializable { override def + (other: Block): Block = other } +/** + * A block inlines all types of input arguments into a string without + * tracking any reference of `JavaCode` instances. + */ +case class InlineBlock(block: String) extends Block { + override val code: String = block + override val exprValues: Set[ExprValue] = Set.empty + + override def + (other: Block): Block = other match { +case c: CodeBlock => Blocks(Seq(this, c)) +case i: InlineBlock => InlineBlock(block + i.block) +case b: Blocks => Blocks(Seq(this) ++ b.blocks) --- End diff -- Ok. I will do that PR first. Will ping you on the PR when it's ready. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r197352254 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -1004,26 +1012,29 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToIntervalCode(from: DataType): CastFunction = from match { case StringType => (c, evPrim, evNull) => -s"""$evPrim = CalendarInterval.fromString($c.toString()); +code"""$evPrim = CalendarInterval.fromString($c.toString()); if(${evPrim} == null) { ${evNull} = true; } """.stripMargin } - private[this] def decimalToTimestampCode(d: String): String = -s"($d.toBigDecimal().bigDecimal().multiply(new java.math.BigDecimal(100L))).longValue()" - private[this] def longToTimeStampCode(l: String): String = s"$l * 100L" - private[this] def timestampToIntegerCode(ts: String): String = -s"java.lang.Math.floor((double) $ts / 100L)" - private[this] def timestampToDoubleCode(ts: String): String = s"$ts / 100.0" + private[this] def decimalToTimestampCode(d: ExprValue): Block = { +val block = code"new java.math.BigDecimal(100L)" +code"($d.toBigDecimal().bigDecimal().multiply($block)).longValue()" + } + private[this] def longToTimeStampCode(l: ExprValue): Block = code"$l * 100L" + private[this] def timestampToIntegerCode(ts: ExprValue): Block = +code"java.lang.Math.floor((double) $ts / 100L)" + private[this] def timestampToDoubleCode(ts: ExprValue): Block = +code"$ts / 100.0" private[this] def castToBooleanCode(from: DataType): CastFunction = from match { case StringType => - val stringUtils = StringUtils.getClass.getName.stripSuffix("$") + val stringUtils = inline"${StringUtils.getClass.getName.stripSuffix("$")}" --- End diff -- inline is just used as a wrapper for string, as we disallow silent string interpolation. We expand the content of an inline into string into a code block. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r197245629 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -256,6 +283,22 @@ object EmptyBlock extends Block with Serializable { override def + (other: Block): Block = other } +/** + * A block inlines all types of input arguments into a string without + * tracking any reference of `JavaCode` instances. + */ +case class InlineBlock(block: String) extends Block { + override val code: String = block + override val exprValues: Set[ExprValue] = Set.empty + + override def + (other: Block): Block = other match { +case c: CodeBlock => Blocks(Seq(this, c)) +case i: InlineBlock => InlineBlock(block + i.block) +case b: Blocks => Blocks(Seq(this) ++ b.blocks) --- End diff -- shall we do that PR first? I feel it's easier to review this PR after we clean up the `Block` framework. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r197245172 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -1004,26 +1012,29 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToIntervalCode(from: DataType): CastFunction = from match { case StringType => (c, evPrim, evNull) => -s"""$evPrim = CalendarInterval.fromString($c.toString()); +code"""$evPrim = CalendarInterval.fromString($c.toString()); if(${evPrim} == null) { ${evNull} = true; } """.stripMargin } - private[this] def decimalToTimestampCode(d: String): String = -s"($d.toBigDecimal().bigDecimal().multiply(new java.math.BigDecimal(100L))).longValue()" - private[this] def longToTimeStampCode(l: String): String = s"$l * 100L" - private[this] def timestampToIntegerCode(ts: String): String = -s"java.lang.Math.floor((double) $ts / 100L)" - private[this] def timestampToDoubleCode(ts: String): String = s"$ts / 100.0" + private[this] def decimalToTimestampCode(d: ExprValue): Block = { +val block = code"new java.math.BigDecimal(100L)" +code"($d.toBigDecimal().bigDecimal().multiply($block)).longValue()" + } + private[this] def longToTimeStampCode(l: ExprValue): Block = code"$l * 100L" + private[this] def timestampToIntegerCode(ts: ExprValue): Block = +code"java.lang.Math.floor((double) $ts / 100L)" + private[this] def timestampToDoubleCode(ts: ExprValue): Block = +code"$ts / 100.0" private[this] def castToBooleanCode(from: DataType): CastFunction = from match { case StringType => - val stringUtils = StringUtils.getClass.getName.stripSuffix("$") + val stringUtils = inline"${StringUtils.getClass.getName.stripSuffix("$")}" --- End diff -- what's the difference between inline and code? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r197079333 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -256,6 +283,22 @@ object EmptyBlock extends Block with Serializable { override def + (other: Block): Block = other } +/** + * A block inlines all types of input arguments into a string without + * tracking any reference of `JavaCode` instances. + */ +case class InlineBlock(block: String) extends Block { + override val code: String = block + override val exprValues: Set[ExprValue] = Set.empty + + override def + (other: Block): Block = other match { +case c: CodeBlock => Blocks(Seq(this, c)) +case i: InlineBlock => InlineBlock(block + i.block) --- End diff -- as of now probably we don't have... @cloud-fan @kiszk what do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r197065708 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -256,6 +283,22 @@ object EmptyBlock extends Block with Serializable { override def + (other: Block): Block = other } +/** + * A block inlines all types of input arguments into a string without + * tracking any reference of `JavaCode` instances. + */ +case class InlineBlock(block: String) extends Block { + override val code: String = block + override val exprValues: Set[ExprValue] = Set.empty + + override def + (other: Block): Block = other match { +case c: CodeBlock => Blocks(Seq(this, c)) +case i: InlineBlock => InlineBlock(block + i.block) --- End diff -- That said, I don't think we should have an usage like: ``` val inline1 = InlineBlock(".") val inline2 = inline1 + InlineBlock(".") code""" | $inline2 | ... """.stripMargin ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r196808968 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -113,6 +113,21 @@ object JavaCode { def isNullExpression(code: String): SimpleExprValue = { expression(code, BooleanType) } + + /** + * Create an `InlineBlock` for Java Class name. + */ + def className(javaClass: Class[_]): InlineBlock = InlineBlock(javaClass.getName) --- End diff -- Ok. Looks good to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r196807818 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -113,6 +113,21 @@ object JavaCode { def isNullExpression(code: String): SimpleExprValue = { expression(code, BooleanType) } + + /** + * Create an `InlineBlock` for Java Class name. + */ + def className(javaClass: Class[_]): InlineBlock = InlineBlock(javaClass.getName) --- End diff -- what about `javaType` as the method below? They return the same thing, just with a different input, don't they? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r196769413 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -113,6 +113,21 @@ object JavaCode { def isNullExpression(code: String): SimpleExprValue = { expression(code, BooleanType) } + + /** + * Create an `InlineBlock` for Java Class name. + */ + def className(javaClass: Class[_]): InlineBlock = InlineBlock(javaClass.getName) --- End diff -- Is `javaClass` better for you? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r196769133 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -256,6 +283,22 @@ object EmptyBlock extends Block with Serializable { override def + (other: Block): Block = other } +/** + * A block inlines all types of input arguments into a string without + * tracking any reference of `JavaCode` instances. + */ +case class InlineBlock(block: String) extends Block { + override val code: String = block + override val exprValues: Set[ExprValue] = Set.empty + + override def + (other: Block): Block = other match { +case c: CodeBlock => Blocks(Seq(this, c)) +case i: InlineBlock => InlineBlock(block + i.block) --- End diff -- I think we won't have too many cases of concatenating `InlineBlock`. You can see `InlineBlock` is mostly used to wrap a small piece of code like a java class name. I'm not sure if we need to add a newline here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r196029083 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -256,6 +283,22 @@ object EmptyBlock extends Block with Serializable { override def + (other: Block): Block = other } +/** + * A block inlines all types of input arguments into a string without + * tracking any reference of `JavaCode` instances. + */ +case class InlineBlock(block: String) extends Block { + override val code: String = block + override val exprValues: Set[ExprValue] = Set.empty + + override def + (other: Block): Block = other match { +case c: CodeBlock => Blocks(Seq(this, c)) +case i: InlineBlock => InlineBlock(block + i.block) +case b: Blocks => Blocks(Seq(this) ++ b.blocks) --- End diff -- After discussed with @cloud-fan, I think this abstraction `Blocks` is not necessary. It can be replaced with `CodeBlock`. I will submit a PR to do that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r196024658 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -256,6 +283,22 @@ object EmptyBlock extends Block with Serializable { override def + (other: Block): Block = other } +/** + * A block inlines all types of input arguments into a string without + * tracking any reference of `JavaCode` instances. + */ +case class InlineBlock(block: String) extends Block { + override val code: String = block + override val exprValues: Set[ExprValue] = Set.empty + + override def + (other: Block): Block = other match { +case c: CodeBlock => Blocks(Seq(this, c)) +case i: InlineBlock => InlineBlock(block + i.block) --- End diff -- does this mean that if we have to consecutive `InlineBlock`s we are concatenating them without any space? Is this ok? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r196025077 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -256,6 +283,22 @@ object EmptyBlock extends Block with Serializable { override def + (other: Block): Block = other } +/** + * A block inlines all types of input arguments into a string without + * tracking any reference of `JavaCode` instances. + */ +case class InlineBlock(block: String) extends Block { + override val code: String = block + override val exprValues: Set[ExprValue] = Set.empty + + override def + (other: Block): Block = other match { +case c: CodeBlock => Blocks(Seq(this, c)) +case i: InlineBlock => InlineBlock(block + i.block) +case b: Blocks => Blocks(Seq(this) ++ b.blocks) --- End diff -- nit: `this +: b.blocks`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r196025306 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -113,6 +113,21 @@ object JavaCode { def isNullExpression(code: String): SimpleExprValue = { expression(code, BooleanType) } + + /** + * Create an `InlineBlock` for Java Class name. + */ + def className(javaClass: Class[_]): InlineBlock = InlineBlock(javaClass.getName) --- End diff -- I am not sure about the names of these 3 methods... They are doing kind of the same thing but this name is pretty different form the other 2... What do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195628609 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -720,31 +719,36 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private def writeMapToStringBuilder( kt: DataType, vt: DataType, - map: String, - buffer: String, - ctx: CodegenContext): String = { + map: ExprValue, + buffer: ExprValue, + ctx: CodegenContext): Block = { def dataToStringFunc(func: String, dataType: DataType) = { val funcName = ctx.freshName(func) val dataToStringCode = castToStringCode(dataType, ctx) + val data = JavaCode.variable("data", dataType) + val dataStr = JavaCode.variable("dataStr", StringType) ctx.addNewFunction(funcName, --- End diff -- Maybe we can consider this in follow-ups. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195626998 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -155,6 +170,17 @@ object Block { val CODE_BLOCK_BUFFER_LENGTH: Int = 512 + /** + * A custom string interpolator which inlines all types of input arguments into a string without --- End diff -- I will also do some improvement of document. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195626960 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -155,6 +170,17 @@ object Block { val CODE_BLOCK_BUFFER_LENGTH: Int = 512 + /** + * A custom string interpolator which inlines all types of input arguments into a string without --- End diff -- I think part of this comment can be moved to `InlineBlock`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195356119 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -805,43 +811,43 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToStringCode(from: DataType, ctx: CodegenContext): CastFunction = { from match { case BinaryType => -(c, evPrim, evNull) => s"$evPrim = UTF8String.fromBytes($c);" +(c, evPrim, evNull) => code"$evPrim = UTF8String.fromBytes($c);" case DateType => -(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( +(c, evPrim, evNull) => code"""$evPrim = UTF8String.fromString( org.apache.spark.sql.catalyst.util.DateTimeUtils.dateToString($c));""" case TimestampType => -val tz = ctx.addReferenceObj("timeZone", timeZone) -(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( +val tz = JavaCode.global(ctx.addReferenceObj("timeZone", timeZone), timeZone.getClass) +(c, evPrim, evNull) => code"""$evPrim = UTF8String.fromString( org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($c, $tz));""" case ArrayType(et, _) => (c, evPrim, evNull) => { - val buffer = ctx.freshName("buffer") - val bufferClass = classOf[UTF8StringBuilder].getName + val buffer = ctx.freshVariable("buffer", classOf[UTF8StringBuilder]) + val bufferClass = JavaCode.className(classOf[UTF8StringBuilder]) --- End diff -- It is fine with me to address this in another PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195333092 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -805,43 +811,43 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToStringCode(from: DataType, ctx: CodegenContext): CastFunction = { from match { case BinaryType => -(c, evPrim, evNull) => s"$evPrim = UTF8String.fromBytes($c);" +(c, evPrim, evNull) => code"$evPrim = UTF8String.fromBytes($c);" case DateType => -(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( +(c, evPrim, evNull) => code"""$evPrim = UTF8String.fromString( org.apache.spark.sql.catalyst.util.DateTimeUtils.dateToString($c));""" case TimestampType => -val tz = ctx.addReferenceObj("timeZone", timeZone) -(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( +val tz = JavaCode.global(ctx.addReferenceObj("timeZone", timeZone), timeZone.getClass) +(c, evPrim, evNull) => code"""$evPrim = UTF8String.fromString( org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($c, $tz));""" case ArrayType(et, _) => (c, evPrim, evNull) => { - val buffer = ctx.freshName("buffer") - val bufferClass = classOf[UTF8StringBuilder].getName + val buffer = ctx.freshVariable("buffer", classOf[UTF8StringBuilder]) + val bufferClass = JavaCode.className(classOf[UTF8StringBuilder]) --- End diff -- I think we could add a class `Variable` which `GlobalVariable` and `LocalVariable` inherit from having a `declare` method taking an optional parameter `initialValue` so we can just invoke it to declare each variable. But maybe this can also be a followup. What do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195328074 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -720,31 +719,36 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private def writeMapToStringBuilder( kt: DataType, vt: DataType, - map: String, - buffer: String, - ctx: CodegenContext): String = { + map: ExprValue, + buffer: ExprValue, + ctx: CodegenContext): Block = { def dataToStringFunc(func: String, dataType: DataType) = { val funcName = ctx.freshName(func) val dataToStringCode = castToStringCode(dataType, ctx) + val data = JavaCode.variable("data", dataType) + val dataStr = JavaCode.variable("dataStr", StringType) ctx.addNewFunction(funcName, --- End diff -- nit: probably we can return this as `inline`, so we don't have to put it everywhere we use it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195328479 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -155,6 +170,17 @@ object Block { val CODE_BLOCK_BUFFER_LENGTH: Int = 512 + /** + * A custom string interpolator which inlines all types of input arguments into a string without --- End diff -- nit: maybe this comment is better to be put to the `InlineBlock` class, do you agree? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195331403 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -805,43 +811,43 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToStringCode(from: DataType, ctx: CodegenContext): CastFunction = { from match { case BinaryType => -(c, evPrim, evNull) => s"$evPrim = UTF8String.fromBytes($c);" +(c, evPrim, evNull) => code"$evPrim = UTF8String.fromBytes($c);" case DateType => -(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( +(c, evPrim, evNull) => code"""$evPrim = UTF8String.fromString( org.apache.spark.sql.catalyst.util.DateTimeUtils.dateToString($c));""" case TimestampType => -val tz = ctx.addReferenceObj("timeZone", timeZone) -(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( +val tz = JavaCode.global(ctx.addReferenceObj("timeZone", timeZone), timeZone.getClass) +(c, evPrim, evNull) => code"""$evPrim = UTF8String.fromString( org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($c, $tz));""" case ArrayType(et, _) => (c, evPrim, evNull) => { - val buffer = ctx.freshName("buffer") - val bufferClass = classOf[UTF8StringBuilder].getName + val buffer = ctx.freshVariable("buffer", classOf[UTF8StringBuilder]) + val bufferClass = JavaCode.className(classOf[UTF8StringBuilder]) --- End diff -- Now, each variable defined by `freshVariable` has a type. We can get a type or its class name from the variable (e.g. `bufffer`). Therefore, it looks redundant to declare a name of each variable again (e.g. bufferClass). Do we have an API to get a type of the variable or define an API to get a name of the class? This is because this pattern is very common. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195307036 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -579,6 +579,22 @@ class CodegenContext { s"${fullName}_$id" } + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, dt: DataType): VariableValue = JavaCode.variable(freshName(name), dt) + + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, javaClass: Class[_]): VariableValue = +JavaCode.variable(freshName(name), javaClass) + + /** + * Creates an `ExprValue` representing a local boolean java variable. + */ + def isNullFreshName(name: String): VariableValue = JavaCode.isNullVariable(freshName(name)) --- End diff -- Ok. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195306844 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -579,6 +579,22 @@ class CodegenContext { s"${fullName}_$id" } + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, dt: DataType): VariableValue = JavaCode.variable(freshName(name), dt) + + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, javaClass: Class[_]): VariableValue = +JavaCode.variable(freshName(name), javaClass) + + /** + * Creates an `ExprValue` representing a local boolean java variable. + */ + def isNullFreshName(name: String): VariableValue = JavaCode.isNullVariable(freshName(name)) --- End diff -- `isNullFreshName` is new, we don't need it and can just call `freshName(name, BooleanType)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195306721 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -579,6 +579,22 @@ class CodegenContext { s"${fullName}_$id" } + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, dt: DataType): VariableValue = JavaCode.variable(freshName(name), dt) --- End diff -- oh I missed the ctx parameter thing, let's leave it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195284656 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -579,6 +579,22 @@ class CodegenContext { s"${fullName}_$id" } + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, dt: DataType): VariableValue = JavaCode.variable(freshName(name), dt) + + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, javaClass: Class[_]): VariableValue = +JavaCode.variable(freshName(name), javaClass) + + /** + * Creates an `ExprValue` representing a local boolean java variable. + */ + def isNullFreshName(name: String): VariableValue = JavaCode.isNullVariable(freshName(name)) --- End diff -- Rename `JavaCode.isNullVariable`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195284269 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -579,6 +579,22 @@ class CodegenContext { s"${fullName}_$id" } + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, dt: DataType): VariableValue = JavaCode.variable(freshName(name), dt) --- End diff -- Putting it in `CodeGenerator` is just for convenience that we can save a `ctx` parameter. I'm fine to let it in `JavaCode` too if you don't think it's verbose. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195184136 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -579,6 +579,22 @@ class CodegenContext { s"${fullName}_$id" } + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, dt: DataType): VariableValue = JavaCode.variable(freshName(name), dt) --- End diff -- how about we move it to `JavaCode` and follow the name convention there and call it `freshVariable`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195181950 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -579,6 +579,22 @@ class CodegenContext { s"${fullName}_$id" } + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, dt: DataType): VariableValue = JavaCode.variable(freshName(name), dt) + + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, javaClass: Class[_]): VariableValue = +JavaCode.variable(freshName(name), javaClass) + + /** + * Creates an `ExprValue` representing a local boolean java variable. + */ + def isNullFreshName(name: String): VariableValue = JavaCode.isNullVariable(freshName(name)) --- End diff -- change the new one in this PR but open another PR to rename `JavaCode.isNullVariable` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195051547 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -579,6 +579,22 @@ class CodegenContext { s"${fullName}_$id" } + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, dt: DataType): VariableValue = JavaCode.variable(freshName(name), dt) --- End diff -- I think either we decide to postpone the introduction of this API to a followup PR or we can change its name now that we are introducing it... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r194961281 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -625,25 +625,23 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String val eval = child.genCode(ctx) val nullSafeCast = nullSafeCastFunction(child.dataType, dataType, ctx) -ev.copy(code = +ev.copy(code = eval.code + code""" --- End diff -- Oh. Yes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r194960933 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -579,6 +579,22 @@ class CodegenContext { s"${fullName}_$id" } + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, dt: DataType): VariableValue = JavaCode.variable(freshName(name), dt) --- End diff -- Sounds good. Not sure if @cloud-fan wants me to change it in this PR or other https://github.com/apache/spark/pull/21537#discussion_r194841055. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r194958976 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -579,6 +579,22 @@ class CodegenContext { s"${fullName}_$id" } + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, dt: DataType): VariableValue = JavaCode.variable(freshName(name), dt) + + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, javaClass: Class[_]): VariableValue = +JavaCode.variable(freshName(name), javaClass) + + /** + * Creates an `ExprValue` representing a local boolean java variable. + */ + def isNullFreshName(name: String): VariableValue = JavaCode.isNullVariable(freshName(name)) --- End diff -- hmm, don't we want to do this in this PR? I can change it to `freshName(name, BooleanType)` together with the changes for other comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r194953280 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -579,6 +579,22 @@ class CodegenContext { s"${fullName}_$id" } + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, dt: DataType): VariableValue = JavaCode.variable(freshName(name), dt) + + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, javaClass: Class[_]): VariableValue = +JavaCode.variable(freshName(name), javaClass) + + /** + * Creates an `ExprValue` representing a local boolean java variable. + */ + def isNullFreshName(name: String): VariableValue = JavaCode.isNullVariable(freshName(name)) --- End diff -- yea, but we should open another PR to do this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r194946640 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -1004,26 +1014,30 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToIntervalCode(from: DataType): CastFunction = from match { case StringType => (c, evPrim, evNull) => -s"""$evPrim = CalendarInterval.fromString($c.toString()); +code"""$evPrim = CalendarInterval.fromString($c.toString()); if(${evPrim} == null) { ${evNull} = true; } """.stripMargin } - private[this] def decimalToTimestampCode(d: String): String = -s"($d.toBigDecimal().bigDecimal().multiply(new java.math.BigDecimal(100L))).longValue()" - private[this] def longToTimeStampCode(l: String): String = s"$l * 100L" - private[this] def timestampToIntegerCode(ts: String): String = -s"java.lang.Math.floor((double) $ts / 100L)" - private[this] def timestampToDoubleCode(ts: String): String = s"$ts / 100.0" + private[this] def decimalToTimestampCode(d: ExprValue): ExprValue = +JavaCode.expression( + s"($d.toBigDecimal().bigDecimal().multiply(new java.math.BigDecimal(100L))).longValue()", --- End diff -- hmm, here we should use `Block`. Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r194908404 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -579,6 +579,22 @@ class CodegenContext { s"${fullName}_$id" } + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, dt: DataType): VariableValue = JavaCode.variable(freshName(name), dt) + + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, javaClass: Class[_]): VariableValue = +JavaCode.variable(freshName(name), javaClass) + + /** + * Creates an `ExprValue` representing a local boolean java variable. + */ + def isNullFreshName(name: String): VariableValue = JavaCode.isNullVariable(freshName(name)) --- End diff -- haha, indeed. At least there is only one parameter. Maybe we should just use `freshName(name, BooleanType)`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r194841055 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -579,6 +579,22 @@ class CodegenContext { s"${fullName}_$id" } + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, dt: DataType): VariableValue = JavaCode.variable(freshName(name), dt) + + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, javaClass: Class[_]): VariableValue = +JavaCode.variable(freshName(name), javaClass) + + /** + * Creates an `ExprValue` representing a local boolean java variable. + */ + def isNullFreshName(name: String): VariableValue = JavaCode.isNullVariable(freshName(name)) --- End diff -- hmmm, now looking at this, `isNullFreshName(name)` doesn't save much typing compared to `freshName(name, BooleanType)`... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r194703775 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -1004,26 +1014,30 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToIntervalCode(from: DataType): CastFunction = from match { case StringType => (c, evPrim, evNull) => -s"""$evPrim = CalendarInterval.fromString($c.toString()); +code"""$evPrim = CalendarInterval.fromString($c.toString()); if(${evPrim} == null) { ${evNull} = true; } """.stripMargin } - private[this] def decimalToTimestampCode(d: String): String = -s"($d.toBigDecimal().bigDecimal().multiply(new java.math.BigDecimal(100L))).longValue()" - private[this] def longToTimeStampCode(l: String): String = s"$l * 100L" - private[this] def timestampToIntegerCode(ts: String): String = -s"java.lang.Math.floor((double) $ts / 100L)" - private[this] def timestampToDoubleCode(ts: String): String = s"$ts / 100.0" + private[this] def decimalToTimestampCode(d: ExprValue): ExprValue = +JavaCode.expression( + s"($d.toBigDecimal().bigDecimal().multiply(new java.math.BigDecimal(100L))).longValue()", --- End diff -- is it safe to do this? aren't we loosing the reference to `$d`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r194704155 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -579,6 +579,22 @@ class CodegenContext { s"${fullName}_$id" } + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshName(name: String, dt: DataType): VariableValue = JavaCode.variable(freshName(name), dt) --- End diff -- What about some better name like `getLocalVariable` or something similar? this is not just a fresh name... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r194697467 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -625,25 +625,23 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String val eval = child.genCode(ctx) val nullSafeCast = nullSafeCastFunction(child.dataType, dataType, ctx) -ev.copy(code = +ev.copy(code = eval.code + code""" --- End diff -- this can be removed, can't it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/21537 [SPARK-24505][SQL] Convert strings in codegen to blocks: Cast and BoundAttribute ## What changes were proposed in this pull request? This is split from #21520. This includes changes of `BoundAttribute` and `Cast`. This patch also adds few convenient APIs: ```scala CodeGenerator.freshName(name: String, dt: DataType): VariableValue CodeGenerator.freshName(name: String, javaClass: Class[_]): VariableValue CodeGenerator.isNullFreshName(name: String): VariableValue JavaCode.className(javaClass: Class[_]): InlineBlock JavaCode.javaType(dataType: DataType): InlineBlock JavaCode.boxedType(dataType: DataType): InlineBlock ``` ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-24505-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21537.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21537 commit 89d025225b557689389d16c207be8a25f5e82fa5 Author: Liang-Chi Hsieh Date: 2018-06-12T08:40:20Z Convert strings in codegen to blocks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org