[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21912 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r214392521 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -463,23 +463,35 @@ public static UnsafeArrayData fromPrimitiveArray( final long[] data = new long[(int)totalSizeInLongs]; Platform.putLong(data, Platform.LONG_ARRAY_OFFSET, length); -if (arr != null) { --- End diff -- Originally, this if is not here. It is fine with me to keep this for the safety. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r214332656 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -463,23 +463,35 @@ public static UnsafeArrayData fromPrimitiveArray( final long[] data = new long[(int)totalSizeInLongs]; Platform.putLong(data, Platform.LONG_ARRAY_OFFSET, length); -if (arr != null) { --- End diff -- shall we keep this if? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r214253824 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala --- @@ -34,6 +36,32 @@ object ArrayData { case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a) case other => new GenericArrayData(other) } + + + /** + * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given parameters. + * + * @param elementSize a size of an element in bytes + * @param numElements the number of elements the array should contain + * @param isPrimitiveType whether the type of an element is primitive type + * @param additionalErrorMessage string to include in the error message + */ + def allocateArrayData( + elementSize: Int, --- End diff -- ah it's called in the generated code. Maybe we can use elementSize `-1` to create a generic array. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r214253479 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala --- @@ -34,6 +36,32 @@ object ArrayData { case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a) case other => new GenericArrayData(other) } + + + /** + * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given parameters. + * + * @param elementSize a size of an element in bytes + * @param numElements the number of elements the array should contain + * @param isPrimitiveType whether the type of an element is primitive type + * @param additionalErrorMessage string to include in the error message + */ + def allocateArrayData( + elementSize: Int, --- End diff -- `elementSize` is only used when creating unsafe array. I think we just have a `elementSize: Option[Int]` and remove the `isPrimitiveType` parameter. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r214253006 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -452,6 +452,16 @@ public UnsafeArrayData copy() { public static UnsafeArrayData fromPrimitiveArray( Object arr, int offset, int length, int elementSize) { +UnsafeArrayData result = createFreshArray(length, elementSize); +final long headerInBytes = calculateHeaderPortionInBytes(length); +final long valueRegionInBytes = (long)elementSize * length; +final Object data = result.getBaseObject(); --- End diff -- now the data is `Object` instead of `long[]`. Can we duplicate the code for now and think of how to deduplicate them later? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r214145047 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -385,107 +385,124 @@ case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInp override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { nullSafeCodeGen(ctx, ev, c => { + val arrayData = ctx.freshName("arrayData") val numElements = ctx.freshName("numElements") val keys = ctx.freshName("keys") val values = ctx.freshName("values") val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType) val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType) + + val wordSize = UnsafeRow.WORD_SIZE + val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize * 2 + val (isPrimitive, elementSize) = if (isKeyPrimitive && isValuePrimitive) { +(false, structSize + wordSize) --- End diff -- Good catch, thank you!! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r213884198 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -385,107 +385,124 @@ case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInp override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { nullSafeCodeGen(ctx, ev, c => { + val arrayData = ctx.freshName("arrayData") val numElements = ctx.freshName("numElements") val keys = ctx.freshName("keys") val values = ctx.freshName("values") val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType) val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType) + + val wordSize = UnsafeRow.WORD_SIZE + val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize * 2 + val (isPrimitive, elementSize) = if (isKeyPrimitive && isValuePrimitive) { +(false, structSize + wordSize) --- End diff -- should be `true`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r213883747 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -452,6 +452,16 @@ public UnsafeArrayData copy() { public static UnsafeArrayData fromPrimitiveArray( Object arr, int offset, int length, int elementSize) { +UnsafeArrayData result = forPrimitiveArray(length, elementSize); +final long headerInBytes = calculateHeaderPortionInBytes(length); +final long valueRegionInBytes = (long)elementSize * length; +final Object data = result.getBaseObject(); +Platform.copyMemory(arr, offset, data, + Platform.LONG_ARRAY_OFFSET + headerInBytes, valueRegionInBytes); +return result; + } + + public static UnsafeArrayData forPrimitiveArray(int length, int elementSize) { --- End diff -- how about `createFreshArray`? It's not for primitive only, e.g. we use it to store a struct of fixed size fields. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r213707288 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -385,107 +385,120 @@ case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInp override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { nullSafeCodeGen(ctx, ev, c => { + val arrayData = ctx.freshName("arrayData") val numElements = ctx.freshName("numElements") val keys = ctx.freshName("keys") val values = ctx.freshName("values") val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType) val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType) + + val wordSize = UnsafeRow.WORD_SIZE + val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize * 2 + val elementSize = if (isKeyPrimitive && isValuePrimitive) { +Some(structSize + wordSize) + } else { +None + } + + val allocation = CodeGenerator.createArrayData(arrayData, childDataType.keyType, numElements, +s" $prettyName failed.", elementSize = elementSize) + val code = if (isKeyPrimitive && isValuePrimitive) { -genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements) +val genCodeForPrimitive = genCodeForPrimitiveElements( + ctx, arrayData, keys, values, ev.value, numElements, structSize) +s""" + |if ($arrayData instanceof UnsafeArrayData) { + | $genCodeForPrimitive + |} else { + | ${genCodeForAnyElements(ctx, arrayData, keys, values, ev.value, numElements)} + |} + """.stripMargin } else { -genCodeForAnyElements(ctx, keys, values, ev.value, numElements) +s"${genCodeForAnyElements(ctx, arrayData, keys, values, ev.value, numElements)}" } + s""" |final int $numElements = $c.numElements(); |final ArrayData $keys = $c.keyArray(); |final ArrayData $values = $c.valueArray(); + |$allocation |$code """.stripMargin }) } - private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z") + private def getKey(varName: String, index: String) = +CodeGenerator.getValue(varName, childDataType.keyType, index) - private def getValue(varName: String) = { -CodeGenerator.getValue(varName, childDataType.valueType, "z") - } + private def getValue(varName: String, index: String) = +CodeGenerator.getValue(varName, childDataType.valueType, index) private def genCodeForPrimitiveElements( ctx: CodegenContext, + arrayData: String, keys: String, values: String, - arrayData: String, - numElements: String): String = { -val unsafeRow = ctx.freshName("unsafeRow") + resultArrayData: String, + numElements: String, + structSize: Int): String = { val unsafeArrayData = ctx.freshName("unsafeArrayData") +val baseObject = ctx.freshName("baseObject") +val unsafeRow = ctx.freshName("unsafeRow") val structsOffset = ctx.freshName("structsOffset") +val offset = ctx.freshName("offset") +val z = ctx.freshName("z") val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes" val baseOffset = Platform.BYTE_ARRAY_OFFSET val wordSize = UnsafeRow.WORD_SIZE -val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize * 2 -val structSizeAsLong = structSize + "L" -val keyTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType) -val valueTypeName = CodeGenerator.primitiveTypeName(childDataType.valueType) - -val valueAssignment = s"$unsafeRow.set$valueTypeName(1, ${getValue(values)});" -val valueAssignmentChecked = if (childDataType.valueContainsNull) { - s""" - |if ($values.isNullAt(z)) { - | $unsafeRow.setNullAt(1); - |} else { - | $valueAssignment - |} - """.stripMargin -} else { - valueAssignment -} +val structSizeAsLong = s"${structSize}L" -val assignmentLoop = (byteArray: String) => - s""" - |final int $structsOffset = $calculateHeader($numElements) + $numElements * $wordSize; - |UnsafeRow $unsafeRow = new UnsafeRow(2); - |for (int z = 0; z < $numElements; z++) { - | long offset = $structsOffset + z * $structSizeAsLong;
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r213542353 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -385,107 +385,120 @@ case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInp override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { nullSafeCodeGen(ctx, ev, c => { + val arrayData = ctx.freshName("arrayData") val numElements = ctx.freshName("numElements") val keys = ctx.freshName("keys") val values = ctx.freshName("values") val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType) val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType) + + val wordSize = UnsafeRow.WORD_SIZE + val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize * 2 + val elementSize = if (isKeyPrimitive && isValuePrimitive) { +Some(structSize + wordSize) + } else { +None + } + + val allocation = CodeGenerator.createArrayData(arrayData, childDataType.keyType, numElements, --- End diff -- this is hacky. Actually we want to create an array of struct, but here we lied and say we want to create an array of key type. I think we should call `ArrayData.allocateArrayData` here directly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r213541643 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -385,107 +385,120 @@ case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInp override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { nullSafeCodeGen(ctx, ev, c => { + val arrayData = ctx.freshName("arrayData") val numElements = ctx.freshName("numElements") val keys = ctx.freshName("keys") val values = ctx.freshName("values") val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType) val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType) + + val wordSize = UnsafeRow.WORD_SIZE + val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize * 2 + val elementSize = if (isKeyPrimitive && isValuePrimitive) { +Some(structSize + wordSize) + } else { +None + } + + val allocation = CodeGenerator.createArrayData(arrayData, childDataType.keyType, numElements, +s" $prettyName failed.", elementSize = elementSize) + val code = if (isKeyPrimitive && isValuePrimitive) { -genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements) +val genCodeForPrimitive = genCodeForPrimitiveElements( + ctx, arrayData, keys, values, ev.value, numElements, structSize) +s""" + |if ($arrayData instanceof UnsafeArrayData) { + | $genCodeForPrimitive + |} else { + | ${genCodeForAnyElements(ctx, arrayData, keys, values, ev.value, numElements)} + |} + """.stripMargin } else { -genCodeForAnyElements(ctx, keys, values, ev.value, numElements) +s"${genCodeForAnyElements(ctx, arrayData, keys, values, ev.value, numElements)}" } + s""" |final int $numElements = $c.numElements(); |final ArrayData $keys = $c.keyArray(); |final ArrayData $values = $c.valueArray(); + |$allocation |$code """.stripMargin }) } - private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z") + private def getKey(varName: String, index: String) = +CodeGenerator.getValue(varName, childDataType.keyType, index) - private def getValue(varName: String) = { -CodeGenerator.getValue(varName, childDataType.valueType, "z") - } + private def getValue(varName: String, index: String) = +CodeGenerator.getValue(varName, childDataType.valueType, index) private def genCodeForPrimitiveElements( ctx: CodegenContext, + arrayData: String, keys: String, values: String, - arrayData: String, - numElements: String): String = { -val unsafeRow = ctx.freshName("unsafeRow") + resultArrayData: String, + numElements: String, + structSize: Int): String = { val unsafeArrayData = ctx.freshName("unsafeArrayData") +val baseObject = ctx.freshName("baseObject") +val unsafeRow = ctx.freshName("unsafeRow") val structsOffset = ctx.freshName("structsOffset") +val offset = ctx.freshName("offset") +val z = ctx.freshName("z") val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes" val baseOffset = Platform.BYTE_ARRAY_OFFSET val wordSize = UnsafeRow.WORD_SIZE -val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize * 2 -val structSizeAsLong = structSize + "L" -val keyTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType) -val valueTypeName = CodeGenerator.primitiveTypeName(childDataType.valueType) - -val valueAssignment = s"$unsafeRow.set$valueTypeName(1, ${getValue(values)});" -val valueAssignmentChecked = if (childDataType.valueContainsNull) { - s""" - |if ($values.isNullAt(z)) { - | $unsafeRow.setNullAt(1); - |} else { - | $valueAssignment - |} - """.stripMargin -} else { - valueAssignment -} +val structSizeAsLong = s"${structSize}L" -val assignmentLoop = (byteArray: String) => - s""" - |final int $structsOffset = $calculateHeader($numElements) + $numElements * $wordSize; - |UnsafeRow $unsafeRow = new UnsafeRow(2); - |for (int z = 0; z < $numElements; z++) { - | long offset = $structsOffset + z * $structSizeAsLong;
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r213540868 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -1490,6 +1423,63 @@ object CodeGenerator extends Logging { } } + /** + * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] based on + * given parameters. + * + * @param arrayName name of the array to create + * @param elementType data type of the elements in source array + * @param numElements code representing the number of elements the array should contain + * @param additionalErrorMessage string to include in the error message + * @param elementSize optional value which shows the size of an element of the allocated + *[[UnsafeArrayData]] or [[GenericArrayData]] + * + * @return code representing the allocation of [[ArrayData]] + */ + def createArrayData( + arrayName: String, + elementType: DataType, + numElements: String, + additionalErrorMessage: String, + elementSize: Option[Int] = None): String = { +val (isPrimitiveType, elemSize) = if (elementSize.isDefined) { + (false, elementSize.get) +} else { + (CodeGenerator.isPrimitiveType(elementType), elementType.defaultSize) +} + +s""" + |ArrayData $arrayName = ArrayData.allocateArrayData( + | $elemSize, $numElements, $isPrimitiveType, "$additionalErrorMessage"); + """.stripMargin + } + + /** + * Generates assignment code for an [[ArrayData]] --- End diff -- shall we mention that the returned code should be put inside a loop? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r213540562 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -1490,6 +1423,63 @@ object CodeGenerator extends Logging { } } + /** + * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] based on + * given parameters. + * + * @param arrayName name of the array to create + * @param elementType data type of the elements in source array + * @param numElements code representing the number of elements the array should contain + * @param additionalErrorMessage string to include in the error message + * @param elementSize optional value which shows the size of an element of the allocated + *[[UnsafeArrayData]] or [[GenericArrayData]] + * + * @return code representing the allocation of [[ArrayData]] + */ + def createArrayData( + arrayName: String, + elementType: DataType, + numElements: String, + additionalErrorMessage: String, + elementSize: Option[Int] = None): String = { +val (isPrimitiveType, elemSize) = if (elementSize.isDefined) { + (false, elementSize.get) +} else { + (CodeGenerator.isPrimitiveType(elementType), elementType.defaultSize) +} + +s""" + |ArrayData $arrayName = ArrayData.allocateArrayData( + | $elemSize, $numElements, $isPrimitiveType, "$additionalErrorMessage"); + """.stripMargin + } + + /** + * Generates assignment code for an [[ArrayData]] + * + * @param arrayName name of the array to create + * @param elementType data type of the elements in destination and source arrays + * @param srcArray code representing the number of elements the array should contain --- End diff -- rename it to `length`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r213540440 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala --- @@ -1314,9 +1314,9 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper } test("Array Distinct") { -val a0 = Literal.create(Seq(2, 1, 2, 3, 4, 4, 5), ArrayType(IntegerType)) +val a0 = Literal.create(Seq(2, 1, 2, 3, 4, 4, 5), ArrayType(IntegerType, false)) --- End diff -- why this change? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r213540247 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -473,13 +473,13 @@ public static UnsafeArrayData fromPrimitiveArray( return result; } - public static UnsafeArrayData forPrimitiveArray(int offset, int length, int elementSize) { -return fromPrimitiveArray(null, offset, length, elementSize); + public static UnsafeArrayData forPrimitiveArray(int length, int elementSize) { +return fromPrimitiveArray(null, 0, length, elementSize); --- End diff -- so this is used to create a new unsafe array with no data. This saves duplicated code, but we will do unnecessary memory copy. Shall we just add a new method in `UnsafeArrayData` to create a fresh array? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r213539798 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -1490,6 +1423,63 @@ object CodeGenerator extends Logging { } } + /** + * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] based on + * given parameters. + * + * @param arrayName name of the array to create + * @param elementType data type of the elements in source array + * @param numElements code representing the number of elements the array should contain + * @param additionalErrorMessage string to include in the error message + * @param elementSize optional value which shows the size of an element of the allocated --- End diff -- it's better to give an example when we will set it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r213539696 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -473,13 +473,13 @@ public static UnsafeArrayData fromPrimitiveArray( return result; } - public static UnsafeArrayData forPrimitiveArray(int offset, int length, int elementSize) { -return fromPrimitiveArray(null, offset, length, elementSize); + public static UnsafeArrayData forPrimitiveArray(int length, int elementSize) { +return fromPrimitiveArray(null, 0, length, elementSize); --- End diff -- is it safe? I vaguely remember the address of offheap memory block is usually not 0. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r211126450 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -735,70 +735,98 @@ class CodegenContext { } /** - * Generates code creating a [[UnsafeArrayData]]. + * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] based on + * given parameters. * * @param arrayName name of the array to create + * @param elementType data type of the elements in source array * @param numElements code representing the number of elements the array should contain - * @param elementType data type of the elements in the array * @param additionalErrorMessage string to include in the error message + * @param elementSize optional value which shows the size of an element of the allocated + *[[UnsafeArrayData]] or [[GenericArrayData]] + * + * @return code representing the allocation of [[ArrayData]] + * code representing a setter of an assignment for the generated array */ - def createUnsafeArray( + def createArrayData( --- End diff -- I think it would be good to have this since this utility method will be used at other places where `new UnsafeArrayData` or `new GenericArrayData` is generated as Java code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r211096868 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala --- @@ -34,6 +36,32 @@ object ArrayData { case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a) case other => new GenericArrayData(other) } + + + /** + * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given parameters. + * + * @param elementSize a size of an element in bytes + * @param numElements the number of elements the array should contain + * @param isPrimitiveType whether the type of an element is primitive type + * @param additionalErrorMessage string to include in the error message + */ + def allocateArrayData( + elementSize: Int, + numElements : Long, + isPrimitiveType: Boolean, + additionalErrorMessage: String) : ArrayData = { +if (isPrimitiveType && !UnsafeArrayData.shouldUseGenericArrayData(elementSize, numElements)) { + UnsafeArrayData.forPrimitiveArray(Platform.BYTE_ARRAY_OFFSET, numElements.toInt, elementSize) --- End diff -- Good catch --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r210543909 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala --- @@ -34,6 +36,32 @@ object ArrayData { case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a) case other => new GenericArrayData(other) } + + + /** + * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given parameters. + * + * @param elementSize a size of an element in bytes + * @param numElements the number of elements the array should contain + * @param isPrimitiveType whether the type of an element is primitive type + * @param additionalErrorMessage string to include in the error message + */ + def allocateArrayData( + elementSize: Int, + numElements : Long, --- End diff -- nit: extra space after `numElements`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r210543965 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala --- @@ -34,6 +36,32 @@ object ArrayData { case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a) case other => new GenericArrayData(other) } + + + /** + * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given parameters. + * + * @param elementSize a size of an element in bytes + * @param numElements the number of elements the array should contain + * @param isPrimitiveType whether the type of an element is primitive type + * @param additionalErrorMessage string to include in the error message + */ + def allocateArrayData( + elementSize: Int, + numElements : Long, + isPrimitiveType: Boolean, + additionalErrorMessage: String) : ArrayData = { --- End diff -- nit: extra space after `)`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r210539389 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -735,70 +735,92 @@ class CodegenContext { } /** - * Generates code creating a [[UnsafeArrayData]]. + * Generates code of setter for an [[ArrayData]] + * + * @param elementType data type of the elements in an [[ArrayData]] + * + * @return code representing a setter of an assignment for an [[ArrayData]] + */ + def setArrayElementFunc(dataType: DataType): String = { +val isPrimitiveType = CodeGenerator.isPrimitiveType(dataType) +if (isPrimitiveType) { + s"set${CodeGenerator.primitiveTypeName(dataType)}" +} else { + "update" +} + } + + /** + * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] based on + * given parameters. * * @param arrayName name of the array to create + * @param elementType data type of the elements in source array * @param numElements code representing the number of elements the array should contain - * @param elementType data type of the elements in the array * @param additionalErrorMessage string to include in the error message + * @param elementSize optional value which shows the size of an element of the allocated + *[[UnsafeArrayData]] or [[GenericArrayData]] + * + * @return code representing the allocation of [[ArrayData]] + * code representing a setter of an assignment for the generated array */ - def createUnsafeArray( + def createArrayData( arrayName: String, - numElements: String, elementType: DataType, - additionalErrorMessage: String): String = { -val arraySize = freshName("size") -val arrayBytes = freshName("arrayBytes") + numElements: String, + additionalErrorMessage: String, + elementSize: Option[Int] = None): String = { +val (isPrimitiveType, elemSize) = if (elementSize.isDefined) { + (false, elementSize.get) +} else { + (CodeGenerator.isPrimitiveType(elementType), elementType.defaultSize) +} s""" - |long $arraySize = UnsafeArrayData.calculateSizeOfUnderlyingByteArray( - | $numElements, - | ${elementType.defaultSize}); - |if ($arraySize > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) { - | throw new RuntimeException("Unsuccessful try create array with " + $arraySize + - |" bytes of data due to exceeding the limit " + - |"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH} bytes for UnsafeArrayData." + - |"$additionalErrorMessage"); - |} - |byte[] $arrayBytes = new byte[(int)$arraySize]; - |UnsafeArrayData $arrayName = new UnsafeArrayData(); - |Platform.putLong($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, $numElements); - |$arrayName.pointTo($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, (int)$arraySize); - """.stripMargin + |ArrayData $arrayName = ArrayData.allocateArrayData( + | $elemSize, $numElements, $isPrimitiveType, "$additionalErrorMessage"); + """.stripMargin } /** - * Generates code creating a [[UnsafeArrayData]]. The generated code executes - * a provided fallback when the size of backing array would exceed the array size limit. - * @param arrayName a name of the array to create - * @param numElements a piece of code representing the number of elements the array should contain - * @param elementSize a size of an element in bytes - * @param bodyCode a function generating code that fills up the [[UnsafeArrayData]] - * and getting the backing array as a parameter - * @param fallbackCode a piece of code executed when the array size limit is exceeded + * Generates assignment code for a [[ArrayData]] + * + * @param arrayName name of the array to create + * @param setFunc string to include in the error message + * @param elementType data type of the elements in source array + * @param srcArray code representing the number of elements the array should contain + * @param needNullCheck value which shows whether a nullcheck is required for the returning + * assignment + * @param rhsValue optional expression for the right-hand side of the returning assignment + * + * @return code representing an assignment to each element of the [[ArrayData]], which requires + * a pair of destination and source loop index variables */ - def
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r210539249 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -735,70 +735,92 @@ class CodegenContext { } /** - * Generates code creating a [[UnsafeArrayData]]. + * Generates code of setter for an [[ArrayData]] + * + * @param elementType data type of the elements in an [[ArrayData]] + * + * @return code representing a setter of an assignment for an [[ArrayData]] + */ + def setArrayElementFunc(dataType: DataType): String = { --- End diff -- We should have this in `object CodeGenerator`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r210533929 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -385,53 +385,81 @@ case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInp override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { nullSafeCodeGen(ctx, ev, c => { + val arrayData = ctx.freshName("arrayData") val numElements = ctx.freshName("numElements") val keys = ctx.freshName("keys") val values = ctx.freshName("values") val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType) val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType) + + val wordSize = UnsafeRow.WORD_SIZE + val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize * 2 + val elementSize = if (isKeyPrimitive && isValuePrimitive) { +Some(structSize + wordSize) + } else { +None + } + + val allocation = ctx.createArrayData(arrayData, childDataType.keyType, numElements, +s" $prettyName failed.", elementSize = elementSize) + val setFunc = ctx.setArrayElementFunc(childDataType.keyType) + val code = if (isKeyPrimitive && isValuePrimitive) { -genCodeForPrimitiveElements(ctx, keys, values, ev.value, numElements) +val genCodeForPrimitive = genCodeForPrimitiveElements( + ctx, arrayData, keys, values, ev.value, numElements, structSize, setFunc) +s""" + |if ($arrayData instanceof UnsafeArrayData) { + | $genCodeForPrimitive + |} else { + | ${genCodeForAnyElements(ctx, arrayData, keys, values, ev.value, numElements)} + |} + """.stripMargin } else { -genCodeForAnyElements(ctx, keys, values, ev.value, numElements) +s"${genCodeForAnyElements(ctx, arrayData, keys, values, ev.value, numElements)}" } + s""" |final int $numElements = $c.numElements(); |final ArrayData $keys = $c.keyArray(); |final ArrayData $values = $c.valueArray(); + |$allocation |$code """.stripMargin }) } - private def getKey(varName: String) = CodeGenerator.getValue(varName, childDataType.keyType, "z") + private def getKey(varName: String, index: String) = +CodeGenerator.getValue(varName, childDataType.keyType, index) - private def getValue(varName: String) = { -CodeGenerator.getValue(varName, childDataType.valueType, "z") - } + private def getValue(varName: String, index: String) = +CodeGenerator.getValue(varName, childDataType.valueType, index) private def genCodeForPrimitiveElements( ctx: CodegenContext, + arrayData: String, keys: String, values: String, - arrayData: String, - numElements: String): String = { -val unsafeRow = ctx.freshName("unsafeRow") + resultArrayData: String, + numElements: String, + structSize: Int, + setFuncForKey: String): String = { val unsafeArrayData = ctx.freshName("unsafeArrayData") +val baseObject = ctx.freshName("baseObject") +val unsafeRow = ctx.freshName("unsafeRow") val structsOffset = ctx.freshName("structsOffset") +val offset = ctx.freshName("offset") +val z = ctx.freshName("z") val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes" val baseOffset = Platform.BYTE_ARRAY_OFFSET val wordSize = UnsafeRow.WORD_SIZE -val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize * 2 -val structSizeAsLong = structSize + "L" -val keyTypeName = CodeGenerator.primitiveTypeName(childDataType.keyType) +val structSizeAsLong = s"${structSize}L" val valueTypeName = CodeGenerator.primitiveTypeName(childDataType.valueType) -val valueAssignment = s"$unsafeRow.set$valueTypeName(1, ${getValue(values)});" +val valueAssignment = s"$unsafeRow.set$valueTypeName(1, ${getValue(values, z)});" --- End diff -- We should use `ctx.setArrayElementFunc` to get the func name? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r210531207 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -735,70 +735,92 @@ class CodegenContext { } /** - * Generates code creating a [[UnsafeArrayData]]. + * Generates code of setter for an [[ArrayData]] + * + * @param elementType data type of the elements in an [[ArrayData]] + * + * @return code representing a setter of an assignment for an [[ArrayData]] + */ + def setArrayElementFunc(dataType: DataType): String = { +val isPrimitiveType = CodeGenerator.isPrimitiveType(dataType) +if (isPrimitiveType) { + s"set${CodeGenerator.primitiveTypeName(dataType)}" +} else { + "update" +} + } + + /** + * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] based on + * given parameters. * * @param arrayName name of the array to create + * @param elementType data type of the elements in source array * @param numElements code representing the number of elements the array should contain - * @param elementType data type of the elements in the array * @param additionalErrorMessage string to include in the error message + * @param elementSize optional value which shows the size of an element of the allocated + *[[UnsafeArrayData]] or [[GenericArrayData]] + * + * @return code representing the allocation of [[ArrayData]] + * code representing a setter of an assignment for the generated array */ - def createUnsafeArray( + def createArrayData( arrayName: String, - numElements: String, elementType: DataType, - additionalErrorMessage: String): String = { -val arraySize = freshName("size") -val arrayBytes = freshName("arrayBytes") + numElements: String, + additionalErrorMessage: String, + elementSize: Option[Int] = None): String = { +val (isPrimitiveType, elemSize) = if (elementSize.isDefined) { + (false, elementSize.get) +} else { + (CodeGenerator.isPrimitiveType(elementType), elementType.defaultSize) +} s""" - |long $arraySize = UnsafeArrayData.calculateSizeOfUnderlyingByteArray( - | $numElements, - | ${elementType.defaultSize}); - |if ($arraySize > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) { - | throw new RuntimeException("Unsuccessful try create array with " + $arraySize + - |" bytes of data due to exceeding the limit " + - |"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH} bytes for UnsafeArrayData." + - |"$additionalErrorMessage"); - |} - |byte[] $arrayBytes = new byte[(int)$arraySize]; - |UnsafeArrayData $arrayName = new UnsafeArrayData(); - |Platform.putLong($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, $numElements); - |$arrayName.pointTo($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, (int)$arraySize); - """.stripMargin + |ArrayData $arrayName = ArrayData.allocateArrayData( + | $elemSize, $numElements, $isPrimitiveType, "$additionalErrorMessage"); + """.stripMargin } /** - * Generates code creating a [[UnsafeArrayData]]. The generated code executes - * a provided fallback when the size of backing array would exceed the array size limit. - * @param arrayName a name of the array to create - * @param numElements a piece of code representing the number of elements the array should contain - * @param elementSize a size of an element in bytes - * @param bodyCode a function generating code that fills up the [[UnsafeArrayData]] - * and getting the backing array as a parameter - * @param fallbackCode a piece of code executed when the array size limit is exceeded + * Generates assignment code for a [[ArrayData]] + * + * @param arrayName name of the array to create + * @param setFunc string to include in the error message + * @param elementType data type of the elements in source array + * @param srcArray code representing the number of elements the array should contain + * @param needNullCheck value which shows whether a nullcheck is required for the returning + * assignment + * @param rhsValue optional expression for the right-hand side of the returning assignment + * + * @return code representing an assignment to each element of the [[ArrayData]], which requires + * a pair of destination and source loop index variables */ - def
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r210530380 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -735,70 +735,98 @@ class CodegenContext { } /** - * Generates code creating a [[UnsafeArrayData]]. + * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] based on + * given parameters. * * @param arrayName name of the array to create + * @param elementType data type of the elements in source array * @param numElements code representing the number of elements the array should contain - * @param elementType data type of the elements in the array * @param additionalErrorMessage string to include in the error message + * @param elementSize optional value which shows the size of an element of the allocated + *[[UnsafeArrayData]] or [[GenericArrayData]] + * + * @return code representing the allocation of [[ArrayData]] + * code representing a setter of an assignment for the generated array */ - def createUnsafeArray( + def createArrayData( --- End diff -- I'm still not sure if it would be good to have this. I think the method is small enough to inline. cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r210529856 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala --- @@ -34,6 +36,32 @@ object ArrayData { case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a) case other => new GenericArrayData(other) } + + + /** + * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given parameters. + * + * @param elementSize a size of an element in bytes + * @param numElements the number of elements the array should contain + * @param isPrimitiveType whether the type of an element is primitive type + * @param additionalErrorMessage string to include in the error message + */ + def allocateArrayData( + elementSize: Int, + numElements : Long, + isPrimitiveType: Boolean, + additionalErrorMessage: String) : ArrayData = { +if (isPrimitiveType && !UnsafeArrayData.shouldUseGenericArrayData(elementSize, numElements)) { + UnsafeArrayData.forPrimitiveArray(Platform.BYTE_ARRAY_OFFSET, numElements.toInt, elementSize) --- End diff -- Do we need `Platform.BYTE_ARRAY_OFFSET` here? Seems like the method doesn't use the `offset` argument? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r210533651 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -385,53 +385,81 @@ case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInp override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { nullSafeCodeGen(ctx, ev, c => { + val arrayData = ctx.freshName("arrayData") val numElements = ctx.freshName("numElements") val keys = ctx.freshName("keys") val values = ctx.freshName("values") val isKeyPrimitive = CodeGenerator.isPrimitiveType(childDataType.keyType) val isValuePrimitive = CodeGenerator.isPrimitiveType(childDataType.valueType) + + val wordSize = UnsafeRow.WORD_SIZE + val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize * 2 + val elementSize = if (isKeyPrimitive && isValuePrimitive) { +Some(structSize + wordSize) + } else { +None + } + + val allocation = ctx.createArrayData(arrayData, childDataType.keyType, numElements, +s" $prettyName failed.", elementSize = elementSize) + val setFunc = ctx.setArrayElementFunc(childDataType.keyType) --- End diff -- We can delay this into `genCodeForPrimitiveElements`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r209878827 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -735,70 +735,98 @@ class CodegenContext { } /** - * Generates code creating a [[UnsafeArrayData]]. + * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] based on + * given parameters. * * @param arrayName name of the array to create + * @param elementType data type of the elements in source array * @param numElements code representing the number of elements the array should contain - * @param elementType data type of the elements in the array * @param additionalErrorMessage string to include in the error message + * @param elementSize optional value which shows the size of an element of the allocated + *[[UnsafeArrayData]] or [[GenericArrayData]] + * + * @return code representing the allocation of [[ArrayData]] + * code representing a setter of an assignment for the generated array */ - def createUnsafeArray( + def createArrayData( --- End diff -- I think that it would be good to have a utility method to generate this call. WDYT? On the other hand, I agree that we have a method `setArrayElementFunc` or `setArrayElement`. Thus, we can split this method into two methods that have only one return value. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r209877426 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala --- @@ -34,6 +36,37 @@ object ArrayData { case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a) case other => new GenericArrayData(other) } + + + /** + * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given parameters. + * + * @param elementSize a size of an element in bytes + * @param numElements the number of elements the array should contain + * @param isPrimitiveType whether the type of an element is primitive type + * @param additionalErrorMessage string to include in the error message + */ + def allocateArrayData( + elementSize: Int, + numElements : Long, + isPrimitiveType: Boolean, + additionalErrorMessage: String) : ArrayData = { +val arraySize = UnsafeArrayData.calculateSizeOfUnderlyingByteArray(numElements, elementSize) +if (isPrimitiveType && !UnsafeArrayData.shouldUseGenericArrayData(elementSize, numElements)) { --- End diff -- When `UnsafeArrayData` can be used, `GenericArrayData` is also used. However, if the element size is large, `GenericArrayData` should be used. But, `UnsafeArrayData` cannot be used. Thus, I think that it would be good to use the current name `shouldUseGenericArrayData`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r209855565 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala --- @@ -34,6 +36,37 @@ object ArrayData { case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a) case other => new GenericArrayData(other) } + + + /** + * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given parameters. + * + * @param elementSize a size of an element in bytes + * @param numElements the number of elements the array should contain + * @param isPrimitiveType whether the type of an element is primitive type + * @param additionalErrorMessage string to include in the error message + */ + def allocateArrayData( + elementSize: Int, + numElements : Long, + isPrimitiveType: Boolean, + additionalErrorMessage: String) : ArrayData = { +val arraySize = UnsafeArrayData.calculateSizeOfUnderlyingByteArray(numElements, elementSize) +if (isPrimitiveType && !UnsafeArrayData.shouldUseGenericArrayData(elementSize, numElements)) { + val arrayBytes = new Array[Byte](arraySize.toInt) + val arrayName = new UnsafeArrayData() + Platform.putLong(arrayBytes, Platform.BYTE_ARRAY_OFFSET, numElements) + arrayName.pointTo(arrayBytes, Platform.BYTE_ARRAY_OFFSET, arraySize.toInt) + arrayName --- End diff -- Can't we use `UnsafeArrayData.forPrimitiveArray()`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r209854249 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala --- @@ -34,6 +36,37 @@ object ArrayData { case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a) case other => new GenericArrayData(other) } + + + /** + * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given parameters. + * + * @param elementSize a size of an element in bytes + * @param numElements the number of elements the array should contain + * @param isPrimitiveType whether the type of an element is primitive type + * @param additionalErrorMessage string to include in the error message + */ + def allocateArrayData( + elementSize: Int, + numElements : Long, + isPrimitiveType: Boolean, + additionalErrorMessage: String) : ArrayData = { +val arraySize = UnsafeArrayData.calculateSizeOfUnderlyingByteArray(numElements, elementSize) +if (isPrimitiveType && !UnsafeArrayData.shouldUseGenericArrayData(elementSize, numElements)) { + val arrayBytes = new Array[Byte](arraySize.toInt) + val arrayName = new UnsafeArrayData() + Platform.putLong(arrayBytes, Platform.BYTE_ARRAY_OFFSET, numElements) + arrayName.pointTo(arrayBytes, Platform.BYTE_ARRAY_OFFSET, arraySize.toInt) + arrayName +} else if (numElements <= ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH.toLong) { + new GenericArrayData(new Array[Any](numElements.toInt)) +} else { + throw new RuntimeException(s"Cannot create array with $numElements " + +"elements of data due to exceeding the limit " + +s"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH} elements for ArrayData." + +s"$additionalErrorMessage") --- End diff -- nit: no need to surround with `s""`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r209854002 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala --- @@ -34,6 +36,37 @@ object ArrayData { case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a) case other => new GenericArrayData(other) } + + + /** + * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given parameters. + * + * @param elementSize a size of an element in bytes + * @param numElements the number of elements the array should contain + * @param isPrimitiveType whether the type of an element is primitive type + * @param additionalErrorMessage string to include in the error message + */ + def allocateArrayData( + elementSize: Int, + numElements : Long, + isPrimitiveType: Boolean, + additionalErrorMessage: String) : ArrayData = { +val arraySize = UnsafeArrayData.calculateSizeOfUnderlyingByteArray(numElements, elementSize) --- End diff -- Move into the following if-clause? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r209856692 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -735,70 +735,98 @@ class CodegenContext { } /** - * Generates code creating a [[UnsafeArrayData]]. + * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] based on + * given parameters. * * @param arrayName name of the array to create + * @param elementType data type of the elements in source array * @param numElements code representing the number of elements the array should contain - * @param elementType data type of the elements in the array * @param additionalErrorMessage string to include in the error message + * @param elementSize optional value which shows the size of an element of the allocated + *[[UnsafeArrayData]] or [[GenericArrayData]] + * + * @return code representing the allocation of [[ArrayData]] + * code representing a setter of an assignment for the generated array */ - def createUnsafeArray( + def createArrayData( arrayName: String, - numElements: String, elementType: DataType, - additionalErrorMessage: String): String = { -val arraySize = freshName("size") -val arrayBytes = freshName("arrayBytes") + numElements: String, + additionalErrorMessage: String, + elementSize: Option[Int] = None): (String, String) = { +val isPrimitiveType = if (elementSize.isDefined) { + false +} else { + CodeGenerator.isPrimitiveType(elementType) +} -s""" - |long $arraySize = UnsafeArrayData.calculateSizeOfUnderlyingByteArray( - | $numElements, - | ${elementType.defaultSize}); - |if ($arraySize > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) { - | throw new RuntimeException("Unsuccessful try create array with " + $arraySize + - |" bytes of data due to exceeding the limit " + - |"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH} bytes for UnsafeArrayData." + - |"$additionalErrorMessage"); - |} - |byte[] $arrayBytes = new byte[(int)$arraySize]; - |UnsafeArrayData $arrayName = new UnsafeArrayData(); - |Platform.putLong($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, $numElements); - |$arrayName.pointTo($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, (int)$arraySize); - """.stripMargin +val setFunc = if (isPrimitiveType) { + s"set${CodeGenerator.primitiveTypeName(elementType)}" +} else { + "update" +} + +val elemSize = if (elementSize.isDefined) { + elementSize.get +} else { + elementType.defaultSize +} +val allocation = + s""" + |ArrayData $arrayName = ArrayData.allocateArrayData( + | $elemSize, $numElements, $isPrimitiveType, "$additionalErrorMessage"); + """.stripMargin + +(allocation, setFunc) } /** - * Generates code creating a [[UnsafeArrayData]]. The generated code executes - * a provided fallback when the size of backing array would exceed the array size limit. - * @param arrayName a name of the array to create - * @param numElements a piece of code representing the number of elements the array should contain - * @param elementSize a size of an element in bytes - * @param bodyCode a function generating code that fills up the [[UnsafeArrayData]] - * and getting the backing array as a parameter - * @param fallbackCode a piece of code executed when the array size limit is exceeded + * Generates assignment code for a [[ArrayData]] + * + * @param arrayName name of the array to create + * @param dataType data type of the result array + * @param elementType data type of the elements in source array + * @param srcArray code representing the number of elements the array should contain + * @param setFunc string to include in the error message + * @param rhsValue optional expression for the right-hand side of the returning assignment + * @param checkForNull optional value which shows whether a nullcheck is required for + * the returning assignment + * + * @return code representing an assignment to each element of the [[ArrayData]], which requires + * a pair of destination and source loop index variables */ - def createUnsafeArrayWithFallback( + def createArrayAssignment( arrayName: String, - numElements: String, -
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r209858514 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala --- @@ -34,6 +36,37 @@ object ArrayData { case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a) case other => new GenericArrayData(other) } + + + /** + * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given parameters. + * + * @param elementSize a size of an element in bytes + * @param numElements the number of elements the array should contain + * @param isPrimitiveType whether the type of an element is primitive type + * @param additionalErrorMessage string to include in the error message + */ + def allocateArrayData( + elementSize: Int, + numElements : Long, + isPrimitiveType: Boolean, + additionalErrorMessage: String) : ArrayData = { +val arraySize = UnsafeArrayData.calculateSizeOfUnderlyingByteArray(numElements, elementSize) +if (isPrimitiveType && !UnsafeArrayData.shouldUseGenericArrayData(elementSize, numElements)) { --- End diff -- The method `UnsafeArrayData.shouldUseGenericArrayData()` should be `UnsafeArrayData.shouldUseUnsafeArrayData()`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r209853481 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -735,70 +735,98 @@ class CodegenContext { } /** - * Generates code creating a [[UnsafeArrayData]]. + * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] based on + * given parameters. * * @param arrayName name of the array to create + * @param elementType data type of the elements in source array * @param numElements code representing the number of elements the array should contain - * @param elementType data type of the elements in the array * @param additionalErrorMessage string to include in the error message + * @param elementSize optional value which shows the size of an element of the allocated + *[[UnsafeArrayData]] or [[GenericArrayData]] + * + * @return code representing the allocation of [[ArrayData]] + * code representing a setter of an assignment for the generated array */ - def createUnsafeArray( + def createArrayData( arrayName: String, - numElements: String, elementType: DataType, - additionalErrorMessage: String): String = { -val arraySize = freshName("size") -val arrayBytes = freshName("arrayBytes") + numElements: String, + additionalErrorMessage: String, + elementSize: Option[Int] = None): (String, String) = { +val isPrimitiveType = if (elementSize.isDefined) { + false +} else { + CodeGenerator.isPrimitiveType(elementType) +} -s""" - |long $arraySize = UnsafeArrayData.calculateSizeOfUnderlyingByteArray( - | $numElements, - | ${elementType.defaultSize}); - |if ($arraySize > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) { - | throw new RuntimeException("Unsuccessful try create array with " + $arraySize + - |" bytes of data due to exceeding the limit " + - |"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH} bytes for UnsafeArrayData." + - |"$additionalErrorMessage"); - |} - |byte[] $arrayBytes = new byte[(int)$arraySize]; - |UnsafeArrayData $arrayName = new UnsafeArrayData(); - |Platform.putLong($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, $numElements); - |$arrayName.pointTo($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, (int)$arraySize); - """.stripMargin +val setFunc = if (isPrimitiveType) { + s"set${CodeGenerator.primitiveTypeName(elementType)}" +} else { + "update" +} --- End diff -- Maybe only this is good to have. `def setArrayElementFunc(dt: DataType): String`? Or how about `def setArrayElement(array: String, dataType: DataType, ordinal: Int, value: String): String` like other set utilities? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r209862522 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3330,50 +3136,123 @@ case class ArrayRemove(left: Expression, right: Expression) val pos = ctx.freshName("pos") val getValue = CodeGenerator.getValue(inputArray, elementType, i) val isEqual = ctx.genEqual(elementType, value, getValue) -if (!CodeGenerator.isPrimitiveType(elementType)) { - val arrayClass = classOf[GenericArrayData].getName + +val (allocation, setFunc) = + ctx.createArrayData(values, elementType, newArraySize, s" $prettyName failed.") +val assignment = ctx.createArrayAssignment( + values, dataType, elementType, inputArray, setFunc, checkForNull = Some(false)) + +s""" + |$allocation + |int $pos = 0; + |for (int $i = 0; $i < $inputArray.numElements(); $i ++) { + | if ($inputArray.isNullAt($i)) { + |$values.setNullAt($pos); + |$pos = $pos + 1; + | } + | else { + |if (!($isEqual)) { + | ${assignment(pos, i)} + | $pos = $pos + 1; + |} + | } + |} + |${ev.value} = $values; + """.stripMargin + } + + override def prettyName: String = "array_remove" +} + +/** + * Will become common base class for [[ArrayDistinct]], [[ArrayUnion]], [[ArrayIntersect]], + * and [[ArrayExcept]]. + */ +trait ArraySetLike { + @transient protected lazy val dt: DataType = NullType + @transient protected lazy val et: DataType = NullType --- End diff -- How about: `protected def dt: ArrayType` and `protected def et: DataType`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r209852203 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -735,70 +735,98 @@ class CodegenContext { } /** - * Generates code creating a [[UnsafeArrayData]]. + * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] based on + * given parameters. * * @param arrayName name of the array to create + * @param elementType data type of the elements in source array * @param numElements code representing the number of elements the array should contain - * @param elementType data type of the elements in the array * @param additionalErrorMessage string to include in the error message + * @param elementSize optional value which shows the size of an element of the allocated + *[[UnsafeArrayData]] or [[GenericArrayData]] + * + * @return code representing the allocation of [[ArrayData]] + * code representing a setter of an assignment for the generated array */ - def createUnsafeArray( + def createArrayData( --- End diff -- Now that we have `ArrayData.allocateArrayData()`, do we need this method? How about calling it directly instead? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r209643599 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -735,70 +735,100 @@ class CodegenContext { } /** - * Generates code creating a [[UnsafeArrayData]]. + * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] based on + * given parameters. * * @param arrayName name of the array to create + * @param elementType data type of the elements in source array * @param numElements code representing the number of elements the array should contain - * @param elementType data type of the elements in the array * @param additionalErrorMessage string to include in the error message + * @param elementSize optional value which shows the size of an element of the allocated + *[[UnsafeArrayData]] or [[GenericArrayData]] + * + * @return code representing the allocation of [[ArrayData]] + * code representing a setter of an assignment for the generated array */ - def createUnsafeArray( + def createArrayData( arrayName: String, - numElements: String, elementType: DataType, - additionalErrorMessage: String): String = { -val arraySize = freshName("size") -val arrayBytes = freshName("arrayBytes") + numElements: String, + additionalErrorMessage: String, + elementSize: Option[Int] = None): (String, String) = { --- End diff -- Yeah, they are exclusive. However, when `elementType` is not used, the integer value is required. `MapEntries` stores `UnsafeRow` instead of primitive data. [This](https://github.com/apache/spark/blob/c15630014c7ef850fa21c1247fa5edd2b1bac81b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L405) requires non-fixed size unlike `elementType.defaultSize`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r209574445 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -735,70 +735,100 @@ class CodegenContext { } /** - * Generates code creating a [[UnsafeArrayData]]. + * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] based on + * given parameters. * * @param arrayName name of the array to create + * @param elementType data type of the elements in source array * @param numElements code representing the number of elements the array should contain - * @param elementType data type of the elements in the array * @param additionalErrorMessage string to include in the error message + * @param elementSize optional value which shows the size of an element of the allocated + *[[UnsafeArrayData]] or [[GenericArrayData]] + * + * @return code representing the allocation of [[ArrayData]] + * code representing a setter of an assignment for the generated array */ - def createUnsafeArray( + def createArrayData( arrayName: String, - numElements: String, elementType: DataType, - additionalErrorMessage: String): String = { -val arraySize = freshName("size") -val arrayBytes = freshName("arrayBytes") + numElements: String, + additionalErrorMessage: String, + elementSize: Option[Int] = None): (String, String) = { +val isPrimitiveType = if (elementSize.isDefined) { + false +} else { + CodeGenerator.isPrimitiveType(elementType) +} -s""" - |long $arraySize = UnsafeArrayData.calculateSizeOfUnderlyingByteArray( - | $numElements, - | ${elementType.defaultSize}); - |if ($arraySize > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) { - | throw new RuntimeException("Unsuccessful try create array with " + $arraySize + - |" bytes of data due to exceeding the limit " + - |"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH} bytes for UnsafeArrayData." + - |"$additionalErrorMessage"); - |} - |byte[] $arrayBytes = new byte[(int)$arraySize]; - |UnsafeArrayData $arrayName = new UnsafeArrayData(); - |Platform.putLong($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, $numElements); - |$arrayName.pointTo($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, (int)$arraySize); - """.stripMargin +val setFunc = if (isPrimitiveType) { + s"set${CodeGenerator.primitiveTypeName(elementType)}" +} else { + "update" +} + +val elemSize = if (elementSize.isDefined) { + elementSize.get +} else { + elementType.defaultSize +} +val arrayData = classOf[ArrayData].getName +val allocation = + s""" + |ArrayData $arrayName = $arrayData$$.MODULE$$.allocateArrayData( + | $elemSize, $numElements, $isPrimitiveType, "$additionalErrorMessage"); + """.stripMargin + +(allocation, setFunc) } /** - * Generates code creating a [[UnsafeArrayData]]. The generated code executes - * a provided fallback when the size of backing array would exceed the array size limit. - * @param arrayName a name of the array to create - * @param numElements a piece of code representing the number of elements the array should contain - * @param elementSize a size of an element in bytes - * @param bodyCode a function generating code that fills up the [[UnsafeArrayData]] - * and getting the backing array as a parameter - * @param fallbackCode a piece of code executed when the array size limit is exceeded + * Generates assignment code for a [[ArrayData]] + * + * @param arrayName name of the array to create + * @param dataType data type of the result array + * @param elementType data type of the elements in source array + * @param srcArray code representing the number of elements the array should contain + * @param setFunc string to include in the error message + * @param rhsValue an optionally specified expression for the right-hand side of the returning + * assignment + * @param checkForNull optional value which shows whether a nullcheck is required for + * the returning assignment + * + * @return code representing an assignment to each element of the [[ArrayData]], which requires + * a pair of destination and source loop index variables */ - def createUnsafeArrayWithFallback(
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r209573854 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -735,70 +735,100 @@ class CodegenContext { } /** - * Generates code creating a [[UnsafeArrayData]]. + * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] based on + * given parameters. * * @param arrayName name of the array to create + * @param elementType data type of the elements in source array * @param numElements code representing the number of elements the array should contain - * @param elementType data type of the elements in the array * @param additionalErrorMessage string to include in the error message + * @param elementSize optional value which shows the size of an element of the allocated + *[[UnsafeArrayData]] or [[GenericArrayData]] + * + * @return code representing the allocation of [[ArrayData]] + * code representing a setter of an assignment for the generated array */ - def createUnsafeArray( + def createArrayData( arrayName: String, - numElements: String, elementType: DataType, - additionalErrorMessage: String): String = { -val arraySize = freshName("size") -val arrayBytes = freshName("arrayBytes") + numElements: String, + additionalErrorMessage: String, + elementSize: Option[Int] = None): (String, String) = { +val isPrimitiveType = if (elementSize.isDefined) { + false +} else { + CodeGenerator.isPrimitiveType(elementType) +} -s""" - |long $arraySize = UnsafeArrayData.calculateSizeOfUnderlyingByteArray( - | $numElements, - | ${elementType.defaultSize}); - |if ($arraySize > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) { - | throw new RuntimeException("Unsuccessful try create array with " + $arraySize + - |" bytes of data due to exceeding the limit " + - |"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH} bytes for UnsafeArrayData." + - |"$additionalErrorMessage"); - |} - |byte[] $arrayBytes = new byte[(int)$arraySize]; - |UnsafeArrayData $arrayName = new UnsafeArrayData(); - |Platform.putLong($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, $numElements); - |$arrayName.pointTo($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, (int)$arraySize); - """.stripMargin +val setFunc = if (isPrimitiveType) { + s"set${CodeGenerator.primitiveTypeName(elementType)}" +} else { + "update" +} + +val elemSize = if (elementSize.isDefined) { + elementSize.get +} else { + elementType.defaultSize +} +val arrayData = classOf[ArrayData].getName +val allocation = + s""" + |ArrayData $arrayName = $arrayData$$.MODULE$$.allocateArrayData( --- End diff -- IIRC java can call methods of scala object directly, can you verify if `ArrayData.allocateArrayData` works? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r209573448 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -735,70 +735,100 @@ class CodegenContext { } /** - * Generates code creating a [[UnsafeArrayData]]. + * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] based on + * given parameters. * * @param arrayName name of the array to create + * @param elementType data type of the elements in source array * @param numElements code representing the number of elements the array should contain - * @param elementType data type of the elements in the array * @param additionalErrorMessage string to include in the error message + * @param elementSize optional value which shows the size of an element of the allocated + *[[UnsafeArrayData]] or [[GenericArrayData]] + * + * @return code representing the allocation of [[ArrayData]] + * code representing a setter of an assignment for the generated array */ - def createUnsafeArray( + def createArrayData( arrayName: String, - numElements: String, elementType: DataType, - additionalErrorMessage: String): String = { -val arraySize = freshName("size") -val arrayBytes = freshName("arrayBytes") + numElements: String, + additionalErrorMessage: String, + elementSize: Option[Int] = None): (String, String) = { --- End diff -- why do we need both `elementType` and `elementSize`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org