[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-09-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21912


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-31 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r214392521
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -463,23 +463,35 @@ public static UnsafeArrayData fromPrimitiveArray(
 final long[] data = new long[(int)totalSizeInLongs];
 
 Platform.putLong(data, Platform.LONG_ARRAY_OFFSET, length);
-if (arr != null) {
--- End diff --

Originally, this if is not here. It is fine with me to keep this for the 
safety.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-31 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r214332656
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -463,23 +463,35 @@ public static UnsafeArrayData fromPrimitiveArray(
 final long[] data = new long[(int)totalSizeInLongs];
 
 Platform.putLong(data, Platform.LONG_ARRAY_OFFSET, length);
-if (arr != null) {
--- End diff --

shall we keep this if?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-31 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r214253824
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -34,6 +36,32 @@ object ArrayData {
 case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a)
 case other => new GenericArrayData(other)
   }
+
+
+  /**
+   * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given 
parameters.
+   *
+   * @param elementSize a size of an element in bytes
+   * @param numElements the number of elements the array should contain
+   * @param isPrimitiveType whether the type of an element is primitive 
type
+   * @param additionalErrorMessage string to include in the error message
+   */
+  def allocateArrayData(
+  elementSize: Int,
--- End diff --

ah it's called in the generated code. Maybe we can use elementSize `-1` to 
create a generic array.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-31 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r214253479
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -34,6 +36,32 @@ object ArrayData {
 case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a)
 case other => new GenericArrayData(other)
   }
+
+
+  /**
+   * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given 
parameters.
+   *
+   * @param elementSize a size of an element in bytes
+   * @param numElements the number of elements the array should contain
+   * @param isPrimitiveType whether the type of an element is primitive 
type
+   * @param additionalErrorMessage string to include in the error message
+   */
+  def allocateArrayData(
+  elementSize: Int,
--- End diff --

`elementSize` is only used when creating unsafe array. I think we just have 
a `elementSize: Option[Int]` and remove the `isPrimitiveType` parameter.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-31 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r214253006
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -452,6 +452,16 @@ public UnsafeArrayData copy() {
 
   public static UnsafeArrayData fromPrimitiveArray(
Object arr, int offset, int length, int elementSize) {
+UnsafeArrayData result = createFreshArray(length, elementSize);
+final long headerInBytes = calculateHeaderPortionInBytes(length);
+final long valueRegionInBytes = (long)elementSize * length;
+final Object data = result.getBaseObject();
--- End diff --

now the data is `Object` instead of `long[]`. Can we duplicate the code for 
now and think of how to deduplicate them later?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-30 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r214145047
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -385,107 +385,124 @@ case class MapEntries(child: Expression) extends 
UnaryExpression with ExpectsInp
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 nullSafeCodeGen(ctx, ev, c => {
+  val arrayData = ctx.freshName("arrayData")
   val numElements = ctx.freshName("numElements")
   val keys = ctx.freshName("keys")
   val values = ctx.freshName("values")
   val isKeyPrimitive = 
CodeGenerator.isPrimitiveType(childDataType.keyType)
   val isValuePrimitive = 
CodeGenerator.isPrimitiveType(childDataType.valueType)
+
+  val wordSize = UnsafeRow.WORD_SIZE
+  val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize 
* 2
+  val (isPrimitive, elementSize) = if (isKeyPrimitive && 
isValuePrimitive) {
+(false, structSize + wordSize)
--- End diff --

Good catch, thank you!!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r213884198
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -385,107 +385,124 @@ case class MapEntries(child: Expression) extends 
UnaryExpression with ExpectsInp
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 nullSafeCodeGen(ctx, ev, c => {
+  val arrayData = ctx.freshName("arrayData")
   val numElements = ctx.freshName("numElements")
   val keys = ctx.freshName("keys")
   val values = ctx.freshName("values")
   val isKeyPrimitive = 
CodeGenerator.isPrimitiveType(childDataType.keyType)
   val isValuePrimitive = 
CodeGenerator.isPrimitiveType(childDataType.valueType)
+
+  val wordSize = UnsafeRow.WORD_SIZE
+  val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize 
* 2
+  val (isPrimitive, elementSize) = if (isKeyPrimitive && 
isValuePrimitive) {
+(false, structSize + wordSize)
--- End diff --

should be `true`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r213883747
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -452,6 +452,16 @@ public UnsafeArrayData copy() {
 
   public static UnsafeArrayData fromPrimitiveArray(
Object arr, int offset, int length, int elementSize) {
+UnsafeArrayData result = forPrimitiveArray(length, elementSize);
+final long headerInBytes = calculateHeaderPortionInBytes(length);
+final long valueRegionInBytes = (long)elementSize * length;
+final Object data = result.getBaseObject();
+Platform.copyMemory(arr, offset, data,
+  Platform.LONG_ARRAY_OFFSET + headerInBytes, valueRegionInBytes);
+return result;
+  }
+
+  public static UnsafeArrayData forPrimitiveArray(int length, int 
elementSize) {
--- End diff --

how about `createFreshArray`? It's not for primitive only, e.g. we use it 
to store a struct of fixed size fields.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-29 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r213707288
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -385,107 +385,120 @@ case class MapEntries(child: Expression) extends 
UnaryExpression with ExpectsInp
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 nullSafeCodeGen(ctx, ev, c => {
+  val arrayData = ctx.freshName("arrayData")
   val numElements = ctx.freshName("numElements")
   val keys = ctx.freshName("keys")
   val values = ctx.freshName("values")
   val isKeyPrimitive = 
CodeGenerator.isPrimitiveType(childDataType.keyType)
   val isValuePrimitive = 
CodeGenerator.isPrimitiveType(childDataType.valueType)
+
+  val wordSize = UnsafeRow.WORD_SIZE
+  val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize 
* 2
+  val elementSize = if (isKeyPrimitive && isValuePrimitive) {
+Some(structSize + wordSize)
+  } else {
+None
+  }
+
+  val allocation = CodeGenerator.createArrayData(arrayData, 
childDataType.keyType, numElements,
+s" $prettyName failed.", elementSize = elementSize)
+
   val code = if (isKeyPrimitive && isValuePrimitive) {
-genCodeForPrimitiveElements(ctx, keys, values, ev.value, 
numElements)
+val genCodeForPrimitive = genCodeForPrimitiveElements(
+  ctx, arrayData, keys, values, ev.value, numElements, structSize)
+s"""
+   |if ($arrayData instanceof UnsafeArrayData) {
+   |  $genCodeForPrimitive
+   |} else {
+   |  ${genCodeForAnyElements(ctx, arrayData, keys, values, 
ev.value, numElements)}
+   |}
+ """.stripMargin
   } else {
-genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
+s"${genCodeForAnyElements(ctx, arrayData, keys, values, ev.value, 
numElements)}"
   }
+
   s"""
  |final int $numElements = $c.numElements();
  |final ArrayData $keys = $c.keyArray();
  |final ArrayData $values = $c.valueArray();
+ |$allocation
  |$code
""".stripMargin
 })
   }
 
-  private def getKey(varName: String) = CodeGenerator.getValue(varName, 
childDataType.keyType, "z")
+  private def getKey(varName: String, index: String) =
+CodeGenerator.getValue(varName, childDataType.keyType, index)
 
-  private def getValue(varName: String) = {
-CodeGenerator.getValue(varName, childDataType.valueType, "z")
-  }
+  private def getValue(varName: String, index: String) =
+CodeGenerator.getValue(varName, childDataType.valueType, index)
 
   private def genCodeForPrimitiveElements(
   ctx: CodegenContext,
+  arrayData: String,
   keys: String,
   values: String,
-  arrayData: String,
-  numElements: String): String = {
-val unsafeRow = ctx.freshName("unsafeRow")
+  resultArrayData: String,
+  numElements: String,
+  structSize: Int): String = {
 val unsafeArrayData = ctx.freshName("unsafeArrayData")
+val baseObject = ctx.freshName("baseObject")
+val unsafeRow = ctx.freshName("unsafeRow")
 val structsOffset = ctx.freshName("structsOffset")
+val offset = ctx.freshName("offset")
+val z = ctx.freshName("z")
 val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
 
 val baseOffset = Platform.BYTE_ARRAY_OFFSET
 val wordSize = UnsafeRow.WORD_SIZE
-val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize * 
2
-val structSizeAsLong = structSize + "L"
-val keyTypeName = 
CodeGenerator.primitiveTypeName(childDataType.keyType)
-val valueTypeName = 
CodeGenerator.primitiveTypeName(childDataType.valueType)
-
-val valueAssignment = s"$unsafeRow.set$valueTypeName(1, 
${getValue(values)});"
-val valueAssignmentChecked = if (childDataType.valueContainsNull) {
-  s"""
- |if ($values.isNullAt(z)) {
- |  $unsafeRow.setNullAt(1);
- |} else {
- |  $valueAssignment
- |}
-   """.stripMargin
-} else {
-  valueAssignment
-}
+val structSizeAsLong = s"${structSize}L"
 
-val assignmentLoop = (byteArray: String) =>
-  s"""
- |final int $structsOffset = $calculateHeader($numElements) + 
$numElements * $wordSize;
- |UnsafeRow $unsafeRow = new UnsafeRow(2);
- |for (int z = 0; z < $numElements; z++) {
- |  long offset = $structsOffset + z * $structSizeAsLong;

[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r213542353
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -385,107 +385,120 @@ case class MapEntries(child: Expression) extends 
UnaryExpression with ExpectsInp
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 nullSafeCodeGen(ctx, ev, c => {
+  val arrayData = ctx.freshName("arrayData")
   val numElements = ctx.freshName("numElements")
   val keys = ctx.freshName("keys")
   val values = ctx.freshName("values")
   val isKeyPrimitive = 
CodeGenerator.isPrimitiveType(childDataType.keyType)
   val isValuePrimitive = 
CodeGenerator.isPrimitiveType(childDataType.valueType)
+
+  val wordSize = UnsafeRow.WORD_SIZE
+  val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize 
* 2
+  val elementSize = if (isKeyPrimitive && isValuePrimitive) {
+Some(structSize + wordSize)
+  } else {
+None
+  }
+
+  val allocation = CodeGenerator.createArrayData(arrayData, 
childDataType.keyType, numElements,
--- End diff --

this is hacky. Actually we want to create an array of struct, but here we 
lied and say we want to create an array of key type.

I think we should call `ArrayData.allocateArrayData` here directly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r213541643
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -385,107 +385,120 @@ case class MapEntries(child: Expression) extends 
UnaryExpression with ExpectsInp
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 nullSafeCodeGen(ctx, ev, c => {
+  val arrayData = ctx.freshName("arrayData")
   val numElements = ctx.freshName("numElements")
   val keys = ctx.freshName("keys")
   val values = ctx.freshName("values")
   val isKeyPrimitive = 
CodeGenerator.isPrimitiveType(childDataType.keyType)
   val isValuePrimitive = 
CodeGenerator.isPrimitiveType(childDataType.valueType)
+
+  val wordSize = UnsafeRow.WORD_SIZE
+  val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize 
* 2
+  val elementSize = if (isKeyPrimitive && isValuePrimitive) {
+Some(structSize + wordSize)
+  } else {
+None
+  }
+
+  val allocation = CodeGenerator.createArrayData(arrayData, 
childDataType.keyType, numElements,
+s" $prettyName failed.", elementSize = elementSize)
+
   val code = if (isKeyPrimitive && isValuePrimitive) {
-genCodeForPrimitiveElements(ctx, keys, values, ev.value, 
numElements)
+val genCodeForPrimitive = genCodeForPrimitiveElements(
+  ctx, arrayData, keys, values, ev.value, numElements, structSize)
+s"""
+   |if ($arrayData instanceof UnsafeArrayData) {
+   |  $genCodeForPrimitive
+   |} else {
+   |  ${genCodeForAnyElements(ctx, arrayData, keys, values, 
ev.value, numElements)}
+   |}
+ """.stripMargin
   } else {
-genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
+s"${genCodeForAnyElements(ctx, arrayData, keys, values, ev.value, 
numElements)}"
   }
+
   s"""
  |final int $numElements = $c.numElements();
  |final ArrayData $keys = $c.keyArray();
  |final ArrayData $values = $c.valueArray();
+ |$allocation
  |$code
""".stripMargin
 })
   }
 
-  private def getKey(varName: String) = CodeGenerator.getValue(varName, 
childDataType.keyType, "z")
+  private def getKey(varName: String, index: String) =
+CodeGenerator.getValue(varName, childDataType.keyType, index)
 
-  private def getValue(varName: String) = {
-CodeGenerator.getValue(varName, childDataType.valueType, "z")
-  }
+  private def getValue(varName: String, index: String) =
+CodeGenerator.getValue(varName, childDataType.valueType, index)
 
   private def genCodeForPrimitiveElements(
   ctx: CodegenContext,
+  arrayData: String,
   keys: String,
   values: String,
-  arrayData: String,
-  numElements: String): String = {
-val unsafeRow = ctx.freshName("unsafeRow")
+  resultArrayData: String,
+  numElements: String,
+  structSize: Int): String = {
 val unsafeArrayData = ctx.freshName("unsafeArrayData")
+val baseObject = ctx.freshName("baseObject")
+val unsafeRow = ctx.freshName("unsafeRow")
 val structsOffset = ctx.freshName("structsOffset")
+val offset = ctx.freshName("offset")
+val z = ctx.freshName("z")
 val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
 
 val baseOffset = Platform.BYTE_ARRAY_OFFSET
 val wordSize = UnsafeRow.WORD_SIZE
-val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize * 
2
-val structSizeAsLong = structSize + "L"
-val keyTypeName = 
CodeGenerator.primitiveTypeName(childDataType.keyType)
-val valueTypeName = 
CodeGenerator.primitiveTypeName(childDataType.valueType)
-
-val valueAssignment = s"$unsafeRow.set$valueTypeName(1, 
${getValue(values)});"
-val valueAssignmentChecked = if (childDataType.valueContainsNull) {
-  s"""
- |if ($values.isNullAt(z)) {
- |  $unsafeRow.setNullAt(1);
- |} else {
- |  $valueAssignment
- |}
-   """.stripMargin
-} else {
-  valueAssignment
-}
+val structSizeAsLong = s"${structSize}L"
 
-val assignmentLoop = (byteArray: String) =>
-  s"""
- |final int $structsOffset = $calculateHeader($numElements) + 
$numElements * $wordSize;
- |UnsafeRow $unsafeRow = new UnsafeRow(2);
- |for (int z = 0; z < $numElements; z++) {
- |  long offset = $structsOffset + z * $structSizeAsLong;
 

[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r213540868
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -1490,6 +1423,63 @@ object CodeGenerator extends Logging {
 }
   }
 
+  /**
+   * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] 
based on
+   * given parameters.
+   *
+   * @param arrayName name of the array to create
+   * @param elementType data type of the elements in source array
+   * @param numElements code representing the number of elements the array 
should contain
+   * @param additionalErrorMessage string to include in the error message
+   * @param elementSize optional value which shows the size of an element 
of the allocated
+   *[[UnsafeArrayData]] or [[GenericArrayData]]
+   *
+   * @return code representing the allocation of [[ArrayData]]
+   */
+  def createArrayData(
+  arrayName: String,
+  elementType: DataType,
+  numElements: String,
+  additionalErrorMessage: String,
+  elementSize: Option[Int] = None): String = {
+val (isPrimitiveType, elemSize) = if (elementSize.isDefined) {
+  (false, elementSize.get)
+} else {
+  (CodeGenerator.isPrimitiveType(elementType), elementType.defaultSize)
+}
+
+s"""
+   |ArrayData $arrayName = ArrayData.allocateArrayData(
+   |  $elemSize, $numElements, $isPrimitiveType, 
"$additionalErrorMessage");
+ """.stripMargin
+  }
+
+  /**
+   * Generates assignment code for an [[ArrayData]]
--- End diff --

shall we mention that the returned code should be put inside a loop?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r213540562
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -1490,6 +1423,63 @@ object CodeGenerator extends Logging {
 }
   }
 
+  /**
+   * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] 
based on
+   * given parameters.
+   *
+   * @param arrayName name of the array to create
+   * @param elementType data type of the elements in source array
+   * @param numElements code representing the number of elements the array 
should contain
+   * @param additionalErrorMessage string to include in the error message
+   * @param elementSize optional value which shows the size of an element 
of the allocated
+   *[[UnsafeArrayData]] or [[GenericArrayData]]
+   *
+   * @return code representing the allocation of [[ArrayData]]
+   */
+  def createArrayData(
+  arrayName: String,
+  elementType: DataType,
+  numElements: String,
+  additionalErrorMessage: String,
+  elementSize: Option[Int] = None): String = {
+val (isPrimitiveType, elemSize) = if (elementSize.isDefined) {
+  (false, elementSize.get)
+} else {
+  (CodeGenerator.isPrimitiveType(elementType), elementType.defaultSize)
+}
+
+s"""
+   |ArrayData $arrayName = ArrayData.allocateArrayData(
+   |  $elemSize, $numElements, $isPrimitiveType, 
"$additionalErrorMessage");
+ """.stripMargin
+  }
+
+  /**
+   * Generates assignment code for an [[ArrayData]]
+   *
+   * @param arrayName name of the array to create
+   * @param elementType data type of the elements in destination and 
source arrays
+   * @param srcArray code representing the number of elements the array 
should contain
--- End diff --

rename it to `length`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r213540440
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
 ---
@@ -1314,9 +1314,9 @@ class CollectionExpressionsSuite extends 
SparkFunSuite with ExpressionEvalHelper
   }
 
   test("Array Distinct") {
-val a0 = Literal.create(Seq(2, 1, 2, 3, 4, 4, 5), 
ArrayType(IntegerType))
+val a0 = Literal.create(Seq(2, 1, 2, 3, 4, 4, 5), 
ArrayType(IntegerType, false))
--- End diff --

why this change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r213540247
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -473,13 +473,13 @@ public static UnsafeArrayData fromPrimitiveArray(
 return result;
   }
 
-  public static UnsafeArrayData forPrimitiveArray(int offset, int length, 
int elementSize) {
-return fromPrimitiveArray(null, offset, length, elementSize);
+  public static UnsafeArrayData forPrimitiveArray(int length, int 
elementSize) {
+return fromPrimitiveArray(null, 0, length, elementSize);
--- End diff --

so this is used to create a new unsafe array with no data. This saves 
duplicated code, but we will do unnecessary memory copy.

Shall we just add a new method in `UnsafeArrayData` to create a fresh array?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r213539798
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -1490,6 +1423,63 @@ object CodeGenerator extends Logging {
 }
   }
 
+  /**
+   * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] 
based on
+   * given parameters.
+   *
+   * @param arrayName name of the array to create
+   * @param elementType data type of the elements in source array
+   * @param numElements code representing the number of elements the array 
should contain
+   * @param additionalErrorMessage string to include in the error message
+   * @param elementSize optional value which shows the size of an element 
of the allocated
--- End diff --

it's better to give an example when we will set it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r213539696
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -473,13 +473,13 @@ public static UnsafeArrayData fromPrimitiveArray(
 return result;
   }
 
-  public static UnsafeArrayData forPrimitiveArray(int offset, int length, 
int elementSize) {
-return fromPrimitiveArray(null, offset, length, elementSize);
+  public static UnsafeArrayData forPrimitiveArray(int length, int 
elementSize) {
+return fromPrimitiveArray(null, 0, length, elementSize);
--- End diff --

is it safe? I vaguely remember the address of offheap memory block is 
usually not 0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-19 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r211126450
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -735,70 +735,98 @@ class CodegenContext {
   }
 
   /**
-   * Generates code creating a [[UnsafeArrayData]].
+   * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] 
based on
+   * given parameters.
*
* @param arrayName name of the array to create
+   * @param elementType data type of the elements in source array
* @param numElements code representing the number of elements the array 
should contain
-   * @param elementType data type of the elements in the array
* @param additionalErrorMessage string to include in the error message
+   * @param elementSize optional value which shows the size of an element 
of the allocated
+   *[[UnsafeArrayData]] or [[GenericArrayData]]
+   *
+   * @return code representing the allocation of [[ArrayData]]
+   * code representing a setter of an assignment for the generated 
array
*/
-  def createUnsafeArray(
+  def createArrayData(
--- End diff --

I think it would be good to have this since this utility method will be 
used at other places where `new UnsafeArrayData` or `new GenericArrayData` is 
generated as Java code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-19 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r211096868
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -34,6 +36,32 @@ object ArrayData {
 case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a)
 case other => new GenericArrayData(other)
   }
+
+
+  /**
+   * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given 
parameters.
+   *
+   * @param elementSize a size of an element in bytes
+   * @param numElements the number of elements the array should contain
+   * @param isPrimitiveType whether the type of an element is primitive 
type
+   * @param additionalErrorMessage string to include in the error message
+   */
+  def allocateArrayData(
+  elementSize: Int,
+  numElements : Long,
+  isPrimitiveType: Boolean,
+  additionalErrorMessage: String) : ArrayData = {
+if (isPrimitiveType && 
!UnsafeArrayData.shouldUseGenericArrayData(elementSize, numElements)) {
+  UnsafeArrayData.forPrimitiveArray(Platform.BYTE_ARRAY_OFFSET, 
numElements.toInt, elementSize)
--- End diff --

Good catch


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-16 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r210543909
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -34,6 +36,32 @@ object ArrayData {
 case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a)
 case other => new GenericArrayData(other)
   }
+
+
+  /**
+   * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given 
parameters.
+   *
+   * @param elementSize a size of an element in bytes
+   * @param numElements the number of elements the array should contain
+   * @param isPrimitiveType whether the type of an element is primitive 
type
+   * @param additionalErrorMessage string to include in the error message
+   */
+  def allocateArrayData(
+  elementSize: Int,
+  numElements : Long,
--- End diff --

nit: extra space after `numElements`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-16 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r210543965
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -34,6 +36,32 @@ object ArrayData {
 case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a)
 case other => new GenericArrayData(other)
   }
+
+
+  /**
+   * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given 
parameters.
+   *
+   * @param elementSize a size of an element in bytes
+   * @param numElements the number of elements the array should contain
+   * @param isPrimitiveType whether the type of an element is primitive 
type
+   * @param additionalErrorMessage string to include in the error message
+   */
+  def allocateArrayData(
+  elementSize: Int,
+  numElements : Long,
+  isPrimitiveType: Boolean,
+  additionalErrorMessage: String) : ArrayData = {
--- End diff --

nit: extra space after `)`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-16 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r210539389
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -735,70 +735,92 @@ class CodegenContext {
   }
 
   /**
-   * Generates code creating a [[UnsafeArrayData]].
+   * Generates code of setter for an [[ArrayData]]
+   *
+   * @param elementType data type of the elements in an [[ArrayData]]
+   *
+   * @return code representing a setter of an assignment for an 
[[ArrayData]]
+   */
+  def setArrayElementFunc(dataType: DataType): String = {
+val isPrimitiveType = CodeGenerator.isPrimitiveType(dataType)
+if (isPrimitiveType) {
+  s"set${CodeGenerator.primitiveTypeName(dataType)}"
+} else {
+  "update"
+}
+  }
+
+  /**
+   * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] 
based on
+   * given parameters.
*
* @param arrayName name of the array to create
+   * @param elementType data type of the elements in source array
* @param numElements code representing the number of elements the array 
should contain
-   * @param elementType data type of the elements in the array
* @param additionalErrorMessage string to include in the error message
+   * @param elementSize optional value which shows the size of an element 
of the allocated
+   *[[UnsafeArrayData]] or [[GenericArrayData]]
+   *
+   * @return code representing the allocation of [[ArrayData]]
+   * code representing a setter of an assignment for the generated 
array
*/
-  def createUnsafeArray(
+  def createArrayData(
   arrayName: String,
-  numElements: String,
   elementType: DataType,
-  additionalErrorMessage: String): String = {
-val arraySize = freshName("size")
-val arrayBytes = freshName("arrayBytes")
+  numElements: String,
+  additionalErrorMessage: String,
+  elementSize: Option[Int] = None): String = {
+val (isPrimitiveType, elemSize) = if (elementSize.isDefined) {
+  (false, elementSize.get)
+} else {
+  (CodeGenerator.isPrimitiveType(elementType), elementType.defaultSize)
+}
 
 s"""
-   |long $arraySize = 
UnsafeArrayData.calculateSizeOfUnderlyingByteArray(
-   |  $numElements,
-   |  ${elementType.defaultSize});
-   |if ($arraySize > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) {
-   |  throw new RuntimeException("Unsuccessful try create array with " 
+ $arraySize +
-   |" bytes of data due to exceeding the limit " +
-   |"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH} bytes for 
UnsafeArrayData." +
-   |"$additionalErrorMessage");
-   |}
-   |byte[] $arrayBytes = new byte[(int)$arraySize];
-   |UnsafeArrayData $arrayName = new UnsafeArrayData();
-   |Platform.putLong($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, 
$numElements);
-   |$arrayName.pointTo($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, 
(int)$arraySize);
-  """.stripMargin
+   |ArrayData $arrayName = ArrayData.allocateArrayData(
+   |  $elemSize, $numElements, $isPrimitiveType, 
"$additionalErrorMessage");
+ """.stripMargin
   }
 
   /**
-   * Generates code creating a [[UnsafeArrayData]]. The generated code 
executes
-   * a provided fallback when the size of backing array would exceed the 
array size limit.
-   * @param arrayName a name of the array to create
-   * @param numElements a piece of code representing the number of 
elements the array should contain
-   * @param elementSize a size of an element in bytes
-   * @param bodyCode a function generating code that fills up the 
[[UnsafeArrayData]]
-   * and getting the backing array as a parameter
-   * @param fallbackCode a piece of code executed when the array size 
limit is exceeded
+   * Generates assignment code for a [[ArrayData]]
+   *
+   * @param arrayName name of the array to create
+   * @param setFunc string to include in the error message
+   * @param elementType data type of the elements in source array
+   * @param srcArray code representing the number of elements the array 
should contain
+   * @param needNullCheck value which shows whether a nullcheck is 
required for the returning
+   *  assignment
+   * @param rhsValue optional expression for the right-hand side of the 
returning assignment
+   *
+   * @return code representing an assignment to each element of the 
[[ArrayData]], which requires
+   * a pair of destination and source loop index variables
*/
-  def 

[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-16 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r210539249
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -735,70 +735,92 @@ class CodegenContext {
   }
 
   /**
-   * Generates code creating a [[UnsafeArrayData]].
+   * Generates code of setter for an [[ArrayData]]
+   *
+   * @param elementType data type of the elements in an [[ArrayData]]
+   *
+   * @return code representing a setter of an assignment for an 
[[ArrayData]]
+   */
+  def setArrayElementFunc(dataType: DataType): String = {
--- End diff --

We should have this in `object CodeGenerator`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-16 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r210533929
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -385,53 +385,81 @@ case class MapEntries(child: Expression) extends 
UnaryExpression with ExpectsInp
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 nullSafeCodeGen(ctx, ev, c => {
+  val arrayData = ctx.freshName("arrayData")
   val numElements = ctx.freshName("numElements")
   val keys = ctx.freshName("keys")
   val values = ctx.freshName("values")
   val isKeyPrimitive = 
CodeGenerator.isPrimitiveType(childDataType.keyType)
   val isValuePrimitive = 
CodeGenerator.isPrimitiveType(childDataType.valueType)
+
+  val wordSize = UnsafeRow.WORD_SIZE
+  val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize 
* 2
+  val elementSize = if (isKeyPrimitive && isValuePrimitive) {
+Some(structSize + wordSize)
+  } else {
+None
+  }
+
+  val allocation = ctx.createArrayData(arrayData, 
childDataType.keyType, numElements,
+s" $prettyName failed.", elementSize = elementSize)
+  val setFunc = ctx.setArrayElementFunc(childDataType.keyType)
+
   val code = if (isKeyPrimitive && isValuePrimitive) {
-genCodeForPrimitiveElements(ctx, keys, values, ev.value, 
numElements)
+val genCodeForPrimitive = genCodeForPrimitiveElements(
+  ctx, arrayData, keys, values, ev.value, numElements, structSize, 
setFunc)
+s"""
+   |if ($arrayData instanceof UnsafeArrayData) {
+   |  $genCodeForPrimitive
+   |} else {
+   |  ${genCodeForAnyElements(ctx, arrayData, keys, values, 
ev.value, numElements)}
+   |}
+ """.stripMargin
   } else {
-genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
+s"${genCodeForAnyElements(ctx, arrayData, keys, values, ev.value, 
numElements)}"
   }
+
   s"""
  |final int $numElements = $c.numElements();
  |final ArrayData $keys = $c.keyArray();
  |final ArrayData $values = $c.valueArray();
+ |$allocation
  |$code
""".stripMargin
 })
   }
 
-  private def getKey(varName: String) = CodeGenerator.getValue(varName, 
childDataType.keyType, "z")
+  private def getKey(varName: String, index: String) =
+CodeGenerator.getValue(varName, childDataType.keyType, index)
 
-  private def getValue(varName: String) = {
-CodeGenerator.getValue(varName, childDataType.valueType, "z")
-  }
+  private def getValue(varName: String, index: String) =
+CodeGenerator.getValue(varName, childDataType.valueType, index)
 
   private def genCodeForPrimitiveElements(
   ctx: CodegenContext,
+  arrayData: String,
   keys: String,
   values: String,
-  arrayData: String,
-  numElements: String): String = {
-val unsafeRow = ctx.freshName("unsafeRow")
+  resultArrayData: String,
+  numElements: String,
+  structSize: Int,
+  setFuncForKey: String): String = {
 val unsafeArrayData = ctx.freshName("unsafeArrayData")
+val baseObject = ctx.freshName("baseObject")
+val unsafeRow = ctx.freshName("unsafeRow")
 val structsOffset = ctx.freshName("structsOffset")
+val offset = ctx.freshName("offset")
+val z = ctx.freshName("z")
 val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
 
 val baseOffset = Platform.BYTE_ARRAY_OFFSET
 val wordSize = UnsafeRow.WORD_SIZE
-val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize * 
2
-val structSizeAsLong = structSize + "L"
-val keyTypeName = 
CodeGenerator.primitiveTypeName(childDataType.keyType)
+val structSizeAsLong = s"${structSize}L"
 val valueTypeName = 
CodeGenerator.primitiveTypeName(childDataType.valueType)
 
-val valueAssignment = s"$unsafeRow.set$valueTypeName(1, 
${getValue(values)});"
+val valueAssignment = s"$unsafeRow.set$valueTypeName(1, 
${getValue(values, z)});"
--- End diff --

We should use `ctx.setArrayElementFunc` to get the func name?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-16 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r210531207
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -735,70 +735,92 @@ class CodegenContext {
   }
 
   /**
-   * Generates code creating a [[UnsafeArrayData]].
+   * Generates code of setter for an [[ArrayData]]
+   *
+   * @param elementType data type of the elements in an [[ArrayData]]
+   *
+   * @return code representing a setter of an assignment for an 
[[ArrayData]]
+   */
+  def setArrayElementFunc(dataType: DataType): String = {
+val isPrimitiveType = CodeGenerator.isPrimitiveType(dataType)
+if (isPrimitiveType) {
+  s"set${CodeGenerator.primitiveTypeName(dataType)}"
+} else {
+  "update"
+}
+  }
+
+  /**
+   * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] 
based on
+   * given parameters.
*
* @param arrayName name of the array to create
+   * @param elementType data type of the elements in source array
* @param numElements code representing the number of elements the array 
should contain
-   * @param elementType data type of the elements in the array
* @param additionalErrorMessage string to include in the error message
+   * @param elementSize optional value which shows the size of an element 
of the allocated
+   *[[UnsafeArrayData]] or [[GenericArrayData]]
+   *
+   * @return code representing the allocation of [[ArrayData]]
+   * code representing a setter of an assignment for the generated 
array
*/
-  def createUnsafeArray(
+  def createArrayData(
   arrayName: String,
-  numElements: String,
   elementType: DataType,
-  additionalErrorMessage: String): String = {
-val arraySize = freshName("size")
-val arrayBytes = freshName("arrayBytes")
+  numElements: String,
+  additionalErrorMessage: String,
+  elementSize: Option[Int] = None): String = {
+val (isPrimitiveType, elemSize) = if (elementSize.isDefined) {
+  (false, elementSize.get)
+} else {
+  (CodeGenerator.isPrimitiveType(elementType), elementType.defaultSize)
+}
 
 s"""
-   |long $arraySize = 
UnsafeArrayData.calculateSizeOfUnderlyingByteArray(
-   |  $numElements,
-   |  ${elementType.defaultSize});
-   |if ($arraySize > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) {
-   |  throw new RuntimeException("Unsuccessful try create array with " 
+ $arraySize +
-   |" bytes of data due to exceeding the limit " +
-   |"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH} bytes for 
UnsafeArrayData." +
-   |"$additionalErrorMessage");
-   |}
-   |byte[] $arrayBytes = new byte[(int)$arraySize];
-   |UnsafeArrayData $arrayName = new UnsafeArrayData();
-   |Platform.putLong($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, 
$numElements);
-   |$arrayName.pointTo($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, 
(int)$arraySize);
-  """.stripMargin
+   |ArrayData $arrayName = ArrayData.allocateArrayData(
+   |  $elemSize, $numElements, $isPrimitiveType, 
"$additionalErrorMessage");
+ """.stripMargin
   }
 
   /**
-   * Generates code creating a [[UnsafeArrayData]]. The generated code 
executes
-   * a provided fallback when the size of backing array would exceed the 
array size limit.
-   * @param arrayName a name of the array to create
-   * @param numElements a piece of code representing the number of 
elements the array should contain
-   * @param elementSize a size of an element in bytes
-   * @param bodyCode a function generating code that fills up the 
[[UnsafeArrayData]]
-   * and getting the backing array as a parameter
-   * @param fallbackCode a piece of code executed when the array size 
limit is exceeded
+   * Generates assignment code for a [[ArrayData]]
+   *
+   * @param arrayName name of the array to create
+   * @param setFunc string to include in the error message
+   * @param elementType data type of the elements in source array
+   * @param srcArray code representing the number of elements the array 
should contain
+   * @param needNullCheck value which shows whether a nullcheck is 
required for the returning
+   *  assignment
+   * @param rhsValue optional expression for the right-hand side of the 
returning assignment
+   *
+   * @return code representing an assignment to each element of the 
[[ArrayData]], which requires
+   * a pair of destination and source loop index variables
*/
-  def 

[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-16 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r210530380
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -735,70 +735,98 @@ class CodegenContext {
   }
 
   /**
-   * Generates code creating a [[UnsafeArrayData]].
+   * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] 
based on
+   * given parameters.
*
* @param arrayName name of the array to create
+   * @param elementType data type of the elements in source array
* @param numElements code representing the number of elements the array 
should contain
-   * @param elementType data type of the elements in the array
* @param additionalErrorMessage string to include in the error message
+   * @param elementSize optional value which shows the size of an element 
of the allocated
+   *[[UnsafeArrayData]] or [[GenericArrayData]]
+   *
+   * @return code representing the allocation of [[ArrayData]]
+   * code representing a setter of an assignment for the generated 
array
*/
-  def createUnsafeArray(
+  def createArrayData(
--- End diff --

I'm still not sure if it would be good to have this. I think the method is 
small enough to inline. cc @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-16 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r210529856
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -34,6 +36,32 @@ object ArrayData {
 case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a)
 case other => new GenericArrayData(other)
   }
+
+
+  /**
+   * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given 
parameters.
+   *
+   * @param elementSize a size of an element in bytes
+   * @param numElements the number of elements the array should contain
+   * @param isPrimitiveType whether the type of an element is primitive 
type
+   * @param additionalErrorMessage string to include in the error message
+   */
+  def allocateArrayData(
+  elementSize: Int,
+  numElements : Long,
+  isPrimitiveType: Boolean,
+  additionalErrorMessage: String) : ArrayData = {
+if (isPrimitiveType && 
!UnsafeArrayData.shouldUseGenericArrayData(elementSize, numElements)) {
+  UnsafeArrayData.forPrimitiveArray(Platform.BYTE_ARRAY_OFFSET, 
numElements.toInt, elementSize)
--- End diff --

Do we need `Platform.BYTE_ARRAY_OFFSET` here? Seems like the method doesn't 
use the `offset` argument?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-16 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r210533651
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -385,53 +385,81 @@ case class MapEntries(child: Expression) extends 
UnaryExpression with ExpectsInp
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 nullSafeCodeGen(ctx, ev, c => {
+  val arrayData = ctx.freshName("arrayData")
   val numElements = ctx.freshName("numElements")
   val keys = ctx.freshName("keys")
   val values = ctx.freshName("values")
   val isKeyPrimitive = 
CodeGenerator.isPrimitiveType(childDataType.keyType)
   val isValuePrimitive = 
CodeGenerator.isPrimitiveType(childDataType.valueType)
+
+  val wordSize = UnsafeRow.WORD_SIZE
+  val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize 
* 2
+  val elementSize = if (isKeyPrimitive && isValuePrimitive) {
+Some(structSize + wordSize)
+  } else {
+None
+  }
+
+  val allocation = ctx.createArrayData(arrayData, 
childDataType.keyType, numElements,
+s" $prettyName failed.", elementSize = elementSize)
+  val setFunc = ctx.setArrayElementFunc(childDataType.keyType)
--- End diff --

We can delay this into `genCodeForPrimitiveElements`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-14 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r209878827
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -735,70 +735,98 @@ class CodegenContext {
   }
 
   /**
-   * Generates code creating a [[UnsafeArrayData]].
+   * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] 
based on
+   * given parameters.
*
* @param arrayName name of the array to create
+   * @param elementType data type of the elements in source array
* @param numElements code representing the number of elements the array 
should contain
-   * @param elementType data type of the elements in the array
* @param additionalErrorMessage string to include in the error message
+   * @param elementSize optional value which shows the size of an element 
of the allocated
+   *[[UnsafeArrayData]] or [[GenericArrayData]]
+   *
+   * @return code representing the allocation of [[ArrayData]]
+   * code representing a setter of an assignment for the generated 
array
*/
-  def createUnsafeArray(
+  def createArrayData(
--- End diff --

I think that it would be good to have a utility method to generate this 
call. WDYT?
On the other hand, I agree that we have a method `setArrayElementFunc` or 
`setArrayElement`. Thus, we can split this method into two methods that have 
only one return value.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-14 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r209877426
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -34,6 +36,37 @@ object ArrayData {
 case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a)
 case other => new GenericArrayData(other)
   }
+
+
+  /**
+   * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given 
parameters.
+   *
+   * @param elementSize a size of an element in bytes
+   * @param numElements the number of elements the array should contain
+   * @param isPrimitiveType whether the type of an element is primitive 
type
+   * @param additionalErrorMessage string to include in the error message
+   */
+  def allocateArrayData(
+  elementSize: Int,
+  numElements : Long,
+  isPrimitiveType: Boolean,
+  additionalErrorMessage: String) : ArrayData = {
+val arraySize = 
UnsafeArrayData.calculateSizeOfUnderlyingByteArray(numElements, elementSize)
+if (isPrimitiveType && 
!UnsafeArrayData.shouldUseGenericArrayData(elementSize, numElements)) {
--- End diff --

When `UnsafeArrayData` can be used, `GenericArrayData` is also used. 
However, if the element size is large, `GenericArrayData` should be used. But, 
`UnsafeArrayData` cannot be used.

Thus, I think that it would be good to use the current name 
`shouldUseGenericArrayData`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r209855565
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -34,6 +36,37 @@ object ArrayData {
 case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a)
 case other => new GenericArrayData(other)
   }
+
+
+  /**
+   * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given 
parameters.
+   *
+   * @param elementSize a size of an element in bytes
+   * @param numElements the number of elements the array should contain
+   * @param isPrimitiveType whether the type of an element is primitive 
type
+   * @param additionalErrorMessage string to include in the error message
+   */
+  def allocateArrayData(
+  elementSize: Int,
+  numElements : Long,
+  isPrimitiveType: Boolean,
+  additionalErrorMessage: String) : ArrayData = {
+val arraySize = 
UnsafeArrayData.calculateSizeOfUnderlyingByteArray(numElements, elementSize)
+if (isPrimitiveType && 
!UnsafeArrayData.shouldUseGenericArrayData(elementSize, numElements)) {
+  val arrayBytes = new Array[Byte](arraySize.toInt)
+  val arrayName = new UnsafeArrayData()
+  Platform.putLong(arrayBytes, Platform.BYTE_ARRAY_OFFSET, numElements)
+  arrayName.pointTo(arrayBytes, Platform.BYTE_ARRAY_OFFSET, 
arraySize.toInt)
+  arrayName
--- End diff --

Can't we use `UnsafeArrayData.forPrimitiveArray()`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r209854249
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -34,6 +36,37 @@ object ArrayData {
 case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a)
 case other => new GenericArrayData(other)
   }
+
+
+  /**
+   * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given 
parameters.
+   *
+   * @param elementSize a size of an element in bytes
+   * @param numElements the number of elements the array should contain
+   * @param isPrimitiveType whether the type of an element is primitive 
type
+   * @param additionalErrorMessage string to include in the error message
+   */
+  def allocateArrayData(
+  elementSize: Int,
+  numElements : Long,
+  isPrimitiveType: Boolean,
+  additionalErrorMessage: String) : ArrayData = {
+val arraySize = 
UnsafeArrayData.calculateSizeOfUnderlyingByteArray(numElements, elementSize)
+if (isPrimitiveType && 
!UnsafeArrayData.shouldUseGenericArrayData(elementSize, numElements)) {
+  val arrayBytes = new Array[Byte](arraySize.toInt)
+  val arrayName = new UnsafeArrayData()
+  Platform.putLong(arrayBytes, Platform.BYTE_ARRAY_OFFSET, numElements)
+  arrayName.pointTo(arrayBytes, Platform.BYTE_ARRAY_OFFSET, 
arraySize.toInt)
+  arrayName
+} else if (numElements <= 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH.toLong) {
+  new GenericArrayData(new Array[Any](numElements.toInt))
+} else {
+  throw new RuntimeException(s"Cannot create array with $numElements " 
+
+"elements of data due to exceeding the limit " +
+s"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH} elements for 
ArrayData." +
+s"$additionalErrorMessage")
--- End diff --

nit: no need to surround with `s""`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r209854002
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -34,6 +36,37 @@ object ArrayData {
 case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a)
 case other => new GenericArrayData(other)
   }
+
+
+  /**
+   * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given 
parameters.
+   *
+   * @param elementSize a size of an element in bytes
+   * @param numElements the number of elements the array should contain
+   * @param isPrimitiveType whether the type of an element is primitive 
type
+   * @param additionalErrorMessage string to include in the error message
+   */
+  def allocateArrayData(
+  elementSize: Int,
+  numElements : Long,
+  isPrimitiveType: Boolean,
+  additionalErrorMessage: String) : ArrayData = {
+val arraySize = 
UnsafeArrayData.calculateSizeOfUnderlyingByteArray(numElements, elementSize)
--- End diff --

Move into the following if-clause?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r209856692
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -735,70 +735,98 @@ class CodegenContext {
   }
 
   /**
-   * Generates code creating a [[UnsafeArrayData]].
+   * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] 
based on
+   * given parameters.
*
* @param arrayName name of the array to create
+   * @param elementType data type of the elements in source array
* @param numElements code representing the number of elements the array 
should contain
-   * @param elementType data type of the elements in the array
* @param additionalErrorMessage string to include in the error message
+   * @param elementSize optional value which shows the size of an element 
of the allocated
+   *[[UnsafeArrayData]] or [[GenericArrayData]]
+   *
+   * @return code representing the allocation of [[ArrayData]]
+   * code representing a setter of an assignment for the generated 
array
*/
-  def createUnsafeArray(
+  def createArrayData(
   arrayName: String,
-  numElements: String,
   elementType: DataType,
-  additionalErrorMessage: String): String = {
-val arraySize = freshName("size")
-val arrayBytes = freshName("arrayBytes")
+  numElements: String,
+  additionalErrorMessage: String,
+  elementSize: Option[Int] = None): (String, String) = {
+val isPrimitiveType = if (elementSize.isDefined) {
+  false
+} else {
+  CodeGenerator.isPrimitiveType(elementType)
+}
 
-s"""
-   |long $arraySize = 
UnsafeArrayData.calculateSizeOfUnderlyingByteArray(
-   |  $numElements,
-   |  ${elementType.defaultSize});
-   |if ($arraySize > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) {
-   |  throw new RuntimeException("Unsuccessful try create array with " 
+ $arraySize +
-   |" bytes of data due to exceeding the limit " +
-   |"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH} bytes for 
UnsafeArrayData." +
-   |"$additionalErrorMessage");
-   |}
-   |byte[] $arrayBytes = new byte[(int)$arraySize];
-   |UnsafeArrayData $arrayName = new UnsafeArrayData();
-   |Platform.putLong($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, 
$numElements);
-   |$arrayName.pointTo($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, 
(int)$arraySize);
-  """.stripMargin
+val setFunc = if (isPrimitiveType) {
+  s"set${CodeGenerator.primitiveTypeName(elementType)}"
+} else {
+  "update"
+}
+
+val elemSize = if (elementSize.isDefined) {
+  elementSize.get
+} else {
+  elementType.defaultSize
+}
+val allocation =
+  s"""
+ |ArrayData $arrayName = ArrayData.allocateArrayData(
+ |  $elemSize, $numElements, $isPrimitiveType, 
"$additionalErrorMessage");
+   """.stripMargin
+
+(allocation, setFunc)
   }
 
   /**
-   * Generates code creating a [[UnsafeArrayData]]. The generated code 
executes
-   * a provided fallback when the size of backing array would exceed the 
array size limit.
-   * @param arrayName a name of the array to create
-   * @param numElements a piece of code representing the number of 
elements the array should contain
-   * @param elementSize a size of an element in bytes
-   * @param bodyCode a function generating code that fills up the 
[[UnsafeArrayData]]
-   * and getting the backing array as a parameter
-   * @param fallbackCode a piece of code executed when the array size 
limit is exceeded
+   * Generates assignment code for a [[ArrayData]]
+   *
+   * @param arrayName name of the array to create
+   * @param dataType data type of the result array
+   * @param elementType data type of the elements in source array
+   * @param srcArray code representing the number of elements the array 
should contain
+   * @param setFunc string to include in the error message
+   * @param rhsValue optional expression for the right-hand side of the 
returning assignment
+   * @param checkForNull optional value which shows whether a nullcheck is 
required for
+   * the returning assignment
+   *
+   * @return code representing an assignment to each element of the 
[[ArrayData]], which requires
+   * a pair of destination and source loop index variables
*/
-  def createUnsafeArrayWithFallback(
+  def createArrayAssignment(
   arrayName: String,
-  numElements: String,
-  

[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r209858514
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -34,6 +36,37 @@ object ArrayData {
 case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a)
 case other => new GenericArrayData(other)
   }
+
+
+  /**
+   * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given 
parameters.
+   *
+   * @param elementSize a size of an element in bytes
+   * @param numElements the number of elements the array should contain
+   * @param isPrimitiveType whether the type of an element is primitive 
type
+   * @param additionalErrorMessage string to include in the error message
+   */
+  def allocateArrayData(
+  elementSize: Int,
+  numElements : Long,
+  isPrimitiveType: Boolean,
+  additionalErrorMessage: String) : ArrayData = {
+val arraySize = 
UnsafeArrayData.calculateSizeOfUnderlyingByteArray(numElements, elementSize)
+if (isPrimitiveType && 
!UnsafeArrayData.shouldUseGenericArrayData(elementSize, numElements)) {
--- End diff --

The method `UnsafeArrayData.shouldUseGenericArrayData()` should be 
`UnsafeArrayData.shouldUseUnsafeArrayData()`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r209853481
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -735,70 +735,98 @@ class CodegenContext {
   }
 
   /**
-   * Generates code creating a [[UnsafeArrayData]].
+   * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] 
based on
+   * given parameters.
*
* @param arrayName name of the array to create
+   * @param elementType data type of the elements in source array
* @param numElements code representing the number of elements the array 
should contain
-   * @param elementType data type of the elements in the array
* @param additionalErrorMessage string to include in the error message
+   * @param elementSize optional value which shows the size of an element 
of the allocated
+   *[[UnsafeArrayData]] or [[GenericArrayData]]
+   *
+   * @return code representing the allocation of [[ArrayData]]
+   * code representing a setter of an assignment for the generated 
array
*/
-  def createUnsafeArray(
+  def createArrayData(
   arrayName: String,
-  numElements: String,
   elementType: DataType,
-  additionalErrorMessage: String): String = {
-val arraySize = freshName("size")
-val arrayBytes = freshName("arrayBytes")
+  numElements: String,
+  additionalErrorMessage: String,
+  elementSize: Option[Int] = None): (String, String) = {
+val isPrimitiveType = if (elementSize.isDefined) {
+  false
+} else {
+  CodeGenerator.isPrimitiveType(elementType)
+}
 
-s"""
-   |long $arraySize = 
UnsafeArrayData.calculateSizeOfUnderlyingByteArray(
-   |  $numElements,
-   |  ${elementType.defaultSize});
-   |if ($arraySize > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) {
-   |  throw new RuntimeException("Unsuccessful try create array with " 
+ $arraySize +
-   |" bytes of data due to exceeding the limit " +
-   |"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH} bytes for 
UnsafeArrayData." +
-   |"$additionalErrorMessage");
-   |}
-   |byte[] $arrayBytes = new byte[(int)$arraySize];
-   |UnsafeArrayData $arrayName = new UnsafeArrayData();
-   |Platform.putLong($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, 
$numElements);
-   |$arrayName.pointTo($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, 
(int)$arraySize);
-  """.stripMargin
+val setFunc = if (isPrimitiveType) {
+  s"set${CodeGenerator.primitiveTypeName(elementType)}"
+} else {
+  "update"
+}
--- End diff --

Maybe only this is good to have. `def setArrayElementFunc(dt: DataType): 
String`?
Or how about `def setArrayElement(array: String, dataType: DataType, 
ordinal: Int, value: String): String` like other set utilities?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r209862522
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3330,50 +3136,123 @@ case class ArrayRemove(left: Expression, right: 
Expression)
 val pos = ctx.freshName("pos")
 val getValue = CodeGenerator.getValue(inputArray, elementType, i)
 val isEqual = ctx.genEqual(elementType, value, getValue)
-if (!CodeGenerator.isPrimitiveType(elementType)) {
-  val arrayClass = classOf[GenericArrayData].getName
+
+val (allocation, setFunc) =
+  ctx.createArrayData(values, elementType, newArraySize, s" 
$prettyName failed.")
+val assignment = ctx.createArrayAssignment(
+  values, dataType, elementType, inputArray, setFunc, checkForNull = 
Some(false))
+
+s"""
+   |$allocation
+   |int $pos = 0;
+   |for (int $i = 0; $i < $inputArray.numElements(); $i ++) {
+   |  if ($inputArray.isNullAt($i)) {
+   |$values.setNullAt($pos);
+   |$pos = $pos + 1;
+   |  }
+   |  else {
+   |if (!($isEqual)) {
+   |  ${assignment(pos, i)}
+   |  $pos = $pos + 1;
+   |}
+   |  }
+   |}
+   |${ev.value} = $values;
+ """.stripMargin
+  }
+
+  override def prettyName: String = "array_remove"
+}
+
+/**
+ * Will become common base class for [[ArrayDistinct]], [[ArrayUnion]], 
[[ArrayIntersect]],
+ * and [[ArrayExcept]].
+ */
+trait ArraySetLike {
+  @transient protected lazy val dt: DataType = NullType
+  @transient protected lazy val et: DataType = NullType
--- End diff --

How about: `protected def dt: ArrayType` and `protected def et: DataType`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r209852203
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -735,70 +735,98 @@ class CodegenContext {
   }
 
   /**
-   * Generates code creating a [[UnsafeArrayData]].
+   * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] 
based on
+   * given parameters.
*
* @param arrayName name of the array to create
+   * @param elementType data type of the elements in source array
* @param numElements code representing the number of elements the array 
should contain
-   * @param elementType data type of the elements in the array
* @param additionalErrorMessage string to include in the error message
+   * @param elementSize optional value which shows the size of an element 
of the allocated
+   *[[UnsafeArrayData]] or [[GenericArrayData]]
+   *
+   * @return code representing the allocation of [[ArrayData]]
+   * code representing a setter of an assignment for the generated 
array
*/
-  def createUnsafeArray(
+  def createArrayData(
--- End diff --

Now that we have `ArrayData.allocateArrayData()`, do we need this method? 
How about calling it directly instead?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-13 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r209643599
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -735,70 +735,100 @@ class CodegenContext {
   }
 
   /**
-   * Generates code creating a [[UnsafeArrayData]].
+   * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] 
based on
+   * given parameters.
*
* @param arrayName name of the array to create
+   * @param elementType data type of the elements in source array
* @param numElements code representing the number of elements the array 
should contain
-   * @param elementType data type of the elements in the array
* @param additionalErrorMessage string to include in the error message
+   * @param elementSize optional value which shows the size of an element 
of the allocated
+   *[[UnsafeArrayData]] or [[GenericArrayData]]
+   *
+   * @return code representing the allocation of [[ArrayData]]
+   * code representing a setter of an assignment for the generated 
array
*/
-  def createUnsafeArray(
+  def createArrayData(
   arrayName: String,
-  numElements: String,
   elementType: DataType,
-  additionalErrorMessage: String): String = {
-val arraySize = freshName("size")
-val arrayBytes = freshName("arrayBytes")
+  numElements: String,
+  additionalErrorMessage: String,
+  elementSize: Option[Int] = None): (String, String) = {
--- End diff --

Yeah, they are exclusive. However, when `elementType` is not used, the 
integer value is required.

`MapEntries` stores `UnsafeRow` instead of primitive data. 
[This](https://github.com/apache/spark/blob/c15630014c7ef850fa21c1247fa5edd2b1bac81b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L405)
 requires non-fixed size unlike `elementType.defaultSize`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r209574445
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -735,70 +735,100 @@ class CodegenContext {
   }
 
   /**
-   * Generates code creating a [[UnsafeArrayData]].
+   * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] 
based on
+   * given parameters.
*
* @param arrayName name of the array to create
+   * @param elementType data type of the elements in source array
* @param numElements code representing the number of elements the array 
should contain
-   * @param elementType data type of the elements in the array
* @param additionalErrorMessage string to include in the error message
+   * @param elementSize optional value which shows the size of an element 
of the allocated
+   *[[UnsafeArrayData]] or [[GenericArrayData]]
+   *
+   * @return code representing the allocation of [[ArrayData]]
+   * code representing a setter of an assignment for the generated 
array
*/
-  def createUnsafeArray(
+  def createArrayData(
   arrayName: String,
-  numElements: String,
   elementType: DataType,
-  additionalErrorMessage: String): String = {
-val arraySize = freshName("size")
-val arrayBytes = freshName("arrayBytes")
+  numElements: String,
+  additionalErrorMessage: String,
+  elementSize: Option[Int] = None): (String, String) = {
+val isPrimitiveType = if (elementSize.isDefined) {
+  false
+} else {
+  CodeGenerator.isPrimitiveType(elementType)
+}
 
-s"""
-   |long $arraySize = 
UnsafeArrayData.calculateSizeOfUnderlyingByteArray(
-   |  $numElements,
-   |  ${elementType.defaultSize});
-   |if ($arraySize > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) {
-   |  throw new RuntimeException("Unsuccessful try create array with " 
+ $arraySize +
-   |" bytes of data due to exceeding the limit " +
-   |"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH} bytes for 
UnsafeArrayData." +
-   |"$additionalErrorMessage");
-   |}
-   |byte[] $arrayBytes = new byte[(int)$arraySize];
-   |UnsafeArrayData $arrayName = new UnsafeArrayData();
-   |Platform.putLong($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, 
$numElements);
-   |$arrayName.pointTo($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, 
(int)$arraySize);
-  """.stripMargin
+val setFunc = if (isPrimitiveType) {
+  s"set${CodeGenerator.primitiveTypeName(elementType)}"
+} else {
+  "update"
+}
+
+val elemSize = if (elementSize.isDefined) {
+  elementSize.get
+} else {
+  elementType.defaultSize
+}
+val arrayData = classOf[ArrayData].getName
+val allocation =
+  s"""
+ |ArrayData $arrayName = $arrayData$$.MODULE$$.allocateArrayData(
+ |  $elemSize, $numElements, $isPrimitiveType, 
"$additionalErrorMessage");
+   """.stripMargin
+
+(allocation, setFunc)
   }
 
   /**
-   * Generates code creating a [[UnsafeArrayData]]. The generated code 
executes
-   * a provided fallback when the size of backing array would exceed the 
array size limit.
-   * @param arrayName a name of the array to create
-   * @param numElements a piece of code representing the number of 
elements the array should contain
-   * @param elementSize a size of an element in bytes
-   * @param bodyCode a function generating code that fills up the 
[[UnsafeArrayData]]
-   * and getting the backing array as a parameter
-   * @param fallbackCode a piece of code executed when the array size 
limit is exceeded
+   * Generates assignment code for a [[ArrayData]]
+   *
+   * @param arrayName name of the array to create
+   * @param dataType data type of the result array
+   * @param elementType data type of the elements in source array
+   * @param srcArray code representing the number of elements the array 
should contain
+   * @param setFunc string to include in the error message
+   * @param rhsValue an optionally specified expression for the right-hand 
side of the returning
+   * assignment
+   * @param checkForNull optional value which shows whether a nullcheck is 
required for
+   * the returning assignment
+   *
+   * @return code representing an assignment to each element of the 
[[ArrayData]], which requires
+   * a pair of destination and source loop index variables
*/
-  def createUnsafeArrayWithFallback(

[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r209573854
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -735,70 +735,100 @@ class CodegenContext {
   }
 
   /**
-   * Generates code creating a [[UnsafeArrayData]].
+   * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] 
based on
+   * given parameters.
*
* @param arrayName name of the array to create
+   * @param elementType data type of the elements in source array
* @param numElements code representing the number of elements the array 
should contain
-   * @param elementType data type of the elements in the array
* @param additionalErrorMessage string to include in the error message
+   * @param elementSize optional value which shows the size of an element 
of the allocated
+   *[[UnsafeArrayData]] or [[GenericArrayData]]
+   *
+   * @return code representing the allocation of [[ArrayData]]
+   * code representing a setter of an assignment for the generated 
array
*/
-  def createUnsafeArray(
+  def createArrayData(
   arrayName: String,
-  numElements: String,
   elementType: DataType,
-  additionalErrorMessage: String): String = {
-val arraySize = freshName("size")
-val arrayBytes = freshName("arrayBytes")
+  numElements: String,
+  additionalErrorMessage: String,
+  elementSize: Option[Int] = None): (String, String) = {
+val isPrimitiveType = if (elementSize.isDefined) {
+  false
+} else {
+  CodeGenerator.isPrimitiveType(elementType)
+}
 
-s"""
-   |long $arraySize = 
UnsafeArrayData.calculateSizeOfUnderlyingByteArray(
-   |  $numElements,
-   |  ${elementType.defaultSize});
-   |if ($arraySize > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) {
-   |  throw new RuntimeException("Unsuccessful try create array with " 
+ $arraySize +
-   |" bytes of data due to exceeding the limit " +
-   |"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH} bytes for 
UnsafeArrayData." +
-   |"$additionalErrorMessage");
-   |}
-   |byte[] $arrayBytes = new byte[(int)$arraySize];
-   |UnsafeArrayData $arrayName = new UnsafeArrayData();
-   |Platform.putLong($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, 
$numElements);
-   |$arrayName.pointTo($arrayBytes, ${Platform.BYTE_ARRAY_OFFSET}, 
(int)$arraySize);
-  """.stripMargin
+val setFunc = if (isPrimitiveType) {
+  s"set${CodeGenerator.primitiveTypeName(elementType)}"
+} else {
+  "update"
+}
+
+val elemSize = if (elementSize.isDefined) {
+  elementSize.get
+} else {
+  elementType.defaultSize
+}
+val arrayData = classOf[ArrayData].getName
+val allocation =
+  s"""
+ |ArrayData $arrayName = $arrayData$$.MODULE$$.allocateArrayData(
--- End diff --

IIRC java can call methods of scala object directly, can you verify if 
`ArrayData.allocateArrayData` works?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r209573448
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -735,70 +735,100 @@ class CodegenContext {
   }
 
   /**
-   * Generates code creating a [[UnsafeArrayData]].
+   * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] 
based on
+   * given parameters.
*
* @param arrayName name of the array to create
+   * @param elementType data type of the elements in source array
* @param numElements code representing the number of elements the array 
should contain
-   * @param elementType data type of the elements in the array
* @param additionalErrorMessage string to include in the error message
+   * @param elementSize optional value which shows the size of an element 
of the allocated
+   *[[UnsafeArrayData]] or [[GenericArrayData]]
+   *
+   * @return code representing the allocation of [[ArrayData]]
+   * code representing a setter of an assignment for the generated 
array
*/
-  def createUnsafeArray(
+  def createArrayData(
   arrayName: String,
-  numElements: String,
   elementType: DataType,
-  additionalErrorMessage: String): String = {
-val arraySize = freshName("size")
-val arrayBytes = freshName("arrayBytes")
+  numElements: String,
+  additionalErrorMessage: String,
+  elementSize: Option[Int] = None): (String, String) = {
--- End diff --

why do we need both `elementType` and `elementSize`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org