Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r209877426 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala --- @@ -34,6 +36,37 @@ object ArrayData { case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a) case other => new GenericArrayData(other) } + + + /** + * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given parameters. + * + * @param elementSize a size of an element in bytes + * @param numElements the number of elements the array should contain + * @param isPrimitiveType whether the type of an element is primitive type + * @param additionalErrorMessage string to include in the error message + */ + def allocateArrayData( + elementSize: Int, + numElements : Long, + isPrimitiveType: Boolean, + additionalErrorMessage: String) : ArrayData = { + val arraySize = UnsafeArrayData.calculateSizeOfUnderlyingByteArray(numElements, elementSize) + if (isPrimitiveType && !UnsafeArrayData.shouldUseGenericArrayData(elementSize, numElements)) { --- End diff -- When `UnsafeArrayData` can be used, `GenericArrayData` is also used. However, if the element size is large, `GenericArrayData` should be used. But, `UnsafeArrayData` cannot be used. Thus, I think that it would be good to use the current name `shouldUseGenericArrayData`.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org