[GitHub] spark pull request #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for Arr...

2018-04-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20984


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for Arr...

2018-04-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20984#discussion_r181339660
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -164,3 +167,46 @@ abstract class ArrayData extends SpecializedGetters 
with Serializable {
 }
   }
 }
+
+/**
+ * Implements an `IndexedSeq` interface for `ArrayData`. Notice that if 
the original `ArrayData`
+ * is a primitive array and contains null elements, it is better to ask 
for `IndexedSeq[Any]`,
+ * instead of `IndexedSeq[Int]`, in order to keep the null elements.
+ */
+class ArrayDataIndexedSeq[T](arrayData: ArrayData, dataType: DataType) 
extends IndexedSeq[T] {
+
+  private def getAccessor(dataType: DataType): (Int) => Any = dataType 
match {
+case BooleanType => (idx: Int) => arrayData.getBoolean(idx)
+case ByteType => (idx: Int) => arrayData.getByte(idx)
+case ShortType => (idx: Int) => arrayData.getShort(idx)
+case IntegerType => (idx: Int) => arrayData.getInt(idx)
--- End diff --

I'd like to reuse the access getter in #20981 which covers `DateType` and 
`TimestampType`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for Arr...

2018-04-12 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20984#discussion_r181140913
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/ArrayDataIndexedSeqSuite.scala
 ---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+import org.apache.spark.sql.catalyst.expressions.UnsafeArrayData
+import org.apache.spark.sql.types.StringType
+import org.apache.spark.unsafe.types.UTF8String
+
+class ArrayDataIndexedSeqSuite extends SparkFunSuite {
+  def utf8(str: String): UTF8String = UTF8String.fromString(str)
+  val stringArray = Array("1", "10", "100", null)
+
+  private def testArrayData(arrayData: ArrayData): Unit = {
+assert(arrayData.numElements == stringArray.length)
+stringArray.zipWithIndex.map { case (e, i) =>
+  if (e != null) {
+assert(arrayData.getUTF8String(i).toString().equals(e))
+  } else {
+assert(arrayData.isNullAt(i))
+  }
+}
+
+val seq = arrayData.toSeq[UTF8String](StringType)
+stringArray.zipWithIndex.map { case (e, i) =>
+  if (e != null) {
+assert(seq(i).toString().equals(e))
+  } else {
+assert(seq(i) == null)
+  }
+}
+
+intercept[IndexOutOfBoundsException] {
+  seq(-1)
+}.getMessage().contains("must be between 0 and the length of the 
ArrayData.")
+
+intercept[IndexOutOfBoundsException] {
+  seq(seq.length)
+}.getMessage().contains("must be between 0 and the length of the 
ArrayData.")
+  }
+
+  test("ArrayDataIndexedSeq can work on GenericArrayData") {
+val arrayData = new GenericArrayData(stringArray.map(utf8(_)))
+testArrayData(arrayData)
+  }
+
+  test("ArrayDataIndexedSeq can work on UnsafeArrayData") {
+val unsafeArrayData = 
ExpressionEncoder[Array[String]].resolveAndBind().
+  toRow(stringArray).getArray(0)
+assert(unsafeArrayData.isInstanceOf[UnsafeArrayData])
+testArrayData(unsafeArrayData)
--- End diff --

+1 for all type test case. :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for Arr...

2018-04-12 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20984#discussion_r181140698
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -164,3 +167,46 @@ abstract class ArrayData extends SpecializedGetters 
with Serializable {
 }
   }
 }
+
+/**
+ * Implements an `IndexedSeq` interface for `ArrayData`. Notice that if 
the original `ArrayData`
+ * is a primitive array and contains null elements, it is better to ask 
for `IndexedSeq[Any]`,
+ * instead of `IndexedSeq[Int]`, in order to keep the null elements.
+ */
+class ArrayDataIndexedSeq[T](arrayData: ArrayData, dataType: DataType) 
extends IndexedSeq[T] {
+
+  private def getAccessor(dataType: DataType): (Int) => Any = dataType 
match {
+case BooleanType => (idx: Int) => arrayData.getBoolean(idx)
+case ByteType => (idx: Int) => arrayData.getByte(idx)
+case ShortType => (idx: Int) => arrayData.getShort(idx)
+case IntegerType => (idx: Int) => arrayData.getInt(idx)
--- End diff --

`DateType` and `TimestampType`? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for Arr...

2018-04-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20984#discussion_r180420612
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -164,3 +167,46 @@ abstract class ArrayData extends SpecializedGetters 
with Serializable {
 }
   }
 }
+
+/**
+ * Implements an `IndexedSeq` interface for `ArrayData`. Notice that if 
the original `ArrayData`
+ * is a primitive array and contains null elements, it is better to ask 
for `IndexedSeq[Any]`,
+ * instead of `IndexedSeq[Int]`, in order to keep the null elements.
+ */
+class ArrayDataIndexedSeq[T](arrayData: ArrayData, dataType: DataType) 
extends IndexedSeq[T] {
+
+  private def getAccessor(dataType: DataType): (Int) => Any = dataType 
match {
--- End diff --

Ok. I will also want to reuse the accessor getter in #20981 too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for Arr...

2018-04-10 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20984#discussion_r180364984
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -164,3 +167,46 @@ abstract class ArrayData extends SpecializedGetters 
with Serializable {
 }
   }
 }
+
+/**
+ * Implements an `IndexedSeq` interface for `ArrayData`. Notice that if 
the original `ArrayData`
+ * is a primitive array and contains null elements, it is better to ask 
for `IndexedSeq[Any]`,
+ * instead of `IndexedSeq[Int]`, in order to keep the null elements.
+ */
+class ArrayDataIndexedSeq[T](arrayData: ArrayData, dataType: DataType) 
extends IndexedSeq[T] {
+
+  private def getAccessor(dataType: DataType): (Int) => Any = dataType 
match {
--- End diff --

Suggestion for a small follow-up: We could also use the `accessor` to 
improve the `foreach` construct.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for Arr...

2018-04-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20984#discussion_r180361913
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -164,3 +167,40 @@ abstract class ArrayData extends SpecializedGetters 
with Serializable {
 }
   }
 }
+
+class ArrayDataIndexedSeq[T](arrayData: ArrayData, dataType: DataType) 
extends IndexedSeq[T] {
+
+  private def getAccessor(dataType: DataType): (Int) => Any = dataType 
match {
+case BooleanType => (idx: Int) => arrayData.getBoolean(idx)
+case ByteType => (idx: Int) => arrayData.getByte(idx)
+case ShortType => (idx: Int) => arrayData.getShort(idx)
+case IntegerType => (idx: Int) => arrayData.getInt(idx)
+case LongType => (idx: Int) => arrayData.getLong(idx)
+case FloatType => (idx: Int) => arrayData.getFloat(idx)
+case DoubleType => (idx: Int) => arrayData.getDouble(idx)
+case d: DecimalType => (idx: Int) => arrayData.getDecimal(idx, 
d.precision, d.scale)
+case CalendarIntervalType => (idx: Int) => arrayData.getInterval(idx)
+case StringType => (idx: Int) => arrayData.getUTF8String(idx)
+case BinaryType => (idx: Int) => arrayData.getBinary(idx)
+case s: StructType => (idx: Int) => arrayData.getStruct(idx, s.length)
+case _: ArrayType => (idx: Int) => arrayData.getArray(idx)
+case _: MapType => (idx: Int) => arrayData.getMap(idx)
+case u: UserDefinedType[_] => getAccessor(u.sqlType)
+case _ => (idx: Int) => arrayData.get(idx, dataType)
+  }
+
+  private val accessor: (Int) => Any = getAccessor(dataType)
+
+  override def apply(idx: Int): T = if (0 <= idx && idx < 
arrayData.numElements()) {
--- End diff --

Ok.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for Arr...

2018-04-10 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20984#discussion_r180360146
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -164,3 +167,40 @@ abstract class ArrayData extends SpecializedGetters 
with Serializable {
 }
   }
 }
+
+class ArrayDataIndexedSeq[T](arrayData: ArrayData, dataType: DataType) 
extends IndexedSeq[T] {
+
+  private def getAccessor(dataType: DataType): (Int) => Any = dataType 
match {
+case BooleanType => (idx: Int) => arrayData.getBoolean(idx)
+case ByteType => (idx: Int) => arrayData.getByte(idx)
+case ShortType => (idx: Int) => arrayData.getShort(idx)
+case IntegerType => (idx: Int) => arrayData.getInt(idx)
+case LongType => (idx: Int) => arrayData.getLong(idx)
+case FloatType => (idx: Int) => arrayData.getFloat(idx)
+case DoubleType => (idx: Int) => arrayData.getDouble(idx)
+case d: DecimalType => (idx: Int) => arrayData.getDecimal(idx, 
d.precision, d.scale)
+case CalendarIntervalType => (idx: Int) => arrayData.getInterval(idx)
+case StringType => (idx: Int) => arrayData.getUTF8String(idx)
+case BinaryType => (idx: Int) => arrayData.getBinary(idx)
+case s: StructType => (idx: Int) => arrayData.getStruct(idx, s.length)
+case _: ArrayType => (idx: Int) => arrayData.getArray(idx)
+case _: MapType => (idx: Int) => arrayData.getMap(idx)
+case u: UserDefinedType[_] => getAccessor(u.sqlType)
+case _ => (idx: Int) => arrayData.get(idx, dataType)
+  }
+
+  private val accessor: (Int) => Any = getAccessor(dataType)
+
+  override def apply(idx: Int): T = if (0 <= idx && idx < 
arrayData.numElements()) {
--- End diff --

NITish: Can you put the if statement on a separate line? This is kinda hard 
to read.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for Arr...

2018-04-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20984#discussion_r180358272
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/ArrayDataIndexedSeqSuite.scala
 ---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+import org.apache.spark.sql.catalyst.expressions.UnsafeArrayData
+import org.apache.spark.sql.types.StringType
+import org.apache.spark.unsafe.types.UTF8String
+
+class ArrayDataIndexedSeqSuite extends SparkFunSuite {
+  def utf8(str: String): UTF8String = UTF8String.fromString(str)
+  val stringArray = Array("1", "10", "100", null)
+
+  private def testArrayData(arrayData: ArrayData): Unit = {
+assert(arrayData.numElements == stringArray.length)
+stringArray.zipWithIndex.map { case (e, i) =>
+  if (e != null) {
+assert(arrayData.getUTF8String(i).toString().equals(e))
+  } else {
+assert(arrayData.isNullAt(i))
+  }
+}
+
+val seq = arrayData.toSeq[UTF8String](StringType)
+stringArray.zipWithIndex.map { case (e, i) =>
+  if (e != null) {
+assert(seq(i).toString().equals(e))
+  } else {
+assert(seq(i) == null)
+  }
+}
+
+intercept[IndexOutOfBoundsException] {
+  seq(-1)
+}.getMessage().contains("must be between 0 and the length of the 
ArrayData.")
+
+intercept[IndexOutOfBoundsException] {
+  seq(seq.length)
+}.getMessage().contains("must be between 0 and the length of the 
ArrayData.")
+  }
+
+  test("ArrayDataIndexedSeq can work on GenericArrayData") {
+val arrayData = new GenericArrayData(stringArray.map(utf8(_)))
+testArrayData(arrayData)
+  }
+
+  test("ArrayDataIndexedSeq can work on UnsafeArrayData") {
+val unsafeArrayData = 
ExpressionEncoder[Array[String]].resolveAndBind().
+  toRow(stringArray).getArray(0)
+assert(unsafeArrayData.isInstanceOf[UnsafeArrayData])
+testArrayData(unsafeArrayData)
--- End diff --

This test may need polish to test again all possible types.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for Arr...

2018-04-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20984#discussion_r180357829
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -164,3 +167,37 @@ abstract class ArrayData extends SpecializedGetters 
with Serializable {
 }
   }
 }
+
+class ArrayDataIndexedSeq[T](arrayData: ArrayData, dataType: DataType) 
extends IndexedSeq[T] {
+
+  private lazy val accessor: (Int) => Any = dataType match {
+case BooleanType => (idx: Int) => arrayData.getBoolean(idx)
+case ByteType => (idx: Int) => arrayData.getByte(idx)
+case ShortType => (idx: Int) => arrayData.getShort(idx)
+case IntegerType => (idx: Int) => arrayData.getInt(idx)
+case LongType => (idx: Int) => arrayData.getLong(idx)
+case FloatType => (idx: Int) => arrayData.getFloat(idx)
+case DoubleType => (idx: Int) => arrayData.getDouble(idx)
+case d: DecimalType => (idx: Int) => arrayData.getDecimal(idx, 
d.precision, d.scale)
+case CalendarIntervalType => (idx: Int) => arrayData.getInterval(idx)
+case StringType => (idx: Int) => arrayData.getUTF8String(idx)
+case BinaryType => (idx: Int) => arrayData.getBinary(idx)
+case s: StructType => (idx: Int) => arrayData.getStruct(idx, s.length)
+case _: ArrayType => (idx: Int) => arrayData.getArray(idx)
+case _: MapType => (idx: Int) => arrayData.getMap(idx)
+case _ => (idx: Int) => arrayData.get(idx, dataType)
+  }
+
+  override def apply(idx: Int): T = if (0 <= idx && idx < 
arrayData.numElements()) {
+if (arrayData.isNullAt(idx)) {
+  null.asInstanceOf[T]
--- End diff --

For primitive ArrayData, this seems to be an issue for null elements...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for Arr...

2018-04-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20984#discussion_r180292435
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -164,3 +167,32 @@ abstract class ArrayData extends SpecializedGetters 
with Serializable {
 }
   }
 }
+
+class ArrayDataIndexedSeq[T](arrayData: ArrayData, dataType: DataType) 
extends IndexedSeq[T] {
+
+  private lazy val accessor: (Int) => Any = dataType match {
+case BooleanType => (idx: Int) => arrayData.getBoolean(idx)
+case ByteType => (idx: Int) => arrayData.getByte(idx)
+case ShortType => (idx: Int) => arrayData.getShort(idx)
+case IntegerType => (idx: Int) => arrayData.getInt(idx)
+case LongType => (idx: Int) => arrayData.getLong(idx)
+case FloatType => (idx: Int) => arrayData.getFloat(idx)
+case DoubleType => (idx: Int) => arrayData.getDouble(idx)
+case d: DecimalType => (idx: Int) => arrayData.getDecimal(idx, 
d.precision, d.scale)
+case CalendarIntervalType => (idx: Int) => arrayData.getInterval(idx)
+case StringType => (idx: Int) => arrayData.getUTF8String(idx)
+case BinaryType => (idx: Int) => arrayData.getBinary(idx)
+case s: StructType => (idx: Int) => arrayData.getStruct(idx, s.length)
+case _: ArrayType => (idx: Int) => arrayData.getArray(idx)
+case _: MapType => (idx: Int) => arrayData.getMap(idx)
+case _ => (idx: Int) => arrayData.get(idx, dataType)
+  }
+
+  override def apply(idx: Int): T = if (idx < arrayData.numElements()) {
--- End diff --

Thanks. Added the check now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for Arr...

2018-04-07 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20984#discussion_r179920210
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -164,3 +167,32 @@ abstract class ArrayData extends SpecializedGetters 
with Serializable {
 }
   }
 }
+
+class ArrayDataIndexedSeq[T](arrayData: ArrayData, dataType: DataType) 
extends IndexedSeq[T] {
+
+  private lazy val accessor: (Int) => Any = dataType match {
+case BooleanType => (idx: Int) => arrayData.getBoolean(idx)
+case ByteType => (idx: Int) => arrayData.getByte(idx)
+case ShortType => (idx: Int) => arrayData.getShort(idx)
+case IntegerType => (idx: Int) => arrayData.getInt(idx)
+case LongType => (idx: Int) => arrayData.getLong(idx)
+case FloatType => (idx: Int) => arrayData.getFloat(idx)
+case DoubleType => (idx: Int) => arrayData.getDouble(idx)
+case d: DecimalType => (idx: Int) => arrayData.getDecimal(idx, 
d.precision, d.scale)
+case CalendarIntervalType => (idx: Int) => arrayData.getInterval(idx)
+case StringType => (idx: Int) => arrayData.getUTF8String(idx)
+case BinaryType => (idx: Int) => arrayData.getBinary(idx)
+case s: StructType => (idx: Int) => arrayData.getStruct(idx, s.length)
+case _: ArrayType => (idx: Int) => arrayData.getArray(idx)
+case _: MapType => (idx: Int) => arrayData.getMap(idx)
+case _ => (idx: Int) => arrayData.get(idx, dataType)
+  }
+
+  override def apply(idx: Int): T = if (idx < arrayData.numElements()) {
--- End diff --

Do we need a check `0 <= idx`, too? If so, it would be good to update a 
message in the exception.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for Arr...

2018-04-05 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/20984

[SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData

## What changes were proposed in this pull request?

We don't have a good way to sequentially access `UnsafeArrayData` with a 
common interface such as `Seq`. An example is `MapObject` where we need to 
access several sequence collection types together. But `UnsafeArrayData` 
doesn't implement `ArrayData.array`. Calling `toArray` will copy the entire 
array. We can provide an `IndexedSeq` wrapper for `ArrayData`, so we can avoid 
copying the entire array.

## How was this patch tested?

TBD.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-23875

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20984.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20984


commit 27eddd764bc60f9f41156c3548e98e6aea68fd02
Author: Liang-Chi Hsieh 
Date:   2018-04-05T14:33:39Z

Add IndexedSeq wrapper for ArrayData.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org