[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21053


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-18 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r182486210
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1846,6 +1846,28 @@ def array_contains(col, value):
 return Column(sc._jvm.functions.array_contains(_to_java_column(col), 
value))
 
 
+@ignore_unicode_prefix
+@since(2.4)
+def element_at(col, value):
--- End diff --

Sure, done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-18 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r182362454
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1846,6 +1846,28 @@ def array_contains(col, value):
 return Column(sc._jvm.functions.array_contains(_to_java_column(col), 
value))
 
 
+@ignore_unicode_prefix
+@since(2.4)
+def element_at(col, value):
--- End diff --

How about `extraction` which is from `Column.apply(extraction)`? cc 
@gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-18 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r182350430
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1846,6 +1846,28 @@ def array_contains(col, value):
 return Column(sc._jvm.functions.array_contains(_to_java_column(col), 
value))
 
 
+@ignore_unicode_prefix
+@since(2.4)
+def element_at(col, value):
--- End diff --

Umm, the 2nd argument acts as `index` for an array or `key` for a map. This 
is why I used `value`.
Can I use `index`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-18 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r182325781
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -417,3 +417,106 @@ case class ArrayMax(child: Expression) extends 
UnaryExpression with ImplicitCast
 
   override def prettyName: String = "array_max"
 }
+
+/**
+ * Returns the value of index `right` in Array `left` or the value for key 
`right` in Map `left`.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, index) - Returns element of array at given index. If 
index < 0, accesses elements
+  from the last to the first. Returns NULL if the index exceeds the 
length of the array.
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-18 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r182327230
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -417,3 +417,106 @@ case class ArrayMax(child: Expression) extends 
UnaryExpression with ImplicitCast
 
   override def prettyName: String = "array_max"
 }
+
+/**
+ * Returns the value of index `right` in Array `left` or the value for key 
`right` in Map `left`.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, index) - Returns element of array at given index. If 
index < 0, accesses elements
+  from the last to the first. Returns NULL if the index exceeds the 
length of the array.
+
+_FUNC_(map, key) - Returns value for given key, or NULL if the key is 
not contained in the map
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), 2);
+   2
+  > SELECT _FUNC_(map(1, 'a', 2, 'b'), 2);
+   "b"
+  """,
+  since = "2.4.0")
+case class ElementAt(left: Expression, right: Expression) extends 
GetMapValueUtil {
+
+  override def dataType: DataType = left.dataType match {
+case _: ArrayType => left.dataType.asInstanceOf[ArrayType].elementType
+case _: MapType => left.dataType.asInstanceOf[MapType].valueType
+  }
--- End diff --

nit: how about:

```scala
override def dataType: DataType = left.dataType match {
  case ArrayType(elementType, _) => elementType
  case MapType(_, valueType, _) => valueType
}
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-18 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r182326476
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1846,6 +1846,28 @@ def array_contains(col, value):
 return Column(sc._jvm.functions.array_contains(_to_java_column(col), 
value))
 
 
+@ignore_unicode_prefix
+@since(2.4)
+def element_at(col, value):
--- End diff --

The name `value` is confusing because this is for an index or a key?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-18 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r182323290
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1846,6 +1846,28 @@ def array_contains(col, value):
 return Column(sc._jvm.functions.array_contains(_to_java_column(col), 
value))
 
 
+@ignore_unicode_prefix
+@since(2.4)
+def element_at(col, value):
+"""
+Collection function: returns element of array at given index in value 
if col is array.
+returns value for the given key in value if col is map.
+
+:param col: name of column containing array or map
+:param value: value to check for in array or key to check for in map
--- End diff --

Can you add a note or something to notice the index is 1-based.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-16 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r181793795
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Returns the value of index `right` in Array `left` or the value for key 
`right` in Map `left`.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, index) - Returns element of array at given index. If 
index < 0, accesses elements
+  from the last to the first.
+
+_FUNC_(map, key) - Returns value for given key, or NULL if the key is 
not contained in the map
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), 2);
+   2
+  > SELECT _FUNC_(map(1, 'a', 2, 'b'), 2);
+   "b"
+  """,
+  since = "2.4.0")
+case class ElementAt(left: Expression, right: Expression) extends 
GetMapValueUtil {
+
+  override def dataType: DataType = left.dataType match {
+case _: ArrayType => left.dataType.asInstanceOf[ArrayType].elementType
+case _: MapType => left.dataType.asInstanceOf[MapType].valueType
+  }
+
+  override def inputTypes: Seq[AbstractDataType] = {
+Seq(TypeCollection(ArrayType, MapType),
+  left.dataType match {
+case _: ArrayType => IntegerType
+case _: MapType => left.dataType.asInstanceOf[MapType].keyType
+  }
+)
+  }
+
+  override def nullable: Boolean = true
--- End diff --

year, may depend on `right` value, too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-16 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r181790679
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1846,6 +1846,27 @@ def array_contains(col, value):
 return Column(sc._jvm.functions.array_contains(_to_java_column(col), 
value))
 
 
+@since(2.4)
--- End diff --

I see, thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-16 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r181687320
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Returns the value of index `right` in Array `left` or the value for key 
`right` in Map `left`.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, index) - Returns element of array at given index. If 
index < 0, accesses elements
+  from the last to the first.
+
+_FUNC_(map, key) - Returns value for given key, or NULL if the key is 
not contained in the map
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), 2);
+   2
+  > SELECT _FUNC_(map(1, 'a', 2, 'b'), 2);
+   "b"
+  """,
+  since = "2.4.0")
+case class ElementAt(left: Expression, right: Expression) extends 
GetMapValueUtil {
+
+  override def dataType: DataType = left.dataType match {
+case _: ArrayType => left.dataType.asInstanceOf[ArrayType].elementType
+case _: MapType => left.dataType.asInstanceOf[MapType].valueType
+  }
+
+  override def inputTypes: Seq[AbstractDataType] = {
+Seq(TypeCollection(ArrayType, MapType),
+  left.dataType match {
+case _: ArrayType => IntegerType
+case _: MapType => left.dataType.asInstanceOf[MapType].keyType
+  }
+)
+  }
+
+  override def nullable: Boolean = true
+
+  override def nullSafeEval(value: Any, ordinal: Any): Any = {
+left.dataType match {
+  case _: ArrayType =>
+val array = value.asInstanceOf[ArrayData]
+val index = ordinal.asInstanceOf[Int]
+if (array.numElements() < math.abs(index)) {
+  null
+} else {
+  val idx = if (index == 0) {
+throw new ArrayIndexOutOfBoundsException("SQL array indices 
start at 1")
+  } else if (index > 0) {
+index - 1
+  } else {
+array.numElements() + index
+  }
+  if (array.isNullAt(idx)) {
+null
+  } else {
+array.get(idx, dataType)
+  }
+}
+  case _: MapType =>
+getValueEval(value, ordinal, 
left.dataType.asInstanceOf[MapType].keyType)
+}
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+left.dataType match {
+  case _: ArrayType =>
+nullSafeCodeGen(ctx, ev, (eval1, eval2) => {
+  val index = ctx.freshName("elementAtIndex")
+  val nullCheck = if 
(left.dataType.asInstanceOf[ArrayType].containsNull) {
+s"""
+   |if ($eval1.isNullAt($index)) {
+   |  ${ev.isNull} = true;
+   |} else
+ """
+  } else {
+""
+  }
+  s"""
+ |int $index = (int) $eval2;
+ |if ($eval1.numElements() < Math.abs($index)) {
+ |  ${ev.isNull} = true;
+ |} else {
+ |  if ($index == 0) {
+ |throw new ArrayIndexOutOfBoundsException("SQL array 
indices start at 1");
+ |  } else if ($index > 0) {
+ |$index--;
+ |  } else {
+ |$index += $eval1.numElements();
+ |  }
+ |  $nullCheck
+ |  {
+ |${ev.value} = ${CodeGenerator.getValue(eval1, dataType, 
index)};
+ |  }
+ |}
+   """
--- End diff --

good catch, thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-15 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r181624253
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Returns the value of index `right` in Array `left` or the value for key 
`right` in Map `left`.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, index) - Returns element of array at given index. If 
index < 0, accesses elements
+  from the last to the first.
+
+_FUNC_(map, key) - Returns value for given key, or NULL if the key is 
not contained in the map
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), 2);
+   2
+  > SELECT _FUNC_(map(1, 'a', 2, 'b'), 2);
+   "b"
+  """,
+  since = "2.4.0")
+case class ElementAt(left: Expression, right: Expression) extends 
GetMapValueUtil {
+
+  override def dataType: DataType = left.dataType match {
+case _: ArrayType => left.dataType.asInstanceOf[ArrayType].elementType
+case _: MapType => left.dataType.asInstanceOf[MapType].valueType
+  }
+
+  override def inputTypes: Seq[AbstractDataType] = {
+Seq(TypeCollection(ArrayType, MapType),
+  left.dataType match {
+case _: ArrayType => IntegerType
+case _: MapType => left.dataType.asInstanceOf[MapType].keyType
+  }
+)
+  }
+
+  override def nullable: Boolean = true
--- End diff --

ah, I see. Invalid `right` can cause null result too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-15 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r181623803
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Returns the value of index `right` in Array `left` or the value for key 
`right` in Map `left`.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, index) - Returns element of array at given index. If 
index < 0, accesses elements
+  from the last to the first.
+
+_FUNC_(map, key) - Returns value for given key, or NULL if the key is 
not contained in the map
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), 2);
+   2
+  > SELECT _FUNC_(map(1, 'a', 2, 'b'), 2);
+   "b"
+  """,
+  since = "2.4.0")
+case class ElementAt(left: Expression, right: Expression) extends 
GetMapValueUtil {
+
+  override def dataType: DataType = left.dataType match {
+case _: ArrayType => left.dataType.asInstanceOf[ArrayType].elementType
+case _: MapType => left.dataType.asInstanceOf[MapType].valueType
+  }
+
+  override def inputTypes: Seq[AbstractDataType] = {
+Seq(TypeCollection(ArrayType, MapType),
+  left.dataType match {
+case _: ArrayType => IntegerType
+case _: MapType => left.dataType.asInstanceOf[MapType].keyType
+  }
+)
+  }
+
+  override def nullable: Boolean = true
--- End diff --

I'm afraid it's wrong because this returns `null` when the given index is 
"out of bounds" (`array.numElements() < math.abs(index)`) for array type or the 
given key doesn't exist for map type.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-15 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r181623729
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Returns the value of index `right` in Array `left` or the value for key 
`right` in Map `left`.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, index) - Returns element of array at given index. If 
index < 0, accesses elements
+  from the last to the first.
+
+_FUNC_(map, key) - Returns value for given key, or NULL if the key is 
not contained in the map
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), 2);
+   2
+  > SELECT _FUNC_(map(1, 'a', 2, 'b'), 2);
+   "b"
+  """,
+  since = "2.4.0")
+case class ElementAt(left: Expression, right: Expression) extends 
GetMapValueUtil {
+
+  override def dataType: DataType = left.dataType match {
+case _: ArrayType => left.dataType.asInstanceOf[ArrayType].elementType
+case _: MapType => left.dataType.asInstanceOf[MapType].valueType
+  }
+
+  override def inputTypes: Seq[AbstractDataType] = {
+Seq(TypeCollection(ArrayType, MapType),
+  left.dataType match {
+case _: ArrayType => IntegerType
+case _: MapType => left.dataType.asInstanceOf[MapType].keyType
+  }
+)
+  }
+
+  override def nullable: Boolean = true
+
+  override def nullSafeEval(value: Any, ordinal: Any): Any = {
+left.dataType match {
+  case _: ArrayType =>
+val array = value.asInstanceOf[ArrayData]
+val index = ordinal.asInstanceOf[Int]
+if (array.numElements() < math.abs(index)) {
+  null
+} else {
+  val idx = if (index == 0) {
+throw new ArrayIndexOutOfBoundsException("SQL array indices 
start at 1")
+  } else if (index > 0) {
+index - 1
+  } else {
+array.numElements() + index
+  }
+  if (array.isNullAt(idx)) {
--- End diff --

nit: 
```scala
if(left.dataType.asInstanceOf[ArrayType].containsNull && 
array.isNullAt(idx)) {
  ...
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-15 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r181623519
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Returns the value of index `right` in Array `left` or the value for key 
`right` in Map `left`.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, index) - Returns element of array at given index. If 
index < 0, accesses elements
+  from the last to the first.
--- End diff --

nit: `Returns NULL if the index exceeds the length of the array`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-15 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r181622879
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Returns the value of index `right` in Array `left` or the value for key 
`right` in Map `left`.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, index) - Returns element of array at given index. If 
index < 0, accesses elements
+  from the last to the first.
+
+_FUNC_(map, key) - Returns value for given key, or NULL if the key is 
not contained in the map
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), 2);
+   2
+  > SELECT _FUNC_(map(1, 'a', 2, 'b'), 2);
+   "b"
+  """,
+  since = "2.4.0")
+case class ElementAt(left: Expression, right: Expression) extends 
GetMapValueUtil {
+
+  override def dataType: DataType = left.dataType match {
+case _: ArrayType => left.dataType.asInstanceOf[ArrayType].elementType
+case _: MapType => left.dataType.asInstanceOf[MapType].valueType
+  }
+
+  override def inputTypes: Seq[AbstractDataType] = {
+Seq(TypeCollection(ArrayType, MapType),
+  left.dataType match {
+case _: ArrayType => IntegerType
+case _: MapType => left.dataType.asInstanceOf[MapType].keyType
+  }
+)
+  }
+
+  override def nullable: Boolean = true
--- End diff --

Maybe?
```scala
override def nullable: Boolean = left.dataType match {
case a: ArrayType => a.containsNull
case m: MapType => m.valueContainsNull
} ||  left.nullable || right.nullable
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-15 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r181619426
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Returns the value of index `right` in Array `left` or the value for key 
`right` in Map `left`.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, index) - Returns element of array at given index. If 
index < 0, accesses elements
+  from the last to the first.
+
+_FUNC_(map, key) - Returns value for given key, or NULL if the key is 
not contained in the map
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), 2);
+   2
+  > SELECT _FUNC_(map(1, 'a', 2, 'b'), 2);
+   "b"
+  """,
+  since = "2.4.0")
+case class ElementAt(left: Expression, right: Expression) extends 
GetMapValueUtil {
+
+  override def dataType: DataType = left.dataType match {
+case _: ArrayType => left.dataType.asInstanceOf[ArrayType].elementType
+case _: MapType => left.dataType.asInstanceOf[MapType].valueType
+  }
+
+  override def inputTypes: Seq[AbstractDataType] = {
+Seq(TypeCollection(ArrayType, MapType),
+  left.dataType match {
+case _: ArrayType => IntegerType
+case _: MapType => left.dataType.asInstanceOf[MapType].keyType
+  }
+)
+  }
+
+  override def nullable: Boolean = true
+
+  override def nullSafeEval(value: Any, ordinal: Any): Any = {
+left.dataType match {
+  case _: ArrayType =>
+val array = value.asInstanceOf[ArrayData]
+val index = ordinal.asInstanceOf[Int]
+if (array.numElements() < math.abs(index)) {
+  null
+} else {
+  val idx = if (index == 0) {
+throw new ArrayIndexOutOfBoundsException("SQL array indices 
start at 1")
+  } else if (index > 0) {
+index - 1
+  } else {
+array.numElements() + index
+  }
+  if (array.isNullAt(idx)) {
+null
+  } else {
+array.get(idx, dataType)
+  }
+}
+  case _: MapType =>
+getValueEval(value, ordinal, 
left.dataType.asInstanceOf[MapType].keyType)
+}
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+left.dataType match {
+  case _: ArrayType =>
+nullSafeCodeGen(ctx, ev, (eval1, eval2) => {
+  val index = ctx.freshName("elementAtIndex")
+  val nullCheck = if 
(left.dataType.asInstanceOf[ArrayType].containsNull) {
+s"""
+   |if ($eval1.isNullAt($index)) {
+   |  ${ev.isNull} = true;
+   |} else
+ """
+  } else {
+""
+  }
+  s"""
+ |int $index = (int) $eval2;
+ |if ($eval1.numElements() < Math.abs($index)) {
+ |  ${ev.isNull} = true;
+ |} else {
+ |  if ($index == 0) {
+ |throw new ArrayIndexOutOfBoundsException("SQL array 
indices start at 1");
+ |  } else if ($index > 0) {
+ |$index--;
+ |  } else {
+ |$index += $eval1.numElements();
+ |  }
+ |  $nullCheck
+ |  {
+ |${ev.value} = ${CodeGenerator.getValue(eval1, dataType, 
index)};
+ |  }
+ |}
+   """
--- End diff --

`stripMargin` is missing?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-15 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r181618055
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1846,6 +1846,27 @@ def array_contains(col, value):
 return Column(sc._jvm.functions.array_contains(_to_java_column(col), 
value))
 
 
+@since(2.4)
--- End diff --

We need to annotate as `@ignore_unicode_prefix`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-15 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r181617303
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala
 ---
@@ -354,3 +336,37 @@ case class GetMapValue(child: Expression, key: 
Expression)
 })
   }
 }
+
+/**
+ * Returns the value of key `key` in Map `child`.
+ *
+ * We need to do type checking here as `key` expression maybe unresolved.
+ */
+case class GetMapValue(child: Expression, key: Expression) extends 
GetMapValueUtil
+  with ExtractValue with NullIntolerant {
--- End diff --

nit: maybe `extends ...` should be this line.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-15 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r181619450
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Returns the value of index `right` in Array `left` or the value for key 
`right` in Map `left`.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, index) - Returns element of array at given index. If 
index < 0, accesses elements
+  from the last to the first.
+
+_FUNC_(map, key) - Returns value for given key, or NULL if the key is 
not contained in the map
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), 2);
+   2
+  > SELECT _FUNC_(map(1, 'a', 2, 'b'), 2);
+   "b"
+  """,
+  since = "2.4.0")
+case class ElementAt(left: Expression, right: Expression) extends 
GetMapValueUtil {
+
+  override def dataType: DataType = left.dataType match {
+case _: ArrayType => left.dataType.asInstanceOf[ArrayType].elementType
+case _: MapType => left.dataType.asInstanceOf[MapType].valueType
+  }
+
+  override def inputTypes: Seq[AbstractDataType] = {
+Seq(TypeCollection(ArrayType, MapType),
+  left.dataType match {
+case _: ArrayType => IntegerType
+case _: MapType => left.dataType.asInstanceOf[MapType].keyType
+  }
+)
+  }
+
+  override def nullable: Boolean = true
+
+  override def nullSafeEval(value: Any, ordinal: Any): Any = {
+left.dataType match {
+  case _: ArrayType =>
+val array = value.asInstanceOf[ArrayData]
+val index = ordinal.asInstanceOf[Int]
+if (array.numElements() < math.abs(index)) {
+  null
+} else {
+  val idx = if (index == 0) {
+throw new ArrayIndexOutOfBoundsException("SQL array indices 
start at 1")
+  } else if (index > 0) {
+index - 1
+  } else {
+array.numElements() + index
+  }
+  if (array.isNullAt(idx)) {
+null
+  } else {
+array.get(idx, dataType)
+  }
+}
+  case _: MapType =>
+getValueEval(value, ordinal, 
left.dataType.asInstanceOf[MapType].keyType)
+}
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+left.dataType match {
+  case _: ArrayType =>
+nullSafeCodeGen(ctx, ev, (eval1, eval2) => {
+  val index = ctx.freshName("elementAtIndex")
+  val nullCheck = if 
(left.dataType.asInstanceOf[ArrayType].containsNull) {
+s"""
+   |if ($eval1.isNullAt($index)) {
+   |  ${ev.isNull} = true;
+   |} else
+ """
--- End diff --

`stripMargin` is missing?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-15 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r181610494
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -413,6 +413,78 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 )
   }
 
+  test("element at function") {
--- End diff --

good catch, thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-13 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r181529978
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -413,6 +413,78 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 )
   }
 
+  test("element at function") {
--- End diff --

also the function is element_at, not "element at" ...



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-13 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r181529901
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -413,6 +413,78 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 )
   }
 
+  test("element at function") {
--- End diff --

why do we need so many test cases here? this is just to verify the api 
works end to end.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-13 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r181475737
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Returns the value of index `right` in Array `left` or key right` in Map 
`left`.
--- End diff --

good catch, thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21053#discussion_r181376088
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Returns the value of index `right` in Array `left` or key right` in Map 
`left`.
--- End diff --

We can improve this doc. There is also broken quote.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-12 Thread kiszk
GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/21053

[SPARK-23924][SQL] Add element_at function

## What changes were proposed in this pull request?

The PR adds the SQL function array_position. The behavior of the function 
is based on Presto's one.

This function returns element of array at given index in value if column is 
array, or returns value for the given key in value if column is map.

## How was this patch tested?

Added UTs

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-23924

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21053.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21053


commit d7ec6ccc177d07a8090ac27ce1659f427d1cf50a
Author: Kazuaki Ishizaki 
Date:   2018-04-12T13:21:22Z

initial commit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org