subject:"\[GitHub\] spark pull request #22408\: \[SPARK\-25417\]\[SQL\] ArrayContains function may ret..."

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22408


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r219039607
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -735,6 +735,60 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   df.selectExpr("array_contains(array(1, null), array(1, null)[0])"),
   Seq(Row(true), Row(true))
 )
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.23D)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.0D)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.0D), 1)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.23D), 1)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.0D))"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.23D))"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.23))"),
--- End diff --

@cloud-fan Yes. it should :-) I think i had changed this test case to 
verify the fix to tighestCommonType.. and pushed it by mistake. Sorry about it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r219039187
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1879,6 +1879,66 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
 ## Upgrading From Spark SQL 2.3 to 2.4
 
+  - In Spark version 2.3 and earlier, the second parameter to 
array_contains function is implicitly promoted to the element type of first 
array type parameter. This type promotion can be lossy and may cause 
`array_contains` function to return wrong result. This problem has been 
addressed in 2.4 by employing a safer type promotion mechanism. This can cause 
some change in behavior and are illustrated in the table below.
+  
+
+  
+Query
+  
+  
+Result Spark 2.3 or Prior
+  
+  
+Result Spark 2.4
+  
+  
+Remarks
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34D);
+  
+  
+true
+  
+  
+false
+  
+  
+In Spark 2.4, both left and right parameters are  promoted 
to array(double) and double type respectively.
+  
+
+
+  
+SELECT  array_contains(array(1), '1');
+  
+  
+true
+  
+  
+AnalysisException is thrown since integer type can not be 
promoted to string type in a loss-less manner.
--- End diff --

Ah then it's fine, we don't need to change anything here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r219038798
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -735,6 +735,60 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   df.selectExpr("array_contains(array(1, null), array(1, null)[0])"),
   Seq(Row(true), Row(true))
 )
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.23D)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.0D)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.0D), 1)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.23D), 1)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.0D))"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.23D))"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.23))"),
--- End diff --

hmm? shouldn't this fail because of the bug?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r219038350
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -735,6 +735,60 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   df.selectExpr("array_contains(array(1, null), array(1, null)[0])"),
   Seq(Row(true), Row(true))
 )
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.23D)"),
--- End diff --

this query doesn't read any data from `df`, so the 2 result rows are always 
same. Can we use `OneRowRelation` here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r219037732
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1879,6 +1879,66 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
 ## Upgrading From Spark SQL 2.3 to 2.4
 
+  - In Spark version 2.3 and earlier, the second parameter to 
array_contains function is implicitly promoted to the element type of first 
array type parameter. This type promotion can be lossy and may cause 
`array_contains` function to return wrong result. This problem has been 
addressed in 2.4 by employing a safer type promotion mechanism. This can cause 
some change in behavior and are illustrated in the table below.
+  
+
+  
+Query
+  
+  
+Result Spark 2.3 or Prior
+  
+  
+Result Spark 2.4
+  
+  
+Remarks
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34D);
+  
+  
+true
+  
+  
+false
+  
+  
+In Spark 2.4, both left and right parameters are  promoted 
to array(double) and double type respectively.
+  
+
+
+  
+SELECT  array_contains(array(1), '1');
+  
+  
+true
+  
+  
+AnalysisException is thrown since integer type can not be 
promoted to string type in a loss-less manner.
--- End diff --

@cloud-fan Yeah, presto gives error. Please refer to my earlier comment 
showing the presto output. Did you want anything to change in the description ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r219037301
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1879,6 +1879,66 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
 ## Upgrading From Spark SQL 2.3 to 2.4
 
+  - In Spark version 2.3 and earlier, the second parameter to 
array_contains function is implicitly promoted to the element type of first 
array type parameter. This type promotion can be lossy and may cause 
`array_contains` function to return wrong result. This problem has been 
addressed in 2.4 by employing a safer type promotion mechanism. This can cause 
some change in behavior and are illustrated in the table below.
+  
+
+  
+Query
+  
+  
+Result Spark 2.3 or Prior
+  
+  
+Result Spark 2.4
+  
+  
+Remarks
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34D);
+  
+  
+true
+  
+  
+false
+  
+  
+In Spark 2.4, both left and right parameters are  promoted 
to array(double) and double type respectively.
+  
+
+
+  
+SELECT  array_contains(array(1), '1');
+  
+  
+true
+  
+  
+AnalysisException is thrown since integer type can not be 
promoted to string type in a loss-less manner.
--- End diff --

If presto doesn't do it, we should follow it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r219037194
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1879,6 +1879,66 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
 ## Upgrading From Spark SQL 2.3 to 2.4
 
+  - In Spark version 2.3 and earlier, the second parameter to 
array_contains function is implicitly promoted to the element type of first 
array type parameter. This type promotion can be lossy and may cause 
`array_contains` function to return wrong result. This problem has been 
addressed in 2.4 by employing a safer type promotion mechanism. This can cause 
some change in behavior and are illustrated in the table below.
+  
+
+  
+Query
+  
+  
+Result Spark 2.3 or Prior
+  
+  
+Result Spark 2.4
+  
+  
+Remarks
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34D);
+  
+  
+true
+  
+  
+false
+  
+  
+In Spark 2.4, both left and right parameters are  promoted 
to array(double) and double type respectively.
+  
+
+
+  
+SELECT  array_contains(array(1), '1');
+  
+  
+true
+  
+  
+AnalysisException is thrown since integer type can not be 
promoted to string type in a loss-less manner.
--- End diff --

We can promote `int` to `string`, but I'm not sure that's a common behavior 
in other databases


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r219037086
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1879,6 +1879,66 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
 ## Upgrading From Spark SQL 2.3 to 2.4
 
+  - In Spark version 2.3 and earlier, the second parameter to 
array_contains function is implicitly promoted to the element type of first 
array type parameter. This type promotion can be lossy and may cause 
`array_contains` function to return wrong result. This problem has been 
addressed in 2.4 by employing a safer type promotion mechanism. This can cause 
some change in behavior and are illustrated in the table below.
+  
+
+  
+Query
+  
+  
+Result Spark 2.3 or Prior
+  
+  
+Result Spark 2.4
+  
+  
+Remarks
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34D);
+  
+  
+true
+  
+  
+false
+  
+  
+In Spark 2.4, both left and right parameters are  promoted 
to array(double) and double type respectively.
--- End diff --

remove `both`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r219028470
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1879,6 +1879,80 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
 ## Upgrading From Spark SQL 2.3 to 2.4
 
+  - In Spark version 2.3 and earlier, the second parameter to 
array_contains function is implicitly promoted to the element type of first 
array type parameter. This type promotion can be lossy and may cause 
`array_contains` function to return wrong result. This problem has been 
addressed in 2.4 by employing a safer type promotion mechanism. This can cause 
some change in behavior and are illustrated in the table below.
+  
+
+  
+Query
+  
+  
+Result Spark 2.3 or Prior
+  
+  
+Result Spark 2.4
+  
+  
+Remarks
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34D);
+  
+  
+true
+  
+  
+false
+  
+  
+In Spark 2.4, both left and right parameters are  promoted 
to array(double) and double type respectively.
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34);
--- End diff --

@cloud-fan OK.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r219019453
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1879,6 +1879,80 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
 ## Upgrading From Spark SQL 2.3 to 2.4
 
+  - In Spark version 2.3 and earlier, the second parameter to 
array_contains function is implicitly promoted to the element type of first 
array type parameter. This type promotion can be lossy and may cause 
`array_contains` function to return wrong result. This problem has been 
addressed in 2.4 by employing a safer type promotion mechanism. This can cause 
some change in behavior and are illustrated in the table below.
+  
+
+  
+Query
+  
+  
+Result Spark 2.3 or Prior
+  
+  
+Result Spark 2.4
+  
+  
+Remarks
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34D);
+  
+  
+true
+  
+  
+false
+  
+  
+In Spark 2.4, both left and right parameters are  promoted 
to array(double) and double type respectively.
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34);
--- End diff --

can we remove this example? or pick a different one that can work. 
Basically it should work if we fix the bug with 
https://github.com/apache/spark/pull/22448


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r218295350
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1879,6 +1879,80 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
 ## Upgrading From Spark SQL 2.3 to 2.4
 
+  - In Spark version 2.3 and earlier, the second parameter to 
array_contains function is implicitly promoted to the element type of first 
array type parameter. This type promotion can be lossy and may cause 
`array_contains` function to return wrong result. This problem has been 
addressed in 2.4 by employing a safer type promotion mechanism. This can cause 
some change in behavior and are illustrated in the table below.
+  
+
+  
+Query
+  
+  
+Result Spark 2.3 or Prior
+  
+  
+Result Spark 2.4
+  
+  
+Remarks
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34D);
+  
+  
+true
+  
+  
+false
+  
+  
+In Spark 2.4, both left and right parameters are  promoted 
to array(double) and double type respectively.
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34);
+  
+  
+true
+  
+  
+AnalysisException is thrown since integer type can not be 
promoted to decimal type in a loss-less manner.
--- End diff --

I left a few comments. Please send a PR, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-17 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r218227611
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1879,6 +1879,80 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
 ## Upgrading From Spark SQL 2.3 to 2.4
 
+  - In Spark version 2.3 and earlier, the second parameter to 
array_contains function is implicitly promoted to the element type of first 
array type parameter. This type promotion can be lossy and may cause 
`array_contains` function to return wrong result. This problem has been 
addressed in 2.4 by employing a safer type promotion mechanism. This can cause 
some change in behavior and are illustrated in the table below.
+  
+
+  
+Query
+  
+  
+Result Spark 2.3 or Prior
+  
+  
+Result Spark 2.4
+  
+  
+Remarks
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34D);
+  
+  
+true
+  
+  
+false
+  
+  
+In Spark 2.4, both left and right parameters are  promoted 
to array(double) and double type respectively.
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34);
+  
+  
+true
+  
+  
+AnalysisException is thrown since integer type can not be 
promoted to decimal type in a loss-less manner.
--- End diff --

@cloud-fan I have taken a stab at fixing this at 
https://github.com/dilipbiswal/spark/tree/find_tightest
May i request you to please take a look ? I have run all the tests and they 
look ok. If you are okay, i will open a PR for this and once that gets in, we 
can look at this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-17 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r218125861
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1879,6 +1879,80 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
 ## Upgrading From Spark SQL 2.3 to 2.4
 
+  - In Spark version 2.3 and earlier, the second parameter to 
array_contains function is implicitly promoted to the element type of first 
array type parameter. This type promotion can be lossy and may cause 
`array_contains` function to return wrong result. This problem has been 
addressed in 2.4 by employing a safer type promotion mechanism. This can cause 
some change in behavior and are illustrated in the table below.
+  
+
+  
+Query
+  
+  
+Result Spark 2.3 or Prior
+  
+  
+Result Spark 2.4
+  
+  
+Remarks
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34D);
+  
+  
+true
+  
+  
+false
+  
+  
+In Spark 2.4, both left and right parameters are  promoted 
to array(double) and double type respectively.
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34);
+  
+  
+true
+  
+  
+AnalysisException is thrown since integer type can not be 
promoted to decimal type in a loss-less manner.
--- End diff --

@cloud-fan Yes..Actually, i think, the logic to handle int and decimal 
promotion can be improved. I like presto's behaviour here -
int, decimal(3, 2) => decimal(12, 2)
int, decimal(4, 3) => decimal(13, 3)
int, decimal(11, 3) => decimal(14, 3)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r218052897
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1331,23 +1331,27 @@ case class ArrayContains(left: Expression, right: 
Expression)
   @transient private lazy val ordering: Ordering[Any] =
 TypeUtils.getInterpretedOrdering(right.dataType)
 
-  override def inputTypes: Seq[AbstractDataType] = right.dataType match {
-case NullType => Seq.empty
-case _ => left.dataType match {
-  case n @ ArrayType(element, _) => Seq(n, element)
+  override def inputTypes: Seq[AbstractDataType] = {
+(left.dataType, right.dataType) match {
+  case (_, NullType) => Seq.empty
+  case (ArrayType(e1, hasNull), e2) =>
+TypeCoercion.findTightestCommonType(e1, e2) match {
--- End diff --

I think we have a bug in the `findTightestCommonType`. For an int and a 
decimal, `findTightestCommonType` can return a value if the decimal's precision 
is bigger than int. But it can't return a value if the decimal's precision is 
small.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r218043239
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1879,6 +1879,80 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
 ## Upgrading From Spark SQL 2.3 to 2.4
 
+  - In Spark version 2.3 and earlier, the second parameter to 
array_contains function is implicitly promoted to the element type of first 
array type parameter. This type promotion can be lossy and may cause 
`array_contains` function to return wrong result. This problem has been 
addressed in 2.4 by employing a safer type promotion mechanism. This can cause 
some change in behavior and are illustrated in the table below.
+  
+
+  
+Query
+  
+  
+Result Spark 2.3 or Prior
+  
+  
+Result Spark 2.4
+  
+  
+Remarks
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34D);
+  
+  
+true
+  
+  
+false
+  
+  
+In Spark 2.4, both left and right parameters are  promoted 
to array(double) and double type respectively.
+  
+
+
+  
+SELECT  array_contains(array(1), 1.34);
+  
+  
+true
+  
+  
+AnalysisException is thrown since integer type can not be 
promoted to decimal type in a loss-less manner.
--- End diff --

hhmm, I think we can convert int to `DecimalType(10, 0)` without losing 
precision?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-13 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r217603789
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -735,6 +735,44 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   df.selectExpr("array_contains(array(1, null), array(1, null)[0])"),
   Seq(Row(true), Row(true))
 )
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.23D)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.0D)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.0D), 1)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.23D), 1)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.0D))"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.23D))"),
+  Seq(Row(false), Row(false))
+)
+
+intercept[AnalysisException] {
+  df.selectExpr("array_contains(array(1), 1.23)")
--- End diff --

@ueshin Sure. We could use `findWiderCommonType`. My thinking was, since we 
are injecting this cast implicitly, we should pick the safest cast so we don't 
see data dependent surprises. Users could always specify an explicit cast and 
take the the responsibility of the result :-)

However, i don't have a strong opinion. I will change it use  
findWiderCommonType


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-13 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r217601459
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -735,6 +735,44 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   df.selectExpr("array_contains(array(1, null), array(1, null)[0])"),
   Seq(Row(true), Row(true))
 )
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.23D)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.0D)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.0D), 1)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.23D), 1)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.0D))"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.23D))"),
+  Seq(Row(false), Row(false))
+)
+
+intercept[AnalysisException] {
+  df.selectExpr("array_contains(array(1), 1.23)")
--- End diff --

Hmm, if we can `array_contains(array(1), '1')` in 2.3 as 
https://github.com/apache/spark/pull/22408#issuecomment-421223501, we should 
use `findWiderCommonType`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-13 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r217593872
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -735,6 +735,44 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   df.selectExpr("array_contains(array(1, null), array(1, null)[0])"),
   Seq(Row(true), Row(true))
 )
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.23D)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.0D)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.0D), 1)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.23D), 1)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.0D))"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.23D))"),
+  Seq(Row(false), Row(false))
+)
+
+intercept[AnalysisException] {
+  df.selectExpr("array_contains(array(1), 1.23)")
--- End diff --

Good point. Yes, I can do a lossy conversion.
Seems like `BinaryComparison` uses wider DecimalType, so we could follow 
the behavior to use `findWiderTypeWithoutStringPromotion`.
cc @gatorsmile @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-13 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r217287140
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -735,6 +735,44 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   df.selectExpr("array_contains(array(1, null), array(1, null)[0])"),
   Seq(Row(true), Row(true))
 )
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.23D)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.0D)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.0D), 1)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.23D), 1)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.0D))"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.23D))"),
+  Seq(Row(false), Row(false))
+)
+
+intercept[AnalysisException] {
+  df.selectExpr("array_contains(array(1), 1.23)")
--- End diff --

@ueshin FYI - i just pushed the changes addressing your comments. I have 
kept `findTightestCommonType` as is for now. I will change it based on your 
input and adjust the tests accordingly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-13 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r217279564
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -735,6 +735,44 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   df.selectExpr("array_contains(array(1, null), array(1, null)[0])"),
   Seq(Row(true), Row(true))
 )
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.23D)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.0D)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.0D), 1)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.23D), 1)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.0D))"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.23D))"),
+  Seq(Row(false), Row(false))
+)
+
+intercept[AnalysisException] {
+  df.selectExpr("array_contains(array(1), 1.23)")
--- End diff --

@ueshin Thank you. I also thought of using 
`findWiderTypeWithoutStringPromotion` but later changed it to 
`findTightestCommonType`. One question i had was, can 
`findWiderTypeWithoutStringPromotion` do a lossy conversion. From the class 
description it seems only findTightestCommonType does a absolute safe casting ? 
Please let me know what you think ..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-12 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r217277655
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -735,6 +735,44 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   df.selectExpr("array_contains(array(1, null), array(1, null)[0])"),
   Seq(Row(true), Row(true))
 )
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.23D)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.0D)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.0D), 1)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.23D), 1)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.0D))"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.23D))"),
+  Seq(Row(false), Row(false))
+)
+
+intercept[AnalysisException] {
+  df.selectExpr("array_contains(array(1), 1.23)")
--- End diff --

As for the behavior, we might be better to use 
`findWiderTypeWithoutStringPromotion` instead of `findTightestCommonType`.
And as for checking the error message, just those you added.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-12 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r217274643
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -735,6 +735,44 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   df.selectExpr("array_contains(array(1, null), array(1, null)[0])"),
   Seq(Row(true), Row(true))
 )
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.23D)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.0D)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.0D), 1)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.23D), 1)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.0D))"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.23D))"),
+  Seq(Row(false), Row(false))
+)
+
+intercept[AnalysisException] {
+  df.selectExpr("array_contains(array(1), 1.23)")
--- End diff --

@ueshin Did you want me to check the error message for all the cases ? or 
just a couple of them ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-12 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r217274487
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1331,26 +1331,31 @@ case class ArrayContains(left: Expression, right: 
Expression)
   @transient private lazy val ordering: Ordering[Any] =
 TypeUtils.getInterpretedOrdering(right.dataType)
 
-  override def inputTypes: Seq[AbstractDataType] = right.dataType match {
-case NullType => Seq.empty
-case _ => left.dataType match {
-  case n @ ArrayType(element, _) => Seq(n, element)
+  override def inputTypes: Seq[AbstractDataType] = {
+(left.dataType, right.dataType) match {
+  case (_, NullType) => Seq.empty
+  case (ArrayType(e1, hasNull), e2) =>
+TypeCoercion.findTightestCommonType(e1, e2) match {
+  case Some(dt) => Seq(ArrayType(dt, hasNull), dt)
+  case _ => Seq.empty
+}
   case _ => Seq.empty
 }
   }
 
   override def checkInputDataTypes(): TypeCheckResult = {
-if (right.dataType == NullType) {
-  TypeCheckResult.TypeCheckFailure("Null typed values cannot be used 
as arguments")
-} else if (!left.dataType.isInstanceOf[ArrayType]
-  || 
!left.dataType.asInstanceOf[ArrayType].elementType.sameType(right.dataType)) {
-  TypeCheckResult.TypeCheckFailure(
-"Arguments must be an array followed by a value of same type as 
the array members")
-} else {
-  TypeUtils.checkForOrderingExpr(right.dataType, s"function 
$prettyName")
+(left.dataType, right.dataType) match {
+  case (_, NullType) =>
+TypeCheckResult.TypeCheckFailure("Null typed values cannot be used 
as arguments")
+  case (ArrayType(e1, _), e2) if e1.sameType(e2) =>
+TypeUtils.checkForOrderingExpr(right.dataType, s"function 
$prettyName")
+  case _ => TypeCheckResult.TypeCheckFailure(s"Input to function 
$prettyName should have " +
+s"been ${ArrayType.simpleString} followed by a value with same 
element type, but it's " +
+s"[${left.dataType.catalogString}, 
${right.dataType.catalogString}].")
 }
   }
 
+
--- End diff --

@ueshin Sure.. sorry.. didn't notice it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-12 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r217274391
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -735,6 +735,44 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   df.selectExpr("array_contains(array(1, null), array(1, null)[0])"),
   Seq(Row(true), Row(true))
 )
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.23D)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.0D)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.0D), 1)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.23D), 1)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.0D))"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.23D))"),
+  Seq(Row(false), Row(false))
+)
+
+intercept[AnalysisException] {
+  df.selectExpr("array_contains(array(1), 1.23)")
--- End diff --

@ueshin This fails since 1.23 is parsed as decimal(3, 2). And 
`findTightestCommonType` can not find a common type between integer and 
decimal(3,2).  I specifically added this case to draw it to your attention to 
see if we are okay with this behaviour :-). If we had it as `1.23D` then we 
would have been able to find a common type.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-12 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r217272407
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -735,6 +735,44 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   df.selectExpr("array_contains(array(1, null), array(1, null)[0])"),
   Seq(Row(true), Row(true))
 )
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.23D)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.0D)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.0D), 1)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.23D), 1)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.0D))"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.23D))"),
+  Seq(Row(false), Row(false))
+)
+
+intercept[AnalysisException] {
+  df.selectExpr("array_contains(array(1), 1.23)")
--- End diff --

Also could you check the error message as well?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-12 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r217269231
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1331,26 +1331,31 @@ case class ArrayContains(left: Expression, right: 
Expression)
   @transient private lazy val ordering: Ordering[Any] =
 TypeUtils.getInterpretedOrdering(right.dataType)
 
-  override def inputTypes: Seq[AbstractDataType] = right.dataType match {
-case NullType => Seq.empty
-case _ => left.dataType match {
-  case n @ ArrayType(element, _) => Seq(n, element)
+  override def inputTypes: Seq[AbstractDataType] = {
+(left.dataType, right.dataType) match {
+  case (_, NullType) => Seq.empty
+  case (ArrayType(e1, hasNull), e2) =>
+TypeCoercion.findTightestCommonType(e1, e2) match {
+  case Some(dt) => Seq(ArrayType(dt, hasNull), dt)
+  case _ => Seq.empty
+}
   case _ => Seq.empty
 }
   }
 
   override def checkInputDataTypes(): TypeCheckResult = {
-if (right.dataType == NullType) {
-  TypeCheckResult.TypeCheckFailure("Null typed values cannot be used 
as arguments")
-} else if (!left.dataType.isInstanceOf[ArrayType]
-  || 
!left.dataType.asInstanceOf[ArrayType].elementType.sameType(right.dataType)) {
-  TypeCheckResult.TypeCheckFailure(
-"Arguments must be an array followed by a value of same type as 
the array members")
-} else {
-  TypeUtils.checkForOrderingExpr(right.dataType, s"function 
$prettyName")
+(left.dataType, right.dataType) match {
+  case (_, NullType) =>
+TypeCheckResult.TypeCheckFailure("Null typed values cannot be used 
as arguments")
+  case (ArrayType(e1, _), e2) if e1.sameType(e2) =>
+TypeUtils.checkForOrderingExpr(right.dataType, s"function 
$prettyName")
+  case _ => TypeCheckResult.TypeCheckFailure(s"Input to function 
$prettyName should have " +
+s"been ${ArrayType.simpleString} followed by a value with same 
element type, but it's " +
+s"[${left.dataType.catalogString}, 
${right.dataType.catalogString}].")
 }
   }
 
+
--- End diff --

nit: can you revert this change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-12 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22408#discussion_r217272398
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -735,6 +735,44 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   df.selectExpr("array_contains(array(1, null), array(1, null)[0])"),
   Seq(Row(true), Row(true))
 )
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.23D)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1), 1.0D)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.0D), 1)"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(1.23D), 1)"),
+  Seq(Row(false), Row(false))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.0D))"),
+  Seq(Row(true), Row(true))
+)
+
+checkAnswer(
+  df.selectExpr("array_contains(array(array(1)), array(1.23D))"),
+  Seq(Row(false), Row(false))
+)
+
+intercept[AnalysisException] {
+  df.selectExpr("array_contains(array(1), 1.23)")
--- End diff --

Why does this fail?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-12 Thread dilipbiswal

GitHub user dilipbiswal opened a pull request:

https://github.com/apache/spark/pull/22408

[SPARK-25417][SQL] ArrayContains function may return incorrect result when 
right expression is implicitly down casted

## What changes were proposed in this pull request?
In ArrayContains, we currently cast the right hand side expression to match 
the element type of the left hand side Array. This may result in down casting 
and may return wrong result or questionable result.

Example :
```SQL
spark-sql> select array_contains(array(1), 1.34);
true
```
```SQL
spark-sql> select array_contains(array(1), 'foo');
null
```

We should safely coerce both left and right hand side expressions.
## How was this patch tested?
Added tests in DataFrameFunctionsSuite

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dilipbiswal/spark SPARK-25417

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22408.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22408


commit 2c6e8dc16c3f7418374dcabea92f5dcf966c5469
Author: Dilip Biswal 
Date:   2018-09-12T19:06:14Z

[SPARK-25417] ArrayContains function may return incorrect result when right 
expression is implicitly down casted




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

29 matches

Site Navigation

Mail list logo

Footer information