date:20180601

[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

2018-06-01 Thread jinxing64

Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19602#discussion_r192550679
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala 
---
@@ -207,65 +271,68 @@ class HiveClientSuite(version: String)
   }
 
   private def testMetastorePartitionFiltering(
-  filterString: String,
+  table: String,
+  filterExpr: Expression,
   expectedDs: Seq[Int],
   expectedH: Seq[Int],
   expectedChunks: Seq[String]): Unit = {
 testMetastorePartitionFiltering(
-  filterString,
-  (expectedDs, expectedH, expectedChunks) :: Nil,
+  table,
+  filterExpr,
+  Map("ds" -> expectedDs, "h" -> expectedH, "chunk" -> expectedChunks) 
:: Nil,
   identity)
   }
 
   private def testMetastorePartitionFiltering(
-  filterString: String,
+  table: String,
+  filterExpr: Expression,
   expectedDs: Seq[Int],
   expectedH: Seq[Int],
   expectedChunks: Seq[String],
   transform: Expression => Expression): Unit = {
 testMetastorePartitionFiltering(
-  filterString,
-  (expectedDs, expectedH, expectedChunks) :: Nil,
+  table,
+  filterExpr,
+  Map("ds" -> expectedDs, "h" -> expectedH, "chunk" -> expectedChunks) 
:: Nil,
   identity)
   }
 
   private def testMetastorePartitionFiltering(
-  filterString: String,
-  expectedPartitionCubes: Seq[(Seq[Int], Seq[Int], Seq[String])]): 
Unit = {
-testMetastorePartitionFiltering(filterString, expectedPartitionCubes, 
identity)
+  table: String,
+  filterExpr: Expression,
+  expectedPartitionCubes: Seq[Map[String, Seq[Any]]]): Unit = {
+testMetastorePartitionFiltering(table, filterExpr, 
expectedPartitionCubes, identity)
   }
 
   private def testMetastorePartitionFiltering(
-  filterString: String,
-  expectedPartitionCubes: Seq[(Seq[Int], Seq[Int], Seq[String])],
+  table: String,
+  filterExpr: Expression,
+  expectedPartitionCubes: Seq[Map[String, Seq[Any]]],
   transform: Expression => Expression): Unit = {
-val filteredPartitions = 
client.getPartitionsByFilter(client.getTable("default", "test"),
+val filteredPartitions = 
client.getPartitionsByFilter(client.getTable("default", table),
   Seq(
-transform(parseExpression(filterString))
+transform(filterExpr)
   ))
 
-val expectedPartitionCount = expectedPartitionCubes.map {
-  case (expectedDs, expectedH, expectedChunks) =>
-expectedDs.size * expectedH.size * expectedChunks.size
-}.sum
-
-val expectedPartitions = expectedPartitionCubes.map {
-  case (expectedDs, expectedH, expectedChunks) =>
-for {
-  ds <- expectedDs
-  h <- expectedH
-  chunk <- expectedChunks
-} yield Set(
-  "ds" -> ds.toString,
-  "h" -> h.toString,
-  "chunk" -> chunk
-)
-}.reduce(_ ++ _)
+val expectedPartitionCount = 
expectedPartitionCubes.map(_.map(_._2.size).product).sum
+
+val expectedPartitions = 
expectedPartitionCubes.map(getPartitionsFromCube(_)).reduce(_ ++ _)
 
 val actualFilteredPartitionCount = filteredPartitions.size
 
 assert(actualFilteredPartitionCount == expectedPartitionCount,
   s"Expected $expectedPartitionCount partitions but got 
$actualFilteredPartitionCount")
-assert(filteredPartitions.map(_.spec.toSet).toSet == 
expectedPartitions.toSet)
+assert(filteredPartitions.map(_.spec).toSet == 
expectedPartitions.toSet)
+  }
+
+  private def getPartitionsFromCube(cube: Map[String, Seq[Any]]): 
Seq[Map[String, String]] = {
+cube.map {
+  case (k: String, pts: Seq[Any]) => pts.map(pt => (k, pt.toString))
+}.foldLeft(Seq(Seq[(String, String)]()))((seq0, seq1) => {
--- End diff --

Hmm, it's  a recursion problem. I tried to use loop states 
directly, but seems not become easier readable;

In current change, I extract a `PartitionSpec` type and add comment. I 
think it's better now?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19602
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3775/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19602
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2018-06-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19602
  
**[Test build #91413 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91413/testReport)**
 for PR 19602 at commit 
[`e4c6e1f`](https://github.com/apache/spark/commit/e4c6e1ff713a7033b0a60dabaca5071b480d7600).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

2018-06-01 Thread jinxing64

Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19602#discussion_r192550486
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala 
---
@@ -59,38 +61,62 @@ class HiveClientSuite(version: String)
 "h" -> h.toString,
 "chunk" -> chunk
   ), storageFormat)
-assert(partitions.size == testPartitionCount)
+assert(partitions0.size == testPartitionCount0)
 
 client.createPartitions(
-  "default", "test", partitions, ignoreIfExists = false)
+  "default", "test0", partitions0, ignoreIfExists = false)
+
+val partitions1 =
+  for {
+pt <- 0 until 10
+chunk <- Seq("aa", "ab", "ba", "bb")
+  } yield CatalogTablePartition(Map(
+"pt" -> pt.toString,
+"chunk" -> chunk
+  ), storageFormat)
+assert(partitions1.size == testPartitionCount1)
+
+client.createPartitions(
+  "default", "test1", partitions1, ignoreIfExists = false)
+
 client
   }
 
+  private def pAttr(table: String, name: String): Attribute = {
+val partTypes = client.getTable("default", 
table).partitionSchema.fields
+.map(field => (field.name, field.dataType)).toMap
+partTypes.get(name) match {
+  case Some(dt) => AttributeReference(name, dt)()
+  case None =>
+fail(s"Illegal name of partition attribute: $name")
+}
+  }
+
   override def beforeAll() {
 super.beforeAll()
 client = init(true)
   }
 
   test(s"getPartitionsByFilter returns all partitions when 
$tryDirectSqlKey=false") {
 val client = init(false)
-val filteredPartitions = 
client.getPartitionsByFilter(client.getTable("default", "test"),
-  Seq(parseExpression("ds=20170101")))
+val filteredPartitions = 
client.getPartitionsByFilter(client.getTable("default", "test0"),
+  Seq(EqualTo(pAttr("test0", "ds"), Literal(20170101, IntegerType
--- End diff --

Thanks, with `org.apache.spark.sql.catalyst.dsl.expressions._`, code can be 
much cleaner.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21479
  
**[Test build #91412 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91412/testReport)**
 for PR 21479 at commit 
[`0ad3dd7`](https://github.com/apache/spark/commit/0ad3dd75bc1a74ca88c9ace8899fd2729aaa16b5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21479
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21479
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3774/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/21479
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21483: [SPARK-24454][ML][PYTHON] Imports image module in ml/__i...

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21483
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21483: [SPARK-24454][ML][PYTHON] Imports image module in ml/__i...

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21483
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91411/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21483: [SPARK-24454][ML][PYTHON] Imports image module in ml/__i...

2018-06-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21483
  
**[Test build #91411 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91411/testReport)**
 for PR 21483 at commit 
[`4b0be58`](https://github.com/apache/spark/commit/4b0be58db843609c2f7fece7becb5187b9086155).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21483: [SPARK-24454][ML][PYTHON] Imports image module in ml/__i...

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21483
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3773/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21483: [SPARK-24454][ML][PYTHON] Imports image module in ml/__i...

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21483
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21482#discussion_r192549119
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
 ---
@@ -199,6 +199,50 @@ case class Nvl2(expr1: Expression, expr2: Expression, 
expr3: Expression, child:
   override def sql: String = s"$prettyName(${expr1.sql}, ${expr2.sql}, 
${expr3.sql})"
 }
 
+/**
+ * Evaluates to `true` iff it's Infinity.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr) - Returns True if expr evaluates to infinite else 
returns False ",
--- End diff --

True -> true, False -> false to be consistent


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21482#discussion_r192549111
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
 ---
@@ -199,6 +199,50 @@ case class Nvl2(expr1: Expression, expr2: Expression, 
expr3: Expression, child:
   override def sql: String = s"$prettyName(${expr1.sql}, ${expr2.sql}, 
${expr3.sql})"
 }
 
+/**
+ * Evaluates to `true` iff it's Infinity.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr) - Returns True if expr evaluates to infinite else 
returns False ",
+  examples = """
+Examples:
+  > SELECT _FUNC_(1/0);
+   True
--- End diff --

Can you run the example and check the results?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21483: [SPARK-24454][ML][PYTHON] Imports image module in ml/__i...

2018-06-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21483
  
**[Test build #91411 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91411/testReport)**
 for PR 21483 at commit 
[`4b0be58`](https://github.com/apache/spark/commit/4b0be58db843609c2f7fece7becb5187b9086155).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21483: [SPARK-24454][ML][PYTHON] Imports image module in ml/__i...

2018-06-01 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21483
  
cc @mengxr, I guess this change is what you actually intended?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21483: [SPARK-24454][ML][PYTHON] Imports image module in...

2018-06-01 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/21483

[SPARK-24454][ML][PYTHON] Imports image module in ml/__init__.py and add 
ImageSchema into __all__

## What changes were proposed in this pull request?

This PR attaches image APIs to ml module too to more expose this.

## How was this patch tested?

Before:


```python
>>> from pyspark import ml
>>> ml.image
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'module' object has no attribute 'image'
>>> ml.image.ImageSchema
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'module' object has no attribute 'image'
```

```python
>>> "ImageSchema" in globals()
False
>>> from pyspark.ml import *
>>> "ImageSchema" in globals()
False
>>> ImageSchema
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'ImageSchema' is not defined
```

After:

```python
>>> from pyspark import ml
>>> ml.image

>>> ml.image.ImageSchema

```

```python
>>> "ImageSchema" in globals()
False
>>> from pyspark.ml import *
>>> "ImageSchema" in globals()
True
>>> ImageSchema

```



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-24454

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21483.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21483


commit 4b0be58db843609c2f7fece7becb5187b9086155
Author: hyukjinkwon 
Date:   2018-06-02T04:07:20Z

Imports image module in ml/__init__.py and add ImageSchema into __all__




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21479
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91409/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21479
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21479
  
**[Test build #91409 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91409/testReport)**
 for PR 21479 at commit 
[`0ad3dd7`](https://github.com/apache/spark/commit/0ad3dd75bc1a74ca88c9ace8899fd2729aaa16b5).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21370
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3772/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21370
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192548464
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and REPL you are using 
supports eager evaluation,
+dataframe will be ran automatically. HTML table will feedback the 
queries user have defined if
--- End diff --

The HTML table is generated by `_repr_html_`, isn't Jupyter only term. 
`_repr_html` is the rich display support for IPython in notebook and Qt 
console. I think it can be used in other place but currently I just test this 
in Jupyter. I re-write the doc, please check is it appropriate, thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-06-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21370
  
**[Test build #91410 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91410/testReport)**
 for PR 21370 at commit 
[`5b36604`](https://github.com/apache/spark/commit/5b3660458945eb318b51b327fcaf10dc94dde82e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192548359
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and REPL you are using 
supports eager evaluation,
+dataframe will be ran automatically. HTML table will feedback the 
queries user have defined if
+_repl_html_ called by notebooks like Jupyter, otherwise 
for plain Python REPL, output
--- End diff --

Thanks, done in 5b36604.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192548352
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and REPL you are using 
supports eager evaluation,
+dataframe will be ran automatically. HTML table will feedback the 
queries user have defined if
+_repl_html_ called by notebooks like Jupyter, otherwise 
for plain Python REPL, output
+will be shown like dataframe.show()
+(see https://issues.apache.org/jira/browse/SPARK-24215;>SPARK-24215 for 
more details).
+  
+
+
+  spark.sql.repl.eagerEval.maxNumRows
+  20
+  
+Default number of rows in eager evaluation output HTML table generated 
by _repr_html_ or plain text,
+this only take effect when 
spark.sql.repl.eagerEval.enabled set to true.
--- End diff --

Thanks, done in 5b36604.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192548361
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and REPL you are using 
supports eager evaluation,
--- End diff --

Thanks, done in 5b36604.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21258: [SPARK-23933][SQL] Add map_from_arrays function

2018-06-01 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21258#discussion_r192548230
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -235,6 +235,86 @@ case class CreateMap(children: Seq[Expression]) 
extends Expression {
   override def prettyName: String = "map"
 }
 
+/**
+ * Returns a catalyst Map containing the two arrays in children 
expressions as keys and values.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(keys, values) - Creates a map with a pair of the given 
key/value arrays. All elements
+  in keys should not be null""",
+  examples = """
+Examples:
+  > SELECT _FUNC_([1.0, 3.0], ['2', '4']);
+   {1.0:"2",3.0:"4"}
+  """, since = "2.4.0")
+case class CreateMapFromArrays(left: Expression, right: Expression)
+extends BinaryExpression with ExpectsInputTypes {
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType, 
ArrayType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+(left.dataType, right.dataType) match {
+  case (ArrayType(_, _), ArrayType(_, _)) =>
+TypeCheckResult.TypeCheckSuccess
+  case _ =>
+TypeCheckResult.TypeCheckFailure("The given two arguments should 
be an array")
+}
+  }
+
+  override def dataType: DataType = {
+MapType(
+  keyType = left.dataType.asInstanceOf[ArrayType].elementType,
+  valueType = right.dataType.asInstanceOf[ArrayType].elementType,
+  valueContainsNull = 
right.dataType.asInstanceOf[ArrayType].containsNull)
+  }
+
+  override def nullable: Boolean = left.nullable || right.nullable
+
+  override def nullSafeEval(keyArray: Any, valueArray: Any): Any = {
+val keyArrayData = keyArray.asInstanceOf[ArrayData]
+val valueArrayData = valueArray.asInstanceOf[ArrayData]
+if (keyArrayData.numElements != valueArrayData.numElements) {
+  throw new RuntimeException("The given two arrays should have the 
same length")
+}
+val leftArrayType = left.dataType.asInstanceOf[ArrayType]
+if (leftArrayType.containsNull) {
+  if (keyArrayData.toArray(leftArrayType.elementType).contains(null)) {
+throw new RuntimeException("Cannot use null as map key!")
+  }
+}
+new ArrayBasedMapData(keyArrayData.copy(), valueArrayData.copy())
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+nullSafeCodeGen(ctx, ev, (keyArrayData, valueArrayData) => {
+  val arrayBasedMapData = classOf[ArrayBasedMapData].getName
+  val leftArrayType = left.dataType.asInstanceOf[ArrayType]
+  val keyArrayElemNullCheck = if (!leftArrayType.containsNull) "" else 
{
+val leftArrayTypeTerm = ctx.addReferenceObj("leftArrayType", 
leftArrayType.elementType)
+val array = ctx.freshName("array")
+val i = ctx.freshName("i")
+s"""
+   |Object[] $array = 
$keyArrayData.toObjectArray($leftArrayTypeTerm);
+   |for (int $i = 0; $i < $array.length; $i++) {
+   |  if ($array[$i] == null) {
+   |throw new RuntimeException("Cannot use null as map key!");
+   |  }
+   |}
--- End diff --

good catch, thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21258: [SPARK-23933][SQL] Add map_from_arrays function

2018-06-01 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21258#discussion_r192548103
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -235,6 +235,86 @@ case class CreateMap(children: Seq[Expression]) 
extends Expression {
   override def prettyName: String = "map"
 }
 
+/**
+ * Returns a catalyst Map containing the two arrays in children 
expressions as keys and values.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(keys, values) - Creates a map with a pair of the given 
key/value arrays. All elements
+  in keys should not be null""",
+  examples = """
+Examples:
+  > SELECT _FUNC_([1.0, 3.0], ['2', '4']);
+   {1.0:"2",3.0:"4"}
+  """, since = "2.4.0")
+case class CreateMapFromArrays(left: Expression, right: Expression)
--- End diff --

In existing convention, `"map" -> "CreateMap"`. How about 
`"map_from_arrays" -> ???`?
I am neutral on `MapFromArrays` or `CreateMapFromArrays`. WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21258: [SPARK-23933][SQL] Add map_from_arrays function

2018-06-01 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21258#discussion_r192547906
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1798,6 +1798,22 @@ def create_map(*cols):
 return Column(jc)
 
 
+@ignore_unicode_prefix
+@since(2.4)
+def create_map_from_arrays(col1, col2):
--- End diff --

Sure


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21482#discussion_r192547440
  
--- Diff: R/pkg/R/functions.R ---
@@ -907,6 +907,30 @@ setMethod("initcap",
 column(jc)
   })
 
+#' @details
+#' \code{isinf}: Returns true if the column is Infinity.
+#' @rdname column_nonaggregate_functions
+#' @aliases isnan isnan,Column-method
+#' @note isinf since 2.4.0
+setMethod("isinf",
+  signature(x = "Column"),
+  function(x) {
+jc <- callJStatic("org.apache.spark.sql.functions", "isinf", 
x@jc)
+column(jc)
+  })
+
+#' @details
+#' \code{isInf}: Returns true if the column is Infinity.
+#' @rdname column_nonaggregate_functions
+#' @aliases isnan isnan,Column-method
+#' @note isinf since 2.4.0
+setMethod("isInf",
--- End diff --

R has `is.infinite`. Can we match the behaviour and rename it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21482#discussion_r192547376
  
--- Diff: R/pkg/R/functions.R ---
@@ -907,6 +907,30 @@ setMethod("initcap",
 column(jc)
   })
 
+#' @details
+#' \code{isinf}: Returns true if the column is Infinity.
+#' @rdname column_nonaggregate_functions
+#' @aliases isnan isnan,Column-method
+#' @note isinf since 2.4.0
+setMethod("isinf",
--- End diff --

R has `is.infinite`. Can we match the behaviour and rename it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21482#discussion_r192547364
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -468,6 +468,18 @@ def input_file_name():
 return Column(sc._jvm.functions.input_file_name())
 
 
+@since(2.4)
+def isinf(col):
+"""An expression that returns true iff the column is NaN.
--- End diff --

ditto. is this the same with `isnan`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21482#discussion_r192547355
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ---
@@ -557,6 +557,14 @@ class Column(val expr: Expression) extends Logging {
 (this >= lowerBound) && (this <= upperBound)
   }
 
+  /**
+   * True if the current expression is NaN.
--- End diff --

? is this the same with `isNaN`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21482#discussion_r192547274
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -1107,6 +1107,14 @@ object functions {
*/
   def input_file_name(): Column = withExpr { InputFileName() }
 
+  /**
+   * Return true iff the column is Infinity.
+   *
+   * @group normal_funcs
+   * @since 2.4.0
+   */
+  def isinf(e: Column): Column = withExpr { IsInf(e.expr) }
--- End diff --

Mind if I ask to elaborate  `isinf` vs `isInf` across the APIs?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21479
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21479
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91407/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21479
  
**[Test build #91407 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91407/testReport)**
 for PR 21479 at commit 
[`c9d2bc3`](https://github.com/apache/spark/commit/c9d2bc348495669bd4347679547f1437f35367f1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91406/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-06-01 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r192546226
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -2189,3 +2189,302 @@ case class ArrayRemove(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_remove"
 }
+
+object ArraySetLike {
+  private val MAX_ARRAY_LENGTH: Int = 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH
+
+  def toArrayDataInt(hs: OpenHashSet[Int]): ArrayData = {
+val array = new Array[Int](hs.size)
+var pos = hs.nextPos(0)
+var i = 0
+while (pos != OpenHashSet.INVALID_POS) {
+  array(i) = hs.getValue(pos)
+  pos = hs.nextPos(pos + 1)
+  i += 1
+}
+
+if (useGenericArrayData(LongType.defaultSize, array.length)) {
+  new GenericArrayData(array)
+} else {
+  UnsafeArrayData.fromPrimitiveArray(array)
+}
+  }
+
+  def toArrayDataLong(hs: OpenHashSet[Long]): ArrayData = {
+val array = new Array[Long](hs.size)
+var pos = hs.nextPos(0)
+var i = 0
+while (pos != OpenHashSet.INVALID_POS) {
+  array(i) = hs.getValue(pos)
+  pos = hs.nextPos(pos + 1)
+  i += 1
+}
+
+if (useGenericArrayData(LongType.defaultSize, array.length)) {
+  new GenericArrayData(array)
+} else {
+  UnsafeArrayData.fromPrimitiveArray(array)
+}
+  }
+
+  def useGenericArrayData(elementSize: Int, length: Int): Boolean = {
--- End diff --

Although I tried it, I stopped reusing. This is because 
`UnsafeArrayData.fromPrimitiveArray()` also uses variables (e.g. 
`headerInBytes` and `valueRegionInBytes`) calculated in this method.
I think that there is no typical way to return multiple values from a 
function.

Thus, we can move this to `UnsafeArrayData`. But, it is not easy to reuse 
it. WDYT?

```
  private static UnsafeArrayData fromPrimitiveArray(
   Object arr, int offset, int length, int elementSize) {
final long headerInBytes = calculateHeaderPortionInBytes(length);
final long valueRegionInBytes = elementSize * length;
final long totalSizeInLongs = (headerInBytes + valueRegionInBytes + 7) 
/ 8;
if (totalSizeInLongs > Integer.MAX_VALUE / 8) {
  throw new UnsupportedOperationException("Cannot convert this array to 
unsafe format as " +
"it's too big.");
}

final long[] data = new long[(int)totalSizeInLongs];

Platform.putLong(data, Platform.LONG_ARRAY_OFFSET, length);
Platform.copyMemory(arr, offset, data,
  Platform.LONG_ARRAY_OFFSET + headerInBytes, valueRegionInBytes);

UnsafeArrayData result = new UnsafeArrayData();
result.pointTo(data, Platform.LONG_ARRAY_OFFSET, (int)totalSizeInLongs 
* 8);
return result;
  }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
**[Test build #91406 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91406/testReport)**
 for PR 20697 at commit 
[`4c5677a`](https://github.com/apache/spark/commit/4c5677a61fd940b818d81469e6640cb45f00ce58).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21482
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21482
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91408/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21482
  
**[Test build #91408 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91408/testReport)**
 for PR 21482 at commit 
[`9ab0eb2`](https://github.com/apache/spark/commit/9ab0eb24295c20e564817d69b3b3315d9b2a3359).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21481: [SPARK-24452][SQL][Core] Avoid possible overflow in int ...

2018-06-01 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21481
  
cc @ueshin @hvanhovell 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21482
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21482
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91405/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21482
  
**[Test build #91405 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91405/testReport)**
 for PR 21482 at commit 
[`bcdaab2`](https://github.com/apache/spark/commit/bcdaab2f8c9c5afc877d3a54f658296aba78fdf0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class IsInf(child: Expression) extends UnaryExpression`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

2018-06-01 Thread jose-torres

Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/21469
  
LGTM. To clarify the description, we expect the memory footprint to be much 
larger than query status reports in situations where the state store is getting 
a lot of updates?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21479
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21481: [SPARK-24452][SQL][Core] Avoid possible overflow in int ...

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21481
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21481: [SPARK-24452][SQL][Core] Avoid possible overflow in int ...

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21481
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91404/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21479
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3771/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21481: [SPARK-24452][SQL][Core] Avoid possible overflow in int ...

2018-06-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21481
  
**[Test build #91404 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91404/testReport)**
 for PR 21481 at commit 
[`324fd5c`](https://github.com/apache/spark/commit/324fd5ccb73c8017f5537031db21b687ac1ca27a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21479
  
**[Test build #91409 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91409/testReport)**
 for PR 21479 at commit 
[`0ad3dd7`](https://github.com/apache/spark/commit/0ad3dd75bc1a74ca88c9ace8899fd2729aaa16b5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

2018-06-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21346
  
**[Test build #4194 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4194/testReport)**
 for PR 21346 at commit 
[`83c3271`](https://github.com/apache/spark/commit/83c3271d2f45bbef18d865bddbc6807e9fbd2503).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread NihalHarish

Github user NihalHarish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21482#discussion_r192541712
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
 ---
@@ -199,6 +199,50 @@ case class Nvl2(expr1: Expression, expr2: Expression, 
expr3: Expression, child:
   override def sql: String = s"$prettyName(${expr1.sql}, ${expr2.sql}, 
${expr3.sql})"
 }
 
+/**
+ * Evaluates to `true` iff it's Infinity.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr) - Returns True evaluates to infinite else returns 
False ",
+  examples = """
+Examples:
+  > SELECT _FUNC_(1/0);
+   True
+  > SELECT _FUNC_(5);
+   False
+  """)
+case class IsInf(child: Expression) extends UnaryExpression
+  with Predicate with ImplicitCastInputTypes {
+
+  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(DoubleType, FloatType))
+
+  override def nullable: Boolean = false
+
+  override def eval(input: InternalRow): Boolean = {
+val value = child.eval(input)
+if (value == null) {
+  false
+} else {
+  child.dataType match {
+case DoubleType => value.asInstanceOf[Double].isInfinity
+case FloatType => value.asInstanceOf[Float].isInfinity
+  }
+}
+  }
+
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val eval = child.genCode(ctx)
+child.dataType match {
+  case DoubleType | FloatType =>
+ev.copy(code = code"""
+  ${eval.code}
+  ${CodeGenerator.javaType(dataType)} ${ev.value} = 
${CodeGenerator.defaultValue(dataType)};
+  ${ev.value} = !${eval.isNull} && 
Double.isInfinite(${eval.value});""",
--- End diff --

The non-codegen version uses the isInfinity method defined for scala's 
Double and Float, whereas the codegen version uses java's static method 
"isInfinite" defined for the classes Double and Float.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop ite...

2018-06-01 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21467#discussion_r192539912
  
--- Diff: python/pyspark/util.py ---
@@ -53,16 +53,11 @@ def _get_argspec(f):
 """
 Get argspec of a function. Supports both Python 2 and Python 3.
 """
-
-if hasattr(f, '_argspec'):
-# only used for pandas UDF: they wrap the user function, losing 
its signature
-# workers need this signature, so UDF saves it here
-argspec = f._argspec
-elif sys.version_info[0] < 3:
+# `getargspec` is deprecated since python3.0 (incompatible with 
function annotations).
--- End diff --

yea, I think the comment is for the else block.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-01 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21467
  
@e-dorigatti I see. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21258: [SPARK-23933][SQL] Add map_from_arrays function

2018-06-01 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21258#discussion_r192536919
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ComplexTypeSuite.scala
 ---
@@ -186,6 +186,50 @@ class ComplexTypeSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
   }
 
+  test("CreateMapFromArrays") {
--- End diff --

`MapFromArrays`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21258: [SPARK-23933][SQL] Add map_from_arrays function

2018-06-01 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21258#discussion_r192535551
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -235,6 +235,86 @@ case class CreateMap(children: Seq[Expression]) 
extends Expression {
   override def prettyName: String = "map"
 }
 
+/**
+ * Returns a catalyst Map containing the two arrays in children 
expressions as keys and values.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(keys, values) - Creates a map with a pair of the given 
key/value arrays. All elements
+  in keys should not be null""",
+  examples = """
+Examples:
+  > SELECT _FUNC_([1.0, 3.0], ['2', '4']);
+   {1.0:"2",3.0:"4"}
+  """, since = "2.4.0")
+case class CreateMapFromArrays(left: Expression, right: Expression)
--- End diff --

`MapFromArrays`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21258: [SPARK-23933][SQL] Add map_from_arrays function

2018-06-01 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21258#discussion_r192535592
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1798,6 +1798,22 @@ def create_map(*cols):
 return Column(jc)
 
 
+@ignore_unicode_prefix
+@since(2.4)
+def create_map_from_arrays(col1, col2):
--- End diff --

`map_from_arrays`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21258: [SPARK-23933][SQL] Add map_from_arrays function

2018-06-01 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21258#discussion_r192536842
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -235,6 +235,86 @@ case class CreateMap(children: Seq[Expression]) 
extends Expression {
   override def prettyName: String = "map"
 }
 
+/**
+ * Returns a catalyst Map containing the two arrays in children 
expressions as keys and values.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(keys, values) - Creates a map with a pair of the given 
key/value arrays. All elements
+  in keys should not be null""",
+  examples = """
+Examples:
+  > SELECT _FUNC_([1.0, 3.0], ['2', '4']);
+   {1.0:"2",3.0:"4"}
+  """, since = "2.4.0")
+case class CreateMapFromArrays(left: Expression, right: Expression)
+extends BinaryExpression with ExpectsInputTypes {
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType, 
ArrayType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+(left.dataType, right.dataType) match {
+  case (ArrayType(_, _), ArrayType(_, _)) =>
+TypeCheckResult.TypeCheckSuccess
+  case _ =>
+TypeCheckResult.TypeCheckFailure("The given two arguments should 
be an array")
+}
+  }
+
+  override def dataType: DataType = {
+MapType(
+  keyType = left.dataType.asInstanceOf[ArrayType].elementType,
+  valueType = right.dataType.asInstanceOf[ArrayType].elementType,
+  valueContainsNull = 
right.dataType.asInstanceOf[ArrayType].containsNull)
+  }
+
+  override def nullable: Boolean = left.nullable || right.nullable
+
+  override def nullSafeEval(keyArray: Any, valueArray: Any): Any = {
+val keyArrayData = keyArray.asInstanceOf[ArrayData]
+val valueArrayData = valueArray.asInstanceOf[ArrayData]
+if (keyArrayData.numElements != valueArrayData.numElements) {
+  throw new RuntimeException("The given two arrays should have the 
same length")
+}
+val leftArrayType = left.dataType.asInstanceOf[ArrayType]
+if (leftArrayType.containsNull) {
+  if (keyArrayData.toArray(leftArrayType.elementType).contains(null)) {
+throw new RuntimeException("Cannot use null as map key!")
+  }
--- End diff --

We can use loop to null-check without converting to object array?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21258: [SPARK-23933][SQL] Add map_from_arrays function

2018-06-01 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21258#discussion_r192535941
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -235,6 +235,86 @@ case class CreateMap(children: Seq[Expression]) 
extends Expression {
   override def prettyName: String = "map"
 }
 
+/**
+ * Returns a catalyst Map containing the two arrays in children 
expressions as keys and values.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(keys, values) - Creates a map with a pair of the given 
key/value arrays. All elements
+  in keys should not be null""",
+  examples = """
+Examples:
+  > SELECT _FUNC_([1.0, 3.0], ['2', '4']);
+   {1.0:"2",3.0:"4"}
+  """, since = "2.4.0")
+case class CreateMapFromArrays(left: Expression, right: Expression)
+extends BinaryExpression with ExpectsInputTypes {
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType, 
ArrayType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+(left.dataType, right.dataType) match {
+  case (ArrayType(_, _), ArrayType(_, _)) =>
+TypeCheckResult.TypeCheckSuccess
+  case _ =>
+TypeCheckResult.TypeCheckFailure("The given two arguments should 
be an array")
+}
+  }
+
+  override def dataType: DataType = {
+MapType(
+  keyType = left.dataType.asInstanceOf[ArrayType].elementType,
+  valueType = right.dataType.asInstanceOf[ArrayType].elementType,
+  valueContainsNull = 
right.dataType.asInstanceOf[ArrayType].containsNull)
+  }
+
+  override def nullable: Boolean = left.nullable || right.nullable
+
+  override def nullSafeEval(keyArray: Any, valueArray: Any): Any = {
+val keyArrayData = keyArray.asInstanceOf[ArrayData]
+val valueArrayData = valueArray.asInstanceOf[ArrayData]
+if (keyArrayData.numElements != valueArrayData.numElements) {
+  throw new RuntimeException("The given two arrays should have the 
same length")
+}
+val leftArrayType = left.dataType.asInstanceOf[ArrayType]
+if (leftArrayType.containsNull) {
+  if (keyArrayData.toArray(leftArrayType.elementType).contains(null)) {
+throw new RuntimeException("Cannot use null as map key!")
+  }
+}
+new ArrayBasedMapData(keyArrayData.copy(), valueArrayData.copy())
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+nullSafeCodeGen(ctx, ev, (keyArrayData, valueArrayData) => {
+  val arrayBasedMapData = classOf[ArrayBasedMapData].getName
+  val leftArrayType = left.dataType.asInstanceOf[ArrayType]
+  val keyArrayElemNullCheck = if (!leftArrayType.containsNull) "" else 
{
+val leftArrayTypeTerm = ctx.addReferenceObj("leftArrayType", 
leftArrayType.elementType)
+val array = ctx.freshName("array")
+val i = ctx.freshName("i")
+s"""
+   |Object[] $array = 
$keyArrayData.toObjectArray($leftArrayTypeTerm);
+   |for (int $i = 0; $i < $array.length; $i++) {
+   |  if ($array[$i] == null) {
+   |throw new RuntimeException("Cannot use null as map key!");
+   |  }
+   |}
--- End diff --

We can null-check without converting to object array.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91401/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-06-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21061
  
**[Test build #91401 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91401/testReport)**
 for PR 21061 at commit 
[`adc68cc`](https://github.com/apache/spark/commit/adc68cc033dec8b26be23e861eb53b466f35ad38).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21479#discussion_r192534591
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1206,6 +1206,41 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 new StringLocate(expression(ctx.substr), expression(ctx.str))
   }
 
+  /**
+   * Create a Extract expression.
+   */
+  override def visitExtract(ctx: ExtractContext): Expression = 
withOrigin(ctx) {
+val extractType = ctx.field.getText.toUpperCase(Locale.ROOT)
+try {
+  extractType match {
+case "YEAR" =>
+  Year(expression(ctx.source))
+case "QUARTER" =>
+  Quarter(expression(ctx.source))
+case "MONTH" =>
+  Month(expression(ctx.source))
+case "WEEK" =>
+  WeekOfYear(expression(ctx.source))
+case "DAY" =>
+  DayOfMonth(expression(ctx.source))
+case "DOW" =>
+  DayOfWeek(expression(ctx.source))
+case "HOUR" =>
+  Hour(expression(ctx.source))
+case "MINUTE" =>
+  Minute(expression(ctx.source))
+case "SECOND" =>
+  Second(expression(ctx.source))
+case other =>
+  throw new ParseException(s"Literals of type '$other' are 
currently not supported.", ctx)
+  }
+} catch {
+  case e: IllegalArgumentException =>
--- End diff --

Do we need this try-catch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21479#discussion_r192534446
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -739,6 +740,7 @@ nonReserved
 | VIEW | REPLACE
 | IF
 | POSITION
+| EXTRACT | YEAR | QUARTER | MONTH | WEEK | DAY | DOW | HOUR | MINUTE 
| SECOND
--- End diff --

We can remove each term except for `EXTRACT`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21479#discussion_r192534696
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1206,6 +1206,41 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 new StringLocate(expression(ctx.substr), expression(ctx.str))
   }
 
+  /**
+   * Create a Extract expression.
+   */
+  override def visitExtract(ctx: ExtractContext): Expression = 
withOrigin(ctx) {
+val extractType = ctx.field.getText.toUpperCase(Locale.ROOT)
+try {
+  extractType match {
+case "YEAR" =>
+  Year(expression(ctx.source))
+case "QUARTER" =>
+  Quarter(expression(ctx.source))
+case "MONTH" =>
+  Month(expression(ctx.source))
+case "WEEK" =>
+  WeekOfYear(expression(ctx.source))
+case "DAY" =>
+  DayOfMonth(expression(ctx.source))
+case "DOW" =>
--- End diff --

`"DAYOFWEEK"` ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21479
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21479
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3770/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21482
  
**[Test build #91408 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91408/testReport)**
 for PR 21482 at commit 
[`9ab0eb2`](https://github.com/apache/spark/commit/9ab0eb24295c20e564817d69b3b3315d9b2a3359).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21479
  
**[Test build #91407 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91407/testReport)**
 for PR 21479 at commit 
[`c9d2bc3`](https://github.com/apache/spark/commit/c9d2bc348495669bd4347679547f1437f35367f1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21282: [SPARK-23934][SQL] Adding map_from_entries functi...

2018-06-01 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21282#discussion_r192531374
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -118,6 +120,229 @@ case class MapValues(child: Expression)
   override def prettyName: String = "map_values"
 }
 
+/**
+ * Returns a map created from the given array of entries.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(arrayOfEntries) - Returns a map created from the given 
array of entries.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));
+   {1:"a",2:"b"}
+  """,
+  since = "2.4.0")
+case class MapFromEntries(child: Expression) extends UnaryExpression
+{
+  private lazy val resolvedDataType: Option[MapType] = child.dataType 
match {
+case ArrayType(
+  StructType(Array(
+StructField(_, keyType, false, _),
+StructField(_, valueType, valueNullable, _))),
+  false) => Some(MapType(keyType, valueType, valueNullable))
+case _ => None
+  }
+
+  override def dataType: MapType = resolvedDataType.get
+
+  override def checkInputDataTypes(): TypeCheckResult = resolvedDataType 
match {
+case Some(_) => TypeCheckResult.TypeCheckSuccess
+case None => TypeCheckResult.TypeCheckFailure(s"'${child.sql}' is of " 
+
+  s"${child.dataType.simpleString} type. $prettyName accepts only 
null-free arrays " +
+  "of pair structs. Values of the first struct field can't contain 
nulls and produce " +
+  "duplicates.")
+  }
+
+  override protected def nullSafeEval(input: Any): Any = {
+val arrayData = input.asInstanceOf[ArrayData]
+val length = arrayData.numElements()
+val keyArray = new Array[AnyRef](length)
+val keySet = new OpenHashSet[AnyRef]()
+val valueArray = new Array[AnyRef](length)
+var i = 0;
+while (i < length) {
+  val entry = arrayData.getStruct(i, 2)
+  val key = entry.get(0, dataType.keyType)
+  if (key == null) {
+throw new RuntimeException("The first field from a struct (key) 
can't be null.")
+  }
+  if (keySet.contains(key)) {
--- End diff --

I'm sorry for the super delay.
Let's just ignore the duplicated key like `CreateMap` for now. We will need 
to discuss map-related topics, such as duplicate keys, equality or ordering, 
etc.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21400: [SPARK-24351][SS]offsetLog/commitLog purge thresholdBatc...

2018-06-01 Thread ivoson

Github user ivoson commented on the issue:

https://github.com/apache/spark/pull/21400
  
@jose-torres @xuanyuanking  @zsxwing Thanks for reviewing this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3631/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3769/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3631/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
**[Test build #91406 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91406/testReport)**
 for PR 20697 at commit 
[`4c5677a`](https://github.com/apache/spark/commit/4c5677a61fd940b818d81469e6640cb45f00ce58).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20894
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91399/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20894
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-06-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20894
  
**[Test build #91399 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91399/testReport)**
 for PR 20894 at commit 
[`3b37712`](https://github.com/apache/spark/commit/3b37712ded664aaf716306574f50072e58b9bbd1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread henryr

Github user henryr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21482#discussion_r192522027
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
 ---
@@ -199,6 +199,50 @@ case class Nvl2(expr1: Expression, expr2: Expression, 
expr3: Expression, child:
   override def sql: String = s"$prettyName(${expr1.sql}, ${expr2.sql}, 
${expr3.sql})"
 }
 
+/**
+ * Evaluates to `true` iff it's Infinity.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr) - Returns True evaluates to infinite else returns 
False ",
--- End diff --

"True evaluates" -> "True if expr evaluates"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread henryr

Github user henryr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21482#discussion_r192520713
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NullExpressionsSuite.scala
 ---
@@ -56,6 +56,16 @@ class NullExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 assert(ex.contains("Null value appeared in non-nullable field"))
   }
 
+  test("IsInf") {
+checkEvaluation(IsInf(Literal(Double.PositiveInfinity)), true)
+checkEvaluation(IsInf(Literal(Double.NegativeInfinity)), true)
+checkEvaluation(IsInf(Literal(Float.PositiveInfinity)), true)
+checkEvaluation(IsInf(Literal(Float.NegativeInfinity)), true)
+checkEvaluation(IsInf(Literal.create(null, DoubleType)), false)
+checkEvaluation(IsInf(Literal(Float.MaxValue)), false)
+checkEvaluation(IsInf(Literal(5.5f)), false)
--- End diff --

check NaN as well?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread henryr

Github user henryr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21482#discussion_r192521881
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
 ---
@@ -199,6 +199,50 @@ case class Nvl2(expr1: Expression, expr2: Expression, 
expr3: Expression, child:
   override def sql: String = s"$prettyName(${expr1.sql}, ${expr2.sql}, 
${expr3.sql})"
 }
 
+/**
+ * Evaluates to `true` iff it's Infinity.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr) - Returns True evaluates to infinite else returns 
False ",
+  examples = """
+Examples:
+  > SELECT _FUNC_(1/0);
+   True
+  > SELECT _FUNC_(5);
+   False
+  """)
+case class IsInf(child: Expression) extends UnaryExpression
+  with Predicate with ImplicitCastInputTypes {
+
+  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(DoubleType, FloatType))
+
+  override def nullable: Boolean = false
+
+  override def eval(input: InternalRow): Boolean = {
+val value = child.eval(input)
+if (value == null) {
+  false
+} else {
+  child.dataType match {
+case DoubleType => value.asInstanceOf[Double].isInfinity
+case FloatType => value.asInstanceOf[Float].isInfinity
+  }
+}
+  }
+
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val eval = child.genCode(ctx)
+child.dataType match {
+  case DoubleType | FloatType =>
+ev.copy(code = code"""
+  ${eval.code}
+  ${CodeGenerator.javaType(dataType)} ${ev.value} = 
${CodeGenerator.defaultValue(dataType)};
+  ${ev.value} = !${eval.isNull} && 
Double.isInfinite(${eval.value});""",
--- End diff --

out of interest, why use `Double.isInfinite` here, but `value.isInfinity` 
in the non-codegen version?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread henryr

Github user henryr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21482#discussion_r192520834
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -1107,6 +1107,14 @@ object functions {
*/
   def input_file_name(): Column = withExpr { InputFileName() }
 
+  /**
+   * Return true iff the column is Infinity.
+   *
+   * @group normal_funcs
+   * @since 1.6.0
--- End diff --

Need to fix these versions, here and elsewhere. This change would land in 
Spark 2.4.0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread henryr

Github user henryr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21482#discussion_r192520566
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NullExpressionsSuite.scala
 ---
@@ -24,7 +24,7 @@ import 
org.apache.spark.sql.catalyst.expressions.objects.AssertNotNull
 import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, Project}
 import org.apache.spark.sql.types._
 
-class NullExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper 
{
+  class NullExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
--- End diff --

Revert this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21482
  
**[Test build #91405 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91405/testReport)**
 for PR 21482 at commit 
[`bcdaab2`](https://github.com/apache/spark/commit/bcdaab2f8c9c5afc877d3a54f658296aba78fdf0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-06-01 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r192520463
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -2189,3 +2189,302 @@ case class ArrayRemove(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_remove"
 }
+
+object ArraySetLike {
+  private val MAX_ARRAY_LENGTH: Int = 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH
+
+  def toArrayDataInt(hs: OpenHashSet[Int]): ArrayData = {
+val array = new Array[Int](hs.size)
+var pos = hs.nextPos(0)
+var i = 0
+while (pos != OpenHashSet.INVALID_POS) {
+  array(i) = hs.getValue(pos)
+  pos = hs.nextPos(pos + 1)
+  i += 1
+}
+
+if (useGenericArrayData(LongType.defaultSize, array.length)) {
+  new GenericArrayData(array)
+} else {
+  UnsafeArrayData.fromPrimitiveArray(array)
+}
+  }
+
+  def toArrayDataLong(hs: OpenHashSet[Long]): ArrayData = {
+val array = new Array[Long](hs.size)
+var pos = hs.nextPos(0)
+var i = 0
+while (pos != OpenHashSet.INVALID_POS) {
+  array(i) = hs.getValue(pos)
+  pos = hs.nextPos(pos + 1)
+  i += 1
+}
+
+if (useGenericArrayData(LongType.defaultSize, array.length)) {
+  new GenericArrayData(array)
+} else {
+  UnsafeArrayData.fromPrimitiveArray(array)
+}
+  }
+
+  def useGenericArrayData(elementSize: Int, length: Int): Boolean = {
--- End diff --

Shall we move this to `UnsafeArrayData` and reuse it? Maybe the name should 
be modified to fit the case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/21482
  
Jenkins, ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21482
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21482
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21482
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-01 Thread NihalHarish

GitHub user NihalHarish opened a pull request:

https://github.com/apache/spark/pull/21482

[SPARK-24393][SQL] SQL builtin: isinf

## What changes were proposed in this pull request?

Implemented isinf to test if a float or double value is Infinity.

## How was this patch tested?

Unit tests have been added to 

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NullExpressionsSuite.scala


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/NihalHarish/spark 
SPARK-24393-SQL-builtin-isinf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21482.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21482


commit bcdaab2f8c9c5afc877d3a54f658296aba78fdf0
Author: Nihal Harish 
Date:   2018-06-01T21:23:24Z

[SPARK-24393][SQL] SQL builtin: isinf




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21479#discussion_r192516776
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -592,6 +592,7 @@ primaryExpression
 | identifier   
#columnReference
 | base=primaryExpression '.' fieldName=identifier  
#dereference
 | '(' expression ')'   
#parenthesizedExpression
+| EXTRACT '(' field=(YEAR | QUARTER | MONTH | WEEK | DAY | HOUR | 
MINUTE | SECOND) FROM source=valueExpression ')'   #extract
--- End diff --

How about `EXTRACT '(' field=identifier FROM source=valueExpression ')'` 
instead of introducing each term?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 337 matches

Mail list logo