[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19602#discussion_r192550679 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -207,65 +271,68 @@ class HiveClientSuite(version: String) } private def testMetastorePartitionFiltering( - filterString: String, + table: String, + filterExpr: Expression, expectedDs: Seq[Int], expectedH: Seq[Int], expectedChunks: Seq[String]): Unit = { testMetastorePartitionFiltering( - filterString, - (expectedDs, expectedH, expectedChunks) :: Nil, + table, + filterExpr, + Map("ds" -> expectedDs, "h" -> expectedH, "chunk" -> expectedChunks) :: Nil, identity) } private def testMetastorePartitionFiltering( - filterString: String, + table: String, + filterExpr: Expression, expectedDs: Seq[Int], expectedH: Seq[Int], expectedChunks: Seq[String], transform: Expression => Expression): Unit = { testMetastorePartitionFiltering( - filterString, - (expectedDs, expectedH, expectedChunks) :: Nil, + table, + filterExpr, + Map("ds" -> expectedDs, "h" -> expectedH, "chunk" -> expectedChunks) :: Nil, identity) } private def testMetastorePartitionFiltering( - filterString: String, - expectedPartitionCubes: Seq[(Seq[Int], Seq[Int], Seq[String])]): Unit = { -testMetastorePartitionFiltering(filterString, expectedPartitionCubes, identity) + table: String, + filterExpr: Expression, + expectedPartitionCubes: Seq[Map[String, Seq[Any]]]): Unit = { +testMetastorePartitionFiltering(table, filterExpr, expectedPartitionCubes, identity) } private def testMetastorePartitionFiltering( - filterString: String, - expectedPartitionCubes: Seq[(Seq[Int], Seq[Int], Seq[String])], + table: String, + filterExpr: Expression, + expectedPartitionCubes: Seq[Map[String, Seq[Any]]], transform: Expression => Expression): Unit = { -val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test"), +val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", table), Seq( -transform(parseExpression(filterString)) +transform(filterExpr) )) -val expectedPartitionCount = expectedPartitionCubes.map { - case (expectedDs, expectedH, expectedChunks) => -expectedDs.size * expectedH.size * expectedChunks.size -}.sum - -val expectedPartitions = expectedPartitionCubes.map { - case (expectedDs, expectedH, expectedChunks) => -for { - ds <- expectedDs - h <- expectedH - chunk <- expectedChunks -} yield Set( - "ds" -> ds.toString, - "h" -> h.toString, - "chunk" -> chunk -) -}.reduce(_ ++ _) +val expectedPartitionCount = expectedPartitionCubes.map(_.map(_._2.size).product).sum + +val expectedPartitions = expectedPartitionCubes.map(getPartitionsFromCube(_)).reduce(_ ++ _) val actualFilteredPartitionCount = filteredPartitions.size assert(actualFilteredPartitionCount == expectedPartitionCount, s"Expected $expectedPartitionCount partitions but got $actualFilteredPartitionCount") -assert(filteredPartitions.map(_.spec.toSet).toSet == expectedPartitions.toSet) +assert(filteredPartitions.map(_.spec).toSet == expectedPartitions.toSet) + } + + private def getPartitionsFromCube(cube: Map[String, Seq[Any]]): Seq[Map[String, String]] = { +cube.map { + case (k: String, pts: Seq[Any]) => pts.map(pt => (k, pt.toString)) +}.foldLeft(Seq(Seq[(String, String)]()))((seq0, seq1) => { --- End diff -- Hmm, it's a recursion problem. I tried to use loop states directly, but seems not become easier readable; In current change, I extract a `PartitionSpec` type and add comment. I think it's better now? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19602 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3775/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19602 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19602 **[Test build #91413 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91413/testReport)** for PR 19602 at commit [`e4c6e1f`](https://github.com/apache/spark/commit/e4c6e1ff713a7033b0a60dabaca5071b480d7600). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19602#discussion_r192550486 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -59,38 +61,62 @@ class HiveClientSuite(version: String) "h" -> h.toString, "chunk" -> chunk ), storageFormat) -assert(partitions.size == testPartitionCount) +assert(partitions0.size == testPartitionCount0) client.createPartitions( - "default", "test", partitions, ignoreIfExists = false) + "default", "test0", partitions0, ignoreIfExists = false) + +val partitions1 = + for { +pt <- 0 until 10 +chunk <- Seq("aa", "ab", "ba", "bb") + } yield CatalogTablePartition(Map( +"pt" -> pt.toString, +"chunk" -> chunk + ), storageFormat) +assert(partitions1.size == testPartitionCount1) + +client.createPartitions( + "default", "test1", partitions1, ignoreIfExists = false) + client } + private def pAttr(table: String, name: String): Attribute = { +val partTypes = client.getTable("default", table).partitionSchema.fields +.map(field => (field.name, field.dataType)).toMap +partTypes.get(name) match { + case Some(dt) => AttributeReference(name, dt)() + case None => +fail(s"Illegal name of partition attribute: $name") +} + } + override def beforeAll() { super.beforeAll() client = init(true) } test(s"getPartitionsByFilter returns all partitions when $tryDirectSqlKey=false") { val client = init(false) -val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test"), - Seq(parseExpression("ds=20170101"))) +val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test0"), + Seq(EqualTo(pAttr("test0", "ds"), Literal(20170101, IntegerType --- End diff -- Thanks, with `org.apache.spark.sql.catalyst.dsl.expressions._`, code can be much cleaner. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21479 **[Test build #91412 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91412/testReport)** for PR 21479 at commit [`0ad3dd7`](https://github.com/apache/spark/commit/0ad3dd75bc1a74ca88c9ace8899fd2729aaa16b5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21479 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21479 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3774/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/21479 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21483: [SPARK-24454][ML][PYTHON] Imports image module in ml/__i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21483 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21483: [SPARK-24454][ML][PYTHON] Imports image module in ml/__i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21483 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91411/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21483: [SPARK-24454][ML][PYTHON] Imports image module in ml/__i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21483 **[Test build #91411 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91411/testReport)** for PR 21483 at commit [`4b0be58`](https://github.com/apache/spark/commit/4b0be58db843609c2f7fece7becb5187b9086155). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21483: [SPARK-24454][ML][PYTHON] Imports image module in ml/__i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21483 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3773/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21483: [SPARK-24454][ML][PYTHON] Imports image module in ml/__i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21483 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192549119 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala --- @@ -199,6 +199,50 @@ case class Nvl2(expr1: Expression, expr2: Expression, expr3: Expression, child: override def sql: String = s"$prettyName(${expr1.sql}, ${expr2.sql}, ${expr3.sql})" } +/** + * Evaluates to `true` iff it's Infinity. + */ +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns True if expr evaluates to infinite else returns False ", --- End diff -- True -> true, False -> false to be consistent --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192549111 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala --- @@ -199,6 +199,50 @@ case class Nvl2(expr1: Expression, expr2: Expression, expr3: Expression, child: override def sql: String = s"$prettyName(${expr1.sql}, ${expr2.sql}, ${expr3.sql})" } +/** + * Evaluates to `true` iff it's Infinity. + */ +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns True if expr evaluates to infinite else returns False ", + examples = """ +Examples: + > SELECT _FUNC_(1/0); + True --- End diff -- Can you run the example and check the results? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21483: [SPARK-24454][ML][PYTHON] Imports image module in ml/__i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21483 **[Test build #91411 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91411/testReport)** for PR 21483 at commit [`4b0be58`](https://github.com/apache/spark/commit/4b0be58db843609c2f7fece7becb5187b9086155). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21483: [SPARK-24454][ML][PYTHON] Imports image module in ml/__i...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21483 cc @mengxr, I guess this change is what you actually intended? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21483: [SPARK-24454][ML][PYTHON] Imports image module in...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/21483 [SPARK-24454][ML][PYTHON] Imports image module in ml/__init__.py and add ImageSchema into __all__ ## What changes were proposed in this pull request? This PR attaches image APIs to ml module too to more expose this. ## How was this patch tested? Before: ```python >>> from pyspark import ml >>> ml.image Traceback (most recent call last): File "", line 1, in AttributeError: 'module' object has no attribute 'image' >>> ml.image.ImageSchema Traceback (most recent call last): File "", line 1, in AttributeError: 'module' object has no attribute 'image' ``` ```python >>> "ImageSchema" in globals() False >>> from pyspark.ml import * >>> "ImageSchema" in globals() False >>> ImageSchema Traceback (most recent call last): File "", line 1, in NameError: name 'ImageSchema' is not defined ``` After: ```python >>> from pyspark import ml >>> ml.image >>> ml.image.ImageSchema ``` ```python >>> "ImageSchema" in globals() False >>> from pyspark.ml import * >>> "ImageSchema" in globals() True >>> ImageSchema ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-24454 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21483.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21483 commit 4b0be58db843609c2f7fece7becb5187b9086155 Author: hyukjinkwon Date: 2018-06-02T04:07:20Z Imports image module in ml/__init__.py and add ImageSchema into __all__ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21479 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91409/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21479 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21479 **[Test build #91409 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91409/testReport)** for PR 21479 at commit [`0ad3dd7`](https://github.com/apache/spark/commit/0ad3dd75bc1a74ca88c9ace8899fd2729aaa16b5). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21370 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3772/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21370 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192548464 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and REPL you are using supports eager evaluation, +dataframe will be ran automatically. HTML table will feedback the queries user have defined if --- End diff -- The HTML table is generated by `_repr_html_`, isn't Jupyter only term. `_repr_html` is the rich display support for IPython in notebook and Qt console. I think it can be used in other place but currently I just test this in Jupyter. I re-write the doc, please check is it appropriate, thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21370 **[Test build #91410 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91410/testReport)** for PR 21370 at commit [`5b36604`](https://github.com/apache/spark/commit/5b3660458945eb318b51b327fcaf10dc94dde82e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192548359 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and REPL you are using supports eager evaluation, +dataframe will be ran automatically. HTML table will feedback the queries user have defined if +_repl_html_ called by notebooks like Jupyter, otherwise for plain Python REPL, output --- End diff -- Thanks, done in 5b36604. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192548352 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and REPL you are using supports eager evaluation, +dataframe will be ran automatically. HTML table will feedback the queries user have defined if +_repl_html_ called by notebooks like Jupyter, otherwise for plain Python REPL, output +will be shown like dataframe.show() +(see https://issues.apache.org/jira/browse/SPARK-24215;>SPARK-24215 for more details). + + + + spark.sql.repl.eagerEval.maxNumRows + 20 + +Default number of rows in eager evaluation output HTML table generated by _repr_html_ or plain text, +this only take effect when spark.sql.repl.eagerEval.enabled set to true. --- End diff -- Thanks, done in 5b36604. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192548361 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and REPL you are using supports eager evaluation, --- End diff -- Thanks, done in 5b36604. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21258: [SPARK-23933][SQL] Add map_from_arrays function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21258#discussion_r192548230 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -235,6 +235,86 @@ case class CreateMap(children: Seq[Expression]) extends Expression { override def prettyName: String = "map" } +/** + * Returns a catalyst Map containing the two arrays in children expressions as keys and values. + */ +@ExpressionDescription( + usage = """ +_FUNC_(keys, values) - Creates a map with a pair of the given key/value arrays. All elements + in keys should not be null""", + examples = """ +Examples: + > SELECT _FUNC_([1.0, 3.0], ['2', '4']); + {1.0:"2",3.0:"4"} + """, since = "2.4.0") +case class CreateMapFromArrays(left: Expression, right: Expression) +extends BinaryExpression with ExpectsInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType, ArrayType) + + override def checkInputDataTypes(): TypeCheckResult = { +(left.dataType, right.dataType) match { + case (ArrayType(_, _), ArrayType(_, _)) => +TypeCheckResult.TypeCheckSuccess + case _ => +TypeCheckResult.TypeCheckFailure("The given two arguments should be an array") +} + } + + override def dataType: DataType = { +MapType( + keyType = left.dataType.asInstanceOf[ArrayType].elementType, + valueType = right.dataType.asInstanceOf[ArrayType].elementType, + valueContainsNull = right.dataType.asInstanceOf[ArrayType].containsNull) + } + + override def nullable: Boolean = left.nullable || right.nullable + + override def nullSafeEval(keyArray: Any, valueArray: Any): Any = { +val keyArrayData = keyArray.asInstanceOf[ArrayData] +val valueArrayData = valueArray.asInstanceOf[ArrayData] +if (keyArrayData.numElements != valueArrayData.numElements) { + throw new RuntimeException("The given two arrays should have the same length") +} +val leftArrayType = left.dataType.asInstanceOf[ArrayType] +if (leftArrayType.containsNull) { + if (keyArrayData.toArray(leftArrayType.elementType).contains(null)) { +throw new RuntimeException("Cannot use null as map key!") + } +} +new ArrayBasedMapData(keyArrayData.copy(), valueArrayData.copy()) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +nullSafeCodeGen(ctx, ev, (keyArrayData, valueArrayData) => { + val arrayBasedMapData = classOf[ArrayBasedMapData].getName + val leftArrayType = left.dataType.asInstanceOf[ArrayType] + val keyArrayElemNullCheck = if (!leftArrayType.containsNull) "" else { +val leftArrayTypeTerm = ctx.addReferenceObj("leftArrayType", leftArrayType.elementType) +val array = ctx.freshName("array") +val i = ctx.freshName("i") +s""" + |Object[] $array = $keyArrayData.toObjectArray($leftArrayTypeTerm); + |for (int $i = 0; $i < $array.length; $i++) { + | if ($array[$i] == null) { + |throw new RuntimeException("Cannot use null as map key!"); + | } + |} --- End diff -- good catch, thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21258: [SPARK-23933][SQL] Add map_from_arrays function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21258#discussion_r192548103 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -235,6 +235,86 @@ case class CreateMap(children: Seq[Expression]) extends Expression { override def prettyName: String = "map" } +/** + * Returns a catalyst Map containing the two arrays in children expressions as keys and values. + */ +@ExpressionDescription( + usage = """ +_FUNC_(keys, values) - Creates a map with a pair of the given key/value arrays. All elements + in keys should not be null""", + examples = """ +Examples: + > SELECT _FUNC_([1.0, 3.0], ['2', '4']); + {1.0:"2",3.0:"4"} + """, since = "2.4.0") +case class CreateMapFromArrays(left: Expression, right: Expression) --- End diff -- In existing convention, `"map" -> "CreateMap"`. How about `"map_from_arrays" -> ???`? I am neutral on `MapFromArrays` or `CreateMapFromArrays`. WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21258: [SPARK-23933][SQL] Add map_from_arrays function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21258#discussion_r192547906 --- Diff: python/pyspark/sql/functions.py --- @@ -1798,6 +1798,22 @@ def create_map(*cols): return Column(jc) +@ignore_unicode_prefix +@since(2.4) +def create_map_from_arrays(col1, col2): --- End diff -- Sure --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192547440 --- Diff: R/pkg/R/functions.R --- @@ -907,6 +907,30 @@ setMethod("initcap", column(jc) }) +#' @details +#' \code{isinf}: Returns true if the column is Infinity. +#' @rdname column_nonaggregate_functions +#' @aliases isnan isnan,Column-method +#' @note isinf since 2.4.0 +setMethod("isinf", + signature(x = "Column"), + function(x) { +jc <- callJStatic("org.apache.spark.sql.functions", "isinf", x@jc) +column(jc) + }) + +#' @details +#' \code{isInf}: Returns true if the column is Infinity. +#' @rdname column_nonaggregate_functions +#' @aliases isnan isnan,Column-method +#' @note isinf since 2.4.0 +setMethod("isInf", --- End diff -- R has `is.infinite`. Can we match the behaviour and rename it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192547376 --- Diff: R/pkg/R/functions.R --- @@ -907,6 +907,30 @@ setMethod("initcap", column(jc) }) +#' @details +#' \code{isinf}: Returns true if the column is Infinity. +#' @rdname column_nonaggregate_functions +#' @aliases isnan isnan,Column-method +#' @note isinf since 2.4.0 +setMethod("isinf", --- End diff -- R has `is.infinite`. Can we match the behaviour and rename it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192547364 --- Diff: python/pyspark/sql/functions.py --- @@ -468,6 +468,18 @@ def input_file_name(): return Column(sc._jvm.functions.input_file_name()) +@since(2.4) +def isinf(col): +"""An expression that returns true iff the column is NaN. --- End diff -- ditto. is this the same with `isnan`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192547355 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala --- @@ -557,6 +557,14 @@ class Column(val expr: Expression) extends Logging { (this >= lowerBound) && (this <= upperBound) } + /** + * True if the current expression is NaN. --- End diff -- ? is this the same with `isNaN`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192547274 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1107,6 +1107,14 @@ object functions { */ def input_file_name(): Column = withExpr { InputFileName() } + /** + * Return true iff the column is Infinity. + * + * @group normal_funcs + * @since 2.4.0 + */ + def isinf(e: Column): Column = withExpr { IsInf(e.expr) } --- End diff -- Mind if I ask to elaborate `isinf` vs `isInf` across the APIs? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21479 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21479 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91407/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21479 **[Test build #91407 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91407/testReport)** for PR 21479 at commit [`c9d2bc3`](https://github.com/apache/spark/commit/c9d2bc348495669bd4347679547f1437f35367f1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91406/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r192546226 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -2189,3 +2189,302 @@ case class ArrayRemove(left: Expression, right: Expression) override def prettyName: String = "array_remove" } + +object ArraySetLike { + private val MAX_ARRAY_LENGTH: Int = ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH + + def toArrayDataInt(hs: OpenHashSet[Int]): ArrayData = { +val array = new Array[Int](hs.size) +var pos = hs.nextPos(0) +var i = 0 +while (pos != OpenHashSet.INVALID_POS) { + array(i) = hs.getValue(pos) + pos = hs.nextPos(pos + 1) + i += 1 +} + +if (useGenericArrayData(LongType.defaultSize, array.length)) { + new GenericArrayData(array) +} else { + UnsafeArrayData.fromPrimitiveArray(array) +} + } + + def toArrayDataLong(hs: OpenHashSet[Long]): ArrayData = { +val array = new Array[Long](hs.size) +var pos = hs.nextPos(0) +var i = 0 +while (pos != OpenHashSet.INVALID_POS) { + array(i) = hs.getValue(pos) + pos = hs.nextPos(pos + 1) + i += 1 +} + +if (useGenericArrayData(LongType.defaultSize, array.length)) { + new GenericArrayData(array) +} else { + UnsafeArrayData.fromPrimitiveArray(array) +} + } + + def useGenericArrayData(elementSize: Int, length: Int): Boolean = { --- End diff -- Although I tried it, I stopped reusing. This is because `UnsafeArrayData.fromPrimitiveArray()` also uses variables (e.g. `headerInBytes` and `valueRegionInBytes`) calculated in this method. I think that there is no typical way to return multiple values from a function. Thus, we can move this to `UnsafeArrayData`. But, it is not easy to reuse it. WDYT? ``` private static UnsafeArrayData fromPrimitiveArray( Object arr, int offset, int length, int elementSize) { final long headerInBytes = calculateHeaderPortionInBytes(length); final long valueRegionInBytes = elementSize * length; final long totalSizeInLongs = (headerInBytes + valueRegionInBytes + 7) / 8; if (totalSizeInLongs > Integer.MAX_VALUE / 8) { throw new UnsupportedOperationException("Cannot convert this array to unsafe format as " + "it's too big."); } final long[] data = new long[(int)totalSizeInLongs]; Platform.putLong(data, Platform.LONG_ARRAY_OFFSET, length); Platform.copyMemory(arr, offset, data, Platform.LONG_ARRAY_OFFSET + headerInBytes, valueRegionInBytes); UnsafeArrayData result = new UnsafeArrayData(); result.pointTo(data, Platform.LONG_ARRAY_OFFSET, (int)totalSizeInLongs * 8); return result; } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20697 **[Test build #91406 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91406/testReport)** for PR 20697 at commit [`4c5677a`](https://github.com/apache/spark/commit/4c5677a61fd940b818d81469e6640cb45f00ce58). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21482 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21482 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91408/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21482 **[Test build #91408 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91408/testReport)** for PR 21482 at commit [`9ab0eb2`](https://github.com/apache/spark/commit/9ab0eb24295c20e564817d69b3b3315d9b2a3359). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21481: [SPARK-24452][SQL][Core] Avoid possible overflow in int ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21481 cc @ueshin @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21482 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21482 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91405/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21482 **[Test build #91405 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91405/testReport)** for PR 21482 at commit [`bcdaab2`](https://github.com/apache/spark/commit/bcdaab2f8c9c5afc877d3a54f658296aba78fdf0). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class IsInf(child: Expression) extends UnaryExpression` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21469 LGTM. To clarify the description, we expect the memory footprint to be much larger than query status reports in situations where the state store is getting a lot of updates? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21479 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21481: [SPARK-24452][SQL][Core] Avoid possible overflow in int ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21481 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21481: [SPARK-24452][SQL][Core] Avoid possible overflow in int ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21481 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91404/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21479 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3771/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21481: [SPARK-24452][SQL][Core] Avoid possible overflow in int ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21481 **[Test build #91404 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91404/testReport)** for PR 21481 at commit [`324fd5c`](https://github.com/apache/spark/commit/324fd5ccb73c8017f5537031db21b687ac1ca27a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21479 **[Test build #91409 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91409/testReport)** for PR 21479 at commit [`0ad3dd7`](https://github.com/apache/spark/commit/0ad3dd75bc1a74ca88c9ace8899fd2729aaa16b5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21346 **[Test build #4194 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4194/testReport)** for PR 21346 at commit [`83c3271`](https://github.com/apache/spark/commit/83c3271d2f45bbef18d865bddbc6807e9fbd2503). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user NihalHarish commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192541712 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala --- @@ -199,6 +199,50 @@ case class Nvl2(expr1: Expression, expr2: Expression, expr3: Expression, child: override def sql: String = s"$prettyName(${expr1.sql}, ${expr2.sql}, ${expr3.sql})" } +/** + * Evaluates to `true` iff it's Infinity. + */ +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns True evaluates to infinite else returns False ", + examples = """ +Examples: + > SELECT _FUNC_(1/0); + True + > SELECT _FUNC_(5); + False + """) +case class IsInf(child: Expression) extends UnaryExpression + with Predicate with ImplicitCastInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(DoubleType, FloatType)) + + override def nullable: Boolean = false + + override def eval(input: InternalRow): Boolean = { +val value = child.eval(input) +if (value == null) { + false +} else { + child.dataType match { +case DoubleType => value.asInstanceOf[Double].isInfinity +case FloatType => value.asInstanceOf[Float].isInfinity + } +} + } + + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val eval = child.genCode(ctx) +child.dataType match { + case DoubleType | FloatType => +ev.copy(code = code""" + ${eval.code} + ${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)}; + ${ev.value} = !${eval.isNull} && Double.isInfinite(${eval.value});""", --- End diff -- The non-codegen version uses the isInfinity method defined for scala's Double and Float, whereas the codegen version uses java's static method "isInfinite" defined for the classes Double and Float. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop ite...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21467#discussion_r192539912 --- Diff: python/pyspark/util.py --- @@ -53,16 +53,11 @@ def _get_argspec(f): """ Get argspec of a function. Supports both Python 2 and Python 3. """ - -if hasattr(f, '_argspec'): -# only used for pandas UDF: they wrap the user function, losing its signature -# workers need this signature, so UDF saves it here -argspec = f._argspec -elif sys.version_info[0] < 3: +# `getargspec` is deprecated since python3.0 (incompatible with function annotations). --- End diff -- yea, I think the comment is for the else block. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21467 @e-dorigatti I see. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21258: [SPARK-23933][SQL] Add map_from_arrays function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21258#discussion_r192536919 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ComplexTypeSuite.scala --- @@ -186,6 +186,50 @@ class ComplexTypeSuite extends SparkFunSuite with ExpressionEvalHelper { } } + test("CreateMapFromArrays") { --- End diff -- `MapFromArrays`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21258: [SPARK-23933][SQL] Add map_from_arrays function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21258#discussion_r192535551 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -235,6 +235,86 @@ case class CreateMap(children: Seq[Expression]) extends Expression { override def prettyName: String = "map" } +/** + * Returns a catalyst Map containing the two arrays in children expressions as keys and values. + */ +@ExpressionDescription( + usage = """ +_FUNC_(keys, values) - Creates a map with a pair of the given key/value arrays. All elements + in keys should not be null""", + examples = """ +Examples: + > SELECT _FUNC_([1.0, 3.0], ['2', '4']); + {1.0:"2",3.0:"4"} + """, since = "2.4.0") +case class CreateMapFromArrays(left: Expression, right: Expression) --- End diff -- `MapFromArrays`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21258: [SPARK-23933][SQL] Add map_from_arrays function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21258#discussion_r192535592 --- Diff: python/pyspark/sql/functions.py --- @@ -1798,6 +1798,22 @@ def create_map(*cols): return Column(jc) +@ignore_unicode_prefix +@since(2.4) +def create_map_from_arrays(col1, col2): --- End diff -- `map_from_arrays`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21258: [SPARK-23933][SQL] Add map_from_arrays function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21258#discussion_r192536842 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -235,6 +235,86 @@ case class CreateMap(children: Seq[Expression]) extends Expression { override def prettyName: String = "map" } +/** + * Returns a catalyst Map containing the two arrays in children expressions as keys and values. + */ +@ExpressionDescription( + usage = """ +_FUNC_(keys, values) - Creates a map with a pair of the given key/value arrays. All elements + in keys should not be null""", + examples = """ +Examples: + > SELECT _FUNC_([1.0, 3.0], ['2', '4']); + {1.0:"2",3.0:"4"} + """, since = "2.4.0") +case class CreateMapFromArrays(left: Expression, right: Expression) +extends BinaryExpression with ExpectsInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType, ArrayType) + + override def checkInputDataTypes(): TypeCheckResult = { +(left.dataType, right.dataType) match { + case (ArrayType(_, _), ArrayType(_, _)) => +TypeCheckResult.TypeCheckSuccess + case _ => +TypeCheckResult.TypeCheckFailure("The given two arguments should be an array") +} + } + + override def dataType: DataType = { +MapType( + keyType = left.dataType.asInstanceOf[ArrayType].elementType, + valueType = right.dataType.asInstanceOf[ArrayType].elementType, + valueContainsNull = right.dataType.asInstanceOf[ArrayType].containsNull) + } + + override def nullable: Boolean = left.nullable || right.nullable + + override def nullSafeEval(keyArray: Any, valueArray: Any): Any = { +val keyArrayData = keyArray.asInstanceOf[ArrayData] +val valueArrayData = valueArray.asInstanceOf[ArrayData] +if (keyArrayData.numElements != valueArrayData.numElements) { + throw new RuntimeException("The given two arrays should have the same length") +} +val leftArrayType = left.dataType.asInstanceOf[ArrayType] +if (leftArrayType.containsNull) { + if (keyArrayData.toArray(leftArrayType.elementType).contains(null)) { +throw new RuntimeException("Cannot use null as map key!") + } --- End diff -- We can use loop to null-check without converting to object array? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21258: [SPARK-23933][SQL] Add map_from_arrays function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21258#discussion_r192535941 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -235,6 +235,86 @@ case class CreateMap(children: Seq[Expression]) extends Expression { override def prettyName: String = "map" } +/** + * Returns a catalyst Map containing the two arrays in children expressions as keys and values. + */ +@ExpressionDescription( + usage = """ +_FUNC_(keys, values) - Creates a map with a pair of the given key/value arrays. All elements + in keys should not be null""", + examples = """ +Examples: + > SELECT _FUNC_([1.0, 3.0], ['2', '4']); + {1.0:"2",3.0:"4"} + """, since = "2.4.0") +case class CreateMapFromArrays(left: Expression, right: Expression) +extends BinaryExpression with ExpectsInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType, ArrayType) + + override def checkInputDataTypes(): TypeCheckResult = { +(left.dataType, right.dataType) match { + case (ArrayType(_, _), ArrayType(_, _)) => +TypeCheckResult.TypeCheckSuccess + case _ => +TypeCheckResult.TypeCheckFailure("The given two arguments should be an array") +} + } + + override def dataType: DataType = { +MapType( + keyType = left.dataType.asInstanceOf[ArrayType].elementType, + valueType = right.dataType.asInstanceOf[ArrayType].elementType, + valueContainsNull = right.dataType.asInstanceOf[ArrayType].containsNull) + } + + override def nullable: Boolean = left.nullable || right.nullable + + override def nullSafeEval(keyArray: Any, valueArray: Any): Any = { +val keyArrayData = keyArray.asInstanceOf[ArrayData] +val valueArrayData = valueArray.asInstanceOf[ArrayData] +if (keyArrayData.numElements != valueArrayData.numElements) { + throw new RuntimeException("The given two arrays should have the same length") +} +val leftArrayType = left.dataType.asInstanceOf[ArrayType] +if (leftArrayType.containsNull) { + if (keyArrayData.toArray(leftArrayType.elementType).contains(null)) { +throw new RuntimeException("Cannot use null as map key!") + } +} +new ArrayBasedMapData(keyArrayData.copy(), valueArrayData.copy()) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +nullSafeCodeGen(ctx, ev, (keyArrayData, valueArrayData) => { + val arrayBasedMapData = classOf[ArrayBasedMapData].getName + val leftArrayType = left.dataType.asInstanceOf[ArrayType] + val keyArrayElemNullCheck = if (!leftArrayType.containsNull) "" else { +val leftArrayTypeTerm = ctx.addReferenceObj("leftArrayType", leftArrayType.elementType) +val array = ctx.freshName("array") +val i = ctx.freshName("i") +s""" + |Object[] $array = $keyArrayData.toObjectArray($leftArrayTypeTerm); + |for (int $i = 0; $i < $array.length; $i++) { + | if ($array[$i] == null) { + |throw new RuntimeException("Cannot use null as map key!"); + | } + |} --- End diff -- We can null-check without converting to object array. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91401/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21061 **[Test build #91401 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91401/testReport)** for PR 21061 at commit [`adc68cc`](https://github.com/apache/spark/commit/adc68cc033dec8b26be23e861eb53b466f35ad38). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21479: [SPARK-23903][SQL] Add support for date extract
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21479#discussion_r192534591 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1206,6 +1206,41 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging new StringLocate(expression(ctx.substr), expression(ctx.str)) } + /** + * Create a Extract expression. + */ + override def visitExtract(ctx: ExtractContext): Expression = withOrigin(ctx) { +val extractType = ctx.field.getText.toUpperCase(Locale.ROOT) +try { + extractType match { +case "YEAR" => + Year(expression(ctx.source)) +case "QUARTER" => + Quarter(expression(ctx.source)) +case "MONTH" => + Month(expression(ctx.source)) +case "WEEK" => + WeekOfYear(expression(ctx.source)) +case "DAY" => + DayOfMonth(expression(ctx.source)) +case "DOW" => + DayOfWeek(expression(ctx.source)) +case "HOUR" => + Hour(expression(ctx.source)) +case "MINUTE" => + Minute(expression(ctx.source)) +case "SECOND" => + Second(expression(ctx.source)) +case other => + throw new ParseException(s"Literals of type '$other' are currently not supported.", ctx) + } +} catch { + case e: IllegalArgumentException => --- End diff -- Do we need this try-catch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21479: [SPARK-23903][SQL] Add support for date extract
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21479#discussion_r192534446 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -739,6 +740,7 @@ nonReserved | VIEW | REPLACE | IF | POSITION +| EXTRACT | YEAR | QUARTER | MONTH | WEEK | DAY | DOW | HOUR | MINUTE | SECOND --- End diff -- We can remove each term except for `EXTRACT`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21479: [SPARK-23903][SQL] Add support for date extract
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21479#discussion_r192534696 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1206,6 +1206,41 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging new StringLocate(expression(ctx.substr), expression(ctx.str)) } + /** + * Create a Extract expression. + */ + override def visitExtract(ctx: ExtractContext): Expression = withOrigin(ctx) { +val extractType = ctx.field.getText.toUpperCase(Locale.ROOT) +try { + extractType match { +case "YEAR" => + Year(expression(ctx.source)) +case "QUARTER" => + Quarter(expression(ctx.source)) +case "MONTH" => + Month(expression(ctx.source)) +case "WEEK" => + WeekOfYear(expression(ctx.source)) +case "DAY" => + DayOfMonth(expression(ctx.source)) +case "DOW" => --- End diff -- `"DAYOFWEEK"` ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21479 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21479 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3770/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21482 **[Test build #91408 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91408/testReport)** for PR 21482 at commit [`9ab0eb2`](https://github.com/apache/spark/commit/9ab0eb24295c20e564817d69b3b3315d9b2a3359). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21479 **[Test build #91407 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91407/testReport)** for PR 21479 at commit [`c9d2bc3`](https://github.com/apache/spark/commit/c9d2bc348495669bd4347679547f1437f35367f1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21282: [SPARK-23934][SQL] Adding map_from_entries functi...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21282#discussion_r192531374 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -118,6 +120,229 @@ case class MapValues(child: Expression) override def prettyName: String = "map_values" } +/** + * Returns a map created from the given array of entries. + */ +@ExpressionDescription( + usage = "_FUNC_(arrayOfEntries) - Returns a map created from the given array of entries.", + examples = """ +Examples: + > SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b'))); + {1:"a",2:"b"} + """, + since = "2.4.0") +case class MapFromEntries(child: Expression) extends UnaryExpression +{ + private lazy val resolvedDataType: Option[MapType] = child.dataType match { +case ArrayType( + StructType(Array( +StructField(_, keyType, false, _), +StructField(_, valueType, valueNullable, _))), + false) => Some(MapType(keyType, valueType, valueNullable)) +case _ => None + } + + override def dataType: MapType = resolvedDataType.get + + override def checkInputDataTypes(): TypeCheckResult = resolvedDataType match { +case Some(_) => TypeCheckResult.TypeCheckSuccess +case None => TypeCheckResult.TypeCheckFailure(s"'${child.sql}' is of " + + s"${child.dataType.simpleString} type. $prettyName accepts only null-free arrays " + + "of pair structs. Values of the first struct field can't contain nulls and produce " + + "duplicates.") + } + + override protected def nullSafeEval(input: Any): Any = { +val arrayData = input.asInstanceOf[ArrayData] +val length = arrayData.numElements() +val keyArray = new Array[AnyRef](length) +val keySet = new OpenHashSet[AnyRef]() +val valueArray = new Array[AnyRef](length) +var i = 0; +while (i < length) { + val entry = arrayData.getStruct(i, 2) + val key = entry.get(0, dataType.keyType) + if (key == null) { +throw new RuntimeException("The first field from a struct (key) can't be null.") + } + if (keySet.contains(key)) { --- End diff -- I'm sorry for the super delay. Let's just ignore the duplicated key like `CreateMap` for now. We will need to discuss map-related topics, such as duplicate keys, equality or ordering, etc. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21400: [SPARK-24351][SS]offsetLog/commitLog purge thresholdBatc...
Github user ivoson commented on the issue: https://github.com/apache/spark/pull/21400 @jose-torres @xuanyuanking @zsxwing Thanks for reviewing this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20697 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3631/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3769/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20697 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3631/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20697 **[Test build #91406 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91406/testReport)** for PR 20697 at commit [`4c5677a`](https://github.com/apache/spark/commit/4c5677a61fd940b818d81469e6640cb45f00ce58). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20894 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91399/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20894 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20894 **[Test build #91399 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91399/testReport)** for PR 20894 at commit [`3b37712`](https://github.com/apache/spark/commit/3b37712ded664aaf716306574f50072e58b9bbd1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192522027 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala --- @@ -199,6 +199,50 @@ case class Nvl2(expr1: Expression, expr2: Expression, expr3: Expression, child: override def sql: String = s"$prettyName(${expr1.sql}, ${expr2.sql}, ${expr3.sql})" } +/** + * Evaluates to `true` iff it's Infinity. + */ +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns True evaluates to infinite else returns False ", --- End diff -- "True evaluates" -> "True if expr evaluates" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192520713 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NullExpressionsSuite.scala --- @@ -56,6 +56,16 @@ class NullExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { assert(ex.contains("Null value appeared in non-nullable field")) } + test("IsInf") { +checkEvaluation(IsInf(Literal(Double.PositiveInfinity)), true) +checkEvaluation(IsInf(Literal(Double.NegativeInfinity)), true) +checkEvaluation(IsInf(Literal(Float.PositiveInfinity)), true) +checkEvaluation(IsInf(Literal(Float.NegativeInfinity)), true) +checkEvaluation(IsInf(Literal.create(null, DoubleType)), false) +checkEvaluation(IsInf(Literal(Float.MaxValue)), false) +checkEvaluation(IsInf(Literal(5.5f)), false) --- End diff -- check NaN as well? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192521881 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala --- @@ -199,6 +199,50 @@ case class Nvl2(expr1: Expression, expr2: Expression, expr3: Expression, child: override def sql: String = s"$prettyName(${expr1.sql}, ${expr2.sql}, ${expr3.sql})" } +/** + * Evaluates to `true` iff it's Infinity. + */ +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns True evaluates to infinite else returns False ", + examples = """ +Examples: + > SELECT _FUNC_(1/0); + True + > SELECT _FUNC_(5); + False + """) +case class IsInf(child: Expression) extends UnaryExpression + with Predicate with ImplicitCastInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(DoubleType, FloatType)) + + override def nullable: Boolean = false + + override def eval(input: InternalRow): Boolean = { +val value = child.eval(input) +if (value == null) { + false +} else { + child.dataType match { +case DoubleType => value.asInstanceOf[Double].isInfinity +case FloatType => value.asInstanceOf[Float].isInfinity + } +} + } + + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val eval = child.genCode(ctx) +child.dataType match { + case DoubleType | FloatType => +ev.copy(code = code""" + ${eval.code} + ${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)}; + ${ev.value} = !${eval.isNull} && Double.isInfinite(${eval.value});""", --- End diff -- out of interest, why use `Double.isInfinite` here, but `value.isInfinity` in the non-codegen version? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192520834 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1107,6 +1107,14 @@ object functions { */ def input_file_name(): Column = withExpr { InputFileName() } + /** + * Return true iff the column is Infinity. + * + * @group normal_funcs + * @since 1.6.0 --- End diff -- Need to fix these versions, here and elsewhere. This change would land in Spark 2.4.0. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192520566 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NullExpressionsSuite.scala --- @@ -24,7 +24,7 @@ import org.apache.spark.sql.catalyst.expressions.objects.AssertNotNull import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, Project} import org.apache.spark.sql.types._ -class NullExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { + class NullExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { --- End diff -- Revert this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21482 **[Test build #91405 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91405/testReport)** for PR 21482 at commit [`bcdaab2`](https://github.com/apache/spark/commit/bcdaab2f8c9c5afc877d3a54f658296aba78fdf0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r192520463 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -2189,3 +2189,302 @@ case class ArrayRemove(left: Expression, right: Expression) override def prettyName: String = "array_remove" } + +object ArraySetLike { + private val MAX_ARRAY_LENGTH: Int = ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH + + def toArrayDataInt(hs: OpenHashSet[Int]): ArrayData = { +val array = new Array[Int](hs.size) +var pos = hs.nextPos(0) +var i = 0 +while (pos != OpenHashSet.INVALID_POS) { + array(i) = hs.getValue(pos) + pos = hs.nextPos(pos + 1) + i += 1 +} + +if (useGenericArrayData(LongType.defaultSize, array.length)) { + new GenericArrayData(array) +} else { + UnsafeArrayData.fromPrimitiveArray(array) +} + } + + def toArrayDataLong(hs: OpenHashSet[Long]): ArrayData = { +val array = new Array[Long](hs.size) +var pos = hs.nextPos(0) +var i = 0 +while (pos != OpenHashSet.INVALID_POS) { + array(i) = hs.getValue(pos) + pos = hs.nextPos(pos + 1) + i += 1 +} + +if (useGenericArrayData(LongType.defaultSize, array.length)) { + new GenericArrayData(array) +} else { + UnsafeArrayData.fromPrimitiveArray(array) +} + } + + def useGenericArrayData(elementSize: Int, length: Int): Boolean = { --- End diff -- Shall we move this to `UnsafeArrayData` and reuse it? Maybe the name should be modified to fit the case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user squito commented on the issue: https://github.com/apache/spark/pull/21482 Jenkins, ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21482 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21482 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21482 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
GitHub user NihalHarish opened a pull request: https://github.com/apache/spark/pull/21482 [SPARK-24393][SQL] SQL builtin: isinf ## What changes were proposed in this pull request? Implemented isinf to test if a float or double value is Infinity. ## How was this patch tested? Unit tests have been added to sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NullExpressionsSuite.scala You can merge this pull request into a Git repository by running: $ git pull https://github.com/NihalHarish/spark SPARK-24393-SQL-builtin-isinf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21482.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21482 commit bcdaab2f8c9c5afc877d3a54f658296aba78fdf0 Author: Nihal Harish Date: 2018-06-01T21:23:24Z [SPARK-24393][SQL] SQL builtin: isinf --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21479: [SPARK-23903][SQL] Add support for date extract
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21479#discussion_r192516776 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -592,6 +592,7 @@ primaryExpression | identifier #columnReference | base=primaryExpression '.' fieldName=identifier #dereference | '(' expression ')' #parenthesizedExpression +| EXTRACT '(' field=(YEAR | QUARTER | MONTH | WEEK | DAY | HOUR | MINUTE | SECOND) FROM source=valueExpression ')' #extract --- End diff -- How about `EXTRACT '(' field=identifier FROM source=valueExpression ')'` instead of introducing each term? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org