[GitHub] spark issue #21948: [SPARK-24991][SQL] use InternalRow in DataSourceWriter
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21948 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94263/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21948: [SPARK-24991][SQL] use InternalRow in DataSourceWriter
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21948 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21948: [SPARK-24991][SQL] use InternalRow in DataSourceWriter
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21948 **[Test build #94263 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94263/testReport)** for PR 21948 at commit [`86817c7`](https://github.com/apache/spark/commit/86817c7ee36f1344e977bb5af14aeb56232c17d5). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class MemoryWriterCommitMessage(partition: Int, data: Seq[Row])` * `case class MemoryWriterFactory(outputMode: OutputMode, schema: StructType)` * `class MemoryDataWriter(partition: Int, outputMode: OutputMode, schema: StructType)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21102 **[Test build #94266 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94266/testReport)** for PR 21102 at commit [`33781b6`](https://github.com/apache/spark/commit/33781b640ed447d9a73a93b63e1834dd9360e72a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21102 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21102 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1833/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21991 **[Test build #94265 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94265/testReport)** for PR 21991 at commit [`fbd8cb4`](https://github.com/apache/spark/commit/fbd8cb4c25dc867d1c08eef83ac935bf3af816e3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21991 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1832/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21991 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21102#discussion_r207781744 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3965,6 +4034,248 @@ object ArrayUnion { } } +/** + * Returns an array of the elements in the intersect of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ + _FUNC_(array1, array2) - Returns an array of the elements in the intersection of array1 and +array2, without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(1, 3) + """, + since = "2.4.0") +case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike + with ComplexTypeMergingExpression { + override def dataType: DataType = { +dataTypeCheck +ArrayType(elementType, + left.dataType.asInstanceOf[ArrayType].containsNull && +right.dataType.asInstanceOf[ArrayType].containsNull) + } + + @transient lazy val evalIntersect: (ArrayData, ArrayData) => ArrayData = { +if (elementTypeSupportEquals) { + (array1, array2) => +if (array1.numElements() != 0 && array2.numElements() != 0) { + val hs = new OpenHashSet[Any] + val hsResult = new OpenHashSet[Any] + var foundNullElement = false + var i = 0 + while (i < array2.numElements()) { +if (array2.isNullAt(i)) { + foundNullElement = true +} else { + val elem = array2.get(i, elementType) + hs.add(elem) +} +i += 1 + } + val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any] + i = 0 + while (i < array1.numElements()) { +if (array1.isNullAt(i)) { + if (foundNullElement) { +arrayBuffer += null +foundNullElement = false + } +} else { + val elem = array1.get(i, elementType) + if (hs.contains(elem) && !hsResult.contains(elem)) { +arrayBuffer += elem +hsResult.add(elem) + } +} +i += 1 + } + new GenericArrayData(arrayBuffer) +} else { + new GenericArrayData(Seq.empty) --- End diff -- nit: `Array.empty` or `Array.emptyObjectArray`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `S...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21991#discussion_r207781137 --- Diff: dev/merge_spark_pr.py --- @@ -154,20 +154,22 @@ def merge_pr(pr_num, target_ref, title, body, pr_repo_desc): # to people every time someone creates a public fork of Spark. merge_message_flags += ["-m", body.replace("@", "")] -authors = "\n".join(["Author: %s" % a for a in distinct_authors]) - -merge_message_flags += ["-m", authors] +committer_name = run_cmd("git config --get user.name").strip() +committer_email = run_cmd("git config --get user.email").strip() if had_conflicts: -committer_name = run_cmd("git config --get user.name").strip() -committer_email = run_cmd("git config --get user.email").strip() message = "This patch had conflicts when merged, resolved by\nCommitter: %s <%s>" % ( committer_name, committer_email) merge_message_flags += ["-m", message] # The string "Closes #%s" string is required for GitHub to correctly close the PR merge_message_flags += ["-m", "Closes #%s from %s." % (pr_num, pr_repo_desc)] +authors = "\n".join(["Co-authored-by: %s" % a for a in distinct_authors]) --- End diff -- Seems that those keys are widely used in [ Linux community](https://git.wiki.kernel.org/index.php/CommitMessageConventions) for a long time, and Github adopted `Co-authored-by` to include the work of cu-authoers in the profile contributions graph and the repository's statistics. This has to be in the end of the message, and Github will just ignore those keys that are not recognized. If we want to keep both multiple lines of `Author:` and `Co-authored-by:`, the information will be duplicated twice. I would like to purpose to just keep `Co-authored-by:`. For single developer commit, we can just use `Authored-by:` for consistency. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21102 **[Test build #94264 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94264/testReport)** for PR 21102 at commit [`ce755e2`](https://github.com/apache/spark/commit/ce755e2b049ca000d6da754654b792e181e6d904). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21102 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1831/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21102 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21939 > i'm ready to pull the trigger on the update to arrow... i'd much prefer a pip dist, but would be ok w/a conda package. :) Thanks @shaneknapp ! So for those suggesting we keep the existing minimum pyarrow version of 0.8.0, does that mean we will need to add triple tests to support 0.9.0 and 0.10.0? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21939 @gatorsmile , there is a RC1 vote up now, so it should very soon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21939 @cloud-fan , we have BinaryType support in Java already, but it has not been added to Python due to an issue - the related jiras that @HyukjinKwon mentioned. So Arrow 0.10.0 has a bug fix that makes it possible to add it to Python. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21608 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21608 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94259/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21608 **[Test build #94259 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94259/testReport)** for PR 21608 at commit [`253af70`](https://github.com/apache/spark/commit/253af70cd4cd525db4951db67fce01a5ca1f0014). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21998: [SPARK-24940][SQL] Use IntegerLiteral in ResolveCoalesce...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21998 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21622 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94261/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21998: [SPARK-24940][SQL] Use IntegerLiteral in ResolveCoalesce...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21998 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94258/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21622 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21998: [SPARK-24940][SQL] Use IntegerLiteral in ResolveCoalesce...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21998 **[Test build #94258 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94258/testReport)** for PR 21998 at commit [`8f0c578`](https://github.com/apache/spark/commit/8f0c57804e75b2a74f7573a21f6c7c63f7b85e03). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21622 **[Test build #94261 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94261/testReport)** for PR 21622 at commit [`722e6a0`](https://github.com/apache/spark/commit/722e6a0f7506440f260126d841d0cb27cf744100). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94256/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21305 **[Test build #94256 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94256/testReport)** for PR 21305 at commit [`ae0d242`](https://github.com/apache/spark/commit/ae0d2424315634760d46be0f21e0e98160bada5a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21937: [WIP][SPARK-23914][SQL][follow-up] refactor ArrayUnion
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21937 I'll revisit after the conflict is fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21937: [WIP][SPARK-23914][SQL][follow-up] refactor Array...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21937#discussion_r207767260 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3698,230 +3767,162 @@ object ArraySetLike { """, since = "2.4.0") case class ArrayUnion(left: Expression, right: Expression) extends ArraySetLike -with ComplexTypeMergingExpression { - var hsInt: OpenHashSet[Int] = _ - var hsLong: OpenHashSet[Long] = _ - - def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { -val elem = array.getInt(idx) -if (!hsInt.contains(elem)) { - if (resultArray != null) { -resultArray.setInt(pos, elem) - } - hsInt.add(elem) - true -} else { - false -} - } - - def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { -val elem = array.getLong(idx) -if (!hsLong.contains(elem)) { - if (resultArray != null) { -resultArray.setLong(pos, elem) - } - hsLong.add(elem) - true -} else { - false -} - } + with ComplexTypeMergingExpression { - def evalIntLongPrimitiveType( - array1: ArrayData, - array2: ArrayData, - resultArray: ArrayData, - isLongType: Boolean): Int = { -// store elements into resultArray -var nullElementSize = 0 -var pos = 0 -Seq(array1, array2).foreach { array => - var i = 0 - while (i < array.numElements()) { -val size = if (!isLongType) hsInt.size else hsLong.size -if (size + nullElementSize > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) { - ArraySetLike.throwUnionLengthOverflowException(size) -} -if (array.isNullAt(i)) { - if (nullElementSize == 0) { -if (resultArray != null) { - resultArray.setNullAt(pos) + @transient lazy val evalUnion: (ArrayData, ArrayData) => ArrayData = { +if (elementTypeSupportEquals) { + (array1, array2) => +val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any] +val hs = new OpenHashSet[Any] +var foundNullElement = false +Seq(array1, array2).foreach { array => + var i = 0 + while (i < array.numElements()) { +if (array.isNullAt(i)) { + if (!foundNullElement) { +arrayBuffer += null +foundNullElement = true + } +} else { + val elem = array.get(i, elementType) + if (!hs.contains(elem)) { +if (arrayBuffer.size > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) { + ArraySetLike.throwUnionLengthOverflowException(arrayBuffer.size) +} +arrayBuffer += elem +hs.add(elem) + } } -pos += 1 -nullElementSize = 1 +i += 1 } -} else { - val assigned = if (!isLongType) { -assignInt(array, i, resultArray, pos) +} +new GenericArrayData(arrayBuffer) +} else { + (array1, array2) => +val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any] +var alreadyIncludeNull = false +Seq(array1, array2).foreach(_.foreach(elementType, (_, elem) => { + var found = false + if (elem == null) { +if (alreadyIncludeNull) { + found = true +} else { + alreadyIncludeNull = true +} } else { -assignLong(array, i, resultArray, pos) +// check elem is already stored in arrayBuffer or not? +var j = 0 +while (!found && j < arrayBuffer.size) { + val va = arrayBuffer(j) + if (va != null && ordering.equiv(va, elem)) { +found = true + } + j = j + 1 +} } - if (assigned) { -pos += 1 + if (!found) { +if (arrayBuffer.length > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) { + ArraySetLike.throwUnionLengthOverflowException(arrayBuffer.length) +} +arrayBuffer += elem } -} -i += 1 - } +})) +new GenericArrayData(arrayBuffer) } -pos } override def nullSafeEval(input1: Any, input2: Any):
[GitHub] spark pull request #21937: [WIP][SPARK-23914][SQL][follow-up] refactor Array...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21937#discussion_r207767113 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3698,230 +3767,162 @@ object ArraySetLike { """, since = "2.4.0") case class ArrayUnion(left: Expression, right: Expression) extends ArraySetLike -with ComplexTypeMergingExpression { - var hsInt: OpenHashSet[Int] = _ - var hsLong: OpenHashSet[Long] = _ - - def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { -val elem = array.getInt(idx) -if (!hsInt.contains(elem)) { - if (resultArray != null) { -resultArray.setInt(pos, elem) - } - hsInt.add(elem) - true -} else { - false -} - } - - def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { -val elem = array.getLong(idx) -if (!hsLong.contains(elem)) { - if (resultArray != null) { -resultArray.setLong(pos, elem) - } - hsLong.add(elem) - true -} else { - false -} - } + with ComplexTypeMergingExpression { - def evalIntLongPrimitiveType( - array1: ArrayData, - array2: ArrayData, - resultArray: ArrayData, - isLongType: Boolean): Int = { -// store elements into resultArray -var nullElementSize = 0 -var pos = 0 -Seq(array1, array2).foreach { array => - var i = 0 - while (i < array.numElements()) { -val size = if (!isLongType) hsInt.size else hsLong.size -if (size + nullElementSize > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) { - ArraySetLike.throwUnionLengthOverflowException(size) -} -if (array.isNullAt(i)) { - if (nullElementSize == 0) { -if (resultArray != null) { - resultArray.setNullAt(pos) + @transient lazy val evalUnion: (ArrayData, ArrayData) => ArrayData = { +if (elementTypeSupportEquals) { + (array1, array2) => +val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any] +val hs = new OpenHashSet[Any] +var foundNullElement = false +Seq(array1, array2).foreach { array => + var i = 0 + while (i < array.numElements()) { +if (array.isNullAt(i)) { + if (!foundNullElement) { +arrayBuffer += null +foundNullElement = true + } +} else { + val elem = array.get(i, elementType) + if (!hs.contains(elem)) { +if (arrayBuffer.size > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) { + ArraySetLike.throwUnionLengthOverflowException(arrayBuffer.size) +} +arrayBuffer += elem +hs.add(elem) + } } -pos += 1 -nullElementSize = 1 +i += 1 } -} else { - val assigned = if (!isLongType) { -assignInt(array, i, resultArray, pos) +} +new GenericArrayData(arrayBuffer) +} else { + (array1, array2) => +val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any] +var alreadyIncludeNull = false +Seq(array1, array2).foreach(_.foreach(elementType, (_, elem) => { + var found = false + if (elem == null) { +if (alreadyIncludeNull) { + found = true +} else { + alreadyIncludeNull = true +} } else { -assignLong(array, i, resultArray, pos) +// check elem is already stored in arrayBuffer or not? +var j = 0 +while (!found && j < arrayBuffer.size) { + val va = arrayBuffer(j) + if (va != null && ordering.equiv(va, elem)) { +found = true + } + j = j + 1 +} } - if (assigned) { -pos += 1 + if (!found) { +if (arrayBuffer.length > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) { + ArraySetLike.throwUnionLengthOverflowException(arrayBuffer.length) +} +arrayBuffer += elem } -} -i += 1 - } +})) +new GenericArrayData(arrayBuffer) } -pos } override def nullSafeEval(input1: Any, input2: Any):
[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21102#discussion_r207767226 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3965,6 +4034,242 @@ object ArrayUnion { } } +/** + * Returns an array of the elements in the intersect of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ + _FUNC_(array1, array2) - Returns an array of the elements in the intersection of array1 and +array2, without duplicates. + """, + examples = """ +Examples:Fun + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(1, 3) + """, + since = "2.4.0") +case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike + with ComplexTypeMergingExpression { + override def dataType: DataType = { +dataTypeCheck +ArrayType(elementType, + left.dataType.asInstanceOf[ArrayType].containsNull && +right.dataType.asInstanceOf[ArrayType].containsNull) + } + + @transient lazy val evalIntersect: (ArrayData, ArrayData) => ArrayData = { +if (elementTypeSupportEquals) { + (array1, array2) => +val hs = new OpenHashSet[Any] +val hsResult = new OpenHashSet[Any] +var foundNullElement = false +var i = 0 +while (i < array2.numElements()) { + if (array2.isNullAt(i)) { +foundNullElement = true + } else { +val elem = array2.get(i, elementType) +hs.add(elem) + } + i += 1 +} +val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any] +i = 0 +while (i < array1.numElements()) { + if (array1.isNullAt(i)) { +if (foundNullElement) { + arrayBuffer += null + foundNullElement = false +} + } else { +val elem = array1.get(i, elementType) +if (hs.contains(elem) && !hsResult.contains(elem)) { + arrayBuffer += elem + hsResult.add(elem) +} + } + i += 1 +} +new GenericArrayData(arrayBuffer) +} else { + (array1, array2) => +val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any] +var alreadySeenNull = false +var i = 0 +while (i < array1.numElements()) { + var found = false + val elem1 = array1.get(i, elementType) + if (array1.isNullAt(i)) { +if (!alreadySeenNull) { + var j = 0 + while (!found && j < array2.numElements()) { +found = array2.isNullAt(j) +j += 1 + } + // array2 is scanned only once for null element + alreadySeenNull = true +} + } else { +var j = 0 +while (!found && j < array2.numElements()) { + if (!array2.isNullAt(j)) { +val elem2 = array2.get(j, elementType) +if (ordering.equiv(elem1, elem2)) { + // check whether elem1 is already stored in arrayBuffer + var foundArrayBuffer = false + var k = 0 + while (!foundArrayBuffer && k < arrayBuffer.size) { +val va = arrayBuffer(k) +foundArrayBuffer = (va != null) && ordering.equiv(va, elem1) +k += 1 + } + found = !foundArrayBuffer +} + } + j += 1 +} + } + if (found) { +arrayBuffer += elem1 + } + i += 1 +} +new GenericArrayData(arrayBuffer) +} + } + + override def nullSafeEval(input1: Any, input2: Any): Any = { +val array1 = input1.asInstanceOf[ArrayData] +val array2 = input2.asInstanceOf[ArrayData] + +evalIntersect(array1, array2) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val arrayData = classOf[ArrayData].getName +val i = ctx.freshName("i") +val value = ctx.freshName("value") +val size = ctx.freshName("size") +if (canUseSpecializedHashSet) { + val jt = CodeGenerator.javaType(elementType) + val ptName = CodeGenerator.primitiveTypeName(jt) + + nullSafeCodeGen(ctx, ev, (array1, array2) => { +val foundNullElement = ctx.freshName("foundNullEleme
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94257/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21305 **[Test build #94257 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94257/testReport)** for PR 21305 at commit [`618a79d`](https://github.com/apache/spark/commit/618a79dfebf52710e3d86abbde35c65910b91a81). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22004: [WIP][SPARK-25029][TESTS] Scala 2.12 issues: TaskNotSeri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22004 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22004: [WIP][SPARK-25029][TESTS] Scala 2.12 issues: TaskNotSeri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22004 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94255/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22004: [WIP][SPARK-25029][TESTS] Scala 2.12 issues: TaskNotSeri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22004 **[Test build #94255 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94255/testReport)** for PR 22004 at commit [`626e7bd`](https://github.com/apache/spark/commit/626e7bd16769ee6dc42d7d04df10981e719c530d). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MyData(val i: Int) extends Serializable` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21102#discussion_r207766511 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3965,6 +4034,242 @@ object ArrayUnion { } } +/** + * Returns an array of the elements in the intersect of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ + _FUNC_(array1, array2) - Returns an array of the elements in the intersection of array1 and +array2, without duplicates. + """, + examples = """ +Examples:Fun + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(1, 3) + """, + since = "2.4.0") +case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike + with ComplexTypeMergingExpression { + override def dataType: DataType = { +dataTypeCheck +ArrayType(elementType, + left.dataType.asInstanceOf[ArrayType].containsNull && +right.dataType.asInstanceOf[ArrayType].containsNull) + } + + @transient lazy val evalIntersect: (ArrayData, ArrayData) => ArrayData = { +if (elementTypeSupportEquals) { + (array1, array2) => +val hs = new OpenHashSet[Any] +val hsResult = new OpenHashSet[Any] +var foundNullElement = false +var i = 0 +while (i < array2.numElements()) { + if (array2.isNullAt(i)) { +foundNullElement = true + } else { +val elem = array2.get(i, elementType) +hs.add(elem) + } + i += 1 +} +val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any] +i = 0 +while (i < array1.numElements()) { + if (array1.isNullAt(i)) { +if (foundNullElement) { + arrayBuffer += null + foundNullElement = false +} + } else { +val elem = array1.get(i, elementType) +if (hs.contains(elem) && !hsResult.contains(elem)) { + arrayBuffer += elem + hsResult.add(elem) +} + } + i += 1 +} +new GenericArrayData(arrayBuffer) +} else { + (array1, array2) => +val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any] +var alreadySeenNull = false +var i = 0 +while (i < array1.numElements()) { + var found = false + val elem1 = array1.get(i, elementType) + if (array1.isNullAt(i)) { +if (!alreadySeenNull) { + var j = 0 + while (!found && j < array2.numElements()) { +found = array2.isNullAt(j) +j += 1 + } + // array2 is scanned only once for null element + alreadySeenNull = true +} + } else { +var j = 0 +while (!found && j < array2.numElements()) { + if (!array2.isNullAt(j)) { +val elem2 = array2.get(j, elementType) +if (ordering.equiv(elem1, elem2)) { + // check whether elem1 is already stored in arrayBuffer + var foundArrayBuffer = false + var k = 0 + while (!foundArrayBuffer && k < arrayBuffer.size) { +val va = arrayBuffer(k) +foundArrayBuffer = (va != null) && ordering.equiv(va, elem1) +k += 1 + } + found = !foundArrayBuffer +} + } + j += 1 +} + } + if (found) { +arrayBuffer += elem1 + } + i += 1 +} +new GenericArrayData(arrayBuffer) +} + } + + override def nullSafeEval(input1: Any, input2: Any): Any = { +val array1 = input1.asInstanceOf[ArrayData] +val array2 = input2.asInstanceOf[ArrayData] + +evalIntersect(array1, array2) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val arrayData = classOf[ArrayData].getName +val i = ctx.freshName("i") +val value = ctx.freshName("value") +val size = ctx.freshName("size") +if (canUseSpecializedHashSet) { + val jt = CodeGenerator.javaType(elementType) + val ptName = CodeGenerator.primitiveTypeName(jt) + + nullSafeCodeGen(ctx, ev, (array1, array2) => { +val foundNullElement = ctx.freshName("foundNullEleme
[GitHub] spark issue #21948: [SPARK-24991][SQL] use InternalRow in DataSourceWriter
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21948 **[Test build #94263 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94263/testReport)** for PR 21948 at commit [`86817c7`](https://github.com/apache/spark/commit/86817c7ee36f1344e977bb5af14aeb56232c17d5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21102#discussion_r207765490 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3965,6 +4034,242 @@ object ArrayUnion { } } +/** + * Returns an array of the elements in the intersect of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ + _FUNC_(array1, array2) - Returns an array of the elements in the intersection of array1 and +array2, without duplicates. + """, + examples = """ +Examples:Fun + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(1, 3) + """, + since = "2.4.0") +case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike + with ComplexTypeMergingExpression { + override def dataType: DataType = { +dataTypeCheck +ArrayType(elementType, + left.dataType.asInstanceOf[ArrayType].containsNull && +right.dataType.asInstanceOf[ArrayType].containsNull) + } + + @transient lazy val evalIntersect: (ArrayData, ArrayData) => ArrayData = { +if (elementTypeSupportEquals) { + (array1, array2) => +val hs = new OpenHashSet[Any] --- End diff -- How about shortcutting to return an empty array when we find one of the two is empty? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21948: [SPARK-24991][SQL] use InternalRow in DataSourceWriter
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21948 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1830/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21948: [SPARK-24991][SQL] use InternalRow in DataSourceWriter
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21948 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21948: [SPARK-24991][SQL] use InternalRow in DataSourceWriter
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21948 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21948: [SPARK-24991][SQL] use InternalRow in DataSourceW...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21948#discussion_r207765500 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/MemorySinkV2Suite.scala --- @@ -44,16 +46,16 @@ class MemorySinkV2Suite extends StreamTest with BeforeAndAfter { val writer = new MemoryStreamWriter(sink, OutputMode.Append(), new StructType().add("i", "int")) writer.commit(0, Array( -MemoryWriterCommitMessage(0, Seq(InternalRow(1), InternalRow(2))), -MemoryWriterCommitMessage(1, Seq(InternalRow(3), InternalRow(4))), -MemoryWriterCommitMessage(2, Seq(InternalRow(6), InternalRow(7))) +MemoryWriterCommitMessage(0, Seq(Row(1), Row(2))), --- End diff -- because the memory sink needs `Row`s at the end. Instead of collecting `InternalRow`s via copy and then convert to `Row`s, I think it's more efficient to collect `Row`s directly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22003: [SPARK-25019][BUILD] Fix orc dependency to use the same ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22003 Hi, @yhuai . Could you review this PR? The following is the new pom.file for `spark-sql_2.11`. ``` org.apache.orc orc-core 1.5.2 nohive compile hadoop-common org.apache.hadoop hadoop-hdfs org.apache.hadoop hive-storage-api org.apache.hive org.apache.orc orc-mapreduce 1.5.2 nohive compile hadoop-common org.apache.hadoop hadoop-mapreduce-client-core org.apache.hadoop orc-core org.apache.orc hive-storage-api org.apache.hive guava com.google.guava ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21996: [SPARK-24888][CORE] spark-submit --master spark:/...
Github user devaraj-kavali commented on a diff in the pull request: https://github.com/apache/spark/pull/21996#discussion_r207763630 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -98,17 +98,24 @@ private[spark] class SparkSubmit extends Logging { * Kill an existing submission using the REST protocol. Standalone and Mesos cluster mode only. */ private def kill(args: SparkSubmitArguments): Unit = { -new RestSubmissionClient(args.master) - .killSubmission(args.submissionToKill) +createRestSubmissionClient(args).killSubmission(args.submissionToKill) } /** * Request the status of an existing submission using the REST protocol. * Standalone and Mesos cluster mode only. */ private def requestStatus(args: SparkSubmitArguments): Unit = { -new RestSubmissionClient(args.master) - .requestSubmissionStatus(args.submissionToRequestStatusFor) + createRestSubmissionClient(args).requestSubmissionStatus(args.submissionToRequestStatusFor) + } + + /** + * Creates RestSubmissionClient with overridden logInfo() + */ + private def createRestSubmissionClient(args: SparkSubmitArguments): RestSubmissionClient = { +new RestSubmissionClient(args.master) { + override protected def logInfo(msg: => String): Unit = printMessage(msg) --- End diff -- I agree, user can change this log level. If the users configure the log level as WARN or above(WARN is the default log level) then they can't see any update/status from status and kill commands. I think we cannot expect the user to configure the log level to INFO if they want to run status and kill commands with status/update. Please let me know if you have any thoughts to fix this better, I can make the changes. Thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22004: [WIP][SPARK-25029][TESTS] Scala 2.12 issues: TaskNotSeri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22004 **[Test build #94262 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94262/testReport)** for PR 22004 at commit [`422c4ab`](https://github.com/apache/spark/commit/422c4ab259b5e27ef12c2d5093a4ae93f2b7f522). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22004: [WIP][SPARK-25029][TESTS] Scala 2.12 issues: TaskNotSeri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22004 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1829/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22004: [WIP][SPARK-25029][TESTS] Scala 2.12 issues: TaskNotSeri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22004 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22003: [SPARK-25019][BUILD] Fix orc dependency to use the same ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22003 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94254/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22003: [SPARK-25019][BUILD] Fix orc dependency to use the same ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22003 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22003: [SPARK-25019][BUILD] Fix orc dependency to use the same ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22003 **[Test build #94254 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94254/testReport)** for PR 22003 at commit [`a801498`](https://github.com/apache/spark/commit/a801498f249d7526b64fcb9fe8144325ebb3d4e4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22004: [WIP][SPARK-25029][TESTS] Scala 2.12 issues: Task...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22004#discussion_r207763075 --- Diff: repl/src/test/scala/org/apache/spark/repl/ReplSuite.scala --- @@ -84,6 +85,7 @@ class ReplSuite extends SparkFunSuite { settings = new scala.tools.nsc.Settings settings.usejavacp.value = true org.apache.spark.repl.Main.interp = this + in = SimpleReader() --- End diff -- This was giving an NPE in 2.12. I think this little hack fixes the issue for purposes of this test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17185: [SPARK-19602][SQL] Support column resolution of f...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17185#discussion_r207759811 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala --- @@ -169,25 +181,50 @@ package object expressions { }) } - // Find matches for the given name assuming that the 1st part is a qualifier (i.e. table name, - // alias, or subquery alias) and the 2nd part is the actual name. This returns a tuple of + // Find matches for the given name assuming that the 1st two parts are qualifier + // (i.e. database name and table name) and the 3rd part is the actual column name. + // + // For example, consider an example where "db1" is the database name, "a" is the table name + // and "b" is the column name and "c" is the struct field name. + // If the name parts is db1.a.b.c, then Attribute will match --- End diff -- What I'm talking about is ambiguity. `col.innerField1.innerField2` can fail if the `innerField2` doesn't exist. My question is: shall we try all the possible resolution paths and pick the valid one, or define a rule that we can decide the resolution path ahead. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21984: [SPARK-24772][SQL] Avro: support logical date type
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21984 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21984: [SPARK-24772][SQL] Avro: support logical date type
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21984 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94260/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21984: [SPARK-24772][SQL] Avro: support logical date type
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21984 **[Test build #94260 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94260/testReport)** for PR 21984 at commit [`c6b997b`](https://github.com/apache/spark/commit/c6b997b673790870b445f0c1e16941fa244d5dea). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21622 **[Test build #94261 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94261/testReport)** for PR 21622 at commit [`722e6a0`](https://github.com/apache/spark/commit/722e6a0f7506440f260126d841d0cb27cf744100). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21956: [MINOR][DOCS] Fix grammatical error in SortShuffleManage...
Github user deshanxiao commented on the issue: https://github.com/apache/spark/pull/21956 It's ok. Thank you for your suggestions! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21102#discussion_r207758427 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -1647,6 +1647,60 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { assert(result10.first.schema(0).dataType === expectedType10) } + test("array_intersect functions") { +val df1 = Seq((Array(1, 2, 4), Array(4, 2))).toDF("a", "b") +val ans1 = Row(Seq(2, 4)) +checkAnswer(df1.select(array_intersect($"a", $"b")), ans1) +checkAnswer(df1.selectExpr("array_intersect(a, b)"), ans1) + +val df2 = Seq((Array[Integer](1, 2, null, 4, 5), Array[Integer](-5, 4, null, 2, -1))) + .toDF("a", "b") +val ans2 = Row(Seq(2, null, 4)) +checkAnswer(df2.select(array_intersect($"a", $"b")), ans2) +checkAnswer(df2.selectExpr("array_intersect(a, b)"), ans2) + +val df3 = Seq((Array(1L, 2L, 4L), Array(4L, 2L))).toDF("a", "b") +val ans3 = Row(Seq(2L, 4L)) +checkAnswer(df3.select(array_intersect($"a", $"b")), ans3) +checkAnswer(df3.selectExpr("array_intersect(a, b)"), ans3) + +val df4 = Seq( + (Array[java.lang.Long](1L, 2L, null, 4L, 5L), Array[java.lang.Long](-5L, 4L, null, 2L, -1L))) + .toDF("a", "b") +val ans4 = Row(Seq(2L, null, 4L)) +checkAnswer(df4.select(array_intersect($"a", $"b")), ans4) +checkAnswer(df4.selectExpr("array_intersect(a, b)"), ans4) + +val df5 = Seq((Array("c", null, "a", "f"), Array("b", "a", null, "g"))).toDF("a", "b") +val ans5 = Row(Seq(null, "a")) +checkAnswer(df5.select(array_intersect($"a", $"b")), ans5) +checkAnswer(df5.selectExpr("array_intersect(a, b)"), ans5) + +val df6 = Seq((null, null)).toDF("a", "b") +intercept[AnalysisException] { --- End diff -- Could you also check the error message? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21622 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21984: [SPARK-24772][SQL] Avro: support logical date type
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21984 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1828/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21984: [SPARK-24772][SQL] Avro: support logical date type
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21984 **[Test build #94260 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94260/testReport)** for PR 21984 at commit [`c6b997b`](https://github.com/apache/spark/commit/c6b997b673790870b445f0c1e16941fa244d5dea). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21984: [SPARK-24772][SQL] Avro: support logical date type
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21984 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21898#discussion_r207758158 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -566,6 +579,9 @@ private[spark] class TaskSchedulerImpl( if (taskResultGetter != null) { taskResultGetter.stop() } +if (barrierCoordinator != null) { --- End diff -- maybe we should not use `lazy val`, but use `var` and control the initialization ourselves. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21898#discussion_r207758062 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1930,6 +1930,12 @@ class SparkContext(config: SparkConf) extends Logging { Utils.tryLogNonFatalError { _executorAllocationManager.foreach(_.stop()) } +if (_dagScheduler != null) { --- End diff -- why this change? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21898#discussion_r207758063 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala --- @@ -80,7 +101,45 @@ class BarrierTaskContext( @Experimental @Since("2.4.0") def barrier(): Unit = { -// TODO SPARK-24817 implement global barrier. +val callSite = Utils.getCallSite() +logInfo(s"Task $taskAttemptId from Stage $stageId(Attempt $stageAttemptNumber) has entered " + + s"the global sync, current barrier epoch is $barrierEpoch.") +logTrace(s"Current callSite: $callSite") --- End diff -- or simpler: `logTrace("Current callSite: " + Utils.getCallSite())` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21898#discussion_r207758062 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1930,6 +1930,12 @@ class SparkContext(config: SparkConf) extends Logging { Utils.tryLogNonFatalError { _executorAllocationManager.foreach(_.stop()) } +if (_dagScheduler != null) { --- End diff -- why this change? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21898#discussion_r207758063 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala --- @@ -80,7 +101,45 @@ class BarrierTaskContext( @Experimental @Since("2.4.0") def barrier(): Unit = { -// TODO SPARK-24817 implement global barrier. +val callSite = Utils.getCallSite() +logInfo(s"Task $taskAttemptId from Stage $stageId(Attempt $stageAttemptNumber) has entered " + + s"the global sync, current barrier epoch is $barrierEpoch.") +logTrace(s"Current callSite: $callSite") --- End diff -- or simpler: `logTrace("Current callSite: " + Utils.getCallSite())` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21898#discussion_r207758008 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala --- @@ -80,7 +101,45 @@ class BarrierTaskContext( @Experimental @Since("2.4.0") def barrier(): Unit = { -// TODO SPARK-24817 implement global barrier. +val callSite = Utils.getCallSite() +logInfo(s"Task $taskAttemptId from Stage $stageId(Attempt $stageAttemptNumber) has entered " + + s"the global sync, current barrier epoch is $barrierEpoch.") +logTrace(s"Current callSite: $callSite") --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21898#discussion_r207757953 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala --- @@ -39,6 +44,22 @@ class BarrierTaskContext( extends TaskContextImpl(stageId, stageAttemptNumber, partitionId, taskAttemptId, attemptNumber, taskMemoryManager, localProperties, metricsSystem, taskMetrics) { + // Find the driver side RPCEndpointRef of the coordinator that handles all the barrier() calls. + private val barrierCoordinator: RpcEndpointRef = { +val env = SparkEnv.get +RpcUtils.makeDriverRef("barrierSync", env.conf, env.rpcEnv) + } + + private val timer = new Timer("Barrier task timer for barrier() calls.") + + // Local barrierEpoch that identify a barrier() call from current task, it shall be identical + // with the driver side epoch. + private var barrierEpoch = 0 + + // Number of tasks of the current barrier stage, a barrier() call must collect enough requests + // from different tasks within the same barrier stage attempt to succeed. + private lazy val numTasks = getTaskInfos().size --- End diff -- this can be a `def`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 Jenkins build successful. Any PR comments/blockers to merge for phase 1? cc @HyukjinKwon, @gatorsmile, @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21608 **[Test build #94259 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94259/testReport)** for PR 21608 at commit [`253af70`](https://github.com/apache/spark/commit/253af70cd4cd525db4951db67fce01a5ca1f0014). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21898 I see. got it, thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21608 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94252/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21998: [SPARK-24940][SQL] Use IntegerLiteral in ResolveCoalesce...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21998 **[Test build #94258 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94258/testReport)** for PR 21998 at commit [`8f0c578`](https://github.com/apache/spark/commit/8f0c57804e75b2a74f7573a21f6c7c63f7b85e03). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94252 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94252/testReport)** for PR 21889 at commit [`8d7f4bc`](https://github.com/apache/spark/commit/8d7f4bc1874f8ae3c2cda8e5aa96a8647a56128d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21998: [SPARK-24940][SQL] Use IntegerLiteral in ResolveCoalesce...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21998 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/21898 > why 94247 is successful while 94241 was failed with the same set of test suites since they are tested using the same source revision. They are not - I made the variable `timer` lazy for 94247. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21898 Looks good to finish without failure. I am curious why 94247 is successful while 94241 was failed with the same set of test suites since they are tested using the same source revision. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21977 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94251/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21977 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21977 **[Test build #94251 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94251/testReport)** for PR 21977 at commit [`a5a788b`](https://github.com/apache/spark/commit/a5a788b3dbb3c6285f903fda740fe3f69ab275a9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22004: [WIP][SPARK-25029][TESTS] Scala 2.12 issues: Task...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/22004#discussion_r207754816 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -2369,39 +2369,12 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi assert(scheduler.getShuffleDependencies(rddE) === Set(shuffleDepA, shuffleDepC)) } - test("SPARK-17644: After one stage is aborted for too many failed attempts, subsequent stages" + + test("SPARK-17644: After one stage is aborted for too many failed attempts, subsequent stages " + "still behave correctly on fetch failures") { -// Runs a job that always encounters a fetch failure, so should eventually be aborted --- End diff -- Yeah, I was also wondering if I had to implement this as well, but I feel people need to move to 2.12 with a different mindset as things have changed. Not sure if it is possible as well, so asked @LRytz in jira. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22002: [FOLLOW-UP][SPARK-23772][SQL] Provide an option to ignor...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22002 LGTM cc: @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22002: [FOLLOW-UP][SPARK-23772][SQL] Provide an option t...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22002#discussion_r207754359 --- Diff: python/pyspark/sql/readwriter.py --- @@ -267,7 +267,7 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None, mode=mode, columnNameOfCorruptRecord=columnNameOfCorruptRecord, dateFormat=dateFormat, timestampFormat=timestampFormat, multiLine=multiLine, allowUnquotedControlChars=allowUnquotedControlChars, lineSep=lineSep, -samplingRatio=samplingRatio, encoding=encoding) +samplingRatio=samplingRatio, dropFieldIfAllNull=dropFieldIfAllNull, encoding=encoding) --- End diff -- oh... good catch. thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22004: [WIP][SPARK-25029][TESTS] Scala 2.12 issues: Task...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22004#discussion_r207753917 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -2369,39 +2369,12 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi assert(scheduler.getShuffleDependencies(rddE) === Set(shuffleDepA, shuffleDepC)) } - test("SPARK-17644: After one stage is aborted for too many failed attempts, subsequent stages" + + test("SPARK-17644: After one stage is aborted for too many failed attempts, subsequent stages " + "still behave correctly on fetch failures") { -// Runs a job that always encounters a fetch failure, so should eventually be aborted --- End diff -- Yeah you've said it more correctly, it's really the body of the lambda capturing references, and it just happens to be to an enclosing class of the test method in this case. This is the kind of the I wonder if we *do* have to do, like in the old closure cleaner. Because LMF closures capture far less to begin with, it's much less of an issue. I also remember that closure cleaning such things got dicey because it's not clear when it's just OK to null some object's field. It may not be possible to do reasonably. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22004: [WIP][SPARK-25029][TESTS] Scala 2.12 issues: Task...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/22004#discussion_r207753682 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -2369,39 +2369,12 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi assert(scheduler.getShuffleDependencies(rddE) === Set(shuffleDepA, shuffleDepC)) } - test("SPARK-17644: After one stage is aborted for too many failed attempts, subsequent stages" + + test("SPARK-17644: After one stage is aborted for too many failed attempts, subsequent stages " + "still behave correctly on fetch failures") { -// Runs a job that always encounters a fetch failure, so should eventually be aborted --- End diff -- If you move ``` object FailThisAttempt { val _fail = new AtomicBoolean(true) } ``` outside tests as a top object, tests pass, no need to move the functions to the companion object. Btw the closure cleaner does not look into the body of the lambda to check if references of other objects create an issue. This is done only for the old closures. According to document we only checked for the return statements. Also Lambdas dont have outers by definition. Regarding the LegacyAccumulatorWrapper there is no closure --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > Alright to make sure we're all on the same page, it sounds like we're ready to merge this PR pending: > > * Successful build by Jenkins > * Any PR comments from a maintainer > > This feature will be merged in disabled state and can't be enabled until the next PR is merged, but we do not expect any regression in behavior in the default disabled state. I agree. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22004: [WIP][SPARK-25029][TESTS] Scala 2.12 issues: Task...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22004#discussion_r207752883 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -2369,39 +2369,12 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi assert(scheduler.getShuffleDependencies(rddE) === Set(shuffleDepA, shuffleDepC)) } - test("SPARK-17644: After one stage is aborted for too many failed attempts, subsequent stages" + + test("SPARK-17644: After one stage is aborted for too many failed attempts, subsequent stages " + "still behave correctly on fetch failures") { -// Runs a job that always encounters a fetch failure, so should eventually be aborted --- End diff -- Just that the task (the rdd.map call's argument) isn't serializable for the same reason that the LegacyAccumulatorWrapper failed -- captures the test class, which has an unserializable AssertionsHelper field in a scalatest superclass. The problem here is capturing the enclosing test class to begin with as it's not relevant. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1827/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22004: [WIP][SPARK-25029][TESTS] Scala 2.12 issues: Task...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/22004#discussion_r207752807 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -2369,39 +2369,12 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi assert(scheduler.getShuffleDependencies(rddE) === Set(shuffleDepA, shuffleDepC)) } - test("SPARK-17644: After one stage is aborted for too many failed attempts, subsequent stages" + + test("SPARK-17644: After one stage is aborted for too many failed attempts, subsequent stages " + "still behave correctly on fetch failures") { -// Runs a job that always encounters a fetch failure, so should eventually be aborted --- End diff -- Ok I will have a look. Do you have the output of the failure? Scala test does not report much. Btw in these tests what I noticed is that only the last one failed (failAfter "A job with one fetch failure should eventually succeed"), so not sure if it is the closure or anything else (need to debug it). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21305 **[Test build #94257 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94257/testReport)** for PR 21305 at commit [`618a79d`](https://github.com/apache/spark/commit/618a79dfebf52710e3d86abbde35c65910b91a81). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1826/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org