[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r196662595 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -2355,3 +2355,297 @@ case class ArrayRemove(left: Expression, right: Expression) override def prettyName: String = "array_remove" } + +object ArraySetLike { + def useGenericArrayData(elementSize: Int, length: Int): Boolean = { +// Use the same calculation in UnsafeArrayData.fromPrimitiveArray() +val headerInBytes = UnsafeArrayData.calculateHeaderPortionInBytes(length) +val valueRegionInBytes = elementSize.toLong * length +val totalSizeInLongs = (headerInBytes + valueRegionInBytes + 7) / 8 +totalSizeInLongs > Integer.MAX_VALUE / 8 + } + + def throwUnionLengthOverflowException(length: Int): Unit = { +throw new RuntimeException(s"Unsuccessful try to union arrays with $length " + + s"elements due to exceeding the array size limit " + + s"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.") + } +} + + +abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast { + override def dataType: DataType = left.dataType + + override def checkInputDataTypes(): TypeCheckResult = { +val typeCheckResult = super.checkInputDataTypes() +if (typeCheckResult.isSuccess) { + TypeUtils.checkForOrderingExpr(dataType.asInstanceOf[ArrayType].elementType, +s"function $prettyName") +} else { + typeCheckResult +} + } + + protected def cn = left.dataType.asInstanceOf[ArrayType].containsNull || +right.dataType.asInstanceOf[ArrayType].containsNull + + @transient protected lazy val ordering: Ordering[Any] = +TypeUtils.getInterpretedOrdering(elementType) + + @transient protected lazy val elementTypeSupportEquals = elementType match { +case BinaryType => false +case _: AtomicType => true +case _ => false + } +} + +/** + * Returns an array of the elements in the union of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ +_FUNC_(array1, array2) - Returns an array of the elements in the union of array1 and array2, + without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(1, 2, 3, 5) + """, + since = "2.4.0") +case class ArrayUnion(left: Expression, right: Expression) extends ArraySetLike { + + override def nullSafeEval(input1: Any, input2: Any): Any = { +val array1 = input1.asInstanceOf[ArrayData] +val array2 = input2.asInstanceOf[ArrayData] + +if (elementTypeSupportEquals && !cn) { --- End diff -- Why do we need to check `!cn`? Can't we avoid boxing if the arrays contain null? How about using `foundNullElement` as `array_distinct` is doing at #21050? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21548: [SPARK-24518][CORE] Using Hadoop credential provider API...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21548 **[Test build #92120 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92120/testReport)** for PR 21548 at commit [`c7ef15e`](https://github.com/apache/spark/commit/c7ef15e47e97e675a63444d0b66b8b8808cccf90). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21548: [SPARK-24518][CORE] Using Hadoop credential provider API...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21548 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4242/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21548: [SPARK-24518][CORE] Using Hadoop credential provider API...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21548 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21548: [SPARK-24518][CORE] Using Hadoop credential provider API...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21548 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/346/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21548: [SPARK-24518][CORE] Using Hadoop credential provider API...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21548 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21590#discussion_r196658307 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala --- @@ -65,13 +65,38 @@ class JDBCOptions( // Required parameters // require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is required.") - require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option '$JDBC_TABLE_NAME' is required.") + // a JDBC URL val url = parameters(JDBC_URL) - // name of table - val table = parameters(JDBC_TABLE_NAME) + val tableName = parameters.get(JDBC_TABLE_NAME) --- End diff -- Personally I prefer: ``` val tableExpression = if (parameters.isDefinedAt(JDBC_TABLE_NAME)) { require(!parameters.isDefinedAt(JDBC_QUERY_STRING), "...") parameters.get(JDBC_TABLE_NAME).get.trim } else { require(parameters.isDefinedAt(JDBC_QUERY_STRING), "...") s"(${parameters.get(JDBC_QUERY_STRING)}) ${curId.getAndIncrement()}" } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21590#discussion_r196657947 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala --- @@ -65,13 +65,38 @@ class JDBCOptions( // Required parameters // require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is required.") - require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option '$JDBC_TABLE_NAME' is required.") + // a JDBC URL val url = parameters(JDBC_URL) - // name of table - val table = parameters(JDBC_TABLE_NAME) + val tableName = parameters.get(JDBC_TABLE_NAME) + val query = parameters.get(JDBC_QUERY_STRING) + // Following two conditions make sure that : + // 1. One of the option (dbtable or query) must be specified. + // 2. Both of them can not be specified at the same time as they are conflicting in nature. + require( +tableName.isDefined || query.isDefined, +s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required." + ) + + require( +!(tableName.isDefined && query.isDefined), +s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be specified." + ) + + // table name or a table expression. + val tableExpression = tableName.map(_.trim).getOrElse { +// We have ensured in the code above that either dbtable or query is specified. +query.get match { + case subq if subq.nonEmpty => s"(${subq}) spark_gen_${curId.getAndIncrement()}" + case subq => subq +} + } + + require(tableExpression.nonEmpty, --- End diff -- The error check and error message here are confusing. It seems telling user that the two options can be both specified. Maybe we should just check the defined one and improve the error message. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92113/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21061 **[Test build #92113 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92113/testReport)** for PR 21061 at commit [`f64e529`](https://github.com/apache/spark/commit/f64e5292ccc9c709ea56614bf70b1fbb83099625). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21050: [SPARK-23912][SQL]add array_distinct
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21050#discussion_r196654369 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -2355,3 +2356,281 @@ case class ArrayRemove(left: Expression, right: Expression) override def prettyName: String = "array_remove" } + +/** + * Removes duplicate values from the array. + */ +@ExpressionDescription( + usage = "_FUNC_(array) - Removes duplicate values from the array.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3, null, 3)); + [1,2,3,null] + """, since = "2.4.0") +case class ArrayDistinct(child: Expression) + extends UnaryExpression with ExpectsInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType) + + override def dataType: DataType = child.dataType + + @transient lazy val elementType: DataType = dataType.asInstanceOf[ArrayType].elementType + + @transient private lazy val ordering: Ordering[Any] = +TypeUtils.getInterpretedOrdering(elementType) + + override def checkInputDataTypes(): TypeCheckResult = { +super.checkInputDataTypes() match { + case f: TypeCheckResult.TypeCheckFailure => f + case TypeCheckResult.TypeCheckSuccess => +TypeUtils.checkForOrderingExpr(elementType, s"function $prettyName") +} + } + + @transient private lazy val elementTypeSupportEquals = elementType match { +case BinaryType => false +case _: AtomicType => true +case _ => false + } + + override def nullSafeEval(array: Any): Any = { +val data = array.asInstanceOf[ArrayData].toArray[AnyRef](elementType) +if (elementTypeSupportEquals) { + new GenericArrayData(data.distinct.asInstanceOf[Array[Any]]) +} else { + var foundNullElement = false + var pos = 0 + for(i <- 0 until data.length) { +if (data(i) == null) { + if (!foundNullElement) { +foundNullElement = true +pos = pos + 1 + } +} else { + var j = 0 + var done = false + while (j <= i && !done) { +if (data(j) != null && ordering.equiv(data(j), data(i))) { + done = true +} +j = j + 1 + } + if (i == j-1) { +pos = pos + 1 + } +} + } + new GenericArrayData(data.slice(0, pos)) +} + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +nullSafeCodeGen(ctx, ev, (array) => { + val i = ctx.freshName("i") + val j = ctx.freshName("j") + val sizeOfDistinctArray = ctx.freshName("sizeOfDistinctArray") + val getValue1 = CodeGenerator.getValue(array, elementType, i) + val getValue2 = CodeGenerator.getValue(array, elementType, j) + val foundNullElement = ctx.freshName("foundNullElement") + val openHashSet = classOf[OpenHashSet[_]].getName + val hs = ctx.freshName("hs") + val classTag = s"scala.reflect.ClassTag$$.MODULE$$.Object()" + if(elementTypeSupportEquals) { +s""" + |int $sizeOfDistinctArray = 0; + |boolean $foundNullElement = false; + |$openHashSet $hs = new $openHashSet($classTag); + |for (int $i = 0; $i < $array.numElements(); $i++) { + | if ($array.isNullAt($i)) { + |$foundNullElement = true; + | } else { + |$hs.add($getValue1); + | } + |} + |$sizeOfDistinctArray = $hs.size() + ($foundNullElement ? 1 : 0); + |${genCodeForResult(ctx, ev, array, sizeOfDistinctArray)} + """.stripMargin + } else { +s""" + |int $sizeOfDistinctArray = 0; + |boolean $foundNullElement = false; + |for (int $i = 0; $i < $array.numElements(); $i ++) { + | if ($array.isNullAt($i)) { + | if (!($foundNullElement)) { + | $sizeOfDistinctArray = $sizeOfDistinctArray + 1; + | $foundNullElement = true; + | } + | } else { + |int $j; + |for ($j = 0; $j < $i; $j++) { + | if (!$array.isNullAt($j) && ${ctx.genEqual(elementType, getValue1, getValue2)}) { + |break; + | } + |} + |if ($i == $j) { + | $sizeOfDistinctArray = $sizeOfDistinctArray + 1; + |} + | } + |}
[GitHub] spark pull request #21050: [SPARK-23912][SQL]add array_distinct
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21050#discussion_r196654348 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -2355,3 +2356,281 @@ case class ArrayRemove(left: Expression, right: Expression) override def prettyName: String = "array_remove" } + +/** + * Removes duplicate values from the array. + */ +@ExpressionDescription( + usage = "_FUNC_(array) - Removes duplicate values from the array.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3, null, 3)); + [1,2,3,null] + """, since = "2.4.0") +case class ArrayDistinct(child: Expression) + extends UnaryExpression with ExpectsInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType) + + override def dataType: DataType = child.dataType + + @transient lazy val elementType: DataType = dataType.asInstanceOf[ArrayType].elementType + + @transient private lazy val ordering: Ordering[Any] = +TypeUtils.getInterpretedOrdering(elementType) + + override def checkInputDataTypes(): TypeCheckResult = { +super.checkInputDataTypes() match { + case f: TypeCheckResult.TypeCheckFailure => f + case TypeCheckResult.TypeCheckSuccess => +TypeUtils.checkForOrderingExpr(elementType, s"function $prettyName") +} + } + + @transient private lazy val elementTypeSupportEquals = elementType match { +case BinaryType => false +case _: AtomicType => true +case _ => false + } + + override def nullSafeEval(array: Any): Any = { +val data = array.asInstanceOf[ArrayData].toArray[AnyRef](elementType) +if (elementTypeSupportEquals) { + new GenericArrayData(data.distinct.asInstanceOf[Array[Any]]) +} else { + var foundNullElement = false + var pos = 0 + for(i <- 0 until data.length) { +if (data(i) == null) { + if (!foundNullElement) { +foundNullElement = true +pos = pos + 1 + } +} else { + var j = 0 + var done = false + while (j <= i && !done) { +if (data(j) != null && ordering.equiv(data(j), data(i))) { + done = true +} +j = j + 1 + } + if (i == j-1) { +pos = pos + 1 + } +} + } + new GenericArrayData(data.slice(0, pos)) +} + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +nullSafeCodeGen(ctx, ev, (array) => { + val i = ctx.freshName("i") + val j = ctx.freshName("j") + val sizeOfDistinctArray = ctx.freshName("sizeOfDistinctArray") + val getValue1 = CodeGenerator.getValue(array, elementType, i) + val getValue2 = CodeGenerator.getValue(array, elementType, j) + val foundNullElement = ctx.freshName("foundNullElement") + val openHashSet = classOf[OpenHashSet[_]].getName + val hs = ctx.freshName("hs") + val classTag = s"scala.reflect.ClassTag$$.MODULE$$.Object()" + if(elementTypeSupportEquals) { +s""" + |int $sizeOfDistinctArray = 0; + |boolean $foundNullElement = false; + |$openHashSet $hs = new $openHashSet($classTag); + |for (int $i = 0; $i < $array.numElements(); $i++) { + | if ($array.isNullAt($i)) { + |$foundNullElement = true; + | } else { + |$hs.add($getValue1); + | } + |} + |$sizeOfDistinctArray = $hs.size() + ($foundNullElement ? 1 : 0); + |${genCodeForResult(ctx, ev, array, sizeOfDistinctArray)} + """.stripMargin + } else { +s""" + |int $sizeOfDistinctArray = 0; + |boolean $foundNullElement = false; + |for (int $i = 0; $i < $array.numElements(); $i ++) { + | if ($array.isNullAt($i)) { + | if (!($foundNullElement)) { + | $sizeOfDistinctArray = $sizeOfDistinctArray + 1; + | $foundNullElement = true; + | } + | } else { + |int $j; + |for ($j = 0; $j < $i; $j++) { + | if (!$array.isNullAt($j) && ${ctx.genEqual(elementType, getValue1, getValue2)}) { + |break; + | } + |} + |if ($i == $j) { + | $sizeOfDistinctArray = $sizeOfDistinctArray + 1; + |} + | } + |}
[GitHub] spark pull request #21050: [SPARK-23912][SQL]add array_distinct
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21050#discussion_r196654935 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -2355,3 +2356,281 @@ case class ArrayRemove(left: Expression, right: Expression) override def prettyName: String = "array_remove" } + +/** + * Removes duplicate values from the array. + */ +@ExpressionDescription( + usage = "_FUNC_(array) - Removes duplicate values from the array.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3, null, 3)); + [1,2,3,null] + """, since = "2.4.0") +case class ArrayDistinct(child: Expression) + extends UnaryExpression with ExpectsInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType) + + override def dataType: DataType = child.dataType + + @transient lazy val elementType: DataType = dataType.asInstanceOf[ArrayType].elementType + + @transient private lazy val ordering: Ordering[Any] = +TypeUtils.getInterpretedOrdering(elementType) + + override def checkInputDataTypes(): TypeCheckResult = { +super.checkInputDataTypes() match { + case f: TypeCheckResult.TypeCheckFailure => f + case TypeCheckResult.TypeCheckSuccess => +TypeUtils.checkForOrderingExpr(elementType, s"function $prettyName") +} + } + + @transient private lazy val elementTypeSupportEquals = elementType match { +case BinaryType => false +case _: AtomicType => true +case _ => false + } + + override def nullSafeEval(array: Any): Any = { +val data = array.asInstanceOf[ArrayData].toArray[AnyRef](elementType) +if (elementTypeSupportEquals) { + new GenericArrayData(data.distinct.asInstanceOf[Array[Any]]) +} else { + var foundNullElement = false + var pos = 0 + for(i <- 0 until data.length) { --- End diff -- nit: `for (`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21050: [SPARK-23912][SQL]add array_distinct
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21050#discussion_r196654756 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -2355,3 +2356,281 @@ case class ArrayRemove(left: Expression, right: Expression) override def prettyName: String = "array_remove" } + +/** + * Removes duplicate values from the array. + */ +@ExpressionDescription( + usage = "_FUNC_(array) - Removes duplicate values from the array.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3, null, 3)); + [1,2,3,null] + """, since = "2.4.0") +case class ArrayDistinct(child: Expression) + extends UnaryExpression with ExpectsInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType) + + override def dataType: DataType = child.dataType + + @transient lazy val elementType: DataType = dataType.asInstanceOf[ArrayType].elementType + + @transient private lazy val ordering: Ordering[Any] = +TypeUtils.getInterpretedOrdering(elementType) + + override def checkInputDataTypes(): TypeCheckResult = { +super.checkInputDataTypes() match { + case f: TypeCheckResult.TypeCheckFailure => f + case TypeCheckResult.TypeCheckSuccess => +TypeUtils.checkForOrderingExpr(elementType, s"function $prettyName") +} + } + + @transient private lazy val elementTypeSupportEquals = elementType match { +case BinaryType => false +case _: AtomicType => true +case _ => false + } + + override def nullSafeEval(array: Any): Any = { +val data = array.asInstanceOf[ArrayData].toArray[AnyRef](elementType) +if (elementTypeSupportEquals) { + new GenericArrayData(data.distinct.asInstanceOf[Array[Any]]) +} else { + var foundNullElement = false + var pos = 0 + for(i <- 0 until data.length) { +if (data(i) == null) { + if (!foundNullElement) { +foundNullElement = true +pos = pos + 1 + } +} else { + var j = 0 + var done = false + while (j <= i && !done) { +if (data(j) != null && ordering.equiv(data(j), data(i))) { + done = true +} +j = j + 1 + } + if (i == j-1) { --- End diff -- nit: `(i == j - 1)`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21050: [SPARK-23912][SQL]add array_distinct
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21050#discussion_r196654969 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -2355,3 +2356,281 @@ case class ArrayRemove(left: Expression, right: Expression) override def prettyName: String = "array_remove" } + +/** + * Removes duplicate values from the array. + */ +@ExpressionDescription( + usage = "_FUNC_(array) - Removes duplicate values from the array.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3, null, 3)); + [1,2,3,null] + """, since = "2.4.0") +case class ArrayDistinct(child: Expression) + extends UnaryExpression with ExpectsInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType) + + override def dataType: DataType = child.dataType + + @transient lazy val elementType: DataType = dataType.asInstanceOf[ArrayType].elementType + + @transient private lazy val ordering: Ordering[Any] = +TypeUtils.getInterpretedOrdering(elementType) + + override def checkInputDataTypes(): TypeCheckResult = { +super.checkInputDataTypes() match { + case f: TypeCheckResult.TypeCheckFailure => f + case TypeCheckResult.TypeCheckSuccess => +TypeUtils.checkForOrderingExpr(elementType, s"function $prettyName") +} + } + + @transient private lazy val elementTypeSupportEquals = elementType match { +case BinaryType => false +case _: AtomicType => true +case _ => false + } + + override def nullSafeEval(array: Any): Any = { +val data = array.asInstanceOf[ArrayData].toArray[AnyRef](elementType) +if (elementTypeSupportEquals) { + new GenericArrayData(data.distinct.asInstanceOf[Array[Any]]) +} else { + var foundNullElement = false + var pos = 0 + for(i <- 0 until data.length) { +if (data(i) == null) { + if (!foundNullElement) { +foundNullElement = true +pos = pos + 1 + } +} else { + var j = 0 + var done = false + while (j <= i && !done) { +if (data(j) != null && ordering.equiv(data(j), data(i))) { + done = true +} +j = j + 1 + } + if (i == j-1) { +pos = pos + 1 + } +} + } + new GenericArrayData(data.slice(0, pos)) +} + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +nullSafeCodeGen(ctx, ev, (array) => { + val i = ctx.freshName("i") + val j = ctx.freshName("j") + val sizeOfDistinctArray = ctx.freshName("sizeOfDistinctArray") + val getValue1 = CodeGenerator.getValue(array, elementType, i) + val getValue2 = CodeGenerator.getValue(array, elementType, j) + val foundNullElement = ctx.freshName("foundNullElement") + val openHashSet = classOf[OpenHashSet[_]].getName + val hs = ctx.freshName("hs") + val classTag = s"scala.reflect.ClassTag$$.MODULE$$.Object()" + if(elementTypeSupportEquals) { --- End diff -- nit: `if (`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21595: [MINOR][SQL] Remove invalid comment from SparkStrategies
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21595 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92111/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21595: [MINOR][SQL] Remove invalid comment from SparkStrategies
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21595 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21587: [SPARK-24588][SS] streaming join should require HashClus...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21587 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4241/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21595: [MINOR][SQL] Remove invalid comment from SparkStrategies
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21595 **[Test build #92111 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92111/testReport)** for PR 21595 at commit [`8afb36b`](https://github.com/apache/spark/commit/8afb36b20aab1bbd1f6a5cf902aef7e0c04c8353). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21587: [SPARK-24588][SS] streaming join should require HashClus...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21587 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21587: [SPARK-24588][SS] streaming join should require HashClus...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21587 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21590: [SPARK-24423][SQL] Add a new option for JDBC sources
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21590 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21590: [SPARK-24423][SQL] Add a new option for JDBC sources
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21590 **[Test build #92118 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92118/testReport)** for PR 21590 at commit [`8920793`](https://github.com/apache/spark/commit/8920793d480de76e3cbe9d25f66e624ad6183503). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21590: [SPARK-24423][SQL] Add a new option for JDBC sources
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21590 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21587: [SPARK-24588][SS] streaming join should require HashClus...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21587 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/345/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21587: [SPARK-24588][SS] streaming join should require HashClus...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21587 **[Test build #92119 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92119/testReport)** for PR 21587 at commit [`08da2e6`](https://github.com/apache/spark/commit/08da2e6717ce4693fcaf33d021513f478f33d2a4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21590: [SPARK-24423][SQL] Add a new option for JDBC sources
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21590 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/344/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21590: [SPARK-24423][SQL] Add a new option for JDBC sources
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21590 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4240/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21587: [SPARK-24588][SS] streaming join should require HashClus...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21587 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/343/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21587: [SPARK-24588][SS] streaming join should require HashClus...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21587 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21333: [SPARK-23778][CORE] Avoid unneeded shuffle when u...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21333 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21587: [SPARK-24588][SS] streaming join should require HashClus...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21587 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21587: [SPARK-24588][SS] streaming join should require HashClus...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21587 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4239/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21587: [SPARK-24588][SS] streaming join should require H...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21587#discussion_r196652632 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/DistributionSuite.scala --- @@ -41,34 +41,127 @@ class DistributionSuite extends SparkFunSuite { } } - test("HashPartitioning (with nullSafe = true) is the output partitioning") { -// Cases which do not need an exchange between two data properties. + test("UnspecifiedDistribution and AllTuples") { --- End diff -- I've reorganized this test suite and added a bunch of new test cases, to improve the test coverage. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21587: [SPARK-24588][SS] streaming join should require H...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21587#discussion_r196652542 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -186,9 +180,8 @@ case class RoundRobinPartitioning(numPartitions: Int) extends Partitioning case object SinglePartition extends Partitioning { val numPartitions = 1 - override def satisfies(required: Distribution): Boolean = required match { + override def satisfies0(required: Distribution): Boolean = required match { --- End diff -- added in the base class --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21333: [SPARK-23778][CORE] Avoid unneeded shuffle when union ge...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21333 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21542 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21587: [SPARK-24588][SS] streaming join should require HashClus...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21587 **[Test build #92117 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92117/testReport)** for PR 21587 at commit [`0795e40`](https://github.com/apache/spark/commit/0795e405398278e0cf86fd06d7cae675194c197d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21542 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4238/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21542 **[Test build #92116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92116/testReport)** for PR 21542 at commit [`75c9339`](https://github.com/apache/spark/commit/75c9339c948239c4b77899169bd5ac0484e523a1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21542 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/342/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21542 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21542 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21581: [SPARK-24574][SQL] array_contains, array_position...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21581#discussion_r196649798 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3082,7 +3082,10 @@ object functions { * @since 1.5.0 */ def array_contains(column: Column, value: Any): Column = withExpr { -ArrayContains(column.expr, Literal(value)) +value match { --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21588: [WIP][SPARK-24590][BUILD] Make Jenkins tests pass...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21588#discussion_r196646202 --- Diff: pom.xml --- @@ -123,7 +123,7 @@ 1.6.0 3.4.6 2.6.0 -org.spark-project.hive +com.github.hyukjinkwon --- End diff -- I am going to revert this change and the changes in ` dev/run-tests.py` back to `org.spark-project.hive` once the tests pass. I redistributed `org.spark-project.hive` having one liner fix for Hadoop 3 support in order to use it in this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21588: [WIP][SPARK-24590][BUILD] Make Jenkins tests pass...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21588#discussion_r196646070 --- Diff: project/SparkBuild.scala --- @@ -464,7 +464,20 @@ object DockerIntegrationTests { */ object DependencyOverrides { lazy val settings = Seq( -dependencyOverrides += "com.google.guava" % "guava" % "14.0.1") +dependencyOverrides += "com.google.guava" % "guava" % "14.0.1", +dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-annotations" % "2.6.7", +dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-core" % "2.6.7", +dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-module-jaxb-annotations" % "2.6.7", +dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.6.7") +} + +/** + * Exclusions to work around sbt's dependency resolution being different from Maven's. + */ +object ExcludeDependencies { + lazy val settings = Seq( +excludeDependencies += "com.fasterxml.jackson.jaxrs" % "jackson-jaxrs-json-provider", +excludeDependencies += "javax.ws.rs" % "jsr311-api") --- End diff -- I think this also should have been excluded by Jersey. Seems it's difference between Maven and SBT if I am not mistaken. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21588: [WIP][SPARK-24590][BUILD] Make Jenkins tests pass...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21588#discussion_r196645986 --- Diff: project/SparkBuild.scala --- @@ -464,7 +464,20 @@ object DockerIntegrationTests { */ object DependencyOverrides { lazy val settings = Seq( -dependencyOverrides += "com.google.guava" % "guava" % "14.0.1") +dependencyOverrides += "com.google.guava" % "guava" % "14.0.1", +dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-annotations" % "2.6.7", +dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-core" % "2.6.7", +dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-module-jaxb-annotations" % "2.6.7", +dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.6.7") --- End diff -- These look coming from `jackson-jaxrs-json-provider` where it looks the resolution is different between Maven and SBT. I had to manually override and exclude. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21588: [WIP][SPARK-24590][BUILD] Make Jenkins tests pass...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21588#discussion_r196645845 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveVersionSuite.scala --- @@ -26,7 +27,6 @@ import org.apache.spark.sql.hive.HiveUtils private[client] abstract class HiveVersionSuite(version: String) extends SparkFunSuite { override protected val enableAutoThreadAudit = false - protected var client: HiveClient = null --- End diff -- This was only used in `HiveClientSuite.scala`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21588: [WIP][SPARK-24590][BUILD] Make Jenkins tests passed with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21588 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21588: [WIP][SPARK-24590][BUILD] Make Jenkins tests passed with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21588 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21588: [WIP][SPARK-24590][BUILD] Make Jenkins tests passed with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21588 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/341/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21588: [WIP][SPARK-24590][BUILD] Make Jenkins tests passed with...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21588 **[Test build #92115 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92115/testReport)** for PR 21588 at commit [`aec2e71`](https://github.com/apache/spark/commit/aec2e710e7fbb349e7c59c466452ed7aab2e7ca8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21588: [WIP][SPARK-24590][BUILD] Make Jenkins tests passed with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21588 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4237/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21542 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92112/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21542 **[Test build #92112 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92112/testReport)** for PR 21542 at commit [`75c9339`](https://github.com/apache/spark/commit/75c9339c948239c4b77899169bd5ac0484e523a1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21542 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21594 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92108/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21594 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21594 **[Test build #92108 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92108/testReport)** for PR 21594 at commit [`4171062`](https://github.com/apache/spark/commit/417106260f329e4de9b0371084f688be435943ce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92109/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21577 **[Test build #92109 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92109/testReport)** for PR 21577 at commit [`264c533`](https://github.com/apache/spark/commit/264c533737410786faae24df8cb5b27218f804cd). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21593#discussion_r196637144 --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java --- @@ -137,30 +137,15 @@ protected void deallocate() { } private int copyByteBuf(ByteBuf buf, WritableByteChannel target) throws IOException { -ByteBuffer buffer = buf.nioBuffer(); -int written = (buffer.remaining() <= NIO_BUFFER_LIMIT) ? - target.write(buffer) : writeNioBuffer(target, buffer); +// SPARK-24578: cap the sub-region's size of returned nio buffer to improve the performance +// for the case that the passed-in buffer has too many components. +int length = Math.min(buf.readableBytes(), NIO_BUFFER_LIMIT); +ByteBuffer buffer = buf.nioBuffer(buf.readerIndex(), length); --- End diff -- I think you can go one step further here, and call `buf.nioBuffers(int, int)` (plural) https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/ByteBuf.java#L2355 that will avoid the copying required to create the merged buffer (though its a bit complicated as you have to check for incomplete writes from any single `target.write()` call). Also OK to leave this for now as this is a pretty important fix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21581: [SPARK-24574][SQL] array_contains, array_position...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21581#discussion_r196636915 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3082,7 +3082,10 @@ object functions { * @since 1.5.0 */ def array_contains(column: Column, value: Any): Column = withExpr { -ArrayContains(column.expr, Literal(value)) +value match { --- End diff -- Yes, I think so. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21581: [SPARK-24574][SQL] array_contains, array_position...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21581#discussion_r196636085 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3082,7 +3082,10 @@ object functions { * @since 1.5.0 */ def array_contains(column: Column, value: Any): Column = withExpr { -ArrayContains(column.expr, Literal(value)) +value match { --- End diff -- Yup, that's what I was thinking too from my glance. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21593 **[Test build #92114 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92114/testReport)** for PR 21593 at commit [`a30d4de`](https://github.com/apache/spark/commit/a30d4de019ac4380cf5bfd36ff0cf12ef72d78f7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21575: [SPARK-24566][CORE] spark.storage.blockManagerSla...
Github user xueyumusic commented on a diff in the pull request: https://github.com/apache/spark/pull/21575#discussion_r196635402 --- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala --- @@ -75,16 +76,18 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock) // "spark.network.timeout" uses "seconds", while `spark.storage.blockManagerSlaveTimeoutMs` uses // "milliseconds" private val slaveTimeoutMs = -sc.conf.getTimeAsMs("spark.storage.blockManagerSlaveTimeoutMs", "120s") +sc.conf.getTimeAsMs("spark.storage.blockManagerSlaveTimeoutMs", --- End diff -- I look at this carefully, I think your are right, thanks @jiangxb1987 . One case that is not relevant with this PR is like this: set spark.storage.blockManagerSlaveTimeoutMs=900ms and not configure spark.network.timeout, then `executorTimeoutMs ` will be 0 since getTimeAsSeconds loos precision for ms. This config maybe not reasonable. If need fix how about add ensuring > 0 or make executorTimeoutMs's min value as 1, @jiangxb1987 @zsxwing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...
Github user squito commented on the issue: https://github.com/apache/spark/pull/21593 Jenkins, ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92106/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21577 **[Test build #92106 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92106/testReport)** for PR 21577 at commit [`5ece2f1`](https://github.com/apache/spark/commit/5ece2f12a820d6438146758f0e944f3b1c70d489). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/21590#discussion_r196634511 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -1206,4 +1207,92 @@ class JDBCSuite extends SparkFunSuite }.getMessage assert(errMsg.contains("Statement was canceled or the session timed out")) } + + test("query JDBC option - negative tests") { +val query = "SELECT * FROM test.people WHERE theid = 1" +// load path +val e1 = intercept[RuntimeException] { + val df = spark.read.format("jdbc") +.option("Url", urlWithUserAndPass) +.option("query", query) --- End diff -- @HyukjinKwon Thanks.. I will update the doc. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21594 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21581: [SPARK-24574][SQL] array_contains, array_position...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21581#discussion_r196633968 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -571,6 +571,10 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { df.selectExpr("array_contains(a, 1)"), Seq(Row(true), Row(false)) ) +checkAnswer( + df.select(array_contains(df("a"), df("c"))), + Seq(Row(true), Row(false)) +) --- End diff -- Can you add another test to use `selectExpr`, e.g., `df.selectExpr("array_contains(a, c)")`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21594 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92107/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21581: [SPARK-24574][SQL] array_contains, array_position...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21581#discussion_r196633908 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3082,7 +3082,10 @@ object functions { * @since 1.5.0 */ def array_contains(column: Column, value: Any): Column = withExpr { -ArrayContains(column.expr, Literal(value)) +value match { --- End diff -- On second thoughts, we should use `lit()`, e.g., `ArrayContains(column.expr, lit(value).expr)`? WDYT? @viirya @maropu @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21594 **[Test build #92107 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92107/testReport)** for PR 21594 at commit [`b9f1507`](https://github.com/apache/spark/commit/b9f1507c737c1edb87df9b03bee61218ada42307). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4236/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21542 **[Test build #92112 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92112/testReport)** for PR 21542 at commit [`75c9339`](https://github.com/apache/spark/commit/75c9339c948239c4b77899169bd5ac0484e523a1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21542 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21542 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4235/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21061 **[Test build #92113 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92113/testReport)** for PR 21061 at commit [`f64e529`](https://github.com/apache/spark/commit/f64e5292ccc9c709ea56614bf70b1fbb83099625). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/340/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21542 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21542 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/339/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21567: [SPARK-24560][CORE][MESOS] Fix some getTimeAsMs as getTi...
Github user xueyumusic commented on the issue: https://github.com/apache/spark/pull/21567 I see, thanks for your review and guidance, @jiangxb1987 @maropu , I will try to add related config to doc and close this PR, thank you --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21495: [SPARK-24418][Build] Upgrade Scala to 2.11.12 and 2.12.6
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21495 Is there any left work, or everything is already done? @dbtsai --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21594 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21594 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92105/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21555: [SPARK-24547][K8S] Allow for building spark on k8s docke...
Github user ifilonenko commented on the issue: https://github.com/apache/spark/pull/21555 @mccheah Good note. As this is a blocker for other PRs. It is probably best to refactor the `docker-image-tool.sh` in a separate PR for that is the not the focus of this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21594 **[Test build #92105 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92105/testReport)** for PR 21594 at commit [`71b93ed`](https://github.com/apache/spark/commit/71b93ed598833d760955e972894685c089af297b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21595: [MINOR][SQL] Remove invalid comment from SparkStrategies
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21595 **[Test build #92111 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92111/testReport)** for PR 21595 at commit [`8afb36b`](https://github.com/apache/spark/commit/8afb36b20aab1bbd1f6a5cf902aef7e0c04c8353). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21555: [SPARK-24547][K8S] Allow for building spark on k8s docke...
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21555 This change makes it such that using the tool forces building and pushing both Python and non-Python, but, what if the user wants to only build one to save time? I can imagine that being the case in something like a dev-CI workflow. Ideally the tool would allow one to be selective in managing either image or both. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21595: [MINOR][SQL] Remove invalid comment from SparkStrategies
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21595 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21595: [MINOR][SQL] Remove invalid comment from SparkStrategies
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21595 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21388: [SPARK-24336][SQL] Support 'pass through' transformation...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21388 I just provided new patch to remove the comment, as it looks like no longer preferred option. https://github.com/apache/spark/pull/21595 Closing this one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21388: [SPARK-24336][SQL] Support 'pass through' transfo...
Github user HeartSaVioR closed the pull request at: https://github.com/apache/spark/pull/21388 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org