[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

2018-05-07 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21236#discussion_r186371562
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -118,6 +118,162 @@ case class MapValues(child: Expression)
   override def prettyName: String = "map_values"
 }
 
+/**
+ * Returns an unordered array of all entries in the given map.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(map) - Returns an unordered array of all entries in the 
given map.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(map(1, 'a', 2, 'b'));
+   [(1,"a"),(2,"b")]
+  """,
+  since = "2.4.0")
+case class MapEntries(child: Expression) extends UnaryExpression with 
ExpectsInputTypes {
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(MapType)
+
+  lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
+
+  override def dataType: DataType = {
+ArrayType(
+  StructType(
+StructField("key", childDataType.keyType, false) ::
+StructField("value", childDataType.valueType, 
childDataType.valueContainsNull) ::
+Nil),
+  false)
+  }
+
+  override protected def nullSafeEval(input: Any): Any = {
+val childMap = input.asInstanceOf[MapData]
+val keys = childMap.keyArray()
+val values = childMap.valueArray()
+val length = childMap.numElements()
+val resultData = new Array[AnyRef](length)
+var i = 0;
+while (i < length) {
+  val key = keys.get(i, childDataType.keyType)
+  val value = values.get(i, childDataType.valueType)
+  val row = new GenericInternalRow(Array[Any](key, value))
+  resultData.update(i, row)
+  i += 1
+}
+new GenericArrayData(resultData)
+  }
+
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
+nullSafeCodeGen(ctx, ev, c => {
+  val numElements = ctx.freshName("numElements")
+  val keys = ctx.freshName("keys")
+  val values = ctx.freshName("values")
+  val isKeyPrimitive = 
CodeGenerator.isPrimitiveType(childDataType.keyType)
+  val isValuePrimitive = 
CodeGenerator.isPrimitiveType(childDataType.valueType)
+  val code = if (isKeyPrimitive && isValuePrimitive) {
+genCodeForPrimitiveElements(ctx, keys, values, ev.value, 
numElements)
+  } else {
+genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
+  }
+  s"""
+ |final int $numElements = $c.numElements();
+ |final ArrayData $keys = $c.keyArray();
+ |final ArrayData $values = $c.valueArray();
+ |$code
+   """.stripMargin
+})
+  }
+
+  private def getKey(varName: String) = CodeGenerator.getValue(varName, 
childDataType.keyType, "z")
+
+  private def getValue(varName: String) = {
+CodeGenerator.getValue(varName, childDataType.valueType, "z")
+  }
+
+  private def genCodeForPrimitiveElements(
+  ctx: CodegenContext,
+  keys: String,
+  values: String,
+  arrayData: String,
+  numElements: String): String = {
+val byteArraySize = ctx.freshName("byteArraySize")
+val data = ctx.freshName("byteArray")
+val unsafeRow = ctx.freshName("unsafeRow")
+val structSize = ctx.freshName("structSize")
+val unsafeArrayData = ctx.freshName("unsafeArrayData")
+val structsOffset = ctx.freshName("structsOffset")
+val calculateArraySize = 
"UnsafeArrayData.calculateSizeOfUnderlyingByteArray"
+val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
+
+val baseOffset = Platform.BYTE_ARRAY_OFFSET
+val longSize = LongType.defaultSize
+val keyTypeName = 
CodeGenerator.primitiveTypeName(childDataType.keyType)
+val valueTypeName = 
CodeGenerator.primitiveTypeName(childDataType.keyType)
+
+val valueAssignment = s"$unsafeRow.set$valueTypeName(1, 
${getValue(values)});"
+val valueAssignmentChecked = if (childDataType.valueContainsNull) {
+  s"""
+ |if ($values.isNullAt(z)) {
+ |  $unsafeRow.setNullAt(1);
+ |} else {
+ |  $valueAssignment
+ |}
+   """.stripMargin
+} else {
+  valueAssignment
+}
+
+s"""
+   |final int $structSize = ${UnsafeRow.calculateBitSetWidthInBytes(2) 
+ longSize * 2};
--- End diff --

We can calculate `structSize` beforehand and inline it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21028
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90308/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21028
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21028
  
**[Test build #90308 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90308/testReport)**
 for PR 21028 at commit 
[`4e37975`](https://github.com/apache/spark/commit/4e37975ba3ce361009a83d248ad7d0b758f86f4c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21254: [SPARK-23094][SPARK-23723][SPARK-23724][SQL][FOLLOW-UP] ...

2018-05-07 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/21254
  
> Do we have any behavior change after the previous PR: #20937?

The PR brought the `encoding` (and `charset`) option but we didn't change 
behavior when `encoding` is not specified.

As @HyukjinKwon wrote above the PR #21247 eliminates restrictions in write 
but the restrictions don't break previous behavior (before #20937) in any case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14083: [SPARK-16406][SQL] Improve performance of Logical...

2018-05-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14083


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14083: [SPARK-16406][SQL] Improve performance of LogicalPlan.re...

2018-05-07 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/14083
  
Merging to master. Thanks for all the reviews!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21250: [SPARK-23291][SQL][R][BRANCH-2.3] R's substr should not ...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21250
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21250: [SPARK-23291][SQL][R][BRANCH-2.3] R's substr should not ...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21250
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90310/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21250: [SPARK-23291][SQL][R][BRANCH-2.3] R's substr should not ...

2018-05-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21250
  
**[Test build #90310 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90310/testReport)**
 for PR 21250 at commit 
[`dd6c329`](https://github.com/apache/spark/commit/dd6c329733924a4fe625473593c7a87b90f2280e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14083: [SPARK-16406][SQL] Improve performance of LogicalPlan.re...

2018-05-07 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/14083
  
Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21255: [SPARK-24186][SparR][SQL]change reverse and conca...

2018-05-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21255#discussion_r186368498
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -1502,12 +1502,21 @@ test_that("column functions", {
   result <- collect(select(df, sort_array(df[[1]])))[[1]]
   expect_equal(result, list(list(1L, 2L, 3L), list(4L, 5L, 6L)))
 
-  # Test flattern
+  result <- collect(select(df, reverse(df[[1]])))[[1]]
--- End diff --

Seems we don't have test for `reverse` for string. Can you add one for it 
too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21255: [SPARK-24186][SparR][SQL]change reverse and conca...

2018-05-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21255#discussion_r186369103
  
--- Diff: R/pkg/R/functions.R ---
@@ -209,6 +209,7 @@ NULL
 #' head(select(tmp, array_max(tmp$v1), array_min(tmp$v1)))
 #' head(select(tmp, array_position(tmp$v1, 21)))
 #' head(select(tmp, flatten(tmp$v1)))
+#' head(select(tmp, reverse(tmp$v1)))
--- End diff --

Also add `concat` here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21249: [SPARK-23291][R][FOLLOWUP] Update SparkR migratio...

2018-05-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21249#discussion_r186365513
  
--- Diff: docs/sparkr.md ---
@@ -664,6 +664,6 @@ You can inspect the search path in R with 
[`search()`](https://stat.ethz.ch/R-ma
  - For `summary`, option for statistics to compute has been added. Its 
output is changed from that from `describe`.
  - A warning can be raised if versions of SparkR package and the Spark JVM 
do not match.
 
-## Upgrading to Spark 2.4.0
+## Upgrading to SparkR 2.3.1 and above
 
- - The `start` parameter of `substr` method was wrongly subtracted by one, 
previously. In other words, the index specified by `start` parameter was 
considered as 0-base. This can lead to inconsistent substring results and also 
does not match with the behaviour with `substr` in R. It has been fixed so the 
`start` parameter of `substr` method is now 1-base, e.g., therefore to get the 
same result as `substr(df$a, 2, 5)`, it should be changed to `substr(df$a, 1, 
4)`.
+ - In SparkR 2.3.0 and earlier, the `start` parameter of `substr` method 
was wrongly subtracted by one, previously. In other words, the index specified 
by `start` parameter was considered as 0-base. This can lead to inconsistent 
substring results and also does not match with the behaviour with `substr` in 
R. In version 2.3.1 and later, it has been fixed so the `start` parameter of 
`substr` method is now 1-base. As an example, `substr(lit('abcdef'), 2, 4))` 
would result to `abc` in SparkR 2.3.0, and the result would be `bcd` in SparkR 
2.3.1.
--- End diff --

ok. :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21249: [SPARK-23291][R][FOLLOWUP] Update SparkR migratio...

2018-05-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21249#discussion_r186364537
  
--- Diff: docs/sparkr.md ---
@@ -664,6 +664,6 @@ You can inspect the search path in R with 
[`search()`](https://stat.ethz.ch/R-ma
  - For `summary`, option for statistics to compute has been added. Its 
output is changed from that from `describe`.
  - A warning can be raised if versions of SparkR package and the Spark JVM 
do not match.
 
-## Upgrading to Spark 2.4.0
+## Upgrading to SparkR 2.3.1 and above
 
- - The `start` parameter of `substr` method was wrongly subtracted by one, 
previously. In other words, the index specified by `start` parameter was 
considered as 0-base. This can lead to inconsistent substring results and also 
does not match with the behaviour with `substr` in R. It has been fixed so the 
`start` parameter of `substr` method is now 1-base, e.g., therefore to get the 
same result as `substr(df$a, 2, 5)`, it should be changed to `substr(df$a, 1, 
4)`.
+ - In SparkR 2.3.0 and earlier, the `start` parameter of `substr` method 
was wrongly subtracted by one, previously. In other words, the index specified 
by `start` parameter was considered as 0-base. This can lead to inconsistent 
substring results and also does not match with the behaviour with `substr` in 
R. In version 2.3.1 and later, it has been fixed so the `start` parameter of 
`substr` method is now 1-base. As an example, `substr(lit('abcdef'), 2, 4))` 
would result to `abc` in SparkR 2.3.0, and the result would be `bcd` in SparkR 
2.3.1.
--- End diff --

I think it's fine since it's an example ... 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21249: [SPARK-23291][R][FOLLOWUP] Update SparkR migration note ...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21249
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21249: [SPARK-23291][R][FOLLOWUP] Update SparkR migration note ...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21249
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90309/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21249: [SPARK-23291][R][FOLLOWUP] Update SparkR migration note ...

2018-05-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21249
  
**[Test build #90309 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90309/testReport)**
 for PR 21249 at commit 
[`6c4743a`](https://github.com/apache/spark/commit/6c4743a8f33138431c2f3ce3ddd9f2512d72bc66).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21250: [SPARK-23291][SQL][R][BRANCH-2.3] R's substr should not ...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21250
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21250: [SPARK-23291][SQL][R][BRANCH-2.3] R's substr should not ...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21250
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2993/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21249: [SPARK-23291][R][FOLLOWUP] Update SparkR migratio...

2018-05-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21249#discussion_r186361818
  
--- Diff: docs/sparkr.md ---
@@ -664,6 +664,6 @@ You can inspect the search path in R with 
[`search()`](https://stat.ethz.ch/R-ma
  - For `summary`, option for statistics to compute has been added. Its 
output is changed from that from `describe`.
  - A warning can be raised if versions of SparkR package and the Spark JVM 
do not match.
 
-## Upgrading to Spark 2.4.0
+## Upgrading to SparkR 2.3.1 and above
 
- - The `start` parameter of `substr` method was wrongly subtracted by one, 
previously. In other words, the index specified by `start` parameter was 
considered as 0-base. This can lead to inconsistent substring results and also 
does not match with the behaviour with `substr` in R. It has been fixed so the 
`start` parameter of `substr` method is now 1-base, e.g., therefore to get the 
same result as `substr(df$a, 2, 5)`, it should be changed to `substr(df$a, 1, 
4)`.
+ - In SparkR 2.3.0 and earlier, the `start` parameter of `substr` method 
was wrongly subtracted by one, previously. In other words, the index specified 
by `start` parameter was considered as 0-base. This can lead to inconsistent 
substring results and also does not match with the behaviour with `substr` in 
R. In version 2.3.1 and later, it has been fixed so the `start` parameter of 
`substr` method is now 1-base. As an example, `substr(lit('abcdef'), 2, 4))` 
would result to `abc` in SparkR 2.3.0, and the result would be `bcd` in SparkR 
2.3.1.
--- End diff --

nit: ```the result would be `bcd` in SparkR 2.3.1 and above.```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21193: [SPARK-24121][SQL] Add API for handling expressio...

2018-05-07 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/21193#discussion_r186361480
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala
 ---
@@ -112,6 +112,112 @@ object JavaCode {
   def isNullExpression(code: String): SimpleExprValue = {
 expression(code, BooleanType)
   }
+
+  def block(code: String): Block = {
+CodeBlock(codeParts = Seq(code), blockInputs = Seq.empty)
+  }
+}
+
+/**
+ * A trait representing a block of java code.
+ */
+trait Block extends JavaCode {
+
+  // The expressions to be evaluated inside this block.
+  def exprValues: Seq[ExprValue]
+
+  // This will be called during string interpolation.
+  override def toString: String = _marginChar match {
+case Some(c) => code.stripMargin(c)
+case _ => code
+  }
+
+  var _marginChar: Option[Char] = None
+
+  def stripMargin(c: Char): this.type = {
+_marginChar = Some(c)
+this
+  }
+
+  def stripMargin: this.type = {
+_marginChar = Some('|')
+this
+  }
+
+  def + (other: Block): Block
+}
+
+object Block {
+  implicit def blockToString(block: Block): String = block.toString
+
+  implicit def blocksToBlock(blocks: Seq[Block]): Block = Blocks(blocks)
+
+  implicit class BlockHelper(val sc: StringContext) extends AnyVal {
+def code(args: Any*): Block = {
+  sc.checkLengths(args)
+  if (sc.parts.length == 0) {
+EmptyBlock
+  } else {
+args.foreach {
+  case _: ExprValue =>
+  case _: Int | _: Long | _: Float | _: Double | _: String =>
+  case _: Block =>
+  case other => throw new IllegalArgumentException(
+s"Can not interpolate ${other.getClass.getName} into code 
block.")
--- End diff --

+10


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21250: [SPARK-23291][SQL][R][BRANCH-2.3] R's substr should not ...

2018-05-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21250
  
**[Test build #90310 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90310/testReport)**
 for PR 21250 at commit 
[`dd6c329`](https://github.com/apache/spark/commit/dd6c329733924a4fe625473593c7a87b90f2280e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21249: [SPARK-23291][R][FOLLOWUP] Update SparkR migration note ...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21249
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21249: [SPARK-23291][R][FOLLOWUP] Update SparkR migration note ...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21249
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2992/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21249: [SPARK-23291][R][FOLLOWUP] Update SparkR migration note ...

2018-05-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21249
  
**[Test build #90309 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90309/testReport)**
 for PR 21249 at commit 
[`6c4743a`](https://github.com/apache/spark/commit/6c4743a8f33138431c2f3ce3ddd9f2512d72bc66).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21193: [SPARK-24121][SQL] Add API for handling expressio...

2018-05-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21193#discussion_r186359947
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExprValueSuite.scala
 ---
@@ -17,6 +17,8 @@
 
 package org.apache.spark.sql.catalyst.expressions.codegen
 
+import scala.collection.mutable
--- End diff --

my bad. forgot to remove.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21255: [SPARK-24186][SparR][SQL]change reverse and concat to co...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21255
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21255: [SPARK-24186][SparR][SQL]change reverse and concat to co...

2018-05-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21255
  
**[Test build #90304 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90304/testReport)**
 for PR 21255 at commit 
[`3985285`](https://github.com/apache/spark/commit/3985285089673e42a85a5d1ba3cd7419a6948909).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21255: [SPARK-24186][SparR][SQL]change reverse and concat to co...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21255
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90304/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21185: [SPARK-23894][CORE][SQL] Defensively clear Active...

2018-05-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21185#discussion_r186358733
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -229,6 +229,23 @@ private[spark] class Executor(
 
ManagementFactory.getGarbageCollectorMXBeans.asScala.map(_.getCollectionTime).sum
   }
 
+  /**
+   * Only in local mode, we have to prevent the driver from setting the 
active SparkSession
+   * in the executor threads.  See SPARK-23894.
+   */
+  private lazy val clearActiveSparkSessionMethod = if 
(Utils.isLocalMaster(conf)) {
--- End diff --

I've added this check in https://github.com/apache/spark/pull/21190


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10....

2018-05-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21070#discussion_r186358096
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
 ---
@@ -63,115 +59,157 @@ public final void readBooleans(int total, 
WritableColumnVector c, int rowId) {
 }
   }
 
+  private ByteBuffer getBuffer(int length) {
+try {
+  return in.slice(length).order(ByteOrder.LITTLE_ENDIAN);
--- End diff --

previously we only call `.order(ByteOrder.LITTLE_ENDIAN)` if it's a 
big-endian platform. Is it OK to alway call it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-07 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r186357798
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,140 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  new 
GenericArrayData(List.fill(count.asInstanceOf[Int])(left.eval(input)))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def nullSafeCodeGen(ctx: CodegenContext,
+   ev: ExprCode,
+   f: (String, String) => String): ExprCode = {
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val resultCode = f(leftGen.value, rightGen.value)
+
+if (nullable) {
+  val nullSafeEval =
+leftGen.code +
+  rightGen.code + ctx.nullSafeExec(right.nullable, 
rightGen.isNull) {
+s"""
+  ${ev.isNull} = false;
+  $resultCode
+"""
+  }
+
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = true;
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $nullSafeEval
+ """.stripMargin
+  )
+} else {
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = false;
+   | ${leftGen.code}
+   | ${rightGen.code}
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $resultCode
+ """.stripMargin
+, isNull = FalseLiteral)
+}
+
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+nullSafeCodeGen(ctx, ev, (l, r) => {
+  val et = dataType.elementType
+  val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+  val arrayDataName = ctx.freshName("arrayData")
+  val arrayName = ctx.freshName("arrayObject")
+  val numElements = ctx.freshName("numElements")
+
+  val genNumElements =
+s"""
+   | int $numElements = 0;
+   | if ($r > 0) {
+   |   $numElements = $r;
+   | }
+ """.stripMargin
+
+  val initialization = if (isPrimitive) {
+val arrayName = ctx.freshName("array")
+val baseOffset = Platform.BYTE_ARRAY_OFFSET
+s"""
+   | int numBytes = ${et.defaultSize} * $numElements;
+   | int unsafeArraySizeInBytes =
+   |   UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+   | + org.apache.spark.unsafe.array.ByteArrayMethods
+   |   .roundNumberOfBytesToNearestWord(numBytes);
+   | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
--- End diff --

Maybe we can use `ctx.createUnsafeArray()` now?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10....

2018-05-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21070#discussion_r186357714
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
 ---
@@ -63,115 +59,157 @@ public final void readBooleans(int total, 
WritableColumnVector c, int rowId) {
 }
   }
 
+  private ByteBuffer getBuffer(int length) {
+try {
+  return in.slice(length).order(ByteOrder.LITTLE_ENDIAN);
+} catch (IOException e) {
+  throw new ParquetDecodingException("Failed to read " + length + " 
bytes", e);
+}
+  }
+
   @Override
   public final void readIntegers(int total, WritableColumnVector c, int 
rowId) {
-c.putIntsLittleEndian(rowId, total, buffer, offset - 
Platform.BYTE_ARRAY_OFFSET);
-offset += 4 * total;
+int requiredBytes = total * 4;
+ByteBuffer buffer = getBuffer(requiredBytes);
+
+if (buffer.hasArray()) {
--- End diff --

shall we assert `buffer.hasArray()` is always true?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10....

2018-05-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21070#discussion_r186357371
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
 ---
@@ -63,115 +59,157 @@ public final void readBooleans(int total, 
WritableColumnVector c, int rowId) {
 }
   }
 
+  private ByteBuffer getBuffer(int length) {
+try {
+  return in.slice(length).order(ByteOrder.LITTLE_ENDIAN);
--- End diff --

does `in.slice(length)` do copy?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-07 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r186356981
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,140 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  new 
GenericArrayData(List.fill(count.asInstanceOf[Int])(left.eval(input)))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def nullSafeCodeGen(ctx: CodegenContext,
--- End diff --

Yes, overriding `nullSafeCodeGen` is not suitable for this usage.
So I think it would be good to put all code in `doGenCode`, or to create 
another method instead of overriding `nullSafeCodeGen`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21193: [SPARK-24121][SQL] Add API for handling expressio...

2018-05-07 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/21193#discussion_r186356233
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExprValueSuite.scala
 ---
@@ -17,6 +17,8 @@
 
 package org.apache.spark.sql.catalyst.expressions.codegen
 
+import scala.collection.mutable
--- End diff --

???


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...

2018-05-07 Thread liutang123
Github user liutang123 commented on the issue:

https://github.com/apache/spark/pull/21164
  
@gatorsmile Could you please give some comments when you have time? Thanks 
so much.
In addition, I think this is a critical bug!!!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-07 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r186355622
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3039,6 +3039,16 @@ object functions {
 ArrayContains(column.expr, Literal(value))
   }
 
+  /**
+   * Returns `true` if `a1` and `a2` have at least one non-null element in 
common. If not and
+   * any of the arrays contains a `null`, it returns `null`. It returns 
`false` otherwise.
+   * @group collection_funcs
+   * @since 2.4.0
+   */
+  def arrays_overlap(a1: Column, a2: Column): Column = withExpr {
+ArraysOverlap(a1.expr, a2.expr)
+   }
--- End diff --

nit: indent


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r186355288
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -530,6 +560,155 @@ case class ArrayContains(left: Expression, right: 
Expression)
   override def prettyName: String = "array_contains"
 }
 
+/**
+ * Checks if the two arrays contain at least one common element.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(a1, a2) - Returns true if a1 contains at least an 
element present also in a2. If the arrays have no common element and either of 
them contains a null element null is returned, false otherwise.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(3, 4, 5));
+   true
+  """, since = "2.4.0")
+// scalastyle:off line.size.limit
+case class ArraysOverlap(left: Expression, right: Expression)
+  extends BinaryArrayExpressionWithImplicitCast {
+
+  override def dataType: DataType = BooleanType
+
+  override def nullable: Boolean = {
+left.nullable || right.nullable || 
left.dataType.asInstanceOf[ArrayType].containsNull ||
+  right.dataType.asInstanceOf[ArrayType].containsNull
+  }
+
+  override def nullSafeEval(a1: Any, a2: Any): Any = {
+var hasNull = false
+val arr1 = a1.asInstanceOf[ArrayData]
+val arr2 = a2.asInstanceOf[ArrayData]
+val (biggestArr, smallestArr) = if (arr1.numElements() > 
arr2.numElements()) {
--- End diff --

it's just 2 arrays, `smaller` and `bigger` should be better


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r186355096
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -530,6 +560,155 @@ case class ArrayContains(left: Expression, right: 
Expression)
   override def prettyName: String = "array_contains"
 }
 
+/**
+ * Checks if the two arrays contain at least one common element.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(a1, a2) - Returns true if a1 contains at least an 
element present also in a2. If the arrays have no common element and either of 
them contains a null element null is returned, false otherwise.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(3, 4, 5));
+   true
+  """, since = "2.4.0")
+// scalastyle:off line.size.limit
+case class ArraysOverlap(left: Expression, right: Expression)
+  extends BinaryArrayExpressionWithImplicitCast {
+
+  override def dataType: DataType = BooleanType
+
+  override def nullable: Boolean = {
+left.nullable || right.nullable || 
left.dataType.asInstanceOf[ArrayType].containsNull ||
+  right.dataType.asInstanceOf[ArrayType].containsNull
+  }
+
+  override def nullSafeEval(a1: Any, a2: Any): Any = {
+var hasNull = false
+val arr1 = a1.asInstanceOf[ArrayData]
+val arr2 = a2.asInstanceOf[ArrayData]
+val (biggestArr, smallestArr) = if (arr1.numElements() > 
arr2.numElements()) {
+  (arr1, arr2)
+} else {
+  (arr2, arr1)
+}
+if (smallestArr.numElements() > 0) {
+  val smallestSet = new mutable.HashSet[Any]
+  smallestArr.foreach(elementType, (_, v) =>
+if (v == null) {
+  hasNull = true
+} else {
+  smallestSet += v
+})
+  biggestArr.foreach(elementType, (_, v1) =>
+if (v1 == null) {
+  hasNull = true
+} else if (smallestSet.contains(v1)) {
+  return true
+}
+  )
+} else if (containsNull(biggestArr, 
right.dataType.asInstanceOf[ArrayType])) {
--- End diff --

`right.dataType.asInstanceOf[ArrayType]` may not match the `biggerArr`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21028
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2991/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21028
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16478
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2990/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16478
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r186354007
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -530,6 +560,155 @@ case class ArrayContains(left: Expression, right: 
Expression)
   override def prettyName: String = "array_contains"
 }
 
+/**
+ * Checks if the two arrays contain at least one common element.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(a1, a2) - Returns true if a1 contains at least an 
element present also in a2. If the arrays have no common element and either of 
them contains a null element null is returned, false otherwise.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(3, 4, 5));
+   true
+  """, since = "2.4.0")
+// scalastyle:off line.size.limit
+case class ArraysOverlap(left: Expression, right: Expression)
+  extends BinaryArrayExpressionWithImplicitCast {
+
+  override def dataType: DataType = BooleanType
+
+  override def nullable: Boolean = {
+left.nullable || right.nullable || 
left.dataType.asInstanceOf[ArrayType].containsNull ||
+  right.dataType.asInstanceOf[ArrayType].containsNull
+  }
+
+  override def nullSafeEval(a1: Any, a2: Any): Any = {
+var hasNull = false
+val arr1 = a1.asInstanceOf[ArrayData]
+val arr2 = a2.asInstanceOf[ArrayData]
+val (biggestArr, smallestArr) = if (arr1.numElements() > 
arr2.numElements()) {
+  (arr1, arr2)
+} else {
+  (arr2, arr1)
+}
+if (smallestArr.numElements() > 0) {
+  val smallestSet = new mutable.HashSet[Any]
+  smallestArr.foreach(elementType, (_, v) =>
+if (v == null) {
+  hasNull = true
+} else {
+  smallestSet += v
+})
+  biggestArr.foreach(elementType, (_, v1) =>
+if (v1 == null) {
+  hasNull = true
+} else if (smallestSet.contains(v1)) {
+  return true
+}
+  )
+} else if (containsNull(biggestArr, 
right.dataType.asInstanceOf[ArrayType])) {
+  hasNull = true
+}
+if (hasNull) {
+  null
+} else {
+  false
+}
+  }
+
+  def containsNull(arr: ArrayData, dt: ArrayType): Boolean = {
+if (dt.containsNull) {
+  arr.foreach(elementType, (_, v) =>
--- End diff --

```
var i = 0
var hasNull = false
while (i < arr.numElements && !hasNull) {
  hasNull = arr.isNullAt(i)
  i += 1
}
hasNull
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r186353058
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -28,6 +30,34 @@ import org.apache.spark.unsafe.Platform
 import org.apache.spark.unsafe.array.ByteArrayMethods
 import org.apache.spark.unsafe.types.{ByteArray, UTF8String}
 
+/**
+ * Base trait for [[BinaryExpression]]s with two arrays of the same 
element type and implicit
+ * casting.
+ */
+trait BinaryArrayExpressionWithImplicitCast extends BinaryExpression
+  with ImplicitCastInputTypes {
+
+  protected lazy val elementType: DataType = 
inputTypes.head.asInstanceOf[ArrayType].elementType
--- End diff --

this can be a `def`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r186353005
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -28,6 +30,34 @@ import org.apache.spark.unsafe.Platform
 import org.apache.spark.unsafe.array.ByteArrayMethods
 import org.apache.spark.unsafe.types.{ByteArray, UTF8String}
 
+/**
+ * Base trait for [[BinaryExpression]]s with two arrays of the same 
element type and implicit
+ * casting.
+ */
+trait BinaryArrayExpressionWithImplicitCast extends BinaryExpression
+  with ImplicitCastInputTypes {
+
+  protected lazy val elementType: DataType = 
inputTypes.head.asInstanceOf[ArrayType].elementType
+
+  override def inputTypes: Seq[AbstractDataType] = {
+TypeCoercion.findWiderTypeForTwo(left.dataType, right.dataType) match {
--- End diff --

does presto allow implicitly casting to string for these collection 
functions? e.g. can `ArraysOverlap` work for array of int and array of string?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21040: [SPARK-23930][SQL] Add slice function

2018-05-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21040


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21252: [SPARK-24193] Sort by disk when number of limit i...

2018-05-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21252#discussion_r186352935
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1238,6 +1238,14 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val SORT_IN_MEM_FOR_LIMIT_THRESHOLD =
+buildConf("spark.sql.limit.sortInMemThreshold")
+  .internal()
+  .doc("In sql like 'select x from t order by y limit m', if m is 
under this threshold, " +
+  "sort in memory, otherwise do a global sort with disk.")
+  .intConf
+  .createWithDefault(2000)
--- End diff --

Yeah, I agree.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2989/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-05-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #90306 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90306/testReport)**
 for PR 16677 at commit 
[`062b8fd`](https://github.com/apache/spark/commit/062b8fd58ae13f252b1e6f61c70b69ed05521715).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90306/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21028
  
**[Test build #90308 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90308/testReport)**
 for PR 21028 at commit 
[`4e37975`](https://github.com/apache/spark/commit/4e37975ba3ce361009a83d248ad7d0b758f86f4c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21040: [SPARK-23930][SQL] Add slice function

2018-05-07 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21040
  
Thanks! merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21240
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2988/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21240
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-07 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21028
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16478
  
**[Test build #90307 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90307/testReport)**
 for PR 16478 at commit 
[`ae00de1`](https://github.com/apache/spark/commit/ae00de13dd779a2a09b142c54a2fcc144d7f8c23).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-05-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #90306 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90306/testReport)**
 for PR 16677 at commit 
[`062b8fd`](https://github.com/apache/spark/commit/062b8fd58ae13f252b1e6f61c70b69ed05521715).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21255: [SPARK-24186][SparR][SQL]change reverse and concat to co...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21255
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21255: [SPARK-24186][SparR][SQL]change reverse and concat to co...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21255
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2987/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-05-07 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21255: [SPARK-24186][SparR][SQL]change reverse and concat to co...

2018-05-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21255
  
**[Test build #90304 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90304/testReport)**
 for PR 21255 at commit 
[`3985285`](https://github.com/apache/spark/commit/3985285089673e42a85a5d1ba3cd7419a6948909).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-07 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16478
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...

2018-05-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21240
  
**[Test build #90305 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90305/testReport)**
 for PR 21240 at commit 
[`4ab3af0`](https://github.com/apache/spark/commit/4ab3af0c1abfd0ac078c968dbe589bf96091).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21256: [SPARK-24160][FOLLOWUP] Fix compilation failure

2018-05-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21256


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...

2018-05-07 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21240
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21252: [SPARK-24193] Sort by disk when number of limit i...

2018-05-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21252#discussion_r186349524
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1238,6 +1238,14 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val SORT_IN_MEM_FOR_LIMIT_THRESHOLD =
+buildConf("spark.sql.limit.sortInMemThreshold")
+  .internal()
+  .doc("In sql like 'select x from t order by y limit m', if m is 
under this threshold, " +
+  "sort in memory, otherwise do a global sort with disk.")
+  .intConf
+  .createWithDefault(2000)
--- End diff --

I would suggest `Int.Max` as the default value, which preserves the 
previous behavior. Users can tune it w.r.t. their workload.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21255: [SPARK-24186][SparR][SQL]change reverse and concat to co...

2018-05-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21255
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21252: [SPARK-24193] Sort by disk when number of limit i...

2018-05-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21252#discussion_r186348657
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1238,6 +1238,14 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val SORT_IN_MEM_FOR_LIMIT_THRESHOLD =
+buildConf("spark.sql.limit.sortInMemThreshold")
+  .internal()
+  .doc("In sql like 'select x from t order by y limit m', if m is 
under this threshold, " +
+  "sort in memory, otherwise do a global sort with disk.")
+  .intConf
+  .createWithDefault(2000)
--- End diff --

Isn't 2000 too small for this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21256: [SPARK-24160][FOLLOWUP] Fix compilation failure

2018-05-07 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21256
  
thanks! I'm merging it to unblock the build, since it already passes the 
compilation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21252: [SPARK-24193] Sort by disk when number of limit i...

2018-05-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21252#discussion_r186348278
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1238,6 +1238,14 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val SORT_IN_MEM_FOR_LIMIT_THRESHOLD =
+buildConf("spark.sql.limit.sortInMemThreshold")
+  .internal()
+  .doc("In sql like 'select x from t order by y limit m', if m is 
under this threshold, " +
+  "sort in memory, otherwise do a global sort with disk.")
+  .intConf
+  .createWithDefault(2000)
--- End diff --

Oh, yeah, reasonable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21256: [SPARK-24160][FOLLOWUP] Fix compilation failure

2018-05-07 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21256
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21250: [SPARK-23291][SQL][R][BRANCH-2.3] R's substr shou...

2018-05-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21250#discussion_r186348079
  
--- Diff: docs/sparkr.md ---
@@ -663,3 +663,7 @@ You can inspect the search path in R with 
[`search()`](https://stat.ethz.ch/R-ma
  - The `stringsAsFactors` parameter was previously ignored with `collect`, 
for example, in `collect(createDataFrame(iris), stringsAsFactors = TRUE))`. It 
has been corrected.
  - For `summary`, option for statistics to compute has been added. Its 
output is changed from that from `describe`.
  - A warning can be raised if versions of SparkR package and the Spark JVM 
do not match.
+
+## Upgrading to Spark 2.3.1 and above
+
+ - The `start` parameter of `substr` method was wrongly subtracted by one, 
previously. In other words, the index specified by `start` parameter was 
considered as 0-base. This can lead to inconsistent substring results and also 
does not match with the behaviour with `substr` in R. It has been fixed so the 
`start` parameter of `substr` method is now 1-base, e.g., therefore to get the 
same result as `substr(df$a, 2, 5)`, it should be changed to `substr(df$a, 1, 
4)`.
--- End diff --

we should mention the version more explicitly, e.g.
```
In SparkR 2.3.0 and earlier, the `start` parameter ... In version 2.3.1 and 
later, ... As an example, `substr(lit('abcdef'), 2, 5)` would result to `abc` 
in SparkR 2.3.0, and in SparkR 2.3.1, the result would be ...
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16478
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2986/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16478
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21250: [SPARK-23291][SQL][R][BRANCH-2.3] R's substr shou...

2018-05-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21250#discussion_r186347093
  
--- Diff: docs/sparkr.md ---
@@ -663,3 +663,7 @@ You can inspect the search path in R with 
[`search()`](https://stat.ethz.ch/R-ma
  - The `stringsAsFactors` parameter was previously ignored with `collect`, 
for example, in `collect(createDataFrame(iris), stringsAsFactors = TRUE))`. It 
has been corrected.
  - For `summary`, option for statistics to compute has been added. Its 
output is changed from that from `describe`.
  - A warning can be raised if versions of SparkR package and the Spark JVM 
do not match.
+
+## Upgrading to Spark 2.3.1 and above
--- End diff --

`Spark` -> `SparkR`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21252: [SPARK-24193] Sort by disk when number of limit i...

2018-05-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21252#discussion_r186346827
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1238,6 +1238,14 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val SORT_IN_MEM_FOR_LIMIT_THRESHOLD =
+buildConf("spark.sql.limit.sortInMemThreshold")
+  .internal()
+  .doc("In sql like 'select x from t order by y limit m', if m is 
under this threshold, " +
+  "sort in memory, otherwise do a global sort with disk.")
--- End diff --

`with disk` -> `which spills to disk if necessary`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21252: [SPARK-24193] Sort by disk when number of limit i...

2018-05-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21252#discussion_r186346747
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1238,6 +1238,14 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val SORT_IN_MEM_FOR_LIMIT_THRESHOLD =
+buildConf("spark.sql.limit.sortInMemThreshold")
--- End diff --

`spark.sql.execution.combineLimitAfterSortTreshold`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2985/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21240
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90303/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...

2018-05-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21240
  
**[Test build #90303 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90303/testReport)**
 for PR 21240 at commit 
[`4ab3af0`](https://github.com/apache/spark/commit/4ab3af0c1abfd0ac078c968dbe589bf96091).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21240
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21252: [SPARK-24193] Sort by disk when number of limit i...

2018-05-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21252#discussion_r186345913
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1238,6 +1238,14 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val SORT_IN_MEM_FOR_LIMIT_THRESHOLD =
+buildConf("spark.sql.limit.sortInMemThreshold")
+  .internal()
+  .doc("In sql like 'select x from t order by y limit m', if m is 
under this threshold, " +
+  "sort in memory, otherwise do a global sort with disk.")
+  .intConf
+  .createWithDefault(2000)
--- End diff --

what if users only have a few queries which have large limit and they want 
to disable the top n sort? I feel this config is more flexible than a boolean 
flag.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21256: [SPARK-24160][FOLLOWUP] Fix compilation failure

2018-05-07 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21256
  
cc @cloud-fan @JoshRosen @jinxing64 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21240
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2984/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21240
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16478
  
**[Test build #90302 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90302/testReport)**
 for PR 16478 at commit 
[`ae00de1`](https://github.com/apache/spark/commit/ae00de13dd779a2a09b142c54a2fcc144d7f8c23).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16478
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16478
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90302/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-05-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #90301 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90301/testReport)**
 for PR 16677 at commit 
[`062b8fd`](https://github.com/apache/spark/commit/062b8fd58ae13f252b1e6f61c70b69ed05521715).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90301/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...

2018-05-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21240
  
**[Test build #90303 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90303/testReport)**
 for PR 21240 at commit 
[`4ab3af0`](https://github.com/apache/spark/commit/4ab3af0c1abfd0ac078c968dbe589bf96091).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21256: [SPARK-24160][FOLLOWUP] Fix compilation failure

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21256
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21256: [SPARK-24160][FOLLOWUP] Fix compilation failure

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21256
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2983/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   >