[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-19 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/19884
  
@shaneknapp afaik Spark is supporting pyarrow with python 2.7 and we should 
be testing these also, but I'm not sure about pypy.  Maybe @ueshin or 
@HyukjinKwon can confirm before we start upgrading?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19975: [SPARK-22781][SS] Support creating streaming data...

2017-12-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19975


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19975: [SPARK-22781][SS] Support creating streaming dataset wit...

2017-12-19 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/19975
  
Thanks! Merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19872: WIP: [SPARK-22274][PySpark] User-defined aggregation fun...

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19872
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19872: WIP: [SPARK-22274][PySpark] User-defined aggregation fun...

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19872
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85152/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19872: WIP: [SPARK-22274][PySpark] User-defined aggregation fun...

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19872
  
**[Test build #85152 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85152/testReport)**
 for PR 19872 at commit 
[`ea5d6f3`](https://github.com/apache/spark/commit/ea5d6f319aa3b1bba20ad86a51e6efb65658e3d2).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-19 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/19884#discussion_r157952725
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1679,6 +1678,15 @@ def from_arrow_schema(arrow_schema):
  for field in arrow_schema])
 
 
+def _require_minimum_pyarrow_version():
--- End diff --

sounds good


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20019: [SPARK-22361][SQL][TEST] Add unit test for Window Frames

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20019
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85153/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-19 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/19884#discussion_r157952778
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1679,6 +1678,15 @@ def from_arrow_schema(arrow_schema):
  for field in arrow_schema])
 
 
+def _require_minimum_pyarrow_version():
+""" Raise ImportError if minimum version of pyarrow is not installed
+"""
+from distutils.version import LooseVersion
+import pyarrow
+if pyarrow.__version__ < LooseVersion('0.8.0'):
--- End diff --

sure, no prob


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20019: [SPARK-22361][SQL][TEST] Add unit test for Window Frames

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20019
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20019: [SPARK-22361][SQL][TEST] Add unit test for Window Frames

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20019
  
**[Test build #85153 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85153/testReport)**
 for PR 20019 at commit 
[`87e61ce`](https://github.com/apache/spark/commit/87e61cee4c2d8aab2a95fd9123b414736de5d3f1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DataFrameWindowFramesSuite extends QueryTest with 
SharedSQLContext `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...

2017-12-19 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20021
  
Here is [another 
example](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala#L414).
 This is complicated.

`result` is passed to `arguments` of `ctx.splitExpressions` in 
`genHashForStruct`. The `result` may get `ev.value` in a certain call path that 
is started from 
[here](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala#L277)


As this example points out, while `arguments` in `ctx.splitExpressions` has 
local variable, a local variable may have `ev.value` in some calling context. 
It does not seem to be easy to ensure no global variable in `arguments`.  
Thus, IMHO, it would be good to conbine localization and adding assert. 
Then, we declare a contract `no global variable in arguments` in comment.

WDYT?


```
// hash.scala, line 277
override def doGenCode(ctx, ev): ExprCode = { 
  ...
computeHash(childGen.value, child.dataType, ev.value, ctx)
  ...
}

// hash.scala, line 448
protected def computeHash(input, dataType, result, ctx): String = 
  computeHashWithTailRec(input, dataType, result, ctx)

// hash.scala, line 414
private def computeHashWithTailRec(input, dataType, result, ctx): String = 
dataType match {
  ...
case StructType(fields) => genHashForStruct(ctx, input, result, fields)
}


//hash.scala, line 402
protected def genHashForStruct(ctx, input, result, fields) {
  ...
  ctx.splitExpressions(
...
arguments = Seq("InternalRow" -> input, hashResultType -> result),
...
}
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20020: [SPARK-22834][SQL] Make insertion commands have real chi...

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20020
  
**[Test build #85159 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85159/testReport)**
 for PR 20020 at commit 
[`e25a9eb`](https://github.com/apache/spark/commit/e25a9eb285d56a771a56b77534413be59b9f111b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19977
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85149/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19977
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19977
  
**[Test build #85149 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85149/testReport)**
 for PR 19977 at commit 
[`cb2a75b`](https://github.com/apache/spark/commit/cb2a75b69c9f82465ba5dccca3e2a22526462b52).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...

2017-12-19 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19872#discussion_r157948426
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
 ---
@@ -48,29 +48,46 @@ object ExtractPythonUDFFromAggregate extends 
Rule[LogicalPlan] {
 }.isDefined
   }
 
+  private def isPandasGroupAggUdf(expr: Expression): Boolean = expr match {
+  case PythonUDF(_, _, _, _, PythonEvalType.SQL_PANDAS_GROUP_AGG_UDF) 
=> true
+  case Alias(child, _) => isPandasGroupAggUdf(child)
+  case _ => false
+  }
+
+  private def hasPandasGroupAggUdf(agg: Aggregate): Boolean = {
+val actualAggExpr = 
agg.aggregateExpressions.drop(agg.groupingExpressions.length)
+actualAggExpr.exists(isPandasGroupAggUdf)
+  }
+
+
--- End diff --

nit: remove an extra line.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...

2017-12-19 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19872#discussion_r157939292
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
 ---
@@ -48,29 +48,46 @@ object ExtractPythonUDFFromAggregate extends 
Rule[LogicalPlan] {
 }.isDefined
   }
 
+  private def isPandasGroupAggUdf(expr: Expression): Boolean = expr match {
+  case PythonUDF(_, _, _, _, PythonEvalType.SQL_PANDAS_GROUP_AGG_UDF) 
=> true
+  case Alias(child, _) => isPandasGroupAggUdf(child)
+  case _ => false
+  }
+
+  private def hasPandasGroupAggUdf(agg: Aggregate): Boolean = {
+val actualAggExpr = 
agg.aggregateExpressions.drop(agg.groupingExpressions.length)
--- End diff --

Do we need to drop the grouping expressions?
If we need, we can drop them only if `conf.dataFrameRetainGroupColumns == 
true`, otherwise `aggregateExpressions` doesn't contain `groupingExpressions`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...

2017-12-19 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19872#discussion_r157944969
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala
 ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.python
+
+import java.io.File
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.{SparkEnv, TaskContext}
+import org.apache.spark.api.python.{ChainedPythonFunctions, PythonEvalType}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.{Ascending, Attribute, 
AttributeSet, Expression, JoinedRow, NamedExpression, PythonUDF, SortOrder, 
UnsafeProjection, UnsafeRow}
+import org.apache.spark.sql.catalyst.plans.physical.{AllTuples, 
ClusteredDistribution, Distribution, Partitioning}
+import org.apache.spark.sql.execution.{GroupedIterator, SparkPlan, 
UnaryExecNode}
+import org.apache.spark.sql.types.{DataType, StructField, StructType}
+import org.apache.spark.util.Utils
+
+case class AggregateInPandasExec(
+groupingExpressions: Seq[Expression],
+udfExpressions: Seq[PythonUDF],
+resultExpressions: Seq[NamedExpression],
+child: SparkPlan)
+  extends UnaryExecNode {
+
+  override def output: Seq[Attribute] = 
resultExpressions.map(_.toAttribute)
+
+  override def outputPartitioning: Partitioning = child.outputPartitioning
+
+  override def producedAttributes: AttributeSet = AttributeSet(output)
+
+  override def requiredChildDistribution: Seq[Distribution] = {
+if (groupingExpressions.isEmpty) {
+  AllTuples :: Nil
+} else {
+  ClusteredDistribution(groupingExpressions) :: Nil
+}
+  }
+
+  private def collectFunctions(udf: PythonUDF): (ChainedPythonFunctions, 
Seq[Expression]) = {
+udf.children match {
+  case Seq(u: PythonUDF) =>
+val (chained, children) = collectFunctions(u)
+(ChainedPythonFunctions(chained.funcs ++ Seq(udf.func)), children)
+  case children =>
+// There should not be any other UDFs, or the children can't be 
evaluated directly.
+assert(children.forall(_.find(_.isInstanceOf[PythonUDF]).isEmpty))
+(ChainedPythonFunctions(Seq(udf.func)), udf.children)
+}
+  }
+
+  override def requiredChildOrdering: Seq[Seq[SortOrder]] =
+Seq(groupingExpressions.map(SortOrder(_, Ascending)))
+
+  override protected def doExecute(): RDD[InternalRow] = {
+val inputRDD = child.execute()
+
+val bufferSize = inputRDD.conf.getInt("spark.buffer.size", 65536)
+val reuseWorker = 
inputRDD.conf.getBoolean("spark.python.worker.reuse", defaultValue = true)
+val sessionLocalTimeZone = conf.sessionLocalTimeZone
+val pandasRespectSessionTimeZone = conf.pandasRespectSessionTimeZone
+
+val (pyFuncs, inputs) = udfExpressions.map(collectFunctions).unzip
+
+val allInputs = new ArrayBuffer[Expression]
+val dataTypes = new ArrayBuffer[DataType]
+
+allInputs.appendAll(groupingExpressions)
+
+val argOffsets = inputs.map { input =>
+  input.map { e =>
+allInputs += e
+dataTypes += e.dataType
+allInputs.length - 1 - groupingExpressions.length
+  }.toArray
+}.toArray
+
+val schema = StructType(dataTypes.zipWithIndex.map { case (dt, i) =>
+  StructField(s"_$i", dt)
+})
+
+inputRDD.mapPartitionsInternal { iter =>
+  val grouped = if (groupingExpressions.isEmpty) {
+Iterator((null, iter))
+  } else {
+val groupedIter = GroupedIterator(iter, groupingExpressions, 
child.output)
+
+val dropGrouping =
+  
UnsafeProjection.create(allInputs.drop(groupingExpressions.length), 
child.output)
+
+groupedI

[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...

2017-12-19 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19872#discussion_r157944622
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala
 ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.python
+
+import java.io.File
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.{SparkEnv, TaskContext}
+import org.apache.spark.api.python.{ChainedPythonFunctions, PythonEvalType}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.{Ascending, Attribute, 
AttributeSet, Expression, JoinedRow, NamedExpression, PythonUDF, SortOrder, 
UnsafeProjection, UnsafeRow}
+import org.apache.spark.sql.catalyst.plans.physical.{AllTuples, 
ClusteredDistribution, Distribution, Partitioning}
+import org.apache.spark.sql.execution.{GroupedIterator, SparkPlan, 
UnaryExecNode}
+import org.apache.spark.sql.types.{DataType, StructField, StructType}
+import org.apache.spark.util.Utils
+
+case class AggregateInPandasExec(
+groupingExpressions: Seq[Expression],
+udfExpressions: Seq[PythonUDF],
+resultExpressions: Seq[NamedExpression],
+child: SparkPlan)
+  extends UnaryExecNode {
+
+  override def output: Seq[Attribute] = 
resultExpressions.map(_.toAttribute)
+
+  override def outputPartitioning: Partitioning = child.outputPartitioning
+
+  override def producedAttributes: AttributeSet = AttributeSet(output)
+
+  override def requiredChildDistribution: Seq[Distribution] = {
+if (groupingExpressions.isEmpty) {
+  AllTuples :: Nil
+} else {
+  ClusteredDistribution(groupingExpressions) :: Nil
+}
+  }
+
+  private def collectFunctions(udf: PythonUDF): (ChainedPythonFunctions, 
Seq[Expression]) = {
+udf.children match {
+  case Seq(u: PythonUDF) =>
+val (chained, children) = collectFunctions(u)
+(ChainedPythonFunctions(chained.funcs ++ Seq(udf.func)), children)
+  case children =>
+// There should not be any other UDFs, or the children can't be 
evaluated directly.
+assert(children.forall(_.find(_.isInstanceOf[PythonUDF]).isEmpty))
+(ChainedPythonFunctions(Seq(udf.func)), udf.children)
+}
+  }
+
+  override def requiredChildOrdering: Seq[Seq[SortOrder]] =
+Seq(groupingExpressions.map(SortOrder(_, Ascending)))
+
+  override protected def doExecute(): RDD[InternalRow] = {
+val inputRDD = child.execute()
+
+val bufferSize = inputRDD.conf.getInt("spark.buffer.size", 65536)
+val reuseWorker = 
inputRDD.conf.getBoolean("spark.python.worker.reuse", defaultValue = true)
+val sessionLocalTimeZone = conf.sessionLocalTimeZone
+val pandasRespectSessionTimeZone = conf.pandasRespectSessionTimeZone
+
+val (pyFuncs, inputs) = udfExpressions.map(collectFunctions).unzip
+
+val allInputs = new ArrayBuffer[Expression]
+val dataTypes = new ArrayBuffer[DataType]
+
+allInputs.appendAll(groupingExpressions)
--- End diff --

I guess we don't need to append `groupingExpressions`. Seems like they are 
dropped later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...

2017-12-19 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19872#discussion_r157938453
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
 ---
@@ -48,29 +48,46 @@ object ExtractPythonUDFFromAggregate extends 
Rule[LogicalPlan] {
 }.isDefined
   }
 
+  private def isPandasGroupAggUdf(expr: Expression): Boolean = expr match {
+  case PythonUDF(_, _, _, _, PythonEvalType.SQL_PANDAS_GROUP_AGG_UDF) 
=> true
+  case Alias(child, _) => isPandasGroupAggUdf(child)
+  case _ => false
+  }
+
+  private def hasPandasGroupAggUdf(agg: Aggregate): Boolean = {
+val actualAggExpr = 
agg.aggregateExpressions.drop(agg.groupingExpressions.length)
+actualAggExpr.exists(isPandasGroupAggUdf)
+  }
+
+
   private def extract(agg: Aggregate): LogicalPlan = {
 val projList = new ArrayBuffer[NamedExpression]()
 val aggExpr = new ArrayBuffer[NamedExpression]()
-agg.aggregateExpressions.foreach { expr =>
-  if (hasPythonUdfOverAggregate(expr, agg)) {
-// Python UDF can only be evaluated after aggregate
-val newE = expr transformDown {
-  case e: Expression if belongAggregate(e, agg) =>
-val alias = e match {
-  case a: NamedExpression => a
-  case o => Alias(e, "agg")()
-}
-aggExpr += alias
-alias.toAttribute
+
+if (hasPandasGroupAggUdf(agg)) {
+  Aggregate(agg.groupingExpressions, agg.aggregateExpressions, 
agg.child)
--- End diff --

Do we need to copy?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20021
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20021
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85150/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20021
  
**[Test build #85150 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85150/testReport)**
 for PR 20021 at commit 
[`3d44195`](https://github.com/apache/spark/commit/3d44195f48c1688d7dc5b87fd0c9f07c1535000b).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19950
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85148/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19950
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19950
  
**[Test build #85148 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85148/testReport)**
 for PR 19950 at commit 
[`604cb7d`](https://github.com/apache/spark/commit/604cb7df1d892010316c964250977c8692799666).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19977
  
**[Test build #85158 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85158/testReport)**
 for PR 19977 at commit 
[`fc14aeb`](https://github.com/apache/spark/commit/fc14aeb4e92e67aba1750fc1bc2b0fc9afaa5fac).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #5604: [SPARK-1442][SQL] Window Function Support for Spar...

2017-12-19 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/5604#discussion_r157945816
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
 ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.catalyst.analysis.UnresolvedException
+import org.apache.spark.sql.catalyst.errors.TreeNodeException
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types.{NumericType, DataType}
+
+/**
+ * The trait of the Window Specification (specified in the OVER clause or 
WINDOW clause) for
+ * Window Functions.
+ */
+sealed trait WindowSpec
+
+/**
+ * The specification for a window function.
+ * @param partitionSpec It defines the way that input rows are partitioned.
+ * @param orderSpec It defines the ordering of rows in a partition.
+ * @param frameSpecification It defines the window frame in a partition.
+ */
+case class WindowSpecDefinition(
+partitionSpec: Seq[Expression],
+orderSpec: Seq[SortOrder],
+frameSpecification: WindowFrame) extends Expression with WindowSpec {
+
+  def validate: Option[String] = frameSpecification match {
+case UnspecifiedFrame =>
+  Some("Found a UnspecifiedFrame. It should be converted to a 
SpecifiedWindowFrame " +
+"during analysis. Please file a bug report.")
+case frame: SpecifiedWindowFrame => frame.validate.orElse {
+  def checkValueBasedBoundaryForRangeFrame(): Option[String] = {
+if (orderSpec.length > 1)  {
+  // It is not allowed to have a value-based PRECEDING and 
FOLLOWING
+  // as the boundary of a Range Window Frame.
+  Some("This Range Window Frame only accepts at most one ORDER BY 
expression.")
+} else if (orderSpec.nonEmpty && 
!orderSpec.head.dataType.isInstanceOf[NumericType]) {
+  Some("The data type of the expression in the ORDER BY clause 
should be a numeric type.")
+} else {
+  None
+}
+  }
+
+  (frame.frameType, frame.frameStart, frame.frameEnd) match {
+case (RangeFrame, vp: ValuePreceding, _) => 
checkValueBasedBoundaryForRangeFrame()
+case (RangeFrame, vf: ValueFollowing, _) => 
checkValueBasedBoundaryForRangeFrame()
+case (RangeFrame, _, vp: ValuePreceding) => 
checkValueBasedBoundaryForRangeFrame()
+case (RangeFrame, _, vf: ValueFollowing) => 
checkValueBasedBoundaryForRangeFrame()
+case (_, _, _) => None
+  }
+}
+  }
+
+  type EvaluatedType = Any
+
+  override def children: Seq[Expression]  = partitionSpec ++ orderSpec
+
+  override lazy val resolved: Boolean =
+childrenResolved && 
frameSpecification.isInstanceOf[SpecifiedWindowFrame]
+
+
+  override def toString: String = simpleString
+
+  override def eval(input: Row): EvaluatedType = throw new 
UnsupportedOperationException
+  override def nullable: Boolean = true
+  override def foldable: Boolean = false
+  override def dataType: DataType = throw new UnsupportedOperationException
+}
+
+/**
+ * A Window specification reference that refers to the 
[[WindowSpecDefinition]] defined
+ * under the name `name`.
+ */
+case class WindowSpecReference(name: String) extends WindowSpec
+
+/**
+ * The trait used to represent the type of a Window Frame.
+ */
+sealed trait FrameType
+
+/**
+ * RowFrame treats rows in a partition individually. When a 
[[ValuePreceding]]
+ * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is 
considered
+ * as a physical offset.
+ * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 
3-row frame,
+ * from the row precedes the current row to the row follows the current 
row.
+ */
+case object

[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19977
  
**[Test build #85157 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85157/testReport)**
 for PR 19977 at commit 
[`9887478`](https://github.com/apache/spark/commit/9887478064e81d0efa7d3b2d9d8abdffde9364a4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19950
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85147/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19950
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19950
  
**[Test build #85147 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85147/testReport)**
 for PR 19950 at commit 
[`3d1d3db`](https://github.com/apache/spark/commit/3d1d3dba1e78eb9a082fa1f6a2aa2437a9ed99a6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20008
  
**[Test build #85156 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85156/testReport)**
 for PR 20008 at commit 
[`b4c5339`](https://github.com/apache/spark/commit/b4c5339cc44eeed7e75aac30ba6b6aaf06316305).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...

2017-12-19 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19977#discussion_r157943490
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -566,6 +568,21 @@ object TypeCoercion {
 }
   }
 
+  /**
+   * When all inputs in [[Concat]] are binary, coerces an output type to 
binary
+   */
+  case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule {
--- End diff --

Aha, sounds reasonable. I feel we should fix that in an separate pr first, 
thought? Or, this pr should include the fix, too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20018: SPARK-22833 [Improvement] in SparkHive Scala Exam...

2017-12-19 Thread chetkhatri
Github user chetkhatri commented on a diff in the pull request:

https://github.com/apache/spark/pull/20018#discussion_r157942866
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala
 ---
@@ -104,6 +103,60 @@ object SparkHiveExample {
 // ...
 // $example off:spark_hive$
--- End diff --

@srowen Can you please review this cc\ @holdenk @sameeragarwal 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20028: [SPARK-19053][ML]Supporting multiple evaluation metrics ...

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20028
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85155/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20028: [SPARK-19053][ML]Supporting multiple evaluation metrics ...

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20028
  
**[Test build #85155 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85155/testReport)**
 for PR 20028 at commit 
[`5d5978e`](https://github.com/apache/spark/commit/5d5978e5c83bc2037646f40d6db23059af530a15).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20028: [SPARK-19053][ML]Supporting multiple evaluation metrics ...

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20028
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...

2017-12-19 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19977#discussion_r157942499
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -566,6 +568,21 @@ object TypeCoercion {
 }
   }
 
+  /**
+   * When all inputs in [[Concat]] are binary, coerces an output type to 
binary
+   */
+  case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule {
+override protected def coerceTypes(
+plan: LogicalPlan): LogicalPlan = plan transformExpressionsUp {
--- End diff --

From other viewpoints, is that a bad idea to drop this rule from 
`typeCoercionRules` if `spark.sql.function.concatBinaryAsString` is true? This 
is like;
```
  def typeCoercionRules(conf: SQLConf): List[Rule[LogicalPlan]] = if (check 
`spark.sql.function.concatBinaryAsString` ) {
  // rules with ConcatCoercion
} else {
  // rules without ConcatCoercion
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19954: [SPARK-22757][Kubernetes] Enable use of remote de...

2017-12-19 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19954#discussion_r157942449
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala
 ---
@@ -209,9 +213,33 @@ private[spark] class ExecutorPodFactoryImpl(sparkConf: 
SparkConf)
 .build()
 }.getOrElse(executorContainer)
 
-new PodBuilder(executorPod)
+val (maybeSecretsMountedPod, maybeSecretsMountedContainer) =
+  mountSecretsBootstrap.map { bootstrap =>
+bootstrap.mountSecrets(executorPod, containerWithLimitCores)
--- End diff --

I created https://issues.apache.org/jira/browse/SPARK-22839.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19954: [SPARK-22757][Kubernetes] Enable use of remote de...

2017-12-19 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19954#discussion_r157942418
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/MountSecretsBootstrap.scala
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s
+
+import io.fabric8.kubernetes.api.model.{Container, ContainerBuilder, Pod, 
PodBuilder}
+
+/**
+ * Bootstraps a driver or executor container or an init-container with 
needed secrets mounted.
+ */
+private[spark] class MountSecretsBootstrap(secretNamesToMountPaths: 
Map[String, String]) {
--- End diff --

I created https://issues.apache.org/jira/browse/SPARK-22839 to keep track 
of the refactoring work.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19975: [SPARK-22781][SS] Support creating streaming dataset wit...

2017-12-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19975
  
Gentle ping! :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20015
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85146/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20015
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20015
  
**[Test build #85146 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85146/testReport)**
 for PR 20015 at commit 
[`238d7d4`](https://github.com/apache/spark/commit/238d7d470c583c910bccbca8bbcaa681b67d6025).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait TruncInstant extends BinaryExpression with 
ImplicitCastInputTypes `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...

2017-12-19 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20021
  
I checked some call site. Here is [one 
example](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala#L285)
 that `arguments` has `ev.value` instead of local variable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...

2017-12-19 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19977#discussion_r157941723
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java ---
@@ -74,4 +74,29 @@ public static long getPrefix(byte[] bytes) {
 }
 return Arrays.copyOfRange(bytes, start, end);
   }
+
+  public static byte[] concat(byte[]... inputs) {
+// Compute the total length of the result
+int totalLength = 0;
+for (int i = 0; i < inputs.length; i++) {
+  if (inputs[i] != null) {
+totalLength += inputs[i].length;
+  } else {
+return null;
+  }
+}
+
+// Allocate a new byte array, and copy the inputs one by one into it
+final byte[] result = new byte[totalLength];
+int offset = 0;
+for (int i = 0; i < inputs.length; i++) {
+  int len = inputs[i].length;
--- End diff --

nvm. we already have checked here 
https://github.com/apache/spark/pull/19977/files#diff-6df3223f9826f2b5d1d0c8e29a240ae3R82


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...

2017-12-19 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19977#discussion_r157941423
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java ---
@@ -74,4 +74,29 @@ public static long getPrefix(byte[] bytes) {
 }
 return Arrays.copyOfRange(bytes, start, end);
   }
+
+  public static byte[] concat(byte[]... inputs) {
+// Compute the total length of the result
+int totalLength = 0;
+for (int i = 0; i < inputs.length; i++) {
+  if (inputs[i] != null) {
+totalLength += inputs[i].length;
+  } else {
+return null;
+  }
+}
+
+// Allocate a new byte array, and copy the inputs one by one into it
+final byte[] result = new byte[totalLength];
+int offset = 0;
+for (int i = 0; i < inputs.length; i++) {
+  int len = inputs[i].length;
--- End diff --

aha, I see. `UTF8String` seems to need the same null check?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...

2017-12-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19977#discussion_r157940901
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -566,6 +568,21 @@ object TypeCoercion {
 }
   }
 
+  /**
+   * When all inputs in [[Concat]] are binary, coerces an output type to 
binary
+   */
+  case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule {
--- End diff --

I feel this rule is a little hacky as it must be run before 
`ImplicitTypeCasts`. I think we should not make `Concat` extends 
`ImplicitCastInputTypes`, and do type coercion manually for it with a customer 
rule.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...

2017-12-19 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19977#discussion_r157940623
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1780,6 +1780,8 @@ options.
  
  - Since Spark 2.3, when either broadcast hash join or broadcast nested 
loop join is applicable, we prefer to broadcasting the table that is explicitly 
specified in a broadcast hint. For details, see the section [Broadcast 
Hint](#broadcast-hint-for-sql-queries) and 
[SPARK-22489](https://issues.apache.org/jira/browse/SPARK-22489).
 
+ - Since Spark 2.3, when all inputs are binary, `functions.concat()` 
returns an output as binary. Otherwise, it returns as a string. Until Spark 
2.3, it always returns as a string despite of input types.
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...

2017-12-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19977#discussion_r157940634
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -566,6 +568,21 @@ object TypeCoercion {
 }
   }
 
+  /**
+   * When all inputs in [[Concat]] are binary, coerces an output type to 
binary
+   */
+  case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule {
+override protected def coerceTypes(
+plan: LogicalPlan): LogicalPlan = plan transformExpressionsUp {
--- End diff --

we can add a stop condition `if isBinaryMode = false`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...

2017-12-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19977#discussion_r157940412
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1780,6 +1780,8 @@ options.
  
  - Since Spark 2.3, when either broadcast hash join or broadcast nested 
loop join is applicable, we prefer to broadcasting the table that is explicitly 
specified in a broadcast hint. For details, see the section [Broadcast 
Hint](#broadcast-hint-for-sql-queries) and 
[SPARK-22489](https://issues.apache.org/jira/browse/SPARK-22489).
 
+ - Since Spark 2.3, when all inputs are binary, `functions.concat()` 
returns an output as binary. Otherwise, it returns as a string. Until Spark 
2.3, it always returns as a string despite of input types.
--- End diff --

also mention the config


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...

2017-12-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19977#discussion_r157940334
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java ---
@@ -74,4 +74,29 @@ public static long getPrefix(byte[] bytes) {
 }
 return Arrays.copyOfRange(bytes, start, end);
   }
+
+  public static byte[] concat(byte[]... inputs) {
+// Compute the total length of the result
+int totalLength = 0;
+for (int i = 0; i < inputs.length; i++) {
+  if (inputs[i] != null) {
+totalLength += inputs[i].length;
+  } else {
+return null;
+  }
+}
+
+// Allocate a new byte array, and copy the inputs one by one into it
+final byte[] result = new byte[totalLength];
+int offset = 0;
+for (int i = 0; i < inputs.length; i++) {
+  int len = inputs[i].length;
--- End diff --

null check here too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...

2017-12-19 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19977#discussion_r157939820
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -48,17 +48,26 @@ import org.apache.spark.unsafe.types.{ByteArray, 
UTF8String}
   > SELECT _FUNC_('Spark', 'SQL');
SparkSQL
   """)
-case class Concat(children: Seq[Expression]) extends Expression with 
ImplicitCastInputTypes {
+case class Concat(children: Seq[Expression], isBinaryMode: Boolean = false)
+  extends Expression with ImplicitCastInputTypes {
 
-  override def inputTypes: Seq[AbstractDataType] = 
Seq.fill(children.size)(StringType)
-  override def dataType: DataType = StringType
+  def this(children: Seq[Expression]) = this(children, false)
--- End diff --

Sorry I didn't get it. We have the default parameter of `isBinaryMode = 
true`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-19 Thread wesm
Github user wesm commented on the issue:

https://github.com/apache/spark/pull/19884
  
We need to upgrade to 0.8.0 now because we changed the binary format 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-19 Thread wesm
Github user wesm commented on the issue:

https://github.com/apache/spark/pull/19884
  
We are supporting pyarrow on Python 2.7 except for Windows


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20026: [SPARK-22838][Core] Avoid unnecessary copying of ...

2017-12-19 Thread ConeyLiu
Github user ConeyLiu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20026#discussion_r157938250
  
--- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala ---
@@ -208,7 +209,7 @@ private class EncryptedBlockData(
 conf: SparkConf,
 key: Array[Byte]) extends BlockData {
 
-  override def toInputStream(): InputStream = 
Channels.newInputStream(open())
+  override def toInputStream(): InputStream = new 
NioBufferedFileInputStream(file)
--- End diff --

You meaning the memory buffer? The `NioBufferedFileInputStream` has `close` 
method, you can see follow:
org.apache.spark.io.NioBufferedFileInputStream.java
```java
@Override
  public synchronized void close() throws IOException {
fileChannel.close();
StorageUtils.dispose(byteBuffer);
  }

  @Override
  protected void finalize() throws IOException {
close();
  }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20028: [SPARK-19053][ML]Supporting multiple evaluation metrics ...

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20028
  
**[Test build #85155 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85155/testReport)**
 for PR 20028 at commit 
[`5d5978e`](https://github.com/apache/spark/commit/5d5978e5c83bc2037646f40d6db23059af530a15).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20027: Branch 2.2

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20027
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20028: [SPARK-19053][ML]Supporting multiple evaluation m...

2017-12-19 Thread hhbyyh
GitHub user hhbyyh opened a pull request:

https://github.com/apache/spark/pull/20028

[SPARK-19053][ML]Supporting multiple evaluation metrics in DataFrame-based 
API

## What changes were proposed in this pull request?

As an initial step, the PR creates BinaryClassificationMetrics, 
MultiClassClassificationMetrics, RegressionMetrics in ML. The long term target 
is to reach function parity with MLlib Metrics and be able to provide more 
enhancements for DataFrame-based API.

This PR allows the Binary/MultilclassClassification/Regression Evaluator 
return a corresponding metrics instance, and users can use the metrics instance 
to access multiple metrics after a single pass, while originally Evaluator only 
allows access for single metric.  
```
val evaluator = new 
MulticlassClassificationEvaluator().setMetricName("accuracy")
val metrics = evaluator.getMetrics(df)
metrics.accuracy
metrics.weightedFMeasure
metrics.weightedPrecision
metrics.weightedRecall
```

To make the review easier, the current PR only includes metrics in the 
Evaluator to reach function parity. Plan for further development:
1. Initial API and function parity with ML Evaluators. (This PR)
2. Python API.
2. Function parity with MLlib Metrics.
3. Add requested enhancements like including weight, add per-row metrics, 
add ranking metrics.
4. Reorganize classification Metrics hierarchy, so Binary Classification 
Metrics can support metrics in MultiClassMetrics (accuracy, recall etc.).
5.  Possibly to be used in training summary.

The current implementation is still based on MLlib Metrics, which is kept 
completely internal and can be changed to DataFrame-based calculation when 
necessary.


## How was this patch tested?

new unit tests added.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hhbyyh/spark mlMetrics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20028.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20028


commit 5d5978e5c83bc2037646f40d6db23059af530a15
Author: Yuhao Yang 
Date:   2017-12-20T02:51:19Z

add ml metrics




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20027: Branch 2.2

2017-12-19 Thread Maple-Wang
Github user Maple-Wang commented on the issue:

https://github.com/apache/spark/pull/20027
  
use SparkR in the R shell, the master parameter too old,connot run 
"spark-submit --master yarn --deploy-mode client" . I install R on all node.
when use in this way:
if (nchar(Sys.getenv("SPARK_HOME")) < 1) {
Sys.setenv(SPARK_HOME = "/usr/hdp/2.6.1.0-129/spark2")
}
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", 
"lib")))
sparkR.session(master = "yarn", sparkConfig = list(spark.driver.memory = 
"10g"))
it comes out:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 
(TID 4, node23.nuctech.com, executor 1): java.net.SocketTimeoutException: 
Accept timed out
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
at java.net.ServerSocket.implAccept(ServerSocket.java:545)
at java.net.ServerSocket.accept(ServerSocket.java:513)
at org.apache.spark.api.r.RRunner$.createRWorker(RRunner.scala:372)
at org.apache.spark.api.r.RRunner.compute(RRunner.scala:69)
at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:51)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19218
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85154/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19218
  
**[Test build #85154 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85154/testReport)**
 for PR 19218 at commit 
[`d779ee6`](https://github.com/apache/spark/commit/d779ee6524b199b695c938bbb0c3436c73c64c87).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class ParquetOptions(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19218
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20027: Branch 2.2

2017-12-19 Thread Maple-Wang
GitHub user Maple-Wang opened a pull request:

https://github.com/apache/spark/pull/20027

Branch 2.2

use SparkR in the R shell, the master parameter too old,connot run 
"spark-submit --master yarn --deploy-mode client" . I install R on all node.

when use in this way:
if (nchar(Sys.getenv("SPARK_HOME")) < 1) {
  Sys.setenv(SPARK_HOME = "/usr/hdp/2.6.1.0-129/spark2")
}
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", 
"lib")))
sparkR.session(master = "yarn", sparkConfig = list(spark.driver.memory = 
"10g"))



it comes out:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 
(TID 4, node23.nuctech.com, executor 1): java.net.SocketTimeoutException: 
Accept timed out
at java.net.PlainSocketImpl.socketAccept(Native Method)
at 
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
at java.net.ServerSocket.implAccept(ServerSocket.java:545)
at java.net.ServerSocket.accept(ServerSocket.java:513)
at org.apache.spark.api.r.RRunner$.createRWorker(RRunner.scala:372)
at org.apache.spark.api.r.RRunner.compute(RRunner.scala:69)
at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:51)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20027.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20027


commit 96c04f1edcd53798d9db5a356482248868a0a905
Author: Marcelo Vanzin 
Date:   2017-06-24T05:23:43Z

[SPARK-21159][CORE] Don't try to connect to launcher in standalone cluster 
mode.

Monitoring for standalone cluster mode is not implemented (see 
SPARK-11033), but
the same scheduler implementation is used, and if it tries to connect to the
launcher it will fail. So fix the scheduler so it only tries that in client 
mode;
cluster mode applications will be correctly launched and will work, but 
monitoring
through the launcher handle will not be available.

Tested by running a cluster mode app with "SparkLauncher.startApplication".

Author: Marcelo Vanzin 

Closes #18397 from vanzin/SPARK-21159.

(cherry picked from commit bfd73a7c48b87456d1b84d826e04eca938a1be64)
Signed-off-by: Wenchen Fan 

commit ad44ab5cb9cdaff836c7469d10b00a86a3e46adf
Author: gatorsmile 
Date:   2017-06-24T14:35:59Z

[SPARK-21203][SQL] Fix wrong results of insertion of Array of Struct

### What changes were proposed in this pull request?
```SQL
CREATE TABLE `tab1`
(`custom_fields` ARRAY>)
USING parquet

INSERT INTO `tab1`
SELECT ARRAY(named_struct('id', 1, 'value', 'a'), named_struct('id', 2, 
'value', 'b'))

SELECT custom_fields.id, custom_fields.value FROM tab1
```

The above query always return the last struct of the array, because the 
rule `SimplifyCasts` incorrectly rewrites the query. The underlying cause is we 
always use the same `GenericInternalRow` object when doing the cast.

### How was this patch tested?

Author: gatorsmile 

Closes #18412 from gatorsmile/castStruct.

(cherry picked from commit 2e1586f60a77ea0adb6f3f68ba74323f0c242199)
Signed-off-by: Wenchen Fan 

commit d8e3a4af36f85455548e82ae4acd525f5e52f322
Author: Masha Basmanova 
Date:   2017-06-25T05:49:35Z

[SPARK-21079][SQL] Calculate total size of a partition table as a sum of 
individual partitions

## What changes were proposed in this pull request?

Storage URI of a partitioned table may or may not point to a directory 
under which individual partitions are stored. In fact, individual partitions 
may be located in totally unrelated directories. Before this change, ANALYZE 
TABLE table COMPUTE STATISTICS command calculated total size of a table by 
adding up sizes of files found under table's storage URI. This calculation 
could produce 0 if partitions are stored elsewhere.

This change uses storage URIs of individual partitions to calculate the 
sizes of all partitions of a table and adds these up to produce the total size 
of a table.

CC: wzhfy

## How was this patch tested?

Added unit test.

Ran ANALYZE TABLE xxx COMPUTE STATISTICS on a partitioned Hive table and 
verified t

[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19218
  
**[Test build #85154 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85154/testReport)**
 for PR 19218 at commit 
[`d779ee6`](https://github.com/apache/spark/commit/d779ee6524b199b695c938bbb0c3436c73c64c87).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...

2017-12-19 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19977#discussion_r157937149
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/concat.sql ---
@@ -0,0 +1,21 @@
+-- Concatenate binary inputs
+SELECT (col1 || col2 || col3 || col4) col
+FROM (
+  SELECT
+encode(string(id), 'utf-8') col1,
+encode(string(id + 1), 'utf-8') col2,
+encode(string(id + 2), 'utf-8') col3,
+encode(string(id + 3), 'utf-8') col4
+  FROM range(10)
+);
+
+-- Concatenate mixed inputs between strings and binary
+SELECT (col1 || col2 || col3 || col4) col
+FROM (
+  SELECT
+string(id) col1,
+string(id + 1) col2,
+encode(string(id + 2), 'utf-8') col3,
+encode(string(id + 3), 'utf-8') col4
+  FROM range(10)
+);
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20019: [SPARK-22361][SQL][TEST] Add unit test for Window Frames

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20019
  
**[Test build #85153 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85153/testReport)**
 for PR 20019 at commit 
[`87e61ce`](https://github.com/apache/spark/commit/87e61cee4c2d8aab2a95fd9123b414736de5d3f1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2017-12-19 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19218
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20019: [SPARK-22361][SQL][TEST] Add unit test for Window Frames

2017-12-19 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20019
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19805: [SPARK-22649][PYTHON][SQL] Adding localCheckpoint...

2017-12-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19805


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...

2017-12-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19977#discussion_r157936581
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -566,6 +568,21 @@ object TypeCoercion {
 }
   }
 
+  /**
+   * When all inputs in [[Concat]] are binary, coerces an output type to 
binary
+   */
+  case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule {
+override protected def coerceTypes(
+plan: LogicalPlan): LogicalPlan = plan transformExpressionsUp {
--- End diff --

This is in the big batch of analyzer rules: `Batch("Resolution")`. This 
rule will be executed without the stop condition. 

cc @cloud-fan Should we do it in `Batch("Finish Analysis")`? Any better 
idea?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19805: [SPARK-22649][PYTHON][SQL] Adding localCheckpoint to Dat...

2017-12-19 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19805
  
LGTM

Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20026: [SPARK-22838][Core] Avoid unnecessary copying of data

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20026
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85151/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20026: [SPARK-22838][Core] Avoid unnecessary copying of data

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20026
  
**[Test build #85151 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85151/testReport)**
 for PR 20026 at commit 
[`51c32c8`](https://github.com/apache/spark/commit/51c32c8c6cc89a3249b3ef856c41f3c238b59f4a).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20026: [SPARK-22838][Core] Avoid unnecessary copying of data

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20026
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20026: [SPARK-22838][Core] Avoid unnecessary copying of ...

2017-12-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20026#discussion_r157935911
  
--- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala ---
@@ -208,7 +209,7 @@ private class EncryptedBlockData(
 conf: SparkConf,
 key: Array[Byte]) extends BlockData {
 
-  override def toInputStream(): InputStream = 
Channels.newInputStream(open())
+  override def toInputStream(): InputStream = new 
NioBufferedFileInputStream(file)
--- End diff --

will `NioBufferedFileInputStream` release its memory automatically?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19872: WIP: [SPARK-22274][PySpark] User-defined aggregation fun...

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19872
  
**[Test build #85152 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85152/testReport)**
 for PR 19872 at commit 
[`ea5d6f3`](https://github.com/apache/spark/commit/ea5d6f319aa3b1bba20ad86a51e6efb65658e3d2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...

2017-12-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19977#discussion_r157935878
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/concat.sql ---
@@ -0,0 +1,21 @@
+-- Concatenate binary inputs
--- End diff --

Add more cases?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20026: [SPARK-22838][Core] Avoid unnecessary copying of data

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20026
  
**[Test build #85151 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85151/testReport)**
 for PR 20026 at commit 
[`51c32c8`](https://github.com/apache/spark/commit/51c32c8c6cc89a3249b3ef856c41f3c238b59f4a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...

2017-12-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19977#discussion_r157935899
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/concat.sql ---
@@ -0,0 +1,21 @@
+-- Concatenate binary inputs
+SELECT (col1 || col2 || col3 || col4) col
+FROM (
+  SELECT
+encode(string(id), 'utf-8') col1,
+encode(string(id + 1), 'utf-8') col2,
+encode(string(id + 2), 'utf-8') col3,
+encode(string(id + 3), 'utf-8') col4
+  FROM range(10)
+);
+
+-- Concatenate mixed inputs between strings and binary
+SELECT (col1 || col2 || col3 || col4) col
+FROM (
+  SELECT
+string(id) col1,
+string(id + 1) col2,
+encode(string(id + 2), 'utf-8') col3,
+encode(string(id + 3), 'utf-8') col4
+  FROM range(10)
+);
--- End diff --

Add more cases?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20026: [SPARK-22838][Core] Avoid unnecessary copying of data

2017-12-19 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20026
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...

2017-12-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19977#discussion_r157935707
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -635,6 +635,11 @@ object SimplifyCaseConversionExpressions extends 
Rule[LogicalPlan] {
 
 /**
  * Combine nested [[Concat]] expressions.
+ *
+ * If `spark.sql.expression.concat.binaryMode.enabled` is true and all 
inputs are binary,
--- End diff --

also update this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20026: [SPARK-22838][Core] Avoid unnecessary copying of data

2017-12-19 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20026
  
cc @jiangxb1987 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20021
  
**[Test build #85150 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85150/testReport)**
 for PR 20021 at commit 
[`3d44195`](https://github.com/apache/spark/commit/3d44195f48c1688d7dc5b87fd0c9f07c1535000b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...

2017-12-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19977#discussion_r157935282
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1044,6 +1044,12 @@ object SQLConf {
 "When this conf is not set, the value from 
`spark.redaction.string.regex` is used.")
   
.fallbackConf(org.apache.spark.internal.config.STRING_REDACTION_PATTERN)
 
+  val CONCAT_BINARY_MODE_ENABLED = 
buildConf("spark.sql.expression.concat.binaryMode.enabled")
--- End diff --

> spark.sql.function.concatBinaryAsString


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for FunctionArgumentConv...

2017-12-19 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20008
  
functionArgumentConversion.sql has too many combinations. Let us get rid of 
this one?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20015: [SPARK-22829] Add new built-in function date_trun...

2017-12-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20015


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20014: [SPARK-22827][CORE] Avoid throwing OutOfMemoryErr...

2017-12-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20014


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20014: [SPARK-22827][CORE] Avoid throwing OutOfMemoryError in c...

2017-12-19 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20014
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()

2017-12-19 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20015
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()

2017-12-19 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20015
  
Thanks! Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for FunctionArgumentConv...

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20008
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for FunctionArgumentConv...

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20008
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85143/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for FunctionArgumentConv...

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20008
  
**[Test build #85143 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85143/testReport)**
 for PR 20008 at commit 
[`16f0a99`](https://github.com/apache/spark/commit/16f0a996c694151bbc743bb49b8f1e2439c69a57).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19954
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...

2017-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19954
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85142/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...

2017-12-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19954
  
**[Test build #85142 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85142/testReport)**
 for PR 19954 at commit 
[`3407d7a`](https://github.com/apache/spark/commit/3407d7af68c44b100558e49e4012a27e41b29dda).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19872: WIP: [SPARK-22274][PySpark] User-defined aggregation fun...

2017-12-19 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/19872
  
@ramacode2014 Hi, I'm not sure why you received notifications from this PR, 
but I guess you can unsubscribe by the "Unsubscribe" button in the right column 
of this page. Sorry for the inconvenience. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >