[GitHub] spark pull request #21193: [SPARK-24121][SQL] Add API for handling expressio...

2018-05-03 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21193#discussion_r185995523
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala
 ---
@@ -112,6 +112,112 @@ object JavaCode {
   def isNullExpression(code: String): SimpleExprValue = {
 expression(code, BooleanType)
   }
+
+  def block(code: String): Block = {
+CodeBlock(codeParts = Seq(code), blockInputs = Seq.empty)
+  }
+}
+
+/**
+ * A trait representing a block of java code.
+ */
+trait Block extends JavaCode {
+
+  // The expressions to be evaluated inside this block.
+  def exprValues: Seq[ExprValue]
+
+  // This will be called during string interpolation.
+  override def toString: String = _marginChar match {
+case Some(c) => code.stripMargin(c)
+case _ => code
+  }
+
+  var _marginChar: Option[Char] = None
+
+  def stripMargin(c: Char): this.type = {
+_marginChar = Some(c)
+this
+  }
+
+  def stripMargin: this.type = {
+_marginChar = Some('|')
+this
+  }
+
+  def + (other: Block): Block
+}
+
+object Block {
+  implicit def blockToString(block: Block): String = block.toString
+
+  implicit def blocksToBlock(blocks: Seq[Block]): Block = Blocks(blocks)
+
+  implicit class BlockHelper(val sc: StringContext) extends AnyVal {
+def code(args: Any*): Block = {
+  sc.checkLengths(args)
+  if (sc.parts.length == 0) {
+EmptyBlock
+  } else {
+args.foreach {
+  case _: ExprValue =>
+  case _: Int | _: Long | _: Float | _: Double | _: String =>
+  case _: Block =>
+  case other => throw new IllegalArgumentException(
+s"Can not interpolate ${other.getClass.getName} into code 
block.")
+}
+CodeBlock(sc.parts, args)
+  }
+}
+  }
+}
+
+/**
+ * A block of java code. Including a sequence of code parts and some 
inputs to this block.
+ * The actual java code is generated by embedding the inputs into the code 
parts.
+ */
+case class CodeBlock(codeParts: Seq[String], blockInputs: Seq[Any]) 
extends Block {
+  override def exprValues: Seq[ExprValue] = {
+blockInputs.flatMap {
+  case b: Block => b.exprValues
+  case e: ExprValue => Seq(e)
+  case _ => Seq.empty
+}
+  }
+
+  override def code: String = {
+val strings = codeParts.iterator
+val inputs = blockInputs.iterator
+var buf = new StringBuffer(strings.next)
+while (strings.hasNext) {
+  buf append inputs.next
+  buf append strings.next
+}
+buf.toString
+  }
+
+  override def + (other: Block): Block = other match {
+case c: CodeBlock => Blocks(Seq(this, c))
+case b: Blocks => Blocks(Seq(this) ++ b.blocks)
+case EmptyBlock => this
+  }
+}
+
+case class Blocks(blocks: Seq[Block]) extends Block {
+  override def exprValues: Seq[ExprValue] = blocks.flatMap(_.exprValues)
+  override def code: String = blocks.map(_.toString).mkString
--- End diff --

I am just curious whether `mkString` may lead to 64KB limit issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21232: [SPARK-23489][SQL][TEST][BRANCH-2.2] HiveExternalCatalog...

2018-05-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21232
  
Wow. Thank you for informing that, @HyukjinKwon !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21232: [SPARK-23489][SQL][TEST][BRANCH-2.2] HiveExternal...

2018-05-03 Thread dongjoon-hyun
Github user dongjoon-hyun closed the pull request at:

https://github.com/apache/spark/pull/21232


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21190
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2899/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21190
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90171/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21061
  
**[Test build #90171 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90171/testReport)**
 for PR 21061 at commit 
[`bc2e4c7`](https://github.com/apache/spark/commit/bc2e4c7b781bbdcad4fe338b4d235887302da791).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21190
  
**[Test build #90182 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90182/testReport)**
 for PR 21190 at commit 
[`c27267d`](https://github.com/apache/spark/commit/c27267d49679fefb161ba4884bb26f7eed3fc8f1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21220: [SPARK-24157][SS] Enabled no-data batches in MicroBatchE...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21220
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90172/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21220: [SPARK-24157][SS] Enabled no-data batches in MicroBatchE...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21220
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21220: [SPARK-24157][SS] Enabled no-data batches in MicroBatchE...

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21220
  
**[Test build #90172 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90172/testReport)**
 for PR 21220 at commit 
[`da3fd2f`](https://github.com/apache/spark/commit/da3fd2f8510482e3e71cc37a9da2207e3aef1ef0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21190
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21229
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21229
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2897/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21190
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2898/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21230: [SPARK-24172][SQL] we should not apply operator p...

2018-05-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21230#discussion_r185991248
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala
 ---
@@ -23,49 +23,46 @@ import 
org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project
 import org.apache.spark.sql.catalyst.rules.Rule
 
 object PushDownOperatorsToDataSource extends Rule[LogicalPlan] {
-  override def apply(
-  plan: LogicalPlan): LogicalPlan = plan transformUp {
-// PhysicalOperation guarantees that filters are deterministic; no 
need to check
-case PhysicalOperation(project, newFilters, relation : 
DataSourceV2Relation) =>
-  // merge the filters
-  val filters = relation.filters match {
-case Some(existing) =>
-  existing ++ newFilters
-case _ =>
-  newFilters
-  }
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+var pushed = false
+plan transformDown {
+  // PhysicalOperation guarantees that filters are deterministic; no 
need to check
+  case PhysicalOperation(project, filters, relation: 
DataSourceV2Relation) if !pushed =>
--- End diff --

`PhysicalOperation` just accumulates project and filter above a specific 
node, if we transform down a tree, and only transform once, we will never hit 
`PhysicalOperation` move than once.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21190
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21190
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2896/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21190
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21190
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90181/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21190
  
**[Test build #90181 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90181/testReport)**
 for PR 21190 at commit 
[`c0b1095`](https://github.com/apache/spark/commit/c0b109529b97163edd224bbf770a8caa20246b0a).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class WidenSetOperationTypes(conf: SQLConf) extends 
Rule[LogicalPlan] `
  * `  case class FunctionArgumentConversion(conf: SQLConf) extends 
TypeCoercionRule `
  * `  case class CaseWhenCoercion(conf: SQLConf) extends TypeCoercionRule `
  * `  case class IfCoercion(conf: SQLConf) extends TypeCoercionRule `
  * `  case class ImplicitTypeCasts(conf: SQLConf) extends TypeCoercionRule 
`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21190
  
**[Test build #90181 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90181/testReport)**
 for PR 21190 at commit 
[`c0b1095`](https://github.com/apache/spark/commit/c0b109529b97163edd224bbf770a8caa20246b0a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21224: [SPARK-24167][SQL] ParquetFilters should not access SQLC...

2018-05-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21224
  
I realized #21086 is only in master, so this bug doesn't exist in 2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21229
  
**[Test build #90180 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90180/testReport)**
 for PR 21229 at commit 
[`a2d81df`](https://github.com/apache/spark/commit/a2d81dfa5f5b2cf58413f3e39717bf436c956d70).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...

2018-05-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21229
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21190: [SPARK-22938][SQL][followup] Assert that SQLConf.get is ...

2018-05-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21190
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21222: [SPARK-24161][SS] Enable debug package feature on struct...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21222
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90168/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21222: [SPARK-24161][SS] Enable debug package feature on struct...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21222
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21229
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21222: [SPARK-24161][SS] Enable debug package feature on struct...

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21222
  
**[Test build #90168 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90168/testReport)**
 for PR 21222 at commit 
[`c1ad1c5`](https://github.com/apache/spark/commit/c1ad1c557e6165455457adb6f148d6d9616548a1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  implicit class DebugStreamQuery(query: StreamingQuery) extends 
Logging `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21229
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90170/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21229
  
**[Test build #90170 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90170/testReport)**
 for PR 21229 at commit 
[`a2d81df`](https://github.com/apache/spark/commit/a2d81dfa5f5b2cf58413f3e39717bf436c956d70).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21232: [SPARK-23489][SQL][TEST][BRANCH-2.2] HiveExternalCatalog...

2018-05-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21232
  
and actually I think he already merged this into branch-2.2 - 
https://github.com/apache/spark/commits/branch-2.2 :-)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21224: [SPARK-24167][SQL] ParquetFilters should not acce...

2018-05-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/21224#discussion_r185988219
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
@@ -342,6 +342,7 @@ class ParquetFileFormat
   sparkSession.sessionState.conf.parquetFilterPushDown
 // Whole stage codegen (PhysicalRDD) is able to deal with batches 
directly
 val returningBatch = supportBatch(sparkSession, resultSchema)
+val pushDownDate = sqlConf.parquetFilterPushDownDate
--- End diff --

Ah, I see. Thank you, @cloud-fan !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21186: [SPARK-22279][SPARK-24112] Enable `convertMetasto...

2018-05-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/21186#discussion_r185988150
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1812,6 +1812,9 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
   - Since Spark 2.4, creating a managed table with nonempty location is 
not allowed. An exception is thrown when attempting to create a managed table 
with nonempty location. To set `true` to 
`spark.sql.allowCreatingManagedTableUsingNonemptyLocation` restores the 
previous behavior. This option will be removed in Spark 3.0.
   - Since Spark 2.4, the type coercion rules can automatically promote the 
argument types of the variadic SQL functions (e.g., IN/COALESCE) to the widest 
common type, no matter how the input arguments order. In prior Spark versions, 
the promotion could fail in some specific orders (e.g., TimestampType, 
IntegerType and StringType) and throw an exception.
   - In version 2.3 and earlier, `to_utc_timestamp` and 
`from_utc_timestamp` respect the timezone in the input timestamp string, which 
breaks the assumption that the input timestamp is in a specific timezone. 
Therefore, these 2 functions can return unexpected results. In version 2.4 and 
later, this problem has been fixed. `to_utc_timestamp` and `from_utc_timestamp` 
will return null if the input timestamp string contains timezone. As an 
example, `from_utc_timestamp('2000-10-10 00:00:00', 'GMT+1')` will return 
`2000-10-10 01:00:00` in both Spark 2.3 and 2.4. However, 
`from_utc_timestamp('2000-10-10 00:00:00+00:00', 'GMT+1')`, assuming a local 
timezone of GMT+8, will return `2000-10-10 09:00:00` in Spark 2.3 but `null` in 
2.4. For people who don't care about this problem and want to retain the 
previous behaivor to keep their query unchanged, you can set 
`spark.sql.function.rejectTimezoneInString` to false. This option will be 
removed in Spark 3.0 and should only be used as a tempora
 ry workaround.
+  - Since Spark 2.4, Spark uses its own ORC support by default instead of 
Hive SerDe for better performance during Hive metastore table access. To set 
`false` to `spark.sql.hive.convertMetastoreOrc` restores the previous behavior.
+  - Since Spark 2.4, Spark supports table properties while converting 
Parquet/ORC Hive tables. To set `false` to 
`spark.sql.hive.convertMetastoreTableProperty` restores the previous behavior.
--- End diff --

Sure!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21232: [SPARK-23489][SQL][TEST][BRANCH-2.2] HiveExternalCatalog...

2018-05-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21232
  
Thank you for review, @gatorsmile and @HyukjinKwon !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

2018-05-03 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21158
  
The default value introduced by SPARK-22479 was already part of Apache 
Spark 2.3. It is hard to say this is a regression, since URIs could contain the 
access keys and usernames could have potentially personal identifiable 
information. Thus, it also makes sense to block them. Although the original 
JIRA is for JDBC, we also make it more secure. If users do not want to redact 
them, they can change the default. I am not feeling comfortable to not redact 
them in the current situation, since we already did it in Spark 2.3.  All the 
security issues could be serious. Thus, I think we should make it unchanged. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21217: [SPARK-24151][SQL] Fix CURRENT_DATE, CURRENT_TIMESTAMP t...

2018-05-03 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21217
  
@jamesthomp Could you document the behavior change in the migration guide? 
https://github.com/apache/spark/blame/master/docs/sql-programming-guide.md#L1802


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18447
  
**[Test build #90179 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90179/testReport)**
 for PR 18447 at commit 
[`d18d14f`](https://github.com/apache/spark/commit/d18d14fe5ee19161c9d56284e051d5da88414094).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21135: [SPARK-24060][TEST] StreamingSymmetricHashJoinHelperSuit...

2018-05-03 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21135
  
@pwoody Could you show me how to reproduce the issue?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18447
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18447
  
**[Test build #90178 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90178/testReport)**
 for PR 18447 at commit 
[`704ad80`](https://github.com/apache/spark/commit/704ad80cb19f692ce44d2c509dbdf915e858d79f).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18447
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90178/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18447
  
**[Test build #90178 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90178/testReport)**
 for PR 18447 at commit 
[`704ad80`](https://github.com/apache/spark/commit/704ad80cb19f692ce44d2c509dbdf915e858d79f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21230: [SPARK-24172][SQL] we should not apply operator p...

2018-05-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21230#discussion_r185986217
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala
 ---
@@ -23,49 +23,46 @@ import 
org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project
 import org.apache.spark.sql.catalyst.rules.Rule
 
 object PushDownOperatorsToDataSource extends Rule[LogicalPlan] {
-  override def apply(
-  plan: LogicalPlan): LogicalPlan = plan transformUp {
-// PhysicalOperation guarantees that filters are deterministic; no 
need to check
-case PhysicalOperation(project, newFilters, relation : 
DataSourceV2Relation) =>
-  // merge the filters
-  val filters = relation.filters match {
-case Some(existing) =>
-  existing ++ newFilters
-case _ =>
-  newFilters
-  }
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+var pushed = false
+plan transformDown {
+  // PhysicalOperation guarantees that filters are deterministic; no 
need to check
+  case PhysicalOperation(project, filters, relation: 
DataSourceV2Relation) if !pushed =>
--- End diff --

Is that possible one plan has multiple `PhysicalOperation`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21207: [SPARK-24136][SS] Fix MemoryStreamDataReader.next to ski...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21207
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21207: [SPARK-24136][SS] Fix MemoryStreamDataReader.next to ski...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21207
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90165/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21135: [SPARK-24060][TEST] StreamingSymmetricHashJoinHelperSuit...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21135
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90166/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21135: [SPARK-24060][TEST] StreamingSymmetricHashJoinHelperSuit...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21135
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21135: [SPARK-24060][TEST] StreamingSymmetricHashJoinHelperSuit...

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21135
  
**[Test build #90166 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90166/testReport)**
 for PR 21135 at commit 
[`dbaa00c`](https://github.com/apache/spark/commit/dbaa00ca99161cd2820b9a74aebac9665c626581).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21207: [SPARK-24136][SS] Fix MemoryStreamDataReader.next to ski...

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21207
  
**[Test build #90165 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90165/testReport)**
 for PR 21207 at commit 
[`4daa548`](https://github.com/apache/spark/commit/4daa548c470a6e5d5de477a10661c4d838c520f5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21208: [SPARK-23925][SQL] Add array_repeat collection function

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21208
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90164/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21208: [SPARK-23925][SQL] Add array_repeat collection function

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21208
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18447
  
**[Test build #90177 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90177/testReport)**
 for PR 18447 at commit 
[`a1bb258`](https://github.com/apache/spark/commit/a1bb2586f9db2e82167140d7565a73629b3b996b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21208: [SPARK-23925][SQL] Add array_repeat collection function

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21208
  
**[Test build #90164 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90164/testReport)**
 for PR 21208 at commit 
[`7a8610f`](https://github.com/apache/spark/commit/7a8610fa25ed1cec4b5b385694a3e24d66e0ccdb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21195: [Spark-23975][ML] Add support of array input for ...

2018-05-03 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21195#discussion_r185984894
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/clustering/LDASuite.scala ---
@@ -323,4 +324,44 @@ class LDASuite extends SparkFunSuite with 
MLlibTestSparkContext with DefaultRead
   assert(model.getOptimizer === optimizer)
 }
   }
+
+  test("LDA with Array input") {
+val featuresColNameD = "array_double_features"
+val featuresColNameF = "array_float_features"
+val doubleUDF = udf { (features: Vector) =>
+  val featureArray = Array.fill[Double](features.size)(0.0)
+  features.foreachActive((idx, value) => featureArray(idx) = 
value.toFloat)
+  featureArray
+}
+val floatUDF = udf { (features: Vector) =>
+  val featureArray = Array.fill[Float](features.size)(0.0f)
+  features.foreachActive((idx, value) => featureArray(idx) = 
value.toFloat)
+  featureArray
+}
+val newdatasetD = dataset.withColumn(featuresColNameD, 
doubleUDF(col("features")))
+  .drop("features")
+val newdatasetF = dataset.withColumn(featuresColNameF, 
floatUDF(col("features")))
+  .drop("features")
+assert(newdatasetD.schema(featuresColNameD).dataType.equals(new 
ArrayType(DoubleType, false)))
+assert(newdatasetF.schema(featuresColNameF).dataType.equals(new 
ArrayType(FloatType, false)))
+
+val ldaD = new LDA().setK(k).setOptimizer("online")
+  .setMaxIter(1).setFeaturesCol(featuresColNameD).setSeed(1)
+val ldaF = new LDA().setK(k).setOptimizer("online").
+  setMaxIter(1).setFeaturesCol(featuresColNameF).setSeed(1)
+val modelD = ldaD.fit(newdatasetD)
+val modelF = ldaF.fit(newdatasetF)
+
+// logLikelihood, logPerplexity
+val llD = modelD.logLikelihood(newdatasetD)
+val llF = modelF.logLikelihood(newdatasetF)
+// assert(llD == llF)
+assert(llD <= 0.0 && llD != Double.NegativeInfinity)
+assert(llF <= 0.0 && llF != Double.NegativeInfinity)
+val lpD = modelD.logPerplexity(newdatasetD)
+val lpF = modelF.logPerplexity(newdatasetF)
+// assert(lpD == lpF)
+assert(lpD >= 0.0 && lpD != Double.NegativeInfinity)
+assert(lpF >= 0.0 && lpF != Double.NegativeInfinity)
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21021
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21195: [Spark-23975][ML] Add support of array input for ...

2018-05-03 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21195#discussion_r185983646
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala 
---
@@ -182,6 +184,40 @@ class BisectingKMeansSuite
 
 model.clusterCenters.forall(Vectors.norm(_, 2) == 1.0)
   }
+
+  test("BisectingKMeans with Array input") {
+val featuresColNameD = "array_double_features"
+val featuresColNameF = "array_float_features"
+val doubleUDF = udf { (features: Vector) =>
+  val featureArray = Array.fill[Double](features.size)(0.0)
+  features.foreachActive((idx, value) => featureArray(idx) = 
value.toFloat)
+  featureArray
+}
+val floatUDF = udf { (features: Vector) =>
+  val featureArray = Array.fill[Float](features.size)(0.0f)
+  features.foreachActive((idx, value) => featureArray(idx) = 
value.toFloat)
+  featureArray
+}
+val newdatasetD = dataset.withColumn(featuresColNameD, 
doubleUDF(col("features")))
+  .drop("features")
--- End diff --

* Unnecessary to drop `features`. Or you can simply replace the features 
column:

~~~scala
val newdatasetD = dataset.withColumn(FEATURES, doubleUDF(col(FEATURES)))
~~~


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21195: [Spark-23975][ML] Add support of array input for ...

2018-05-03 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21195#discussion_r185984527
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/clustering/GaussianMixtureSuite.scala 
---
@@ -256,6 +258,42 @@ class GaussianMixtureSuite extends SparkFunSuite with 
MLlibTestSparkContext
 val expectedMatrix = GaussianMixture.unpackUpperTriangularMatrix(4, 
triangularValues)
 assert(symmetricMatrix === expectedMatrix)
   }
+
+  test("GaussianMixture with Array input") {
+val featuresColNameD = "array_double_features"
+val featuresColNameF = "array_float_features"
+val doubleUDF = udf { (features: Vector) =>
+  val featureArray = Array.fill[Double](features.size)(0.0)
+  features.foreachActive((idx, value) => featureArray(idx) = 
value.toFloat)
+  featureArray
+}
+val floatUDF = udf { (features: Vector) =>
+  val featureArray = Array.fill[Float](features.size)(0.0f)
+  features.foreachActive((idx, value) => featureArray(idx) = 
value.toFloat)
+  featureArray
+}
+val newdatasetD = dataset.withColumn(featuresColNameD, 
doubleUDF(col("features")))
+  .drop("features")
+val newdatasetF = dataset.withColumn(featuresColNameF, 
floatUDF(col("features")))
+  .drop("features")
+assert(newdatasetD.schema(featuresColNameD).dataType.equals(new 
ArrayType(DoubleType, false)))
+assert(newdatasetF.schema(featuresColNameF).dataType.equals(new 
ArrayType(FloatType, false)))
+
+val gmD = new GaussianMixture().setK(k).setMaxIter(1)
+  .setFeaturesCol(featuresColNameD).setSeed(1)
+val gmF = new GaussianMixture().setK(k).setMaxIter(1)
+  .setFeaturesCol(featuresColNameF).setSeed(1)
+val modelD = gmD.fit(newdatasetD)
+val modelF = gmF.fit(newdatasetF)
+val transformedD = modelD.transform(newdatasetD)
+val transformedF = modelF.transform(newdatasetF)
+val predictDifference = transformedD.select("prediction")
+  .except(transformedF.select("prediction"))
+assert(predictDifference.count() == 0)
+val probabilityDifference = transformedD.select("probability")
+  .except(transformedF.select("probability"))
+assert(probabilityDifference.count() == 0)
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21195: [Spark-23975][ML] Add support of array input for ...

2018-05-03 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21195#discussion_r185971647
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala 
---
@@ -182,6 +184,40 @@ class BisectingKMeansSuite
 
 model.clusterCenters.forall(Vectors.norm(_, 2) == 1.0)
   }
+
+  test("BisectingKMeans with Array input") {
+val featuresColNameD = "array_double_features"
+val featuresColNameF = "array_float_features"
+val doubleUDF = udf { (features: Vector) =>
+  val featureArray = Array.fill[Double](features.size)(0.0)
+  features.foreachActive((idx, value) => featureArray(idx) = 
value.toFloat)
--- End diff --

* If `.toFloat` is to keep the same precision, we should leave an inline 
comment.
* `features.toArray.map(_.toFloat.toDouble)` should do the work.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21195: [Spark-23975][ML] Add support of array input for ...

2018-05-03 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21195#discussion_r185984500
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala 
---
@@ -182,6 +184,40 @@ class BisectingKMeansSuite
 
 model.clusterCenters.forall(Vectors.norm(_, 2) == 1.0)
   }
+
+  test("BisectingKMeans with Array input") {
+val featuresColNameD = "array_double_features"
+val featuresColNameF = "array_float_features"
+val doubleUDF = udf { (features: Vector) =>
+  val featureArray = Array.fill[Double](features.size)(0.0)
+  features.foreachActive((idx, value) => featureArray(idx) = 
value.toFloat)
+  featureArray
+}
+val floatUDF = udf { (features: Vector) =>
+  val featureArray = Array.fill[Float](features.size)(0.0f)
+  features.foreachActive((idx, value) => featureArray(idx) = 
value.toFloat)
+  featureArray
+}
+val newdatasetD = dataset.withColumn(featuresColNameD, 
doubleUDF(col("features")))
+  .drop("features")
+val newdatasetF = dataset.withColumn(featuresColNameF, 
floatUDF(col("features")))
+  .drop("features")
+assert(newdatasetD.schema(featuresColNameD).dataType.equals(new 
ArrayType(DoubleType, false)))
+assert(newdatasetF.schema(featuresColNameF).dataType.equals(new 
ArrayType(FloatType, false)))
+
+val bkmD = new BisectingKMeans()
+  .setK(k).setMaxIter(1).setFeaturesCol(featuresColNameD).setSeed(1)
+val bkmF = new BisectingKMeans()
+  .setK(k).setMaxIter(1).setFeaturesCol(featuresColNameF).setSeed(1)
+val modelD = bkmD.fit(newdatasetD)
+val modelF = bkmF.fit(newdatasetF)
+val transformedD = modelD.transform(newdatasetD)
+val transformedF = modelF.transform(newdatasetF)
+val predictDifference = transformedD.select("prediction")
+  .except(transformedF.select("prediction"))
+assert(predictDifference.count() == 0)
--- End diff --

This only verifies it handles `Array[Double]` and `Array[Float]` the same 
way. But it doesn't guarantee that the result is correct. We can define a 
method that takes a dataset, apply one iteration, and return the cost.

~~~scala
def trainAndComputeCost(dataset: DataFrame): Double = {
  val model = new BisectingKMeans()
.setK(k).setMaxIter(1).setSeed(1)
.fit(dataset)
  model.computeCost(dataset)
}

val trueCost = trainAndComputeCost(dataset)
val floatArrayCost = trainAndComputeCost(newDatasetF)
assert(floatArrayCost === trueCost)
val doubleArrayCost = trainAndComputeCost(newDatasetD)
assert(doubleArrayCost === trueCost)
~~~

We can map the original dataset to single precision to have exact match. Or 
we can test equality with a threshold. See 
https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/mllib/util/TestingUtils.scala


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21195: [Spark-23975][ML] Add support of array input for ...

2018-05-03 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21195#discussion_r185971385
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/SchemaUtils.scala 
---
@@ -101,4 +102,17 @@ private[spark] object SchemaUtils {
 require(!schema.fieldNames.contains(col.name), s"Column ${col.name} 
already exists.")
 StructType(schema.fields :+ col)
   }
+
+  /**
+   * Check whether the given column in the schema is one of the supporting 
vector type: Vector,
+   * Array[Dloat]. Array[Double]
--- End diff --

nit: Float


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21021
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90163/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21021
  
**[Test build #90163 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90163/testReport)**
 for PR 21021 at commit 
[`e3fcaaa`](https://github.com/apache/spark/commit/e3fcaaa7e44c09ead1d395bd5be78de29305c4b2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21193
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90169/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21193
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21193
  
**[Test build #90169 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90169/testReport)**
 for PR 21193 at commit 
[`53daaf3`](https://github.com/apache/spark/commit/53daaf36587b29b542e10a473f6909c1f7a20b63).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21173: [SPARK-23856][SQL] Add an option `queryTimeout` in JDBCO...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21173
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21173: [SPARK-23856][SQL] Add an option `queryTimeout` in JDBCO...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21173
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90173/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21173: [SPARK-23856][SQL] Add an option `queryTimeout` in JDBCO...

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21173
  
**[Test build #90173 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90173/testReport)**
 for PR 21173 at commit 
[`35759ca`](https://github.com/apache/spark/commit/35759ca256df2bd757cb8c85f96ebe4ae9126227).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21193
  
**[Test build #90176 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90176/testReport)**
 for PR 21193 at commit 
[`d138ee0`](https://github.com/apache/spark/commit/d138ee05541bc1976442c5f1170ba7a7f3f5114c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21193
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21193
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2895/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...

2018-05-03 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21193
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21222: [SPARK-24161][SS] Enable debug package feature on...

2018-05-03 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21222#discussion_r185981329
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala ---
@@ -88,23 +100,62 @@ package object debug {
 }
   }
 
+  /**
+   * Get WholeStageCodegenExec subtrees and the codegen in a query plan 
into one String
+   *
+   * @param query the streaming query for codegen
+   * @return single String containing all WholeStageCodegen subtrees and 
corresponding codegen
+   */
+  def codegenString(query: StreamingQuery): String = {
+val msg = query match {
+  case w: StreamExecution if w.lastExecution != null =>
--- End diff --

I think this can be rewritten as if `case w: StreamExecution => if 
(w.lastExecution != null) ... else ...` 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21212: [SPARK-24143] filter empty blocks when convert mapstatus...

2018-05-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21212
  
> do you mean optimize space usage for MapStatus when there are lots of 
consecutive empty-blocks ?

Yea, something like doing an RLE for the size array in 
`CompressedMapStatus`. But this can be done as a followup.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21193
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21193
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90174/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21193
  
**[Test build #90174 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90174/testReport)**
 for PR 21193 at commit 
[`d138ee0`](https://github.com/apache/spark/commit/d138ee05541bc1976442c5f1170ba7a7f3f5114c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21212: [SPARK-24143] filter empty blocks when convert ma...

2018-05-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21212#discussion_r185980680
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala 
---
@@ -267,28 +269,28 @@ final class ShuffleBlockFetcherIterator(
 // at most maxBytesInFlight in order to limit the amount of data in 
flight.
 val remoteRequests = new ArrayBuffer[FetchRequest]
 
-// Tracks total number of blocks (including zero sized blocks)
-var totalBlocks = 0
 for ((address, blockInfos) <- blocksByAddress) {
-  totalBlocks += blockInfos.size
   if (address.executorId == blockManager.blockManagerId.executorId) {
-// Filter out zero-sized blocks
-localBlocks ++= blockInfos.filter(_._2 != 0).map(_._1)
+blockInfos.find(_._2 < 0) match {
--- End diff --

shall we use `_._2 < 0` to make sure the 0-size blocks are filtered?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21095: [SPARK-23529][K8s] Support mounting hostPath volu...

2018-05-03 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21095#discussion_r185961170
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala
 ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s
+
+import scala.collection.mutable.HashMap
+
+import io.fabric8.kubernetes.api.model._
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+
+private[spark] object KubernetesVolumeUtils {
+
+  /**
+   * Given hostPath volume specs, add volume to pod and volume mount to 
container.
+   *
+   * @param pod original specification of the pod
+   * @param container original specification of the container
+   * @param sparkConf Spark configuration
+   * @param prefix the prefix for volume configuration
+   * @return a tuple of (pod with the volume(s) added, container with 
mount(s) added)
+   */
+  def addVolumes(
+  pod: Pod,
+  container: Container,
+  sparkConf: SparkConf,
+  prefix : String): (Pod, Container) = {
+val hostPathVolumeSpecs = parseHostPathVolumesWithPrefix(sparkConf, 
prefix)
+addHostPathVolumes(pod, container, hostPathVolumeSpecs)
+  }
+
+  /**
+   * Extract Spark volume configuration properties with a given name 
prefix.
+   *
+   * @param sparkConf Spark configuration
+   * @param prefix the given property name prefix
+   * @param volumeTypeKey the given property name prefix
+   * @return a Map storing with volume name as key and spec as value
+   */
+  def parseVolumesWithPrefix(
+  sparkConf: SparkConf,
+  prefix: String,
+  volumeTypeKey: String): Map[String, KubernetesVolumeSpec] = {
+val volumes = HashMap[String, KubernetesVolumeSpec]()
+val properties = 
sparkConf.getAllWithPrefix(s"$prefix$volumeTypeKey.").toList
+// Extract volume names
+properties.foreach {
+  k =>
+val keys = k._1.split("\\.")
+if (keys.nonEmpty && !volumes.contains(keys(0))) {
+  volumes.update(keys(0), KubernetesVolumeSpec.emptySpec())
+}
+}
+// Populate spec
+volumes.foreach {
+  case (name, spec) =>
+properties.foreach {
+  k =>
+k._1.split("\\.") match {
+  case Array(`name`, KUBERNETES_VOLUMES_MOUNT_KEY, 
KUBERNETES_VOLUMES_PATH_KEY) =>
+spec.mountPath = Some(k._2)
+  case Array(`name`, KUBERNETES_VOLUMES_MOUNT_KEY, 
KUBERNETES_VOLUMES_READONLY_KEY) =>
+spec.mountReadOnly = Some(k._2.toBoolean)
+  case Array(`name`, KUBERNETES_VOLUMES_OPTIONS_KEY, option) =>
+spec.optionsSpec.update(option, k._2)
+  case _ =>
+None
+}
+}
+}
+volumes.toMap
+  }
+
+  /**
+   * Extract Spark hostPath volume configuration properties with a given 
name prefix and
+   * return the result as a Map.
+   *
+   * @param sparkConf Spark configuration
+   * @param prefix the given property name prefix
+   * @return a Map storing with volume name as key and spec as value
+   */
+  def parseHostPathVolumesWithPrefix(
+  sparkConf: SparkConf,
+  prefix: String): Map[String, KubernetesVolumeSpec] = {
+parseVolumesWithPrefix(sparkConf, prefix, 
KUBERNETES_VOLUMES_HOSTPATH_KEY)
+  }
+
+  /**
+   * Given hostPath volume specs, add volume to pod and volume mount to 
container.
+   *
+   * @param pod original specification of the pod
+   * @param container original specification of the container
+   * @param volumes list of named volume specs
+   * @return a tuple of (pod with the volume(s) added, container with 
mount(s) added)
+   */
+  def addHostPathVolumes(
+  pod: Pod,
+

[GitHub] spark pull request #21095: [SPARK-23529][K8s] Support mounting hostPath volu...

2018-05-03 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21095#discussion_r185961495
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala
 ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s
+
+import scala.collection.mutable.HashMap
+
+import io.fabric8.kubernetes.api.model._
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+
+private[spark] object KubernetesVolumeUtils {
+
+  /**
+   * Given hostPath volume specs, add volume to pod and volume mount to 
container.
+   *
+   * @param pod original specification of the pod
+   * @param container original specification of the container
+   * @param sparkConf Spark configuration
+   * @param prefix the prefix for volume configuration
+   * @return a tuple of (pod with the volume(s) added, container with 
mount(s) added)
+   */
+  def addVolumes(
+  pod: Pod,
+  container: Container,
+  sparkConf: SparkConf,
+  prefix : String): (Pod, Container) = {
+val hostPathVolumeSpecs = parseHostPathVolumesWithPrefix(sparkConf, 
prefix)
+addHostPathVolumes(pod, container, hostPathVolumeSpecs)
+  }
+
+  /**
+   * Extract Spark volume configuration properties with a given name 
prefix.
+   *
+   * @param sparkConf Spark configuration
+   * @param prefix the given property name prefix
+   * @param volumeTypeKey the given property name prefix
+   * @return a Map storing with volume name as key and spec as value
+   */
+  def parseVolumesWithPrefix(
+  sparkConf: SparkConf,
+  prefix: String,
+  volumeTypeKey: String): Map[String, KubernetesVolumeSpec] = {
+val volumes = HashMap[String, KubernetesVolumeSpec]()
+val properties = 
sparkConf.getAllWithPrefix(s"$prefix$volumeTypeKey.").toList
+// Extract volume names
+properties.foreach {
+  k =>
+val keys = k._1.split("\\.")
+if (keys.nonEmpty && !volumes.contains(keys(0))) {
+  volumes.update(keys(0), KubernetesVolumeSpec.emptySpec())
+}
+}
+// Populate spec
+volumes.foreach {
+  case (name, spec) =>
+properties.foreach {
+  k =>
+k._1.split("\\.") match {
+  case Array(`name`, KUBERNETES_VOLUMES_MOUNT_KEY, 
KUBERNETES_VOLUMES_PATH_KEY) =>
+spec.mountPath = Some(k._2)
+  case Array(`name`, KUBERNETES_VOLUMES_MOUNT_KEY, 
KUBERNETES_VOLUMES_READONLY_KEY) =>
+spec.mountReadOnly = Some(k._2.toBoolean)
+  case Array(`name`, KUBERNETES_VOLUMES_OPTIONS_KEY, option) =>
+spec.optionsSpec.update(option, k._2)
+  case _ =>
+None
+}
+}
+}
+volumes.toMap
+  }
+
+  /**
+   * Extract Spark hostPath volume configuration properties with a given 
name prefix and
+   * return the result as a Map.
+   *
+   * @param sparkConf Spark configuration
+   * @param prefix the given property name prefix
+   * @return a Map storing with volume name as key and spec as value
+   */
+  def parseHostPathVolumesWithPrefix(
+  sparkConf: SparkConf,
+  prefix: String): Map[String, KubernetesVolumeSpec] = {
+parseVolumesWithPrefix(sparkConf, prefix, 
KUBERNETES_VOLUMES_HOSTPATH_KEY)
+  }
+
+  /**
+   * Given hostPath volume specs, add volume to pod and volume mount to 
container.
+   *
+   * @param pod original specification of the pod
+   * @param container original specification of the container
+   * @param volumes list of named volume specs
+   * @return a tuple of (pod with the volume(s) added, container with 
mount(s) added)
+   */
+  def addHostPathVolumes(
+  pod: Pod,
+

[GitHub] spark pull request #21095: [SPARK-23529][K8s] Support mounting hostPath volu...

2018-05-03 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21095#discussion_r185960781
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala
 ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s
+
+import scala.collection.mutable.HashMap
+
+import io.fabric8.kubernetes.api.model._
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+
+private[spark] object KubernetesVolumeUtils {
+
+  /**
+   * Given hostPath volume specs, add volume to pod and volume mount to 
container.
+   *
+   * @param pod original specification of the pod
+   * @param container original specification of the container
+   * @param sparkConf Spark configuration
+   * @param prefix the prefix for volume configuration
+   * @return a tuple of (pod with the volume(s) added, container with 
mount(s) added)
+   */
+  def addVolumes(
+  pod: Pod,
+  container: Container,
+  sparkConf: SparkConf,
+  prefix : String): (Pod, Container) = {
+val hostPathVolumeSpecs = parseHostPathVolumesWithPrefix(sparkConf, 
prefix)
+addHostPathVolumes(pod, container, hostPathVolumeSpecs)
+  }
+
+  /**
+   * Extract Spark volume configuration properties with a given name 
prefix.
+   *
+   * @param sparkConf Spark configuration
+   * @param prefix the given property name prefix
+   * @param volumeTypeKey the given property name prefix
+   * @return a Map storing with volume name as key and spec as value
+   */
+  def parseVolumesWithPrefix(
+  sparkConf: SparkConf,
+  prefix: String,
+  volumeTypeKey: String): Map[String, KubernetesVolumeSpec] = {
+val volumes = HashMap[String, KubernetesVolumeSpec]()
+val properties = 
sparkConf.getAllWithPrefix(s"$prefix$volumeTypeKey.").toList
+// Extract volume names
+properties.foreach {
+  k =>
+val keys = k._1.split("\\.")
+if (keys.nonEmpty && !volumes.contains(keys(0))) {
+  volumes.update(keys(0), KubernetesVolumeSpec.emptySpec())
+}
+}
+// Populate spec
+volumes.foreach {
+  case (name, spec) =>
+properties.foreach {
+  k =>
+k._1.split("\\.") match {
+  case Array(`name`, KUBERNETES_VOLUMES_MOUNT_KEY, 
KUBERNETES_VOLUMES_PATH_KEY) =>
+spec.mountPath = Some(k._2)
+  case Array(`name`, KUBERNETES_VOLUMES_MOUNT_KEY, 
KUBERNETES_VOLUMES_READONLY_KEY) =>
+spec.mountReadOnly = Some(k._2.toBoolean)
+  case Array(`name`, KUBERNETES_VOLUMES_OPTIONS_KEY, option) =>
+spec.optionsSpec.update(option, k._2)
+  case _ =>
+None
+}
+}
+}
+volumes.toMap
+  }
+
+  /**
+   * Extract Spark hostPath volume configuration properties with a given 
name prefix and
+   * return the result as a Map.
+   *
+   * @param sparkConf Spark configuration
+   * @param prefix the given property name prefix
+   * @return a Map storing with volume name as key and spec as value
+   */
+  def parseHostPathVolumesWithPrefix(
+  sparkConf: SparkConf,
+  prefix: String): Map[String, KubernetesVolumeSpec] = {
+parseVolumesWithPrefix(sparkConf, prefix, 
KUBERNETES_VOLUMES_HOSTPATH_KEY)
+  }
+
+  /**
+   * Given hostPath volume specs, add volume to pod and volume mount to 
container.
+   *
+   * @param pod original specification of the pod
+   * @param container original specification of the container
+   * @param volumes list of named volume specs
+   * @return a tuple of (pod with the volume(s) added, container with 
mount(s) added)
+   */
+  def addHostPathVolumes(
+  pod: Pod,
+

[GitHub] spark pull request #21095: [SPARK-23529][K8s] Support mounting hostPath volu...

2018-05-03 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21095#discussion_r185980118
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala
 ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s
+
+import scala.collection.mutable.HashMap
+
+import io.fabric8.kubernetes.api.model._
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+
+private[spark] object KubernetesVolumeUtils {
+
+  /**
+   * Given hostPath volume specs, add volume to pod and volume mount to 
container.
+   *
+   * @param pod original specification of the pod
+   * @param container original specification of the container
+   * @param sparkConf Spark configuration
+   * @param prefix the prefix for volume configuration
+   * @return a tuple of (pod with the volume(s) added, container with 
mount(s) added)
+   */
+  def addVolumes(
+  pod: Pod,
+  container: Container,
+  sparkConf: SparkConf,
+  prefix : String): (Pod, Container) = {
+val hostPathVolumeSpecs = parseHostPathVolumesWithPrefix(sparkConf, 
prefix)
+addHostPathVolumes(pod, container, hostPathVolumeSpecs)
+  }
+
+  /**
+   * Extract Spark volume configuration properties with a given name 
prefix.
+   *
+   * @param sparkConf Spark configuration
+   * @param prefix the given property name prefix
+   * @param volumeTypeKey the given property name prefix
+   * @return a Map storing with volume name as key and spec as value
+   */
+  def parseVolumesWithPrefix(
+  sparkConf: SparkConf,
+  prefix: String,
+  volumeTypeKey: String): Map[String, KubernetesVolumeSpec] = {
+val volumes = HashMap[String, KubernetesVolumeSpec]()
+val properties = 
sparkConf.getAllWithPrefix(s"$prefix$volumeTypeKey.").toList
+// Extract volume names
+properties.foreach {
+  k =>
+val keys = k._1.split("\\.")
+if (keys.nonEmpty && !volumes.contains(keys(0))) {
+  volumes.update(keys(0), KubernetesVolumeSpec.emptySpec())
+}
+}
+// Populate spec
+volumes.foreach {
+  case (name, spec) =>
+properties.foreach {
+  k =>
+k._1.split("\\.") match {
+  case Array(`name`, KUBERNETES_VOLUMES_MOUNT_KEY, 
KUBERNETES_VOLUMES_PATH_KEY) =>
+spec.mountPath = Some(k._2)
+  case Array(`name`, KUBERNETES_VOLUMES_MOUNT_KEY, 
KUBERNETES_VOLUMES_READONLY_KEY) =>
+spec.mountReadOnly = Some(k._2.toBoolean)
+  case Array(`name`, KUBERNETES_VOLUMES_OPTIONS_KEY, option) =>
+spec.optionsSpec.update(option, k._2)
+  case _ =>
+None
+}
+}
+}
+volumes.toMap
+  }
+
+  /**
+   * Extract Spark hostPath volume configuration properties with a given 
name prefix and
+   * return the result as a Map.
+   *
+   * @param sparkConf Spark configuration
+   * @param prefix the given property name prefix
+   * @return a Map storing with volume name as key and spec as value
+   */
+  def parseHostPathVolumesWithPrefix(
--- End diff --

Looks like you don't really need this function as it's just a wrapper of 
`parseVolumesWithPrefix `. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21229
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2894/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21229
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21095: [SPARK-23529][K8s] Support mounting hostPath volu...

2018-05-03 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21095#discussion_r185980514
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala
 ---
@@ -109,7 +109,15 @@ private[spark] class BasicDriverFeatureStep(
 .addToImagePullSecrets(conf.imagePullSecrets(): _*)
 .endSpec()
   .build()
-SparkPod(driverPod, driverContainer)
+
+val (driverPodWithVolumes, driverContainerVolumes) =
--- End diff --

Instead of putting the logic of volume mounting in `BasicDriverFeatureStep` 
and `BasicExecutorFeatureStep`, we should add a new step for mounting volumes, 
similarly to how we handle secrets, e.g., `MountVolumesFeatureStep` where the 
logic of `addVolumes` should be. This feature step can be used for both the 
driver and executors.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21095: [SPARK-23529][K8s] Support mounting hostPath volu...

2018-05-03 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21095#discussion_r185960630
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala
 ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s
+
+import scala.collection.mutable.HashMap
+
+import io.fabric8.kubernetes.api.model._
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+
+private[spark] object KubernetesVolumeUtils {
+
+  /**
+   * Given hostPath volume specs, add volume to pod and volume mount to 
container.
+   *
+   * @param pod original specification of the pod
+   * @param container original specification of the container
+   * @param sparkConf Spark configuration
+   * @param prefix the prefix for volume configuration
+   * @return a tuple of (pod with the volume(s) added, container with 
mount(s) added)
+   */
+  def addVolumes(
+  pod: Pod,
+  container: Container,
+  sparkConf: SparkConf,
+  prefix : String): (Pod, Container) = {
+val hostPathVolumeSpecs = parseHostPathVolumesWithPrefix(sparkConf, 
prefix)
+addHostPathVolumes(pod, container, hostPathVolumeSpecs)
+  }
+
+  /**
+   * Extract Spark volume configuration properties with a given name 
prefix.
+   *
+   * @param sparkConf Spark configuration
+   * @param prefix the given property name prefix
+   * @param volumeTypeKey the given property name prefix
+   * @return a Map storing with volume name as key and spec as value
+   */
+  def parseVolumesWithPrefix(
+  sparkConf: SparkConf,
+  prefix: String,
+  volumeTypeKey: String): Map[String, KubernetesVolumeSpec] = {
+val volumes = HashMap[String, KubernetesVolumeSpec]()
+val properties = 
sparkConf.getAllWithPrefix(s"$prefix$volumeTypeKey.").toList
+// Extract volume names
+properties.foreach {
+  k =>
+val keys = k._1.split("\\.")
+if (keys.nonEmpty && !volumes.contains(keys(0))) {
+  volumes.update(keys(0), KubernetesVolumeSpec.emptySpec())
+}
+}
+// Populate spec
+volumes.foreach {
+  case (name, spec) =>
--- End diff --

The `case` line should be merged into the previous line according to the 
Spark code convention, e.g., `volumes.foreach { case (name, spec) =>`. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21095: [SPARK-23529][K8s] Support mounting hostPath volu...

2018-05-03 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21095#discussion_r185960149
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala
 ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s
+
+import scala.collection.mutable.HashMap
+
+import io.fabric8.kubernetes.api.model._
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+
+private[spark] object KubernetesVolumeUtils {
+
+  /**
+   * Given hostPath volume specs, add volume to pod and volume mount to 
container.
+   *
+   * @param pod original specification of the pod
+   * @param container original specification of the container
+   * @param sparkConf Spark configuration
+   * @param prefix the prefix for volume configuration
+   * @return a tuple of (pod with the volume(s) added, container with 
mount(s) added)
+   */
+  def addVolumes(
--- End diff --

The Scaladoc should not mention hostPath as this function is not hostPath 
exclusively.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21095: [SPARK-23529][K8s] Support mounting hostPath volu...

2018-05-03 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21095#discussion_r185960658
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala
 ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s
+
+import scala.collection.mutable.HashMap
+
+import io.fabric8.kubernetes.api.model._
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+
+private[spark] object KubernetesVolumeUtils {
+
+  /**
+   * Given hostPath volume specs, add volume to pod and volume mount to 
container.
+   *
+   * @param pod original specification of the pod
+   * @param container original specification of the container
+   * @param sparkConf Spark configuration
+   * @param prefix the prefix for volume configuration
+   * @return a tuple of (pod with the volume(s) added, container with 
mount(s) added)
+   */
+  def addVolumes(
+  pod: Pod,
+  container: Container,
+  sparkConf: SparkConf,
+  prefix : String): (Pod, Container) = {
+val hostPathVolumeSpecs = parseHostPathVolumesWithPrefix(sparkConf, 
prefix)
+addHostPathVolumes(pod, container, hostPathVolumeSpecs)
+  }
+
+  /**
+   * Extract Spark volume configuration properties with a given name 
prefix.
+   *
+   * @param sparkConf Spark configuration
+   * @param prefix the given property name prefix
+   * @param volumeTypeKey the given property name prefix
+   * @return a Map storing with volume name as key and spec as value
+   */
+  def parseVolumesWithPrefix(
+  sparkConf: SparkConf,
+  prefix: String,
+  volumeTypeKey: String): Map[String, KubernetesVolumeSpec] = {
+val volumes = HashMap[String, KubernetesVolumeSpec]()
+val properties = 
sparkConf.getAllWithPrefix(s"$prefix$volumeTypeKey.").toList
+// Extract volume names
+properties.foreach {
+  k =>
+val keys = k._1.split("\\.")
+if (keys.nonEmpty && !volumes.contains(keys(0))) {
+  volumes.update(keys(0), KubernetesVolumeSpec.emptySpec())
+}
+}
+// Populate spec
+volumes.foreach {
+  case (name, spec) =>
+properties.foreach {
+  k =>
--- End diff --

Ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21186: [SPARK-22279][SPARK-24112] Enable `convertMetasto...

2018-05-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21186#discussion_r185980350
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1812,6 +1812,9 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
   - Since Spark 2.4, creating a managed table with nonempty location is 
not allowed. An exception is thrown when attempting to create a managed table 
with nonempty location. To set `true` to 
`spark.sql.allowCreatingManagedTableUsingNonemptyLocation` restores the 
previous behavior. This option will be removed in Spark 3.0.
   - Since Spark 2.4, the type coercion rules can automatically promote the 
argument types of the variadic SQL functions (e.g., IN/COALESCE) to the widest 
common type, no matter how the input arguments order. In prior Spark versions, 
the promotion could fail in some specific orders (e.g., TimestampType, 
IntegerType and StringType) and throw an exception.
   - In version 2.3 and earlier, `to_utc_timestamp` and 
`from_utc_timestamp` respect the timezone in the input timestamp string, which 
breaks the assumption that the input timestamp is in a specific timezone. 
Therefore, these 2 functions can return unexpected results. In version 2.4 and 
later, this problem has been fixed. `to_utc_timestamp` and `from_utc_timestamp` 
will return null if the input timestamp string contains timezone. As an 
example, `from_utc_timestamp('2000-10-10 00:00:00', 'GMT+1')` will return 
`2000-10-10 01:00:00` in both Spark 2.3 and 2.4. However, 
`from_utc_timestamp('2000-10-10 00:00:00+00:00', 'GMT+1')`, assuming a local 
timezone of GMT+8, will return `2000-10-10 09:00:00` in Spark 2.3 but `null` in 
2.4. For people who don't care about this problem and want to retain the 
previous behaivor to keep their query unchanged, you can set 
`spark.sql.function.rejectTimezoneInString` to false. This option will be 
removed in Spark 3.0 and should only be used as a tempora
 ry workaround.
+  - Since Spark 2.4, Spark uses its own ORC support by default instead of 
Hive SerDe for better performance during Hive metastore table access. To set 
`false` to `spark.sql.hive.convertMetastoreOrc` restores the previous behavior.
+  - Since Spark 2.4, Spark supports table properties while converting 
Parquet/ORC Hive tables. To set `false` to 
`spark.sql.hive.convertMetastoreTableProperty` restores the previous behavior.
--- End diff --

please polish the migration guide w.r.t. 
https://issues.apache.org/jira/browse/SPARK-24175


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21229
  
**[Test build #90175 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90175/testReport)**
 for PR 21229 at commit 
[`a2d81df`](https://github.com/apache/spark/commit/a2d81dfa5f5b2cf58413f3e39717bf436c956d70).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21229: [SPARK-23697][CORE] LegacyAccumulatorWrapper should defi...

2018-05-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21229
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

2018-05-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21118
  
all the places that use `GenerateUnsafeRowJoiner` assume the input row is 
unsafe row.

`ShuffleExchangeExec` assumes its input is unsafe row, because its 
serializer is `UnsafeRowSerializer`.

Note that we don't enforce this at the API level, i.e. `SparkPlan.execute` 
still returns `RDD[InternalRow]`. This is because we have exceptions: the 
object related operators can return safe row, and object related operators 
always appear in a group, and the last operator will output unsafe row.

That said, you may not be able to see `UnsafeRow` in the generated code, 
but you will get `ClassCastException` if you don't follow this rule and output 
safe row.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21219: [SPARK-24160] ShuffleBlockFetcherIterator should fail if...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21219
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90161/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21219: [SPARK-24160] ShuffleBlockFetcherIterator should fail if...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21219
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21219: [SPARK-24160] ShuffleBlockFetcherIterator should fail if...

2018-05-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21219
  
**[Test build #90161 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90161/testReport)**
 for PR 21219 at commit 
[`3ecd7da`](https://github.com/apache/spark/commit/3ecd7da67e372f2cbb25c61b269030b2115885dd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21207: [SPARK-24136][SS] Fix MemoryStreamDataReader.next to ski...

2018-05-03 Thread arunmahadevan
Github user arunmahadevan commented on the issue:

https://github.com/apache/spark/pull/21207
  
@HyukjinKwon thanks for noticing, updated the title.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...

2018-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21193
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2893/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >