date:20180731

[GitHub] spark issue #21661: [SPARK-24685][build] Restore support for building old Ha...

2018-07-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21661
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21930: [SPARK-14540][Core] Fix remaining major issues for Scala...

2018-07-31 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21930
  
I think that's binary-incompatible breaking API change, right?
ex. 
https://github.com/apache/spark/pull/21930/files#diff-2b8f0f66fe5397b169d0f754e99da8d5R64


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21936: [SPARK-24981][Core] ShutdownHook timeout causes j...

2018-07-31 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21936#discussion_r206769869
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -571,7 +571,12 @@ class SparkContext(config: SparkConf) extends Logging {
 _shutdownHookRef = ShutdownHookManager.addShutdownHook(
   ShutdownHookManager.SPARK_CONTEXT_SHUTDOWN_PRIORITY) { () =>
   logInfo("Invoking stop() from shutdown hook")
-  stop()
+  try {
+stop()
+  } catch {
+case e: Throwable =>
+  logWarning("Ignoring Exception while stoping SparkContext. 
Exception: " + e)
--- End diff --

`stoping` -> `stopping`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21936: [SPARK-24981][Core] ShutdownHook timeout causes j...

2018-07-31 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21936#discussion_r206770131
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -571,7 +571,12 @@ class SparkContext(config: SparkConf) extends Logging {
 _shutdownHookRef = ShutdownHookManager.addShutdownHook(
   ShutdownHookManager.SPARK_CONTEXT_SHUTDOWN_PRIORITY) { () =>
   logInfo("Invoking stop() from shutdown hook")
-  stop()
+  try {
+stop()
+  } catch {
+case e: Throwable =>
+  logWarning("Ignoring Exception while stoping SparkContext. 
Exception: " + e)
--- End diff --

use this format 
`logWarning("", exception)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21661: [SPARK-24685][build] Restore support for building old Ha...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21661
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21661: [SPARK-24685][build] Restore support for building old Ha...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21661
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93860/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21661: [SPARK-24685][build] Restore support for building old Ha...

2018-07-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21661
  
**[Test build #93860 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93860/testReport)**
 for PR 21661 at commit 
[`1db4ab8`](https://github.com/apache/spark/commit/1db4ab8d1781036278329ae313cb7b1bf2c201c7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...

2018-07-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21941
  
**[Test build #93872 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93872/testReport)**
 for PR 21941 at commit 
[`47cbc5a`](https://github.com/apache/spark/commit/47cbc5a8d77c949674ff97c5763936a8425b0f00).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21941
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21941
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1553/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21941: [SPARK-24966][SQL] Implement precedence rules for...

2018-07-31 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21941#discussion_r206768063
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1451,6 +1451,15 @@ object SQLConf {
 .intConf
 .checkValues((1 to 9).toSet + Deflater.DEFAULT_COMPRESSION)
 .createWithDefault(Deflater.DEFAULT_COMPRESSION)
+
+  val SETOPS_PRECEDENCE_ENFORCED =
+buildConf("spark.sql.setops.precedence.enforced")
--- End diff --

@gatorsmile Sure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21622: [SPARK-24637][SS] Add metrics regarding state and...

2018-07-31 Thread HeartSaVioR

Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/spark/pull/21622#discussion_r206766835
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetricsReporter.scala
 ---
@@ -39,6 +42,23 @@ class MetricsReporter(
   registerGauge("processingRate-total", _.processedRowsPerSecond, 0.0)
   registerGauge("latency", 
_.durationMs.get("triggerExecution").longValue(), 0L)
 
+  private val timestampFormat = new 
SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'") // ISO8601
+  timestampFormat.setTimeZone(DateTimeUtils.getTimeZone("UTC"))
+
+  registerGauge("eventTime-watermark",
+progress => 
convertStringDateToMillis(progress.eventTime.get("watermark")), 0L)
+
+  registerGauge("states-rowsTotal", 
_.stateOperators.map(_.numRowsTotal).sum, 0L)
+  registerGauge("states-usedBytes", 
_.stateOperators.map(_.memoryUsedBytes).sum, 0L)
+
--- End diff --

Thanks for the input! I'll keep the patch as it is.

Could you suggest approach to extend the maintained metrics? I would like 
to expand more, and newer things might be coming from custom metrics (like from 
source and sink) so might be worth to have extension point.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21756: [SPARK-24764] [CORE] Add ServiceLoader implementation fo...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21756
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93856/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-07-31 Thread jbax

Github user jbax commented on the issue:

https://github.com/apache/spark/pull/21892
  
Thanks @MaxGekk I've fixed the error and also made the parser run faster 
than before when processing fields that were not selected in general. 

Can you please retest with the latest SNAPSHOT build and let me know how it 
goes?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21756: [SPARK-24764] [CORE] Add ServiceLoader implementation fo...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21756
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21756: [SPARK-24764] [CORE] Add ServiceLoader implementation fo...

2018-07-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21756
  
**[Test build #93856 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93856/testReport)**
 for PR 21756 at commit 
[`6b9edca`](https://github.com/apache/spark/commit/6b9edca76579cd1adfb42eb4085b604b050b552c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21941: [SPARK-24966][SQL] Implement precedence rules for...

2018-07-31 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21941#discussion_r206764090
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -535,14 +535,14 @@ abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
   case logical.Intersect(left, right, true) =>
 throw new IllegalStateException(
   "logical intersect operator should have been replaced by union, 
aggregate" +
-"and generate operators in the optimizer")
+" and generate operators in the optimizer")
   case logical.Except(left, right, false) =>
 throw new IllegalStateException(
   "logical except operator should have been replaced by anti-join 
in the optimizer")
   case logical.Except(left, right, true) =>
 throw new IllegalStateException(
   "logical except (all) operator should have been replaced by 
union, aggregate" +
-"and generate operators in the optimizer")
+" and generate operators in the optimizer")
--- End diff --

This is not related to the current PR. This addresses a comment from 
@HyukjinKwon in [21886](https://github.com/apache/spark/pull/21886)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21941: [SPARK-24966][SQL] Implement precedence rules for...

2018-07-31 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21941#discussion_r206764069
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -535,14 +535,14 @@ abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
   case logical.Intersect(left, right, true) =>
 throw new IllegalStateException(
   "logical intersect operator should have been replaced by union, 
aggregate" +
-"and generate operators in the optimizer")
+" and generate operators in the optimizer")
--- End diff --

This is not related to the current PR. This addresses a comment from 
@HyukjinKwon in [21886](https://github.com/apache/spark/pull/21886)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21941: [SPARK-24966][SQL] Implement precedence rules for...

2018-07-31 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21941#discussion_r206764004
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -165,9 +165,9 @@ object SetOperation {
 }
 
 case class Intersect(
-   left: LogicalPlan,
-   right: LogicalPlan,
-   isAll: Boolean = false) extends SetOperation(left, right) {
+left: LogicalPlan,
--- End diff --

This is not related to the current PR. This addresses a comment from 
@HyukjinKwon in [21886](https://github.com/apache/spark/pull/21886)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21941: [SPARK-24966][SQL] Implement precedence rules for...

2018-07-31 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21941#discussion_r206763936
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1451,6 +1451,15 @@ object SQLConf {
 .intConf
 .checkValues((1 to 9).toSet + Deflater.DEFAULT_COMPRESSION)
 .createWithDefault(Deflater.DEFAULT_COMPRESSION)
+
+  val SETOPS_PRECEDENCE_ENFORCED =
+buildConf("spark.sql.setops.precedence.enforced")
--- End diff --

let me think about the name of conf


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21941: [SPARK-24966][SQL] Implement precedence rules for...

2018-07-31 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21941#discussion_r206763732
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1451,6 +1451,15 @@ object SQLConf {
 .intConf
 .checkValues((1 to 9).toSet + Deflater.DEFAULT_COMPRESSION)
 .createWithDefault(Deflater.DEFAULT_COMPRESSION)
+
+  val SETOPS_PRECEDENCE_ENFORCED =
+buildConf("spark.sql.setops.precedence.enforced")
+  .doc("When set to true and order of evaluation is not specified by 
parentheses, " +
+"INTERSECT operations are performed before any UNION or EXCEPT 
operations. " +
--- End diff --

also include MINUS


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21941: [SPARK-24966][SQL] Implement precedence rules for...

2018-07-31 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21941#discussion_r206763501
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
 ---
@@ -676,4 +677,42 @@ class PlanParserSuite extends AnalysisTest {
   OneRowRelation().select('rtrim.function("c&^,.", "bc...,,,&&&ccc"))
 )
   }
+
+  test("precedence of set operations") {
+val a = table("a").select(star())
+val b = table("b").select(star())
+val c = table("c").select(star())
+val d = table("d").select(star())
+
+val query1 =
+  """
+|SELECT * FROM a
+|UNION
+|SELECT * FROM b
+|EXCEPT
+|SELECT * FROM c
+|INTERSECT
+|SELECT * FROM d
+  """.stripMargin
+
+val query2 =
+  """
+|SELECT * FROM a
+|UNION
+|SELECT * FROM b
+|EXCEPT ALL
+|SELECT * FROM c
+|INTERSECT ALL
+|SELECT * FROM d
+  """.stripMargin
+
+assertEqual(query1, Distinct(a.union(b)).except(c.intersect(d)))
--- End diff --

also add `withSQLConf(SQLConf.SETOPS_PRECEDENCE_ENFORCED.key -> "true") {`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21941: [SPARK-24966][SQL] Implement precedence rules for...

2018-07-31 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21941#discussion_r206763358
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -17,6 +17,12 @@
 grammar SqlBase;
 
 @members {
+  /**
+   * When true, INTERSECT is given precedence over UNION and EXCEPT set 
operations as per
--- End diff --

> When true, INTERSECT is given precedence over UNION and EXCEPT set 
operations as per

->

> When true, INTERSECT is given the greater precedence over the other set 
operations (UNION, EXCEPT and MINUS) as per


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19084: [SPARK-20711][ML]MultivariateOnlineSummarizer/Sum...

2018-07-31 Thread zhengruifeng

Github user zhengruifeng closed the pull request at:

https://github.com/apache/spark/pull/19084


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...

2018-07-31 Thread zhengruifeng

Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/21563
  
@mengxr I notice that you open a ticket for supporting integer type labels 
in ClusteringEvalutator, would you like to shepherd this pr too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...

2018-07-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21622
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...

2018-07-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21941
  
**[Test build #93871 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93871/testReport)**
 for PR 21941 at commit 
[`c0821b6`](https://github.com/apache/spark/commit/c0821b6dd8e713edf2bd1ddd9a27f170d8f8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19449: [SPARK-22219][SQL] Refactor code to get a value f...

2018-07-31 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19449#discussion_r206760031
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/internal/ExecutorSideSQLConfSuite.scala
 ---
@@ -82,4 +84,22 @@ class ExecutorSideSQLConfSuite extends SparkFunSuite 
with SQLTestUtils {
   assert(checks.forall(_ == true))
 }
   }
+
+  test("SPARK-22219: refactor to control to generate comment") {
+withSQLConf(StaticSQLConf.CODEGEN_COMMENTS.key -> "false") {
+  val res = codegenStringSeq(spark.range(10).groupBy(col("id") * 
2).count()
+.queryExecution.executedPlan)
+  assert(res.length == 2)
+  assert(res.forall{ case (_, code) =>
+!code.contains("* Codegend pipeline") && !code.contains("// 
input[")})
+}
+
+withSQLConf(StaticSQLConf.CODEGEN_COMMENTS.key -> "true") {
+  val res = codegenStringSeq(spark.range(10).groupBy(col("id") * 
2).count()
+.queryExecution.executedPlan)
+  assert(res.length == 2)
+  assert(res.forall{ case (_, code) =>
+code.contains("* Codegend pipeline") && code.contains("// 
input[")})
+}
--- End diff --

combine these two?
```
Seq(true, false).foreach { flag =>
  ...
  if (flag) {
 ...
  } else {
...
  }
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21941
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1552/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21941
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21622: [SPARK-24637][SS] Add metrics regarding state and...

2018-07-31 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/21622#discussion_r206761192
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetricsReporter.scala
 ---
@@ -39,6 +42,23 @@ class MetricsReporter(
   registerGauge("processingRate-total", _.processedRowsPerSecond, 0.0)
   registerGauge("latency", 
_.durationMs.get("triggerExecution").longValue(), 0L)
 
+  private val timestampFormat = new 
SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'") // ISO8601
+  timestampFormat.setTimeZone(DateTimeUtils.getTimeZone("UTC"))
+
+  registerGauge("eventTime-watermark",
+progress => 
convertStringDateToMillis(progress.eventTime.get("watermark")), 0L)
+
+  registerGauge("states-rowsTotal", 
_.stateOperators.map(_.numRowsTotal).sum, 0L)
+  registerGauge("states-usedBytes", 
_.stateOperators.map(_.memoryUsedBytes).sum, 0L)
+
--- End diff --

Those are custom metrics, which may or may not be present depending on the 
implementation of state store. I dont recommend adding them here directly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...

2018-07-31 Thread holdensmagicalunicorn

Github user holdensmagicalunicorn commented on the issue:

https://github.com/apache/spark/pull/21941
  
@dilipbiswal, thanks! I am a bot who has found some folks who might be able 
to help with the review:@gatorsmile, @rxin and @hvanhovell


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21941: [SPARK-24966][SQL] Implement precedence rules for...

2018-07-31 Thread dilipbiswal

GitHub user dilipbiswal opened a pull request:

https://github.com/apache/spark/pull/21941

[SPARK-24966][SQL] Implement precedence rules for set operations.

## What changes were proposed in this pull request?

Currently the set operations INTERSECT, UNION and EXCEPT are assigned the 
same precedence. This PR fixes the problem by giving INTERSECT  higher 
precedence than UNION and EXCEPT. UNION and EXCEPT operators are evaluated in 
the order in which they appear in the query from left to right.

This results in change in behavior because of the change in order of 
evaluations of set operators in a query. The old behavior is still preserved 
under a newly added config parameter.

Query `:`
```
SELECT * FROM t1
UNION 
SELECT * FROM t2
EXCEPT
SELECT * FROM t3
INTERSECT
SELECT * FROM t4
```
Parsed plan before the change `:`
```
== Parsed Logical Plan ==
'Intersect false
:- 'Except false
:  :- 'Distinct
:  :  +- 'Union
:  : :- 'Project [*]
:  : :  +- 'UnresolvedRelation `t1`
:  : +- 'Project [*]
:  :+- 'UnresolvedRelation `t2`
:  +- 'Project [*]
: +- 'UnresolvedRelation `t3`
+- 'Project [*]
   +- 'UnresolvedRelation `t4`
```
Parsed plan after the change `:`
```
== Parsed Logical Plan ==
'Except false
:- 'Distinct
:  +- 'Union
: :- 'Project [*]
: :  +- 'UnresolvedRelation `t1`
: +- 'Project [*]
:+- 'UnresolvedRelation `t2`
+- 'Intersect false
   :- 'Project [*]
   :  +- 'UnresolvedRelation `t3`
   +- 'Project [*]
  +- 'UnresolvedRelation `t4`
```
## How was this patch tested?
Added tests in PlanParserSuite, SQLQueryTestSuite.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dilipbiswal/spark SPARK-24966

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21941.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21941


commit c0821b6dd8e713edf2bd1ddd9a27f170d8f8
Author: Dilip Biswal 
Date:   2018-07-30T05:10:29Z

[SPARK-24966] Implement precedence rules for set operations.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21103
  
**[Test build #93870 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93870/testReport)**
 for PR 21103 at commit 
[`93e7979`](https://github.com/apache/spark/commit/93e7979a1c3fb82c47ecae5b3ed539b31cb99e19).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1551/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-31 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21103
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21222: [SPARK-24161][SS] Enable debug package feature on struct...

2018-07-31 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21222
  
@zsxwing Kindly reminder.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...

2018-07-31 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21622
  
Pinging @tdas and @zsxwing for reviewing. It's small one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21934: [SPARK-24951][SQL] Table valued functions should ...

2018-07-31 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21934


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-07-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21469
  
**[Test build #93869 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93869/testReport)**
 for PR 21469 at commit 
[`ed072fc`](https://github.com/apache/spark/commit/ed072fcf057f982275d0daf69787ed812f03e87b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-07-31 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21469
  
@tdas Thanks for the review! Addressed review comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-31 Thread ajacques

Github user ajacques commented on the issue:

https://github.com/apache/spark/pull/21889
  
@mallman, sounds good I'll get this PR updated with your latest changes as 
soon as I can.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...

2018-07-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21883
  
**[Test build #93868 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93868/testReport)**
 for PR 21883 at commit 
[`536346e`](https://github.com/apache/spark/commit/536346e60ed24ee447f991aacf58cafe9415a020).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21883
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21883
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1550/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...

2018-07-31 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/21883
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21561
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21561
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93866/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...

2018-07-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21561
  
**[Test build #93866 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93866/testReport)**
 for PR 21561 at commit 
[`1a93c34`](https://github.com/apache/spark/commit/1a93c3432f95713e9a086a39e2f605ea4953619a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

2018-07-31 Thread HeartSaVioR

Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/spark/pull/21469#discussion_r206755595
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala ---
@@ -48,12 +49,24 @@ class StateOperatorProgress private[sql](
   def prettyJson: String = pretty(render(jsonValue))
 
   private[sql] def copy(newNumRowsUpdated: Long): StateOperatorProgress =
-new StateOperatorProgress(numRowsTotal, newNumRowsUpdated, 
memoryUsedBytes)
+new StateOperatorProgress(numRowsTotal, newNumRowsUpdated, 
memoryUsedBytes, customMetrics)
 
   private[sql] def jsonValue: JValue = {
-("numRowsTotal" -> JInt(numRowsTotal)) ~
-("numRowsUpdated" -> JInt(numRowsUpdated)) ~
-("memoryUsedBytes" -> JInt(memoryUsedBytes))
+def safeMapToJValue[T](map: ju.Map[String, T], valueToJValue: T => 
JValue): JValue = {
+  if (map.isEmpty) return JNothing
+  val keys = map.keySet.asScala.toSeq.sorted
+  keys.map { k => k -> valueToJValue(map.get(k)) : JObject }.reduce(_ 
~ _)
+}
+
+val jsonVal = ("numRowsTotal" -> JInt(numRowsTotal)) ~
+  ("numRowsUpdated" -> JInt(numRowsUpdated)) ~
+  ("memoryUsedBytes" -> JInt(memoryUsedBytes))
+
+if (!customMetrics.isEmpty) {
--- End diff --

Actually didn't notice that. Thanks for letting me know! Will simplify.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

2018-07-31 Thread HeartSaVioR

Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/spark/pull/21469#discussion_r206755538
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala ---
@@ -48,12 +49,24 @@ class StateOperatorProgress private[sql](
   def prettyJson: String = pretty(render(jsonValue))
 
   private[sql] def copy(newNumRowsUpdated: Long): StateOperatorProgress =
-new StateOperatorProgress(numRowsTotal, newNumRowsUpdated, 
memoryUsedBytes)
+new StateOperatorProgress(numRowsTotal, newNumRowsUpdated, 
memoryUsedBytes, customMetrics)
 
   private[sql] def jsonValue: JValue = {
-("numRowsTotal" -> JInt(numRowsTotal)) ~
-("numRowsUpdated" -> JInt(numRowsUpdated)) ~
-("memoryUsedBytes" -> JInt(memoryUsedBytes))
+def safeMapToJValue[T](map: ju.Map[String, T], valueToJValue: T => 
JValue): JValue = {
--- End diff --

I've first trying to leverage `StreamingQueryProgress.safeMapToJValue` but 
can't find proper place to move to be co-used, so I simply copied it. Will 
simplify the code block and inline.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

2018-07-31 Thread HeartSaVioR

Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/spark/pull/21469#discussion_r206754359
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -81,10 +81,10 @@ class SQLMetric(val metricType: String, initValue: Long 
= 0L) extends Accumulato
 }
 
 object SQLMetrics {
-  private val SUM_METRIC = "sum"
-  private val SIZE_METRIC = "size"
-  private val TIMING_METRIC = "timing"
-  private val AVERAGE_METRIC = "average"
+  val SUM_METRIC = "sum"
+  val SIZE_METRIC = "size"
+  val TIMING_METRIC = "timing"
+  val AVERAGE_METRIC = "average"
--- End diff --

It was to handle exception case while aggregating custom metrics, 
especially filtering out average since it is not aggregated correctly. Since we 
remove custom average metric, we no longer need to filter out them. Will revert 
the change as well as relevant logic.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21883
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93855/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21883
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...

2018-07-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21883
  
**[Test build #93855 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93855/testReport)**
 for PR 21883 at commit 
[`536346e`](https://github.com/apache/spark/commit/536346e60ed24ee447f991aacf58cafe9415a020).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93851/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21722: Spark-24742: Fix NullPointerexception in Field Metadata

2018-07-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21722
  
**[Test build #4228 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4228/testReport)**
 for PR 21722 at commit 
[`088e2d7`](https://github.com/apache/spark/commit/088e2d789dad707bd657a72afa8933e957641536).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21103
  
**[Test build #93851 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93851/testReport)**
 for PR 21103 at commit 
[`93e7979`](https://github.com/apache/spark/commit/93e7979a1c3fb82c47ecae5b3ed539b31cb99e19).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21357: [SPARK-24311][SS] Refactor HDFSBackedStateStoreProvider ...

2018-07-31 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21357
  
@tdas 
The rationalization of this patch is to group functions which deal with 
delta and snapshot files into one so that the difference between delta file and 
snapshot file will be clearly shown (actually no difference other than allowing 
TOMBSTONE value in delta file) as well as easy to document about these files. 
It's also easier to add tests for delta / snapshot files.

Indeed my underlying rationalization is to make the class easier to 
understand from newcomers (actually I found it helpful to group them logically 
to understand the code better), but the file has been getting enough love from 
various contributors so may not worth to put effort to make it easiler.

I respect the rule of Spark project, and happy to close if we don't feel 
benefitial to go on. Let's close it and revisit some other one feels 
benefitial. Thanks for providing your voice on this!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21357: [SPARK-24311][SS] Refactor HDFSBackedStateStorePr...

2018-07-31 Thread HeartSaVioR

Github user HeartSaVioR closed the pull request at:

https://github.com/apache/spark/pull/21357


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19449
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93852/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19449
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...

2018-07-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19449
  
**[Test build #93852 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93852/testReport)**
 for PR 19449 at commit 
[`afe889d`](https://github.com/apache/spark/commit/afe889d7cd05f7a293f76103616cd62106b91305).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21563
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93863/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21563
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...

2018-07-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21563
  
**[Test build #93863 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93863/testReport)**
 for PR 21563 at commit 
[`9064e7b`](https://github.com/apache/spark/commit/9064e7bde92f206602ebde9b3d99a861b2a90f8a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21911: [SPARK-24940][SQL] Coalesce Hint for SQL Queries

2018-07-31 Thread jzhuge

Github user jzhuge commented on the issue:

https://github.com/apache/spark/pull/21911
  
@gatorsmile Oracle's [PARALLEL 
Hint](https://docs.oracle.com/en/database/oracle/oracle-database/18/sqlrf/Comments.html#GUID-D25225CE-2DCE-4D9F-8E82-401839690A6E)
 is the closest I can find. And [SET CURRENT 
DEGREE](https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/sqlref/src/tpc/db2z_sql_setcurrentdegree.html)
 for parallel processing in DB2.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21933: [SPARK-24917] make chunk size configurable

2018-07-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21933
  
**[Test build #93867 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93867/testReport)**
 for PR 21933 at commit 
[`0251bd5`](https://github.com/apache/spark/commit/0251bd517e7fd3e695cb8366ffa03de8c9e2900b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21933: [SPARK-24917] make chunk size configurable

2018-07-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21933
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21940: Pin tag 210

2018-07-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21940
  
@zhangchj1990, looks mistakenly open. Close this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19186: [SPARK-21972][ML] Add param handlePersistence

2018-07-31 Thread zhengruifeng

Github user zhengruifeng closed the pull request at:

https://github.com/apache/spark/pull/19186


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...

2018-07-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21561
  
**[Test build #93866 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93866/testReport)**
 for PR 21561 at commit 
[`1a93c34`](https://github.com/apache/spark/commit/1a93c3432f95713e9a086a39e2f605ea4953619a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20918: [SPARK-23805][ML][WIP] Features alg support vecto...

2018-07-31 Thread zhengruifeng

Github user zhengruifeng closed the pull request at:

https://github.com/apache/spark/pull/20918


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21561
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21561
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1549/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21935: [SPARK-24773] Avro: support logical timestamp typ...

2018-07-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21935#discussion_r206748626
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala 
---
@@ -114,7 +121,10 @@ object SchemaConverters {
   case ByteType | ShortType | IntegerType => builder.intType()
   case LongType => builder.longType()
   case DateType => builder.longType()
-  case TimestampType => builder.longType()
+  case TimestampType =>
+// To be consistent with the previous behavior of writing 
Timestamp type with Avro 1.7,
--- End diff --

For now I think writing out timestamp micros should be good


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21752: [SPARK-24788][SQL] fixed UnresolvedException when toStri...

2018-07-31 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21752
  
ping @c-horn 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21561
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93865/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...

2018-07-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21561
  
**[Test build #93865 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93865/testReport)**
 for PR 21561 at commit 
[`2e48282`](https://github.com/apache/spark/commit/2e48282825a6fb46a50f4497491c550963f2c634).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21561
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...

2018-07-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21561
  
**[Test build #93865 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93865/testReport)**
 for PR 21561 at commit 
[`2e48282`](https://github.com/apache/spark/commit/2e48282825a6fb46a50f4497491c550963f2c634).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21305#discussion_r206748200
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/WriteSupport.java ---
@@ -38,15 +38,16 @@
* If this method fails (by throwing an exception), the action will fail 
and no Spark job will be
* submitted.
*
-   * @param jobId A unique string for the writing job. It's possible that 
there are many writing
-   *  jobs running at the same time, and the returned {@link 
DataSourceWriter} can
-   *  use this job id to distinguish itself from other jobs.
+   * @param writeUUID A unique string for the writing job. It's possible 
that there are many writing
+   *  jobs running at the same time, and the returned 
{@link DataSourceWriter} can
+   *  use this job id to distinguish itself from other 
jobs.
* @param schema the schema of the data to be written.
* @param mode the save mode which determines what to do when the data 
are already in this data
* source, please refer to {@link SaveMode} for more details.
* @param options the options for the returned data source writer, which 
is an immutable
*case-insensitive string-to-string map.
+   * @return a writer to append data to this data source
--- End diff --

non-append cases also call this `createWriter`, shall we remove this line?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21561
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1548/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21561
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18589: [SPARK-16872][ML] Add Gaussian NB

2018-07-31 Thread zhengruifeng

Github user zhengruifeng closed the pull request at:

https://github.com/apache/spark/pull/18589


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21934: [SPARK-24951][SQL] Table valued functions should throw A...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21934
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18389: [SPARK-14174][ML] Add minibatch kmeans

2018-07-31 Thread zhengruifeng

Github user zhengruifeng closed the pull request at:

https://github.com/apache/spark/pull/18389


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...

2018-07-31 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20636#discussion_r206748015
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala
 ---
@@ -39,8 +39,8 @@ class BufferHolderSparkSubmitSuite
 val argsForSparkSubmit = Seq(
   "--class", 
BufferHolderSparkSubmitSuite.getClass.getName.stripSuffix("$"),
   "--name", "SPARK-2",
-  "--master", "local-cluster[2,1,1024]",
-  "--driver-memory", "4g",
+  "--master", "local-cluster[1,1,7168]",
--- End diff --

I think we support this for debugging purpose since, IIRC, that's going to 
make separate processes for workers.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21934: [SPARK-24951][SQL] Table valued functions should throw A...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21934
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93849/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21934: [SPARK-24951][SQL] Table valued functions should throw A...

2018-07-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21934
  
**[Test build #93849 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93849/testReport)**
 for PR 21934 at commit 
[`514fd77`](https://github.com/apache/spark/commit/514fd77501194e43e8029734e4a3669f12fbf749).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21305#discussion_r206747528
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2217,6 +2218,100 @@ class Analyzer(
 }
   }
 
+  /**
+   * Resolves columns of an output table from the data in a logical plan. 
This rule will:
+   *
+   * - Reorder columns when the write is by name
+   * - Insert safe casts when data types do not match
+   * - Insert aliases when column names do not match
+   * - Detect plans that are not compatible with the output table and 
throw AnalysisException
+   */
+  object ResolveOutputRelation extends Rule[LogicalPlan] {
+override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case append @ AppendData(table, query, isByName)
+  if table.resolved && query.resolved && !append.resolved =>
+val projection = resolveOutputColumns(table.name, table.output, 
query, isByName)
+
+if (projection != query) {
+  append.copy(query = projection)
+} else {
+  append
+}
+}
+
+def resolveOutputColumns(
+tableName: String,
+expected: Seq[Attribute],
+query: LogicalPlan,
+byName: Boolean): LogicalPlan = {
+
+  if (expected.size < query.output.size) {
+throw new AnalysisException(
+  s"""Cannot write to '$tableName', too many data columns:
+ |Table columns: ${expected.map(_.name).mkString(", ")}
+ |Data columns: ${query.output.map(_.name).mkString(", 
")}""".stripMargin)
+  }
+
+  val errors = new mutable.ArrayBuffer[String]()
+  val resolved: Seq[NamedExpression] = if (byName) {
+expected.flatMap { outAttr =>
+  query.resolveQuoted(outAttr.name, resolver) match {
+case Some(inAttr) if inAttr.nullable && !outAttr.nullable =>
+  errors += s"Cannot write nullable values to non-null column 
'${outAttr.name}'"
+  None
+
+case Some(inAttr) if !DataType.canWrite(outAttr.dataType, 
inAttr.dataType, resolver) =>
+  Some(upcast(inAttr, outAttr))
+
+case Some(inAttr) =>
+  Some(inAttr) // matches nullability, datatype, and name
+
+case _ =>
+  errors += s"Cannot find data for output column 
'${outAttr.name}'"
+  None
+  }
+}
+
+  } else {
+if (expected.size > query.output.size) {
+  throw new AnalysisException(
+s"""Cannot write to '$tableName', not enough data columns:
+   |Table columns: ${expected.map(_.name).mkString(", ")}
+   |Data columns: ${query.output.map(_.name).mkString(", 
")}""".stripMargin)
+}
+
+query.output.zip(expected).flatMap {
+  case (inAttr, outAttr) if inAttr.nullable && !outAttr.nullable =>
+errors += s"Cannot write nullable values to non-null column 
'${outAttr.name}'"
+None
+
+  case (inAttr, outAttr)
+if !DataType.canWrite(inAttr.dataType, outAttr.dataType, 
resolver) ||
--- End diff --

can't we always do upCast? if it can write, the upCast will be a no-op and 
removed by optimizer.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21935: [SPARK-24773] Avro: support logical timestamp typ...

2018-07-31 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21935#discussion_r206747402
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala 
---
@@ -35,6 +36,12 @@ object SchemaConverters {
* This function takes an avro schema and returns a sql schema.
*/
   def toSqlType(avroSchema: Schema): SchemaType = {
+avroSchema.getLogicalType match {
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21935: [SPARK-24773] Avro: support logical timestamp typ...

2018-07-31 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21935#discussion_r206747243
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala 
---
@@ -71,7 +72,15 @@ class AvroDeserializer(rootAvroType: Schema, 
rootCatalystType: DataType) {
   private def newWriter(
   avroType: Schema,
   catalystType: DataType,
-  path: List[String]): (CatalystDataUpdater, Int, Any) => Unit =
+  path: List[String]): (CatalystDataUpdater, Int, Any) => Unit = {
+(avroType.getLogicalType, catalystType) match {
--- End diff --

Can we do this like:

```scala
  case (LONG, TimestampType) => avroType.getLogicalType match {
case _: TimestampMillis => (updater, ordinal, value) =>
  updater.setLong(ordinal, value.asInstanceOf[Long] * 1000)
case _: TimestampMicros => (updater, ordinal, value) =>
  updater.setLong(ordinal, value.asInstanceOf[Long])
case _ => (updater, ordinal, value) =>
  updater.setLong(ordinal, value.asInstanceOf[Long] * 1000)
  }
```

? Looks they have Avro long type anyway. Thought it's better to read and 
actually safer and correct. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-31 Thread lindblombr

Github user lindblombr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21847#discussion_r206746980
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -165,16 +182,118 @@ class AvroSerializer(rootCatalystType: DataType, 
rootAvroType: Schema, nullable:
   result
   }
 
-  private def resolveNullableType(avroType: Schema, nullable: Boolean): 
Schema = {
-if (nullable) {
+  // Resolve an Avro union against a supplied DataType, i.e. a LongType 
compared against
+  // a ["null", "long"] should return a schema of type Schema.Type.LONG
+  // This function also handles resolving a DataType against unions of 2 
or more types, i.e.
+  // an IntType resolves against a ["int", "long", "null"] will correctly 
return a schema of
+  // type Schema.Type.LONG
+  private def resolveUnionType(avroType: Schema, catalystType: DataType,
+  nullable: Boolean): Schema = {
+if (avroType.getType == Type.UNION) {
   // avro uses union to represent nullable type.
-  val fields = avroType.getTypes.asScala
-  assert(fields.length == 2)
-  val actualType = fields.filter(_.getType != NULL)
-  assert(actualType.length == 1)
+  val fieldTypes = avroType.getTypes.asScala
+
+  // If we're nullable, we need to have at least two types.  Cases 
with more than two types
+  // are captured in test("read read-write, read-write w/ schema, 
read") w/ test.avro input
+  if (nullable && fieldTypes.length < 2) {
+throw new IncompatibleSchemaException(
+  s"Cannot resolve nullable ${catalystType} against union type 
${avroType}")
+  }
+
+  val actualType = catalystType match {
+case NullType => fieldTypes.filter(_.getType == Type.NULL)
+case BooleanType => fieldTypes.filter(_.getType == Type.BOOLEAN)
+case ByteType => fieldTypes.filter(_.getType == Type.INT)
+case BinaryType =>
+  val at = fieldTypes.filter(x => x.getType == Type.BYTES || 
x.getType == Type.FIXED)
+  if (at.length > 1) {
+throw new IncompatibleSchemaException(
+  s"Cannot resolve schema of ${catalystType} against union 
${avroType.toString}")
+  } else {
+at
+  }
+case ShortType | IntegerType => fieldTypes.filter(_.getType == 
Type.INT)
+case LongType => fieldTypes.filter(_.getType == Type.LONG)
+case FloatType => fieldTypes.filter(_.getType == Type.FLOAT)
+case DoubleType => fieldTypes.filter(_.getType == Type.DOUBLE)
+case d: DecimalType => fieldTypes.filter(_.getType == Type.STRING)
+case StringType => fieldTypes
+  .filter(x => x.getType == Type.STRING || x.getType == Type.ENUM)
+case DateType => fieldTypes.filter(x => x.getType == Type.INT || 
x.getType == Type.LONG)
+case TimestampType => fieldTypes.filter(_.getType == Type.LONG)
+case ArrayType(et, containsNull) =>
+  // Find array that matches the element type specified
+  fieldTypes.filter(x => x.getType == Type.ARRAY
+&& typeMatchesSchema(et, x.getElementType))
+case st: StructType => // Find the matching record!
+  val recordTypes = fieldTypes.filter(x => x.getType == 
Type.RECORD)
+  if (recordTypes.length > 1) {
+throw new IncompatibleSchemaException(
+  "Unions of multiple record types are NOT supported with 
user-specified schema")
+  }
+  recordTypes
+case MapType(kt, vt, valueContainsNull) =>
+  // Find the map that matches the value type.  Maps in Avro are 
always key type string
+  fieldTypes.filter(x => x.getType == Type.MAP && 
typeMatchesSchema(vt, x.getValueType))
--- End diff --

In `SchemaConverters.toAvro`, the expectation is that Maps are keyed only 
with `StringType`:

case MapType(StringType, vt, valueContainsNull) =>
  builder.map().values(toAvroType(vt, valueContainsNull, recordName, 
prevNameSpace))

When you attempt this trivial test case, we fail
```
test("SPARK-24855: Maps with kv not string") {
withTempPath { dir =>
  val someData = Seq(
Row("a", Map(
  1 -> "foo",
  2 -> "bar",
  3 -> "baz"
  )
),
Row("b", Map(
  1 -> "foo",
  2 -> "bar",
  3 -> "baz"
  )
)
  )

  val someSchema = StructType(Seq(
StructField("id", StringType, true),
StructField("map", MapType(IntegerType, StringType), true)
)
  )

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-31 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r206746905
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import org.apache.spark.annotation.{Experimental, Since}
+
+/** A [[TaskContext]] with extra info and tooling for a barrier stage. */
+trait BarrierTaskContext extends TaskContext {
--- End diff --

Please check the generated JavaDoc. I think it becomes a Java interface 
with only two methods defined here. We might want to define `class 
BarrierTaskContext` directly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21305#discussion_r206746478
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2217,6 +2218,100 @@ class Analyzer(
 }
   }
 
+  /**
+   * Resolves columns of an output table from the data in a logical plan. 
This rule will:
+   *
+   * - Reorder columns when the write is by name
+   * - Insert safe casts when data types do not match
+   * - Insert aliases when column names do not match
+   * - Detect plans that are not compatible with the output table and 
throw AnalysisException
+   */
+  object ResolveOutputRelation extends Rule[LogicalPlan] {
+override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case append @ AppendData(table, query, isByName)
+  if table.resolved && query.resolved && !append.resolved =>
+val projection = resolveOutputColumns(table.name, table.output, 
query, isByName)
+
+if (projection != query) {
+  append.copy(query = projection)
+} else {
+  append
+}
+}
+
+def resolveOutputColumns(
+tableName: String,
+expected: Seq[Attribute],
+query: LogicalPlan,
+byName: Boolean): LogicalPlan = {
+
+  if (expected.size < query.output.size) {
+throw new AnalysisException(
+  s"""Cannot write to '$tableName', too many data columns:
+ |Table columns: ${expected.map(_.name).mkString(", ")}
+ |Data columns: ${query.output.map(_.name).mkString(", 
")}""".stripMargin)
+  }
+
+  val errors = new mutable.ArrayBuffer[String]()
+  val resolved: Seq[NamedExpression] = if (byName) {
+expected.flatMap { outAttr =>
+  query.resolveQuoted(outAttr.name, resolver) match {
+case Some(inAttr) if inAttr.nullable && !outAttr.nullable =>
--- End diff --

shall we check the nullability for nested fields.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21305#discussion_r206746383
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -352,6 +351,36 @@ case class Join(
   }
 }
 
+/**
+ * Append data to an existing table.
+ */
+case class AppendData(
+table: NamedRelation,
+query: LogicalPlan,
+isByName: Boolean) extends LogicalPlan {
+  override def children: Seq[LogicalPlan] = Seq(query)
--- End diff --

why is `table` not a child? Then we can't transform the table relation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21854: [SPARK-24896][SQL] Uuid should produce different values ...

2018-07-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21854
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 616 matches

Mail list logo