[GitHub] spark issue #23268: [Hive][Minor] Refactor on HiveShim and Add Unit Tests

2018-12-09 Thread sadhen
Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/23268
  
OK, I will modify the PR several hours later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240110126
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
+  .getOrElse(cols)
+  .mkString(",")
--- End diff --

Thanks. It doesn't matter which company it is. All the PRs are equally and 
reasonably reviewed, and merged per the same criteria.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23270: [SPARK-26317][BUILD] Upgrade SBT to 0.13.18

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23270
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5911/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23269: [SPARK-26316] Currently the wrong implementation in the ...

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23269
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23270: [SPARK-26317][BUILD] Upgrade SBT to 0.13.18

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23270
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23269: [SPARK-26316] Currently the wrong implementation in the ...

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23269
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99896/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23269: [SPARK-26316] Currently the wrong implementation in the ...

2018-12-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23269
  
**[Test build #99896 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99896/testReport)**
 for PR 23269 at commit 
[`03cfe2b`](https://github.com/apache/spark/commit/03cfe2b7506f5c5421aaf2858f3f31f2153db8fb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23270: [SPARK-26317][BUILD] Upgrade SBT to 0.13.18

2018-12-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23270
  
**[Test build #99897 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99897/testReport)**
 for PR 23270 at commit 
[`0885c94`](https://github.com/apache/spark/commit/0885c947d3b4561df2d39f1bc9a35a06d7f0ed0c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23270: [SPARK-26317][BUILD] Upgrade SBT to 0.13.18

2018-12-09 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/23270

[SPARK-26317][BUILD] Upgrade SBT to 0.13.18

## What changes were proposed in this pull request?

SBT 0.13.14 ~ 1.1.1 has a bug on accessing `java.util.Base64.getDecoder` on 
JDK9+. It's fixed at 1.1.2 and backported to [0.13.18 (released on Nov 
28th)](https://github.com/sbt/sbt/releases/tag/v0.13.18). This PR aims to 
update SBT.

## How was this patch tested?

Pass the Jenkins with the building and existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-26317

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23270.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23270


commit 0885c947d3b4561df2d39f1bc9a35a06d7f0ed0c
Author: Dongjoon Hyun 
Date:   2018-12-10T07:48:16Z

[SPARK-26317][BUILD] Upgrade SBT to 0.13.18




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240107064
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
--- End diff --

Ah, it's `:+` not `+:`. Yea, it's confusing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread maomaoChibei
Github user maomaoChibei commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240107003
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
+  .getOrElse(cols)
+  .mkString(",")
--- End diff --

thanks HyukjinKwon for the kind explaination. we are from an alibaba 
subsidiary internet company working on resolving real time data calculation 
based on Peta level data. Currently we have an real time engine which is built 
on this spark-sql, well, at lease from the sql engine point of view. The 
throughput is over an qps of 1k and with response latency less than 100ms. Its 
an none-stop online 24*7 platform serving over ten million active users. If 
this market scale and platform scale meets the criteria, let me know, we are 
searching for further cooporations. thanks again. maomao



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23132: [SPARK-26163][SQL] Parsing decimals from JSON using loca...

2018-12-09 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/23132
  
`spark.sql.legacy.decimalParsing.enabled` is still shown in the PR 
description and commit messages. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240105070
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
--- End diff --

? output is different, no?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...

2018-12-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/23124#discussion_r240104815
  
--- Diff: docs/sql-migration-guide-upgrade.md ---
@@ -27,6 +27,8 @@ displayTitle: Spark SQL Upgrading Guide
 
   - In Spark version 2.4 and earlier, float/double -0.0 is semantically 
equal to 0.0, but users can still distinguish them via `Dataset.show`, 
`Dataset.collect` etc. Since Spark 3.0, float/double -0.0 is replaced by 0.0 
internally, and users can't distinguish them any more.
 
+  - In Spark version 2.4 and earlier, users can create a map with 
duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. 
The behavior of map with duplicated keys is undefined, e.g. map look up 
respects the duplicated key appears first, `Dataset.collect` only keeps the 
duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since 
Spark 3.0, these built-in functions will remove duplicated map keys with last 
wins policy. Users may still read map values with duplicated keys from data 
sources which do not enforce it (e.g. Parquet), the behavior will be udefined.
--- End diff --

A few typos. How about?

```
In Spark version 2.4 and earlier, users can create a map with duplicate 
keys via built-in functions like `CreateMap` and `StringToMap`. The behavior of 
map with duplicate keys is undefined. For example, the map lookup respects the 
duplicate key that appears first, `Dataset.collect` only keeps the duplicate 
key that appears last, and `MapKeys` returns duplicate keys. Since Spark 3.0, 
these built-in functions will remove duplicate map keys using the last-one-wins 
policy. Users may still read map values with duplicate keys from the data 
sources that do not enforce it (e.g. Parquet), but the behavior will be 
undefined.
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23204: Revert "[SPARK-21052][SQL] Add hash map metrics t...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23204#discussion_r240104812
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala ---
@@ -213,10 +213,6 @@ trait HashJoin {
   s"BroadcastHashJoin should not take $x as the JoinType")
 }
 
-// At the end of the task, we update the avg hash probe.
-TaskContext.get().addTaskCompletionListener[Unit](_ =>
--- End diff --

in this file, the `join` method takes `avgHashProbe: SQLMetric`, we should 
remove it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

2018-12-09 Thread wangjiaochun
Github user wangjiaochun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23225#discussion_r239990587
  
--- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
@@ -161,6 +161,10 @@ private void writeSortedFile(boolean isLastFile) {
 final ShuffleInMemorySorter.ShuffleSorterIterator sortedRecords =
   inMemSorter.getSortedIterator();
 
+// If there are no sorted records, so we don't need to create an empty 
spill file.
+if (!sortedRecords.hasNext()) {
+  return;
+}
--- End diff --

Okay, I will make the changes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

2018-12-09 Thread wangjiaochun
Github user wangjiaochun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23225#discussion_r239990083
  
--- Diff: 
core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java 
---
@@ -235,6 +235,7 @@ public void writeEmptyIterator() throws Exception {
 final Option mapStatus = writer.stop(true);
 assertTrue(mapStatus.isDefined());
 assertTrue(mergedOutputFile.exists());
+assertEquals(0, spillFilesCreated.size());
--- End diff --

I think it's unnecessary to add  a new test case, and it can delete  line 
239~242 of this test writeEmptyIterator because they're always right.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread sadhen
Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240103941
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
--- End diff --

Just a idiomatic way to write Scala


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18784: [SPARK-21559][Mesos] remove mesos fine-grained mode

2018-12-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18784
  
Looks inactive. @srowen and @felixcheung, do you know anyone who might be 
interested in this? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23269: [SPARK-26316] partial revert 21052 because of the perfor...

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23269
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5910/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23269: [SPARK-26316] partial revert 21052 because of the perfor...

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23269
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23269: [SPARK-26316] partial revert 21052 because of the perfor...

2018-12-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23269
  
@JkSelf . Since PR description already mentioned the background and the 
partital revert stuff, could you update the PR title to describe your solution 
more directly? `partial revert` doesn't give a clear idea.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240101927
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
+  .getOrElse(cols)
+  .mkString(",")
--- End diff --

Right, but I don't think it's more readable. It's non-critical path so 
performance concern is secondary anyway.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23268: [Hive][Minor] Refactor on HiveShim and Add Unit Tests

2018-12-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/23268
  
Yup, the test looks okay in that way. Let's file a JIRA and only leave the 
test case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23269: [SPARK-26316] partial revert 21052 because of the perfor...

2018-12-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23269
  
**[Test build #99896 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99896/testReport)**
 for PR 23269 at commit 
[`03cfe2b`](https://github.com/apache/spark/commit/03cfe2b7506f5c5421aaf2858f3f31f2153db8fb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23269: [SPARK-26316] partial revert 21052 because of the perfor...

2018-12-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23269
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-09 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/23208#discussion_r240101369
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -17,52 +17,49 @@
 
 package org.apache.spark.sql.execution.datasources.v2
 
-import java.util.UUID
-
-import scala.collection.JavaConverters._
+import java.util.{Optional, UUID}
 
 import org.apache.spark.sql.{AnalysisException, SaveMode}
 import org.apache.spark.sql.catalyst.analysis.{MultiInstanceRelation, 
NamedRelation}
 import org.apache.spark.sql.catalyst.expressions.{AttributeReference, 
Expression}
 import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, 
Statistics}
 import org.apache.spark.sql.catalyst.util.truncatedString
-import org.apache.spark.sql.sources.DataSourceRegister
 import org.apache.spark.sql.sources.v2._
 import org.apache.spark.sql.sources.v2.reader._
 import org.apache.spark.sql.sources.v2.writer.BatchWriteSupport
 import org.apache.spark.sql.types.StructType
 
 /**
- * A logical plan representing a data source v2 scan.
+ * A logical plan representing a data source v2 table.
  *
- * @param source An instance of a [[DataSourceV2]] implementation.
- * @param options The options for this scan. Used to create fresh 
[[BatchWriteSupport]].
- * @param userSpecifiedSchema The user-specified schema for this scan.
+ * @param table The table that this relation represents.
+ * @param options The options for this table operation. It's used to 
create fresh [[ScanBuilder]]
+ *and [[BatchWriteSupport]].
  */
 case class DataSourceV2Relation(
-// TODO: remove `source` when we finish API refactor for write.
-source: TableProvider,
-table: SupportsBatchRead,
+table: Table,
 output: Seq[AttributeReference],
-options: Map[String, String],
-userSpecifiedSchema: Option[StructType] = None)
+// TODO: use a simple case insensitive map instead.
+options: DataSourceOptions)
   extends LeafNode with MultiInstanceRelation with NamedRelation {
 
-  import DataSourceV2Relation._
-
   override def name: String = table.name()
 
   override def simpleString: String = {
 s"RelationV2${truncatedString(output, "[", ", ", "]")} $name"
   }
 
-  def newWriteSupport(): BatchWriteSupport = 
source.createWriteSupport(options, schema)
-
-  def newScanBuilder(): ScanBuilder = {
-val dsOptions = new DataSourceOptions(options.asJava)
-table.newScanBuilder(dsOptions)
+  def newWriteSupport(inputSchema: StructType, mode: SaveMode): 
Optional[BatchWriteSupport] = {
--- End diff --

Nit: add comment for the method. Especially when it will return None. 
Although it is explained in `SupportsBatchWrite.createBatchWriteSupport`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-09 Thread JkSelf
Github user JkSelf commented on the issue:

https://github.com/apache/spark/pull/23204
  
@cloud-fan the new ticket is in 
[here](https://github.com/apache/spark/pull/23269 ). I will close this ticket.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23268: [Hive][Minor] Refactor on HiveShim and Add Unit Tests

2018-12-09 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/23268
  
The existing way is too JAVA. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23204
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23204
  
**[Test build #99894 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99894/testReport)**
 for PR 23204 at commit 
[`9baf05e`](https://github.com/apache/spark/commit/9baf05e00f2839a1dfa5d3a23cf265e1fe21d2a6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23204
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99894/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23268: [Hive][Minor] Refactor on HiveShim and Add Unit Tests

2018-12-09 Thread sadhen
Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/23268
  
Nevermind, just do not like the coding style personally.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23269: partial revert 21052 because of the performance degradat...

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23269
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23269: partial revert 21052 because of the performance degradat...

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23269
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23269: partial revert 21052 because of the performance degradat...

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23269
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23269: partial revert 21052 because of the performance d...

2018-12-09 Thread JkSelf
GitHub user JkSelf opened a pull request:

https://github.com/apache/spark/pull/23269

partial revert 21052 because of the performance degradation in TPC-DS

## What changes were proposed in this pull request?
We tested TPC-DS in spark2.3 with and without  
[L486](https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L486)
 and 
[L487](https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L487)
 in following cluster configuration. And the result [tpc-ds 
result](https://docs.google.com/spreadsheets/d/18a5BdOlmm8euTaRodyeWum9yu92mbWWu6JbhGXtr7yE/edit#gid=0)
 has performance degradation. So we currently partial revert 21052.
**Cluster info:**

  | Master Node | Worker Nodes
-- | -- | --
Node | 1x | 4x
Processor | Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz | Intel(R) Xeon(R) 
Platinum 8180 CPU @ 2.50GHz
Memory | 192 GB | 384 GB
Storage Main | 8 x 960G SSD | 8 x 960G SSD
Network | 10Gbe |  
Role | CM Management NameNodeSecondary NameNodeResource ManagerHive 
Metastore Server | DataNodeNodeManager
OS Version | CentOS 7.2 | CentOS 7.2
Hadoop | Apache Hadoop 2.7.5 | Apache Hadoop 2.7.5
Hive | Apache Hive 2.2.0 |  
Spark | Apache Spark 2.1.0  & Apache Spark2.3.0 |  
JDK  version | 1.8.0_112 | 1.8.0_112

**Related parameters setting:**

Component | Parameter | Value
-- | -- | --
Yarn Resource Manager | yarn.scheduler.maximum-allocation-mb | 120GB
  | yarn.scheduler.minimum-allocation-mb | 1GB
  | yarn.scheduler.maximum-allocation-vcores | 121
  | Yarn.resourcemanager.scheduler.class | Fair Scheduler
Yarn Node Manager | yarn.nodemanager.resource.memory-mb | 120GB
  | yarn.nodemanager.resource.cpu-vcores | 121
Spark | spark.executor.memory | 110GB
  | spark.executor.cores | 50





## How was this patch tested?
N/A

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JkSelf/spark partial-revert-21052

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23269.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23269


commit 03cfe2b7506f5c5421aaf2858f3f31f2153db8fb
Author: jiake 
Date:   2018-12-10T06:50:32Z

partial revert 21052 because of the performance degradation in tpc-ds




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23255: [SPARK-26307] [SQL] Fix CTAS when INSERT a partit...

2018-12-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23255


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23255: [SPARK-26307] [SQL] Fix CTAS when INSERT a partitioned t...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23255
  
thanks, merging to master/2.4/2.3!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread sadhen
Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240098806
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
+  .getOrElse(cols)
+  .mkString(",")
--- End diff --

For comprehension is just a syntax sugar and is not performant at all. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread sadhen
Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240098620
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
+  .getOrElse(cols)
+  .mkString(",")
--- End diff --

As for performance:

Current Version:
```
[info] HiveShimSuite:
[info] - appendReadColumns (549 milliseconds)
```

Previous Version:
```
[info] HiveShimSuite:
[info] - appendReadColumns (949 milliseconds)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23268: [Hive][Minor] Refactor on HiveShim and Add Unit Tests

2018-12-09 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/23268
  
@sadhen What is the motivation of this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23268: [Hive][Minor] Refactor on HiveShim and Add Unit Tests

2018-12-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/23268
  
Let's close this one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240097931
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
--- End diff --

Also, looks the output would be different after this change.
Looks it was `old + col` but the current changes looks doing `col + old`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23211: [SPARK-19712][SQL] Move PullupCorrelatedPredicates and R...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23211
  
to make the PR smaller, can we add an individual rule 
`PushdownLeftSemiOrAntiJoin` first?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

2018-12-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23225


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23211: [SPARK-19712][SQL] Move PullupCorrelatedPredicate...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23211#discussion_r240097479
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -984,6 +1002,28 @@ object PushDownPredicate extends Rule[LogicalPlan] 
with PredicateHelper {
 
   project.copy(child = Filter(replaceAlias(condition, aliasMap), 
grandChild))
 
+// Similar to the above Filter over Project
+// LeftSemi/LeftAnti over Project
+case join @ Join(p @ Project(pList, grandChild), rightOp, 
LeftSemiOrAnti(joinType), joinCond)
--- End diff --

Shall we create a new rule `PushdownLeftSemaOrAntiJoin`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23268: [Hive][Minor] Refactor on HiveShim and Add Unit Tests

2018-12-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23268
  
**[Test build #99895 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99895/testReport)**
 for PR 23268 at commit 
[`b68e7b1`](https://github.com/apache/spark/commit/b68e7b1421f977c7256573564b12b4cc07d31f4a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23211: [SPARK-19712][SQL] Move PullupCorrelatedPredicate...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23211#discussion_r240097255
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -649,13 +664,16 @@ object CollapseProject extends Rule[LogicalPlan] {
 
   def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
 case p1 @ Project(_, p2: Project) =>
-  if (haveCommonNonDeterministicOutput(p1.projectList, 
p2.projectList)) {
+  if (haveCommonNonDeterministicOutput(p1.projectList, p2.projectList) 
||
+ScalarSubquery.hasScalarSubquery(p1.projectList) ||
+ScalarSubquery.hasScalarSubquery(p2.projectList)) {
--- End diff --

why do we allow it before?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...

2018-12-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23225
  
Thank you, @wangjiaochun .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23268: [Hive][Minor] Refactor on HiveShim and Add Unit Tests

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23268
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23268: [Hive][Minor] Refactor on HiveShim and Add Unit Tests

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23268
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240097105
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
+  .getOrElse(cols)
+  .mkString(",")
--- End diff --

I don't think this is more readable. Also, the previous code is more 
performant. I wouldn't change this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23255: [SPARK-26307] [SQL] Fix CTAS when INSERT a partitioned t...

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23255
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23255: [SPARK-26307] [SQL] Fix CTAS when INSERT a partitioned t...

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23255
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99891/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23255: [SPARK-26307] [SQL] Fix CTAS when INSERT a partitioned t...

2018-12-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23255
  
**[Test build #99891 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99891/testReport)**
 for PR 23255 at commit 
[`68c076a`](https://github.com/apache/spark/commit/68c076a34a1406057effb6953e2cedfadabd1324).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23204
  
can we follow 
https://github.com/apache/spark/pull/23204#issuecomment-445510026 and create a 
new ticket?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread sadhen
GitHub user sadhen opened a pull request:

https://github.com/apache/spark/pull/23268

[Hive][Minor] Refactor on HiveShim and Add Unit Tests

## What changes were proposed in this pull request?

Refactor on HiveShim, and add Unit Tests.

## How was this patch tested?
```
$ build/sbt
> project hive
> testOnly *HiveShimSuite
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sadhen/spark refactor/hiveshim

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23268.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23268


commit 987fa18a3b50e117fc839201fcc29c166732486e
Author: Darcy Shen 
Date:   2018-12-10T06:32:35Z

Add unit tests for HiveShim appendReadColumns

commit b68e7b1421f977c7256573564b12b4cc07d31f4a
Author: Darcy Shen 
Date:   2018-12-10T06:38:19Z

Refactor




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22290: [SPARK-25285][CORE] Add executor task metrics, successfu...

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22290
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-09 Thread LuciferYang
Github user LuciferYang commented on the issue:

https://github.com/apache/spark/pull/23204
  
ok~ already close #23214


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23214: [SPARK-26155] Optimizing the performance of LongT...

2018-12-09 Thread LuciferYang
Github user LuciferYang closed the pull request at:

https://github.com/apache/spark/pull/23214


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-09 Thread LuciferYang
Github user LuciferYang commented on the issue:

https://github.com/apache/spark/pull/23214
  
As @cloud-fan said `the hash join metrics is wrongly implemented`,  we will 
partial revert # SPARK-21052, no longer need this patch, close it ~


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23225
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23225
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99890/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-09 Thread JkSelf
Github user JkSelf commented on the issue:

https://github.com/apache/spark/pull/23204
  
@cloud-fan @dongjoon-hyun  update the patch, please help review if you have 
time. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...

2018-12-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23225
  
**[Test build #99890 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99890/testReport)**
 for PR 23225 at commit 
[`bac1991`](https://github.com/apache/spark/commit/bac1991a5c047fd546785179d19144f9c076aa58).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23204
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5909/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23204
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23204
  
**[Test build #99894 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99894/testReport)**
 for PR 23204 at commit 
[`9baf05e`](https://github.com/apache/spark/commit/9baf05e00f2839a1dfa5d3a23cf265e1fe21d2a6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23211: [SPARK-19712][SQL] Move PullupCorrelatedPredicate...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23211#discussion_r240092936
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
 ---
@@ -267,6 +267,17 @@ object ScalarSubquery {
   case _ => false
 }.isDefined
   }
+
+  def hasScalarSubquery(e: Expression): Boolean = {
+e.find {
+  case s: ScalarSubquery => true
+  case _ => false
+}.isDefined
+  }
+
+  def hasScalarSubquery(e: Seq[Expression]): Boolean = {
+e.find(hasScalarSubquery(_)).isDefined
--- End diff --

`e.exists(hasScalarSubquery)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-12-09 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22305#discussion_r240092271
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala
 ---
@@ -144,24 +282,107 @@ case class WindowInPandasExec(
 queue.close()
   }
 
-  val inputProj = UnsafeProjection.create(allInputs, child.output)
-  val pythonInput = grouped.map { case (_, rows) =>
-rows.map { row =>
-  queue.add(row.asInstanceOf[UnsafeRow])
-  inputProj(row)
+  val stream = iter.map { row =>
+queue.add(row.asInstanceOf[UnsafeRow])
+row
+  }
+
+  val pythonInput = new Iterator[Iterator[UnsafeRow]] {
+
+// Manage the stream and the grouping.
+var nextRow: UnsafeRow = null
+var nextGroup: UnsafeRow = null
+var nextRowAvailable: Boolean = false
+private[this] def fetchNextRow() {
+  nextRowAvailable = stream.hasNext
+  if (nextRowAvailable) {
+nextRow = stream.next().asInstanceOf[UnsafeRow]
+nextGroup = grouping(nextRow)
+  } else {
+nextRow = null
+nextGroup = null
+  }
+}
+fetchNextRow()
+
+// Manage the current partition.
+val buffer: ExternalAppendOnlyUnsafeRowArray =
+  new ExternalAppendOnlyUnsafeRowArray(inMemoryThreshold, 
spillThreshold)
+var bufferIterator: Iterator[UnsafeRow] = _
+
+val indexRow = new 
SpecificInternalRow(Array.fill(numBoundIndices)(IntegerType))
+
+val frames = factories.map(_(indexRow))
+
+private[this] def fetchNextPartition() {
+  // Collect all the rows in the current partition.
+  // Before we start to fetch new input rows, make a copy of 
nextGroup.
+  val currentGroup = nextGroup.copy()
+
+  // clear last partition
+  buffer.clear()
+
+  while (nextRowAvailable && nextGroup == currentGroup) {
--- End diff --

Thanks! Could you submit the PR to fix `WindowExec` please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21877: [SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS ...

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21877
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99893/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21877: [SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS ...

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21877
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21877: [SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS ...

2018-12-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21877
  
**[Test build #99893 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99893/testReport)**
 for PR 21877 at commit 
[`e50d94b`](https://github.com/apache/spark/commit/e50d94bbc205369969e0c1707bda057ee99f0007).
 * This patch **fails to build**.
 * This patch **does not merge cleanly**.
 * This patch adds the following public classes _(experimental)_:
  * `case class CreateTableAsSelect(`
  * `case class ReplaceTableAsSelect(`
  * `case class TableV2Relation(`
  * `case class AppendDataExec(`
  * `case class CreateTableAsSelectExec(`
  * `case class ReplaceTableAsSelectExec(`
  * `case class WriteToDataSourceV2Exec(`
  * `abstract class V2TableWriteExec(`
  * `  implicit class CatalogHelper(catalog: CatalogProvider) `
  * `  implicit class TableHelper(table: Table) `
  * `  implicit class SourceHelper(source: DataSourceV2) `
  * `  implicit class OptionsHelper(options: Map[String, String]) `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23248: [SPARK-26293][SQL] Cast exception when having python udf...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23248
  
If it's fine for 2.4, I think it's also fine for master as a temporary fix? 
We can create another ticket to clean up the subquery optimization hack. IIUC 
https://github.com/apache/spark/pull/23211 may help with it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23228: [MINOR][DOC] Update the condition description of seriali...

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23228
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5908/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21877: [SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS ...

2018-12-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21877
  
**[Test build #99893 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99893/testReport)**
 for PR 21877 at commit 
[`e50d94b`](https://github.com/apache/spark/commit/e50d94bbc205369969e0c1707bda057ee99f0007).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23228: [MINOR][DOC] Update the condition description of seriali...

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23228
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23258#discussion_r240090371
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -182,10 +182,13 @@ class SQLMetricsSuite extends SparkFunSuite with 
SQLMetricsTestUtils with Shared
   }
 
   test("Sort metrics") {
-// Assume the execution plan is
-// WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1))
-val ds = spark.range(10).sort('id)
-testSparkPlanMetrics(ds.toDF(), 2, Map.empty)
+// Assume the execution plan with node id is
+// Sort(nodeId = 0)
+//   Exchange(nodeId = 1)
+// LocalTableScan(nodeId = 2)
+val df = Seq(1, 3, 2).toDF("id").sort('id)
+testSparkPlanMetrics(df, 2, Map.empty)
--- End diff --

can we just check something like `sortTime  > 0`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23201#discussion_r240090192
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
 ---
@@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: 
JSONOptions) extends Serializable {
 DecimalType(bigDecimal.precision, bigDecimal.scale)
 }
 decimalTry.getOrElse(StringType)
-  case VALUE_STRING => StringType
+  case VALUE_STRING =>
+val stringValue = parser.getText
--- End diff --

the partition feature is shared between all the file-based sources, I think 
it's an overkill to make it differ with different data sources.

The simplest solution to me is asking all text sources to follow the 
behavior of partition value type inference.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23228: [MINOR][DOC] Update the condition description of seriali...

2018-12-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23228
  
**[Test build #99892 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99892/testReport)**
 for PR 23228 at commit 
[`d5dadbf`](https://github.com/apache/spark/commit/d5dadbf30d5429c36ec3d5c2845a71c2717fd6f3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23265: [2.4][SPARK-26021][SQL][FOLLOWUP] only deal with ...

2018-12-09 Thread cloud-fan
Github user cloud-fan closed the pull request at:

https://github.com/apache/spark/pull/23265


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23228: [MINOR][DOC] Update the condition description of seriali...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23228
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23204
  
If we can quickly finish #23214 (within several days), let's go for it. But 
if we can't, I'd suggest we do the partial revert first to fix the perf 
regression, and add back the metrics later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-09 Thread LuciferYang
Github user LuciferYang commented on the issue:

https://github.com/apache/spark/pull/23204
  
@cloud-fan  If we decide to partial revert SPARK-21052 and no need for 
#23214, I will close it.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23255: [SPARK-26307] [SQL] Fix CTAS when INSERT a partitioned t...

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23255
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23255: [SPARK-26307] [SQL] Fix CTAS when INSERT a partitioned t...

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23255
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5907/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23255: [SPARK-26307] [SQL] Fix CTAS when INSERT a partitioned t...

2018-12-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23255
  
**[Test build #99891 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99891/testReport)**
 for PR 23255 at commit 
[`68c076a`](https://github.com/apache/spark/commit/68c076a34a1406057effb6953e2cedfadabd1324).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...

2018-12-09 Thread Ngone51
Github user Ngone51 commented on the issue:

https://github.com/apache/spark/pull/23223
  
Hi @tgravescs , I tried it, but found it's difficult to produce  
KILLED_BY_RESOURCEMANAGER exit status. I followed 
[YARN-73](https://issues.apache.org/jira/browse/YARN-73) 
[YARN-495](https://issues.apache.org/jira/browse/YARN-495), but things didn't 
go as I expected.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23248: [SPARK-26293][SQL] Cast exception when having python udf...

2018-12-09 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/23248
  
LGTM to the surgical fix for backporting. 

We need to fix this rule with the other rules for avoiding making such a 
strong and hidden assumption. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23248: [SPARK-26293][SQL] Cast exception when having pyt...

2018-12-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/23248#discussion_r240079120
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
 ---
@@ -131,8 +131,20 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] 
with PredicateHelper {
 expressions.flatMap(collectEvaluableUDFs)
   }
 
-  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
-case plan: LogicalPlan => extract(plan)
+  def apply(plan: LogicalPlan): LogicalPlan = plan match {
+// SPARK-26293: A subquery will be rewritten into join later, and will 
go through this rule
+// eventually. Here we skip subquery, as Python UDF only needs to be 
extracted once.
+case _: Subquery => plan
--- End diff --

Basically, we want to ensure this rule is running once and only once. In 
the future, if we have another rule/function that calls 
Optimizer.this.execute(plan), this rule needs to be fixed again... We have a 
very strong hidden assumption in the implementation. This looks risky in the 
long term.

The current fix is fine for backporting to 2.4. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22707: [SPARK-25717][SQL] Insert overwrite a recreated e...

2018-12-09 Thread fjh100456
Github user fjh100456 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22707#discussion_r240077883
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala ---
@@ -774,4 +774,23 @@ class InsertSuite extends QueryTest with 
TestHiveSingleton with BeforeAndAfter
   }
 }
   }
+
+  test("SPARK-25717: Insert overwrite a recreated external and partitioned 
table "
--- End diff --

Ok, thank you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23266: [SPARK-26313][SQL] move read related methods from...

2018-12-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23266#discussion_r240074711
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java 
---
@@ -20,14 +20,27 @@
 import org.apache.spark.annotation.Evolving;
 import org.apache.spark.sql.sources.v2.reader.Scan;
 import org.apache.spark.sql.sources.v2.reader.ScanBuilder;
+import org.apache.spark.sql.types.StructType;
 
 /**
- * An empty mix-in interface for {@link Table}, to indicate this table 
supports batch scan.
- * 
- * If a {@link Table} implements this interface, its {@link 
Table#newScanBuilder(DataSourceOptions)}
- * must return a {@link ScanBuilder} that builds {@link Scan} with {@link 
Scan#toBatch()}
- * implemented.
- * 
+ * A mix-in interface for {@link Table} to provide data reading ability of 
batch processing.
  */
 @Evolving
-public interface SupportsBatchRead extends Table { }
+public interface SupportsBatchRead extends Table {
+
+  /**
+   * Returns the schema of this table.
+   */
+  StructType schema();
--- End diff --

To me, +1 for the current change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23266: [SPARK-26313][SQL] move read related methods from...

2018-12-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23266#discussion_r240073831
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java 
---
@@ -20,14 +20,27 @@
 import org.apache.spark.annotation.Evolving;
 import org.apache.spark.sql.sources.v2.reader.Scan;
 import org.apache.spark.sql.sources.v2.reader.ScanBuilder;
+import org.apache.spark.sql.types.StructType;
 
 /**
- * An empty mix-in interface for {@link Table}, to indicate this table 
supports batch scan.
- * 
- * If a {@link Table} implements this interface, its {@link 
Table#newScanBuilder(DataSourceOptions)}
- * must return a {@link ScanBuilder} that builds {@link Scan} with {@link 
Scan#toBatch()}
- * implemented.
- * 
+ * A mix-in interface for {@link Table} to provide data reading ability of 
batch processing.
  */
 @Evolving
-public interface SupportsBatchRead extends Table { }
+public interface SupportsBatchRead extends Table {
+
+  /**
+   * Returns the schema of this table.
+   */
+  StructType schema();
+
+  /**
+   * Returns a {@link ScanBuilder} which can be used to build a {@link 
Scan} later. The built
+   * {@link Scan} must implement {@link Scan#toBatch()}. Spark will call 
this method for each data
+   * scanning query.
+   * 
+   * The builder can take some query specific information to do operators 
pushdown, and keep these
+   * information in the created {@link Scan}.
+   * 
+   */
+  ScanBuilder newScanBuilder(DataSourceOptions options);
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22707: [SPARK-25717][SQL] Insert overwrite a recreated e...

2018-12-09 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22707#discussion_r240073151
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala ---
@@ -774,4 +774,23 @@ class InsertSuite extends QueryTest with 
TestHiveSingleton with BeforeAndAfter
   }
 }
   }
+
+  test("SPARK-25717: Insert overwrite a recreated external and partitioned 
table "
--- End diff --

How about `SPARK-25717: Insert overwrites remove old partition dirs 
correctly`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23072
  
It looks enough to me, @srowen .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16812: [SPARK-19465][SQL] Added options for custom boolean valu...

2018-12-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16812
  
I still don't think need this since the workaround is easy. If other 
committers find it worth, I won't object.
If there are no interests fro this PR afterwards, I would just close this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22707: [SPARK-25717][SQL] Insert overwrite a recreated e...

2018-12-09 Thread fjh100456
Github user fjh100456 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22707#discussion_r240070378
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -227,18 +227,22 @@ case class InsertIntoHiveTable(
   // Newer Hive largely improves insert overwrite performance. As 
Spark uses older Hive
   // version and we may not want to catch up new Hive version 
every time. We delete the
   // Hive partition first and then load data file into the Hive 
partition.
-  if (oldPart.nonEmpty && overwrite) {
-oldPart.get.storage.locationUri.foreach { uri =>
-  val partitionPath = new Path(uri)
-  val fs = partitionPath.getFileSystem(hadoopConf)
-  if (fs.exists(partitionPath)) {
-if (!fs.delete(partitionPath, true)) {
-  throw new RuntimeException(
-"Cannot remove partition directory '" + 
partitionPath.toString)
-}
-// Don't let Hive do overwrite operation since it is 
slower.
-doHiveOverwrite = false
+  if (overwrite) {
+val oldPartitionPath = 
oldPart.flatMap(_.storage.locationUri.map(new Path(_)))
+  .getOrElse {
+ExternalCatalogUtils.generatePartitionPath(
+  partitionSpec,
+  partitionColumnNames,
+  new Path(table.location))
--- End diff --

Oops, seems it's a mistake. The `oldPart` is empty.

Thank you very much,  I'll change the code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23228: [MINOR][DOC] Update the condition description of seriali...

2018-12-09 Thread 10110346
Github user 10110346 commented on the issue:

https://github.com/apache/spark/pull/23228
  
 I  have updated, thanks all.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23267: [SPARK-25401] [SQL] Reorder join predicates to match chi...

2018-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23267
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >