[GitHub] spark pull request #23161: [SPARK-26189][R]Fix unionAll doc in SparkR

2018-11-28 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/23161#discussion_r236973169
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2732,14 +2732,24 @@ setMethod("union",
 dataFrame(unioned)
   })
 
-#' Return a new SparkDataFrame containing the union of rows
-#'
-#' This is an alias for `union`.
+#' Return a new SparkDataFrame containing the union of rows.
+#' This is an alias for \code{union}.
 #'
-#' @rdname union
-#' @name unionAll
+#' @param x a SparkDataFrame.
+#' @param y a SparkDataFrame.
+#' @return A SparkDataFrame containing the result of the unionAll 
operation.
+#' @family SparkDataFrame functions
 #' @aliases unionAll,SparkDataFrame,SparkDataFrame-method
-#' @note unionAll since 1.4.0
+#' @rdname unionAll
+#' @name unionAll
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' df1 <- read.json(path)
+#' df2 <- read.json(path2)
+#' unionAllDF <- unionAll(df1, df2)
+#' }
+#' @note unionAll since 3.0.0
--- End diff --

this should be `1.4.0` as above


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23161: [SPARK-26189][R]Fix unionAll doc in SparkR

2018-11-28 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/23161#discussion_r236972877
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2732,14 +2732,24 @@ setMethod("union",
 dataFrame(unioned)
   })
 
-#' Return a new SparkDataFrame containing the union of rows
-#'
-#' This is an alias for `union`.
+#' Return a new SparkDataFrame containing the union of rows.
+#' This is an alias for \code{union}.
 #'
-#' @rdname union
-#' @name unionAll
+#' @param x a SparkDataFrame.
+#' @param y a SparkDataFrame.
+#' @return A SparkDataFrame containing the result of the unionAll 
operation.
+#' @family SparkDataFrame functions
 #' @aliases unionAll,SparkDataFrame,SparkDataFrame-method
-#' @note unionAll since 1.4.0
+#' @rdname unionAll
+#' @name unionAll
--- End diff --

I'd add a `@seealso` to union


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23128: [SPARK-26142][SQL] Implement shuffle read metrics in SQL

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23128
  
**[Test build #99355 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99355/testReport)**
 for PR 23128 at commit 
[`d12ea31`](https://github.com/apache/spark/commit/d12ea311e58e7925f21d343e5de13bfec6737549).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23162: [MINOR][DOC] Correct some document description errors

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23162
  
**[Test build #99352 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99352/testReport)**
 for PR 23162 at commit 
[`e9aba19`](https://github.com/apache/spark/commit/e9aba19b526610f3f31fa6a5b56140f6be8dc1c1).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23083
  
**[Test build #99351 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99351/testReport)**
 for PR 23083 at commit 
[`1723819`](https://github.com/apache/spark/commit/17238196719de1e68cbcb1eb930cb3176308e437).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23052
  
**[Test build #99354 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99354/testReport)**
 for PR 23052 at commit 
[`76e1466`](https://github.com/apache/spark/commit/76e1466a39aa2a40d999791bb9d3b09628921e85).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22590: [SPARK-25574][SQL]Add an option `keepQuotes` for parsing...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22590
  
**[Test build #99353 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99353/testReport)**
 for PR 22590 at commit 
[`9e3c4bd`](https://github.com/apache/spark/commit/9e3c4bda06011cf6b4d21321d8e7336495839325).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23124
  
**[Test build #99357 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99357/testReport)**
 for PR 23124 at commit 
[`6dff654`](https://github.com/apache/spark/commit/6dff6545f272e0d5117ac17fdc27b686573c5626).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23124
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23153: [SPARK-26147][SQL] only pull out unevaluable python udf ...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23153
  
**[Test build #99356 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99356/testReport)**
 for PR 23153 at commit 
[`7b985d8`](https://github.com/apache/spark/commit/7b985d84cb0fd853d40610b2380313389791298e).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23124
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99357/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23153: [SPARK-26147][SQL] only pull out unevaluable python udf ...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23153
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23162: [MINOR][DOC] Correct some document description errors

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23162
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99352/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23162: [MINOR][DOC] Correct some document description errors

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23162
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23052
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99354/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22590: [SPARK-25574][SQL]Add an option `keepQuotes` for parsing...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22590
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99353/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23083
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23052
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22590: [SPARK-25574][SQL]Add an option `keepQuotes` for parsing...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22590
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23153: [SPARK-26147][SQL] only pull out unevaluable python udf ...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23153
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99356/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23128: [SPARK-26142][SQL] Implement shuffle read metrics in SQL

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23128
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23128: [SPARK-26142][SQL] Implement shuffle read metrics in SQL

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23128
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99355/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

2018-11-28 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23083
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23083
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99351/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23052
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23153: [SPARK-26147][SQL] only pull out unevaluable python udf ...

2018-11-28 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23153
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23128: [SPARK-26142][SQL] Implement shuffle read metrics in SQL

2018-11-28 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23128
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23153: [SPARK-26147][SQL] only pull out unevaluable python udf ...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23153
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5435/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23153: [SPARK-26147][SQL] only pull out unevaluable python udf ...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23153
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23052
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23052
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5438/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23128: [SPARK-26142][SQL] Implement shuffle read metrics in SQL

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23128
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23083
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5437/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23128: [SPARK-26142][SQL] Implement shuffle read metrics...

2018-11-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23128#discussion_r236976020
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala ---
@@ -154,7 +156,10 @@ class ShuffledRowRDD(
 
   override def compute(split: Partition, context: TaskContext): 
Iterator[InternalRow] = {
 val shuffledRowPartition = split.asInstanceOf[ShuffledRowRDDPartition]
-val metrics = context.taskMetrics().createTempShuffleReadMetrics()
+val tempMetrics = context.taskMetrics().createTempShuffleReadMetrics()
+// Wrap the tempMetrics with SQLShuffleMetricsReporter here to support
+// shuffle read metrics in SQL.
--- End diff --

```
// `SQLShuffleMetricsReporter` will update its own metrics for SQL exchange 
operator,
// as well as the `tempMetrics` for basic shuffle metrics.
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23153: [SPARK-26147][SQL] only pull out unevaluable python udf ...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23153
  
**[Test build #99358 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99358/testReport)**
 for PR 23153 at commit 
[`7b985d8`](https://github.com/apache/spark/commit/7b985d84cb0fd853d40610b2380313389791298e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23083
  
**[Test build #99360 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99360/testReport)**
 for PR 23083 at commit 
[`1723819`](https://github.com/apache/spark/commit/17238196719de1e68cbcb1eb930cb3176308e437).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23083
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23052
  
**[Test build #99361 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99361/testReport)**
 for PR 23052 at commit 
[`76e1466`](https://github.com/apache/spark/commit/76e1466a39aa2a40d999791bb9d3b09628921e85).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23128: [SPARK-26142][SQL] Implement shuffle read metrics in SQL

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23128
  
**[Test build #99359 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99359/testReport)**
 for PR 23128 at commit 
[`d12ea31`](https://github.com/apache/spark/commit/d12ea311e58e7925f21d343e5de13bfec6737549).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23128: [SPARK-26142][SQL] Implement shuffle read metrics in SQL

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23128
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5436/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...

2018-11-28 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23124
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...

2018-11-28 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23124#discussion_r236973308
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala
 ---
@@ -19,6 +19,12 @@ package org.apache.spark.sql.catalyst.util
 
 import java.util.{Map => JavaMap}
 
+/**
+ * A simple `MapData` implementation which is backed by 2 arrays.
+ *
+ * Note that, user is responsible to guarantee that the key array does not 
have duplicated
+ * elements, otherwise the behavior is undefined.
--- End diff --

nit: we might need to add the same note to the 3rd and 4th 
`ArrayBasedMapData.apply()` method.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23124
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5439/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23124
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23141: [SPARK-26021][SQL][followup] add test for special floati...

2018-11-28 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23141
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23124
  
**[Test build #99362 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99362/testReport)**
 for PR 23124 at commit 
[`6dff654`](https://github.com/apache/spark/commit/6dff6545f272e0d5117ac17fdc27b686573c5626).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...

2018-11-28 Thread adrian-wang
Github user adrian-wang commented on the issue:

https://github.com/apache/spark/pull/23152
  
@srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23141: [SPARK-26021][SQL][followup] add test for special...

2018-11-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23141


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23128: [SPARK-26142][SQL] Implement shuffle read metrics...

2018-11-28 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/23128#discussion_r236982643
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala ---
@@ -154,7 +156,10 @@ class ShuffledRowRDD(
 
   override def compute(split: Partition, context: TaskContext): 
Iterator[InternalRow] = {
 val shuffledRowPartition = split.asInstanceOf[ShuffledRowRDDPartition]
-val metrics = context.taskMetrics().createTempShuffleReadMetrics()
+val tempMetrics = context.taskMetrics().createTempShuffleReadMetrics()
+// Wrap the tempMetrics with SQLShuffleMetricsReporter here to support
+// shuffle read metrics in SQL.
--- End diff --

Thanks Wenchen, done in 8e84c5b.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23128: [SPARK-26142][SQL] Implement shuffle read metrics in SQL

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23128
  
**[Test build #99363 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99363/testReport)**
 for PR 23128 at commit 
[`8e84c5b`](https://github.com/apache/spark/commit/8e84c5bbfc4b9151310bce84c1506c6aad449011).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23128: [SPARK-26142][SQL] Implement shuffle read metrics in SQL

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23128
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5440/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23128: [SPARK-26142][SQL] Implement shuffle read metrics in SQL

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23128
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23136: [SPARK-25515][K8s] Adds a config option to keep executor...

2018-11-28 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/23136
  
@mccheah gentle ping.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23155: [MINOR][K8S] add missing docs for podTemplateCont...

2018-11-28 Thread aditanase
Github user aditanase commented on a diff in the pull request:

https://github.com/apache/spark/pull/23155#discussion_r236990958
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -932,15 +935,31 @@ specific to Spark on Kubernetes.
   (none)
   
Specify the local file that contains the driver [pod 
template](#pod-template). For example
-   
spark.kubernetes.driver.podTemplateFile=/path/to/driver-pod-template.yaml`
+   
spark.kubernetes.driver.podTemplateFile=/path/to/driver-pod-template.yaml
   
 
 
   spark.kubernetes.executor.podTemplateFile
   (none)
   
Specify the local file that contains the executor [pod 
template](#pod-template). For example
-   
spark.kubernetes.executor.podTemplateFile=/path/to/executor-pod-template.yaml`
+   
spark.kubernetes.executor.podTemplateFile=/path/to/executor-pod-template.yaml
+  
+
+
+  spark.kubernetes.driver.podTemplateContainerName
--- End diff --

Thanks, done. I'll squash the commits before merging.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23155: [MINOR][K8S] add missing docs for podTemplateContainerNa...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23155
  
**[Test build #99364 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99364/testReport)**
 for PR 23155 at commit 
[`290b764`](https://github.com/apache/spark/commit/290b76443c379e99713be360a176fe771858eade).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22168: [SPARK-24985][SQL][WIP] Fix OOM in Full Outer Joi...

2018-11-28 Thread sujithjay
Github user sujithjay commented on a diff in the pull request:

https://github.com/apache/spark/pull/22168#discussion_r236992453
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -1058,31 +1064,37 @@ private class SortMergeFullOuterJoinScanner(
* @return true if a valid match is found, false otherwise.
*/
   private def scanNextInBuffered(): Boolean = {
-while (leftIndex < leftMatches.size) {
-  while (rightIndex < rightMatches.size) {
-joinedRow(leftMatches(leftIndex), rightMatches(rightIndex))
-if (boundCondition(joinedRow)) {
-  leftMatched.set(leftIndex)
-  rightMatched.set(rightIndex)
+val leftMatchesIterator = leftMatches.generateIterator(leftIndex)
+
+while (leftMatchesIterator.hasNext) {
+  val leftCurRow = leftMatchesIterator.next()
+  val rightMatchesIterator = rightMatches.generateIterator(rightIndex)
--- End diff --

Hi @viirya ,
After some deliberation, I figure it would not be possible to avoid the 
reinitialisation of the right iterator. Please share your thought on this. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23162: [MINOR][DOC] Correct some document description errors

2018-11-28 Thread 10110346
Github user 10110346 commented on the issue:

https://github.com/apache/spark/pull/23162
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23136: [SPARK-25515][K8s] Adds a config option to keep executor...

2018-11-28 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/23136
  
tests failed but not due to this PR. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23162: [MINOR][DOC] Correct some document description errors

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23162
  
**[Test build #99365 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99365/testReport)**
 for PR 23162 at commit 
[`e9aba19`](https://github.com/apache/spark/commit/e9aba19b526610f3f31fa6a5b56140f6be8dc1c1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23155: [MINOR][K8S] add missing docs for podTemplateContainerNa...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23155
  
**[Test build #99364 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99364/testReport)**
 for PR 23155 at commit 
[`290b764`](https://github.com/apache/spark/commit/290b76443c379e99713be360a176fe771858eade).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23155: [MINOR][K8S] add missing docs for podTemplateContainerNa...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23155
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23155: [MINOR][K8S] add missing docs for podTemplateContainerNa...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23155
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99364/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIn...

2018-11-28 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21004#discussion_r236998030
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala
 ---
@@ -126,35 +126,32 @@ abstract class PartitioningAwareFileIndex(
 val caseInsensitiveOptions = CaseInsensitiveMap(parameters)
 val timeZoneId = 
caseInsensitiveOptions.get(DateTimeUtils.TIMEZONE_OPTION)
   .getOrElse(sparkSession.sessionState.conf.sessionLocalTimeZone)
-
-userPartitionSchema match {
+val inferredPartitionSpec = PartitioningUtils.parsePartitions(
+  leafDirs,
+  typeInference = 
sparkSession.sessionState.conf.partitionColumnTypeInferenceEnabled,
--- End diff --

this is causing a behavior change in Spark 2.4.0 reported in SPARK-26188. 
Why did we need this change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23155: [MINOR][K8S] add missing docs for podTemplateContainerNa...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23155
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/5441/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23162: [MINOR][DOC] Correct some document description errors

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23162
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23162: [MINOR][DOC] Correct some document description errors

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23162
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5442/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...

2018-11-28 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/23124
  
LGTM too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22764: [SPARK-25765][ML] Add training cost to BisectingKMeans s...

2018-11-28 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/22764
  
@dbtsai any luck with this? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23155: [MINOR][K8S] add missing docs for podTemplateContainerNa...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23155
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/5441/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23155: [MINOR][K8S] add missing docs for podTemplateContainerNa...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23155
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23155: [MINOR][K8S] add missing docs for podTemplateContainerNa...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23155
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5441/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23057: [SPARK-26078][SQL] Dedup self-join attributes on IN subq...

2018-11-28 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/23057
  
cc @gatorsmile too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

2018-11-28 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/22957
  
cc @cloud-fan @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22947: [SPARK-24913][SQL] Make AssertNotNull and AssertTrue non...

2018-11-28 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/22947
  
any more comments on this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23100: [SPARK-26133][ML] Remove deprecated OneHotEncoder and re...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23100
  
**[Test build #99366 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99366/testReport)**
 for PR 23100 at commit 
[`9e3dab7`](https://github.com/apache/spark/commit/9e3dab7388a6de18b7cf8ddcc8cc2c73a6efea67).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23100: [SPARK-26133][ML] Remove deprecated OneHotEncoder and re...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23100
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5443/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23100: [SPARK-26133][ML] Remove deprecated OneHotEncoder and re...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23100
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23100: [SPARK-26133][ML] Remove deprecated OneHotEncoder...

2018-11-28 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/23100#discussion_r237010801
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala ---
@@ -17,126 +17,512 @@
 
 package org.apache.spark.ml.feature
 
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.SparkException
 import org.apache.spark.annotation.Since
-import org.apache.spark.ml.Transformer
+import org.apache.spark.ml.{Estimator, Model}
 import org.apache.spark.ml.attribute._
 import org.apache.spark.ml.linalg.Vectors
 import org.apache.spark.ml.param._
-import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol}
+import org.apache.spark.ml.param.shared.{HasHandleInvalid, HasInputCols, 
HasOutputCols}
 import org.apache.spark.ml.util._
 import org.apache.spark.sql.{DataFrame, Dataset}
-import org.apache.spark.sql.functions.{col, udf}
-import org.apache.spark.sql.types.{DoubleType, NumericType, StructType}
+import org.apache.spark.sql.expressions.UserDefinedFunction
+import org.apache.spark.sql.functions.{col, lit, udf}
+import org.apache.spark.sql.types.{DoubleType, StructField, StructType}
+
+/** Private trait for params and common methods for OneHotEncoder and 
OneHotEncoderModel */
+private[ml] trait OneHotEncoderBase extends Params with HasHandleInvalid
+with HasInputCols with HasOutputCols {
+
+  /**
+   * Param for how to handle invalid data during transform().
+   * Options are 'keep' (invalid data presented as an extra categorical 
feature) or
+   * 'error' (throw an error).
+   * Note that this Param is only used during transform; during fitting, 
invalid data
+   * will result in an error.
+   * Default: "error"
+   * @group param
+   */
+  @Since("2.3.0")
--- End diff --

I changed since tag of renamed `OneHotEncoder` to `3.0.0`.

Because this `OneHotEncoderBase` is not renamed, I didn't change its since 
tag.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23100: [SPARK-26133][ML] Remove deprecated OneHotEncoder and re...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23100
  
**[Test build #99367 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99367/testReport)**
 for PR 23100 at commit 
[`a451756`](https://github.com/apache/spark/commit/a451756f393c107963b893ba4a74c1d6ade33dd0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...

2018-11-28 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/23120
  
@cloud-fan May I ask you to take a look at the PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23100: [SPARK-26133][ML] Remove deprecated OneHotEncoder and re...

2018-11-28 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/23100
  
Except for changing `Since` tag, is there any other comments?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23163: [SPARK-26164][SQL] Allow FileFormatWriter to writ...

2018-11-28 Thread c21
GitHub user c21 opened a pull request:

https://github.com/apache/spark/pull/23163

[SPARK-26164][SQL] Allow FileFormatWriter to write multiple 
partitions/buckets without sort

## What changes were proposed in this pull request?

Currently spark always requires a local sort before writing to output table 
on partition/bucket columns (see `write.requiredOrdering` in 
`FileFormatWriter.scala`), which is unnecessary, and can be avoided by keeping 
multiple output writers concurrently in `FileFormatDataWriter.scala`.

This pr is first doing hash-based write, then falling back to sort-based 
write (current implementation) when number of opened writer exceeding a 
threshold (controlled by a config). Specifically:

1. (hash-based write) Maintain mapping between file path and output writer, 
and re-use writer for writing input row. In case of the number of opened output 
writers exceeding a threshold (can be changed by a config), we go to 2.

2. (sort-based write) Sort the rest of input rows (use the same sorter in 
SortExec). Then writing the rest of sorted rows, and we can close the writer on 
the fly, in case no more rows for current file path.

## How was this patch tested?

Added unit test in `DataFrameReaderWriterSuite.scala`. Existing test like 
`SQLMetricsSuite.scala` would already exercise the code path of executor write 
metrics.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/c21/spark more-writers

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23163.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23163


commit c2e81eb2f9cdc1b290c098d228d477f325a24101
Author: Cheng Su 
Date:   2018-11-28T09:46:35Z

Allow FileFormatWriter to write multiple partitions/buckets without sort




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23163: [SPARK-26164][SQL] Allow FileFormatWriter to write multi...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23163
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23100: [SPARK-26133][ML] Remove deprecated OneHotEncoder and re...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23100
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5444/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23100: [SPARK-26133][ML] Remove deprecated OneHotEncoder and re...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23100
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23163: [SPARK-26164][SQL] Allow FileFormatWriter to write multi...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23163
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23163: [SPARK-26164][SQL] Allow FileFormatWriter to write multi...

2018-11-28 Thread c21
Github user c21 commented on the issue:

https://github.com/apache/spark/pull/23163
  
cc people who have most context for review - @cloud-fan, @tejasapatil and 
@sameeragarwal. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23163: [SPARK-26164][SQL] Allow FileFormatWriter to write multi...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23163
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...

2018-11-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23055#discussion_r237021410
  
--- Diff: python/pyspark/worker.py ---
@@ -22,7 +22,12 @@
 import os
 import sys
 import time
-import resource
+# 'resource' is a Unix specific module.
+has_resource_module = True
+try:
+import resource
+except ImportError:
+has_resource_module = False
--- End diff --

@vanzin, I'm going to remove this flag. Does it then look okay to you?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23052
  
**[Test build #99361 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99361/testReport)**
 for PR 23052 at commit 
[`76e1466`](https://github.com/apache/spark/commit/76e1466a39aa2a40d999791bb9d3b09628921e85).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23052
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23052
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99361/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23052
  
seems like a real failure


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-28 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/23052
  
> seems like a real failure

I am looking at it. It seems the test is not deterministic. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIn...

2018-11-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21004#discussion_r237040743
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala
 ---
@@ -126,35 +126,32 @@ abstract class PartitioningAwareFileIndex(
 val caseInsensitiveOptions = CaseInsensitiveMap(parameters)
 val timeZoneId = 
caseInsensitiveOptions.get(DateTimeUtils.TIMEZONE_OPTION)
   .getOrElse(sparkSession.sessionState.conf.sessionLocalTimeZone)
-
-userPartitionSchema match {
+val inferredPartitionSpec = PartitioningUtils.parsePartitions(
+  leafDirs,
+  typeInference = 
sparkSession.sessionState.conf.partitionColumnTypeInferenceEnabled,
--- End diff --

Before this patch, there was a subtle difference between with and without a 
user-provided partition schema:
1. with user-provided partition schema, we should not infer data types. We 
should infer as string and cast to user-provided type
2. without user-provided partition schema, we should infer the data 
type(with a config)

So it was wrong to unify these 2 code paths. @gengliangwang can you change 
it back?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23124
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5445/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...

2018-11-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23124
  
**[Test build #99368 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99368/testReport)**
 for PR 23124 at commit 
[`72c771e`](https://github.com/apache/spark/commit/72c771e691ae1071742bde5a612593d38f147ff5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...

2018-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23124
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23130: [SPARK-26161][SQL] Ignore empty files in load

2018-11-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23130#discussion_r237045706
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/SaveLoadSuite.scala ---
@@ -142,4 +144,15 @@ class SaveLoadSuite extends DataSourceTest with 
SharedSQLContext with BeforeAndA
   assert(e.contains(s"Partition column `$unknown` not found in schema 
$schemaCatalog"))
 }
   }
+
+  test("skip empty files in non bucketed read") {
+withTempDir { dir =>
+  val path = dir.getCanonicalPath
+  Files.write(Paths.get(path, "empty"), Array.empty[Byte])
+  Files.write(Paths.get(path, "notEmpty"), 
"a".getBytes(StandardCharsets.UTF_8))
+  val readback = spark.read.option("wholetext", true).text(path)
+
+  assert(readback.rdd.getNumPartitions === 1)
--- End diff --

does this test fail without your change? IIUC one partition can read 
multiple files. Is JSON the only data source that may return a row for empty 
file?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23120: [SPARK-26151][SQL] Return partial results for bad...

2018-11-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23120#discussion_r237046251
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 ---
@@ -243,21 +243,27 @@ class UnivocityParser(
 () => getPartialResult(),
 new RuntimeException("Malformed CSV record"))
 } else {
-  try {
-// When the length of the returned tokens is identical to the 
length of the parsed schema,
-// we just need to convert the tokens that correspond to the 
required columns.
-var i = 0
-while (i < requiredSchema.length) {
+  // When the length of the returned tokens is identical to the length 
of the parsed schema,
+  // we just need to convert the tokens that correspond to the 
required columns.
+  var badRecordException: Option[Throwable] = None
+  var i = 0
+  while (i < requiredSchema.length) {
--- End diff --

shall we stop parsing when we hit the first exception?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >