date:20181103

[GitHub] spark issue #22934: [INFRA] Close stale PRs

2018-11-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22934
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()

2018-11-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22626
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22626: [SPARK-25638][SQL] Adding new function - to_csv()

2018-11-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22626#discussion_r230577431
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala
 ---
@@ -174,3 +176,68 @@ case class SchemaOfCsv(
 
   override def prettyName: String = "schema_of_csv"
 }
+
+/**
+ * Converts a [[StructType]] to a CSV output string.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr[, options]) - Returns a CSV string with a given 
struct value",
+  examples = """
+Examples:
+  > SELECT _FUNC_(named_struct('a', 1, 'b', 2));
+   1,2
+  > SELECT _FUNC_(named_struct('time', to_timestamp('2015-08-26', 
'-MM-dd')), map('timestampFormat', 'dd/MM/'));
+   "26/08/2015"
+  """,
+  since = "3.0.0")
+// scalastyle:on line.size.limit
+case class StructsToCsv(
+ options: Map[String, String],
+ child: Expression,
+ timeZoneId: Option[String] = None)
+  extends UnaryExpression with TimeZoneAwareExpression with 
CodegenFallback with ExpectsInputTypes {
+  override def nullable: Boolean = true
+
+  def this(options: Map[String, String], child: Expression) = 
this(options, child, None)
+
+  // Used in `FunctionRegistry`
+  def this(child: Expression) = this(Map.empty, child, None)
+
+  def this(child: Expression, options: Expression) =
+this(
+  options = ExprUtils.convertToMapData(options),
+  child = child,
+  timeZoneId = None)
+
+  @transient
+  lazy val writer = new CharArrayWriter()
+
+  @transient
+  lazy val inputSchema: StructType = child.dataType match {
+case st: StructType => st
+case other =>
+  throw new IllegalArgumentException(s"Unsupported input type 
${other.catalogString}")
+  }
+
+  @transient
+  lazy val gen = new UnivocityGenerator(
+inputSchema, writer, new CSVOptions(options, columnPruning = true, 
timeZoneId.get))
--- End diff --

nit: We wouldn't need `lazy val writer` then but just `new 
CharArrayWriter()` here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22919: [SPARK-25906][SHELL] Documents '-I' option (from Scala R...

2018-11-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22919
  
Let me get this in to master and branch-2.4 in few days if there are no 
more comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22754
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98434/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22754
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...

2018-11-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22754
  
**[Test build #98434 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98434/testReport)**
 for PR 22754 at commit 
[`c883f4b`](https://github.com/apache/spark/commit/c883f4b1b26139e3cb2053b8efa0b0bb0f902be0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...

2018-11-03 Thread arman1371

Github user arman1371 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22889#discussion_r230575855
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -883,6 +883,31 @@ class Dataset[T] private[sql](
 join(right, Seq(usingColumn))
   }
 
+  /**
+* Equi-join with another `DataFrame` using the given column.
+*
+* Different from other join functions, the join column will only 
appear once in the output,
+* i.e. similar to SQL's `JOIN USING` syntax.
+*
+* {{{
+*   // Left join of df1 and df2 using the column "user_id"
+*   df1.join(df2, "user_id", "left")
+* }}}
+*
+* @param right Right side of the join operation.
+* @param usingColumn Name of the column to join on. This column must 
exist on both sides.
+* @param joinType Type of join to perform. Default `inner`. Must be 
one of:
+* `inner`, `cross`, `outer`, `full`, `full_outer`, 
`left`, `left_outer`,
+* `right`, `right_outer`, `left_semi`, `left_anti`.
+* @note If you perform a self-join using this function without 
aliasing the input
+* `DataFrame`s, you will NOT be able to reference any columns after 
the join, since
+* there is no way to disambiguate which side of the join you would 
like to reference.
+* @group untypedrel
+*/
+  def join(right: Dataset[_], usingColumn: String, joinType: String): 
DataFrame = {
--- End diff --

Yes, for scala api it's just 5 characters but in java api it's very hard to 
change `String` to `Seq[String]`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22867: [SPARK-25778] WriteAheadLogBackedBlockRDD in YARN Cluste...

2018-11-03 Thread gss2002

Github user gss2002 commented on the issue:

https://github.com/apache/spark/pull/22867
  
@vanzin can you please review latest patch thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...

2018-11-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22914
  
**[Test build #98436 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98436/testReport)**
 for PR 22914 at commit 
[`64aa7a7`](https://github.com/apache/spark/commit/64aa7a7ab2f4d2653aa4310d30884b26eb712ed1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...

2018-11-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22914
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22818
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...

2018-11-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22818
  
**[Test build #98435 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98435/testReport)**
 for PR 22818 at commit 
[`ca3efd8`](https://github.com/apache/spark/commit/ca3efd8f636706abf8c994cb75c14432f4e4939a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22818
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4744/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...

2018-11-03 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22818
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...

2018-11-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22754
  
**[Test build #98434 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98434/testReport)**
 for PR 22754 at commit 
[`c883f4b`](https://github.com/apache/spark/commit/c883f4b1b26139e3cb2053b8efa0b0bb0f902be0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22754
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4743/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22754
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...

2018-11-03 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22754
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...

2018-11-03 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22754
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22921
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22921
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98433/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...

2018-11-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22921
  
**[Test build #98433 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98433/testReport)**
 for PR 22921 at commit 
[`df92f0f`](https://github.com/apache/spark/commit/df92f0fd48883adfd1c4881c03d86665b93d0831).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22683
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98432/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22683
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-11-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22683
  
**[Test build #98432 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98432/testReport)**
 for PR 22683 at commit 
[`9e45697`](https://github.com/apache/spark/commit/9e45697296039e55e85dd204788e287c9c60fceb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...

2018-11-03 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22889#discussion_r230571447
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -883,6 +883,31 @@ class Dataset[T] private[sql](
 join(right, Seq(usingColumn))
   }
 
+  /**
+* Equi-join with another `DataFrame` using the given column.
--- End diff --

Please fix the indentation here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...

2018-11-03 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22889#discussion_r230571437
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -883,6 +883,31 @@ class Dataset[T] private[sql](
 join(right, Seq(usingColumn))
   }
 
+  /**
+* Equi-join with another `DataFrame` using the given column.
+*
+* Different from other join functions, the join column will only 
appear once in the output,
+* i.e. similar to SQL's `JOIN USING` syntax.
+*
+* {{{
+*   // Left join of df1 and df2 using the column "user_id"
+*   df1.join(df2, "user_id", "left")
+* }}}
+*
+* @param right Right side of the join operation.
+* @param usingColumn Name of the column to join on. This column must 
exist on both sides.
+* @param joinType Type of join to perform. Default `inner`. Must be 
one of:
+* `inner`, `cross`, `outer`, `full`, `full_outer`, 
`left`, `left_outer`,
+* `right`, `right_outer`, `left_semi`, `left_anti`.
+* @note If you perform a self-join using this function without 
aliasing the input
+* `DataFrame`s, you will NOT be able to reference any columns after 
the join, since
+* there is no way to disambiguate which side of the join you would 
like to reference.
+* @group untypedrel
+*/
+  def join(right: Dataset[_], usingColumn: String, joinType: String): 
DataFrame = {
--- End diff --

@arman1371 . We understand that this PR is trying to add a syntactic sugar. 
But you need only 5 characters, `Seq(` and `)`, to use the existing general 
API. Personally, I agree with @wangyum . I prefer not to add this.

Historically, 
1. Spark 1.4 adds `Seq[String]` version was added later to support PySpark 
(SPARK-7990)
2. Spark 1.6 adds `join` type to `Seq[String]` version (SPARK-10446)

It's a long time ago. Given that, I guess Apache Spark community 
intentionally didn't add the `String` version for this in order to keep 
`Dataset` simple in terms of the number of APIs. Anyway, since you need an 
answer, let's ask the general opinion again to make a decision.

Hi, @rxin, @cloud-fan , @gatorsmile  . Did we explicitly decide not to add 
this API ? It seems that @arman1371 wants to add this for feature parity with 
PySpark at Spark 3.0.0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...

2018-11-03 Thread wangyum

Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22889#discussion_r230570164
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -883,6 +883,31 @@ class Dataset[T] private[sql](
 join(right, Seq(usingColumn))
   }
 
+  /**
+* Equi-join with another `DataFrame` using the given column.
+*
+* Different from other join functions, the join column will only 
appear once in the output,
+* i.e. similar to SQL's `JOIN USING` syntax.
+*
+* {{{
+*   // Left join of df1 and df2 using the column "user_id"
+*   df1.join(df2, "user_id", "left")
+* }}}
+*
+* @param right Right side of the join operation.
+* @param usingColumn Name of the column to join on. This column must 
exist on both sides.
+* @param joinType Type of join to perform. Default `inner`. Must be 
one of:
+* `inner`, `cross`, `outer`, `full`, `full_outer`, 
`left`, `left_outer`,
+* `right`, `right_outer`, `left_semi`, `left_anti`.
+* @note If you perform a self-join using this function without 
aliasing the input
+* `DataFrame`s, you will NOT be able to reference any columns after 
the join, since
+* there is no way to disambiguate which side of the join you would 
like to reference.
+* @group untypedrel
+*/
+  def join(right: Dataset[_], usingColumn: String, joinType: String): 
DataFrame = {
--- End diff --

cc @dongjoon-hyun


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...

2018-11-03 Thread shahidki31

Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22914
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22921
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22921
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4742/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...

2018-11-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22921
  
**[Test build #98433 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98433/testReport)**
 for PR 22921 at commit 
[`df92f0f`](https://github.com/apache/spark/commit/df92f0fd48883adfd1c4881c03d86665b93d0831).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...

2018-11-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22921
  
Yeah it's a good point that these weren't deprecated, but I assume they 
should have been. Same change, same time, same logic. given that it's a 
reasonably niche method, I thought it would be best to go ahead and be 
consistent here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...

2018-11-03 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22921#discussion_r230568378
  
--- Diff: R/pkg/R/functions.R ---
@@ -319,23 +319,23 @@ setMethod("acos",
   })
 
 #' @details
-#' \code{approxCountDistinct}: Returns the approximate number of distinct 
items in a group.
+#' \code{approx_count_distinct}: Returns the approximate number of 
distinct items in a group.
 #'
 #' @rdname column_aggregate_functions
-#' @aliases approxCountDistinct approxCountDistinct,Column-method
+#' @aliases approx_count_distinct approx_count_distinct,Column-method
 #' @examples
 #'
 #' \dontrun{
-#' head(select(df, approxCountDistinct(df$gear)))
-#' head(select(df, approxCountDistinct(df$gear, 0.02)))
+#' head(select(df, approx_count_distinct(df$gear)))
+#' head(select(df, approx_count_distinct(df$gear, 0.02)))
 #' head(select(df, countDistinct(df$gear, df$cyl)))
 #' head(select(df, n_distinct(df$gear)))
 #' head(distinct(select(df, "gear")))}
-#' @note approxCountDistinct(Column) since 1.4.0
-setMethod("approxCountDistinct",
+#' @note approx_count_distinct(Column) since 2.0.0
--- End diff --

Right, will fix that one too if I missed it, per 
https://github.com/apache/spark/pull/22921#discussion_r230449173


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-11-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22683
  
**[Test build #98432 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98432/testReport)**
 for PR 22683 at commit 
[`9e45697`](https://github.com/apache/spark/commit/9e45697296039e55e85dd204788e287c9c60fceb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-11-03 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22683
  
Jenkins, ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...

2018-11-03 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22921#discussion_r230568079
  
--- Diff: R/pkg/R/functions.R ---
@@ -1641,30 +1641,30 @@ setMethod("tanh",
   })
 
 #' @details
-#' \code{toDegrees}: Converts an angle measured in radians to an 
approximately equivalent angle
+#' \code{degrees}: Converts an angle measured in radians to an 
approximately equivalent angle
 #' measured in degrees.
 #'
 #' @rdname column_math_functions
-#' @aliases toDegrees toDegrees,Column-method
-#' @note toDegrees since 1.4.0
-setMethod("toDegrees",
+#' @aliases degrees degrees,Column-method
+#' @note degrees since 3.0.0
+setMethod("degrees",
--- End diff --

`degrees` and `radians` will need to be added to NAMESPACE file for export


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...

2018-11-03 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22921#discussion_r230568058
  
--- Diff: R/pkg/R/generics.R ---
@@ -748,7 +748,7 @@ setGeneric("add_months", function(y, x) { 
standardGeneric("add_months") })
 
 #' @rdname column_aggregate_functions
 #' @name NULL
-setGeneric("approxCountDistinct", function(x, ...) { 
standardGeneric("approxCountDistinct") })
+setGeneric("approx_count_distinct", function(x, ...) { 
standardGeneric("approx_count_distinct") })
--- End diff --

my concern is that these are breaking changes in a version without having 
them deprecated first...
could we leave the old one to redirect and add .Deprecate?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...

2018-11-03 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22921#discussion_r230568088
  
--- Diff: R/pkg/R/functions.R ---
@@ -319,23 +319,23 @@ setMethod("acos",
   })
 
 #' @details
-#' \code{approxCountDistinct}: Returns the approximate number of distinct 
items in a group.
+#' \code{approx_count_distinct}: Returns the approximate number of 
distinct items in a group.
 #'
 #' @rdname column_aggregate_functions
-#' @aliases approxCountDistinct approxCountDistinct,Column-method
+#' @aliases approx_count_distinct approx_count_distinct,Column-method
 #' @examples
 #'
 #' \dontrun{
-#' head(select(df, approxCountDistinct(df$gear)))
-#' head(select(df, approxCountDistinct(df$gear, 0.02)))
+#' head(select(df, approx_count_distinct(df$gear)))
+#' head(select(df, approx_count_distinct(df$gear, 0.02)))
 #' head(select(df, countDistinct(df$gear, df$cyl)))
 #' head(select(df, n_distinct(df$gear)))
 #' head(distinct(select(df, "gear")))}
-#' @note approxCountDistinct(Column) since 1.4.0
-setMethod("approxCountDistinct",
+#' @note approx_count_distinct(Column) since 2.0.0
--- End diff --

it's actually new in R for 3.0.0 then


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22626
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98431/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22626
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()

2018-11-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22626
  
**[Test build #98431 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98431/testReport)**
 for PR 22626 at commit 
[`1895cdc`](https://github.com/apache/spark/commit/1895cdc3540f67ad562e10488ac7ffe7012d9ccc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_DIR fro...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22937
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_DIR fro...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22937
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_DIR fro...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22937
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_...

2018-11-03 Thread mpmolek

GitHub user mpmolek opened a pull request:

https://github.com/apache/spark/pull/22937

[SPARK-25934] [Mesos] Don't propagate SPARK_CONF_DIR from spark submit

## What changes were proposed in this pull request?

Don't propagate SPARK_CONF_DIR to the driver in mesos cluster mode. 

## How was this patch tested?

I built the 2.3.2 tag with this patch added and deployed a test job to a 
mesos cluster to confirm that the incorrect SPARK_CONF_DIR was no longer passed 
from the submit command.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mpmolek/spark fix-conf-dir

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22937.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22937


commit 10da1120d52f61e98bcd929d7ad59220a93d59f7
Author: Matt Molek 
Date:   2018-11-03T19:33:02Z

[SPARK-25934] Don't propagate SPARK_CONF_DIR from spark submit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22932
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22932
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98430/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22932
  
**[Test build #98430 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98430/testReport)**
 for PR 22932 at commit 
[`f5d35b4`](https://github.com/apache/spark/commit/f5d35b42092a7af2b545b4145daf9172ea6a8e32).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22626
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22626
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98428/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()

2018-11-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22626
  
**[Test build #98428 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98428/testReport)**
 for PR 22626 at commit 
[`6969b49`](https://github.com/apache/spark/commit/6969b49812acd2664bca724378a3739cb7846a6a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-03 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22932
  
The last commit will pass the test. The previous one fails due to `spaces 
at the end`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22932
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98429/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22932
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22932
  
**[Test build #98429 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98429/testReport)**
 for PR 22932 at commit 
[`1ed6368`](https://github.com/apache/spark/commit/1ed63683f2f8f7361a83892d38c84f40e2464590).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

2018-11-03 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22932#discussion_r230564513
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/describe-part-after-analyze.sql.out
 ---
@@ -93,7 +93,7 @@ Partition Values  [ds=2017-08-01, hr=10]
 Location [not included in 
comparison]sql/core/spark-warehouse/t/ds=2017-08-01/hr=10   
 
 Created Time [not included in comparison]
 Last Access [not included in comparison]
-Partition Statistics   1121 bytes, 3 rows  
+Partition Statistics   1229 bytes, 3 rows  
--- End diff --

Right, @gatorsmile .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

2018-11-03 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22932#discussion_r230563752
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/describe-part-after-analyze.sql.out
 ---
@@ -93,7 +93,7 @@ Partition Values  [ds=2017-08-01, hr=10]
 Location [not included in 
comparison]sql/core/spark-warehouse/t/ds=2017-08-01/hr=10   
 
 Created Time [not included in comparison]
 Last Access [not included in comparison]
-Partition Statistics   1121 bytes, 3 rows  
+Partition Statistics   1229 bytes, 3 rows  
--- End diff --

This is caused by adding `org.apache.spark.sql.create.version = 
3.0.0-SNAPSHOT`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22934: [BUILD] Close stale PRs

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22934
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98427/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22934: [BUILD] Close stale PRs

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22934
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22934: [BUILD] Close stale PRs

2018-11-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22934
  
**[Test build #98427 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98427/testReport)**
 for PR 22934 at commit 
[`322e21c`](https://github.com/apache/spark/commit/322e21c29919cb7dcfc2e088cd5d605e1f4bb5a7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...

2018-11-03 Thread shahidki31

Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22914
  
@srowen @gengliangwang There is one more place where the WEBUI can throw an 
exception.

https://github.com/apache/spark/blob/1a7abf3f453f7d6012d7e842cf05f29f3afbb3bc/core/src/main/scala/org/apache/spark/ui/PagedTable.scala#L36-L38

I will update the PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22933: [SPARK-25933][DOCUMENTATION] Fix pstats.Stats() r...

2018-11-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22933


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22933: [SPARK-25933][DOCUMENTATION] Fix pstats.Stats() referenc...

2018-11-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22933
  
Merged to master/2.4/2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()

2018-11-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22626
  
**[Test build #98431 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98431/testReport)**
 for PR 22626 at commit 
[`1895cdc`](https://github.com/apache/spark/commit/1895cdc3540f67ad562e10488ac7ffe7012d9ccc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22914
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22914
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98426/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...

2018-11-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22914
  
**[Test build #98426 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98426/testReport)**
 for PR 22914 at commit 
[`2e39c4a`](https://github.com/apache/spark/commit/2e39c4a2cbf1db82b37795b2b568985fda2ff903).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22920: [SPARK-25931][SQL] Benchmarking creation of Jackson pars...

2018-11-03 Thread MaxGekk

Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/22920
  
@dongjoon-hyun Thank you for re-running the benchmarks on EC2, and 
@HyukjinKwon for review. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22919: [SPARK-25906][SHELL] Documents '-I' option (from Scala R...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22919
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98425/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22919: [SPARK-25906][SHELL] Documents '-I' option (from Scala R...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22919
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22919: [SPARK-25906][SHELL] Documents '-I' option (from Scala R...

2018-11-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22919
  
**[Test build #98425 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98425/testReport)**
 for PR 22919 at commit 
[`5f3cb87`](https://github.com/apache/spark/commit/5f3cb87c8798e72cc6852e71c02ffc2077c748d7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22920: [SPARK-25931][SQL] Benchmarking creation of Jackson pars...

2018-11-03 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22920
  
Thank you, @MaxGekk and @HyukjinKwon !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22932
  
**[Test build #98430 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98430/testReport)**
 for PR 22932 at commit 
[`f5d35b4`](https://github.com/apache/spark/commit/f5d35b42092a7af2b545b4145daf9172ea6a8e32).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22932
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4741/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22932
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...

2018-11-03 Thread arman1371

Github user arman1371 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22889#discussion_r230560687
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -883,6 +883,31 @@ class Dataset[T] private[sql](
 join(right, Seq(usingColumn))
   }
 
+  /**
+* Equi-join with another `DataFrame` using the given column.
+*
+* Different from other join functions, the join column will only 
appear once in the output,
+* i.e. similar to SQL's `JOIN USING` syntax.
+*
+* {{{
+*   // Left join of df1 and df2 using the column "user_id"
+*   df1.join(df2, "user_id", "left")
+* }}}
+*
+* @param right Right side of the join operation.
+* @param usingColumn Name of the column to join on. This column must 
exist on both sides.
+* @param joinType Type of join to perform. Default `inner`. Must be 
one of:
+* `inner`, `cross`, `outer`, `full`, `full_outer`, 
`left`, `left_outer`,
+* `right`, `right_outer`, `left_semi`, `left_anti`.
+* @note If you perform a self-join using this function without 
aliasing the input
+* `DataFrame`s, you will NOT be able to reference any columns after 
the join, since
+* there is no way to disambiguate which side of the join you would 
like to reference.
+* @group untypedrel
+*/
+  def join(right: Dataset[_], usingColumn: String, joinType: String): 
DataFrame = {
--- End diff --

no, it's a simpler implementation. as i said we have both of `def 
join(right: Dataset[_], usingColumn: String)` and `def join(right: Dataset[_], 
usingColumns: Seq[String])`. based on your opinion the first function should be 
removed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22934: [BUILD] Close stale PRs

2018-11-03 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22934
  
Thank you for taking care of this, @wangyum .

nit. We are using `[BUILD]` or `[INFRA]` tag for this kind of work. Maybe, 
can we use `[INFRA]` consistently?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22932
  
**[Test build #98429 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98429/testReport)**
 for PR 22932 at commit 
[`1ed6368`](https://github.com/apache/spark/commit/1ed63683f2f8f7361a83892d38c84f40e2464590).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22932
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22932
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4740/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22930: [SPARK-24869][SQL] Fix SaveIntoDataSourceCommand's input...

2018-11-03 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22930
  
cc @gatorsmile @gengliangwang @maropu


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22920: [SPARK-25931][SQL] Benchmarking creation of Jacks...

2018-11-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22920


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22936: Support WITH clause (CTE) in subqueries

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22936
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22936: Support WITH clause (CTE) in subqueries

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22936
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22936: Support WITH clause (CTE) in subqueries

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22936
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22936: Support WITH clause (CTE) in subqueries

2018-11-03 Thread gbloisi

GitHub user gbloisi opened a pull request:

https://github.com/apache/spark/pull/22936

Support WITH clause (CTE) in subqueries

Because of SPARK-17590 support of WITH clause (CTE) in subqueries requires 
only grammar support.

Test for augmented syntax is provided.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gbloisi/spark SPARK-19799

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22936.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22936


commit 66cd5379a17e05707ae162bb20e9c64812737d78
Author: Giambattista Bloisi 
Date:   2018-11-03T16:04:09Z

Because of SPARK-17590 support of WITH clause in subqueries requires only 
grammar support.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...

2018-11-03 Thread wangyum

Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22889#discussion_r230559647
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -883,6 +883,31 @@ class Dataset[T] private[sql](
 join(right, Seq(usingColumn))
   }
 
+  /**
+* Equi-join with another `DataFrame` using the given column.
+*
+* Different from other join functions, the join column will only 
appear once in the output,
+* i.e. similar to SQL's `JOIN USING` syntax.
+*
+* {{{
+*   // Left join of df1 and df2 using the column "user_id"
+*   df1.join(df2, "user_id", "left")
+* }}}
+*
+* @param right Right side of the join operation.
+* @param usingColumn Name of the column to join on. This column must 
exist on both sides.
+* @param joinType Type of join to perform. Default `inner`. Must be 
one of:
+* `inner`, `cross`, `outer`, `full`, `full_outer`, 
`left`, `left_outer`,
+* `right`, `right_outer`, `left_semi`, `left_anti`.
+* @note If you perform a self-join using this function without 
aliasing the input
+* `DataFrame`s, you will NOT be able to reference any columns after 
the join, since
+* there is no way to disambiguate which side of the join you would 
like to reference.
+* @group untypedrel
+*/
+  def join(right: Dataset[_], usingColumn: String, joinType: String): 
DataFrame = {
--- End diff --

Cloud we close this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...

2018-11-03 Thread arman1371

Github user arman1371 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22889#discussion_r230559472
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -883,6 +883,31 @@ class Dataset[T] private[sql](
 join(right, Seq(usingColumn))
   }
 
+  /**
+* Equi-join with another `DataFrame` using the given column.
+*
+* Different from other join functions, the join column will only 
appear once in the output,
+* i.e. similar to SQL's `JOIN USING` syntax.
+*
+* {{{
+*   // Left join of df1 and df2 using the column "user_id"
+*   df1.join(df2, "user_id", "left")
+* }}}
+*
+* @param right Right side of the join operation.
+* @param usingColumn Name of the column to join on. This column must 
exist on both sides.
+* @param joinType Type of join to perform. Default `inner`. Must be 
one of:
+* `inner`, `cross`, `outer`, `full`, `full_outer`, 
`left`, `left_outer`,
+* `right`, `right_outer`, `left_semi`, `left_anti`.
+* @note If you perform a self-join using this function without 
aliasing the input
+* `DataFrame`s, you will NOT be able to reference any columns after 
the join, since
+* there is no way to disambiguate which side of the join you would 
like to reference.
+* @group untypedrel
+*/
+  def join(right: Dataset[_], usingColumn: String, joinType: String): 
DataFrame = {
--- End diff --

The answer of both questions are yes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22626: [SPARK-25638][SQL] Adding new function - to_csv()

2018-11-03 Thread MaxGekk

Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22626#discussion_r230559020
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala
 ---
@@ -15,18 +15,17 @@
  * limitations under the License.
  */
 
-package org.apache.spark.sql.execution.datasources.csv
+package org.apache.spark.sql.catalyst.csv
 
 import java.io.Writer
 
 import com.univocity.parsers.csv.CsvWriter
 
 import org.apache.spark.sql.catalyst.InternalRow
-import org.apache.spark.sql.catalyst.csv.CSVOptions
 import org.apache.spark.sql.catalyst.util.DateTimeUtils
 import org.apache.spark.sql.types._
 
-private[csv] class UnivocityGenerator(
+private[sql] class UnivocityGenerator(
--- End diff --

removed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22626: [SPARK-25638][SQL] Adding new function - to_csv()

2018-11-03 Thread MaxGekk

Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22626#discussion_r230559006
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/csv-functions.sql ---
@@ -15,3 +15,10 @@ CREATE TEMPORARY VIEW csvTable(csvField, a) AS SELECT * 
FROM VALUES ('1,abc', 'a
 SELECT schema_of_csv(csvField) FROM csvTable;
 -- Clean up
 DROP VIEW IF EXISTS csvTable;
+-- to_csv
+select to_csv(named_struct('a', 1, 'b', 2));
+select to_csv(named_struct('time', to_timestamp('2015-08-26', 
'-MM-dd')), map('timestampFormat', 'dd/MM/'));
+-- Check if errors handled
+select to_csv(named_struct('a', 1, 'b', 2), named_struct('mode', 
'PERMISSIVE'));
+select to_csv(named_struct('a', 1, 'b', 2), map('mode', 1));
--- End diff --

I removed `select to_csv()`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()

2018-11-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22626
  
**[Test build #98428 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98428/testReport)**
 for PR 22626 at commit 
[`6969b49`](https://github.com/apache/spark/commit/6969b49812acd2664bca724378a3739cb7846a6a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-03 Thread KyleLi1985

Github user KyleLi1985 commented on the issue:

https://github.com/apache/spark/pull/22893
  
> OK, the Spark part doesn't seem relevant. The input might be more 
realistic here, yes. I was commenting that your test code doesn't show what 
you're testing, though I understand you manually modified it. Because the test 
is so central here I think it's important to understand exactly what you're 
measuring and exactly what you're running.
> 
> This doesn't show an improvement, right?

TEST, I agree with you

No influence for sparse case


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22935: Branch 2.2

2018-11-03 Thread litao1223

Github user litao1223 closed the pull request at:

https://github.com/apache/spark/pull/22935


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...

2018-11-03 Thread wangyum

Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22889#discussion_r230557937
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -883,6 +883,31 @@ class Dataset[T] private[sql](
 join(right, Seq(usingColumn))
   }
 
+  /**
+* Equi-join with another `DataFrame` using the given column.
+*
+* Different from other join functions, the join column will only 
appear once in the output,
+* i.e. similar to SQL's `JOIN USING` syntax.
+*
+* {{{
+*   // Left join of df1 and df2 using the column "user_id"
+*   df1.join(df2, "user_id", "left")
+* }}}
+*
+* @param right Right side of the join operation.
+* @param usingColumn Name of the column to join on. This column must 
exist on both sides.
+* @param joinType Type of join to perform. Default `inner`. Must be 
one of:
+* `inner`, `cross`, `outer`, `full`, `full_outer`, 
`left`, `left_outer`,
+* `right`, `right_outer`, `left_semi`, `left_anti`.
+* @note If you perform a self-join using this function without 
aliasing the input
+* `DataFrame`s, you will NOT be able to reference any columns after 
the join, since
+* there is no way to disambiguate which side of the join you would 
like to reference.
+* @group untypedrel
+*/
+  def join(right: Dataset[_], usingColumn: String, joinType: String): 
DataFrame = {
--- End diff --

@arman1371 What do you think? ```def join(right: Dataset[_], usingColumn: 
String, joinType: String)``` only support one column. right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22935: Branch 2.2

2018-11-03 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22935
  
@litao1223 Please close this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22935: Branch 2.2

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22935
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22935: Branch 2.2

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22935
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22935: Branch 2.2

2018-11-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22935
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 183 matches

Mail list logo