date:20181010

[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-10 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r224318955
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala
 ---
@@ -108,6 +108,16 @@ object TestingUDT {
   }
 }
 
+object TestingValueClass {
+  case class IntWrapper(i: Int) extends AnyVal
+  case class StrWrapper(s: String) extends AnyVal
+
+  case class ValueClassData(
+intField: Int,
+wrappedInt: IntWrapper,
+strField: String,
+wrappedStr: StrWrapper)
--- End diff --

We might need a comment to describe what this class is look like in Java.
Seems like it has 2 int fields `intField`, `wrappedInt`, and 2 string 
fields `strField`, `wrappedStr`. I'm not sure it is the same in Scala 2.12, 
though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22309
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22309
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97232/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22309
  
**[Test build #97232 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97232/testReport)**
 for PR 22309 at commit 
[`5613217`](https://github.com/apache/spark/commit/5613217771b1929b9f66106468fd2da2c3ea7dec).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-10 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r224322470
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON, JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
+
+
+

+[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource)
+implements a Spark SQL data source API for loading image data as a 
DataFrame.
+
+{% highlight scala %}
+scala> spark.read.format("image").load("data/mllib/images/origin")
+res1: org.apache.spark.sql.DataFrame = [image: struct]
+{% endhighlight %}
+
+
+

+[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html)
+implements Spark SQL data source API for loading image data as DataFrame.
+
+{% highlight java %}
+Dataset imagesDF = 
spark.read().format("image").load("data/mllib/images/origin");
--- End diff --

Can we do a simple transformation so that how the image datasource can be 
utilized?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-10 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r224322298
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON, JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
+
+
+

+[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource)
+implements a Spark SQL data source API for loading image data as a 
DataFrame.
+
+{% highlight scala %}
+scala> spark.read.format("image").load("data/mllib/images/origin")
+res1: org.apache.spark.sql.DataFrame = [image: struct]
+{% endhighlight %}
+
+
+

+[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html)
+implements Spark SQL data source API for loading image data as DataFrame.
+
+{% highlight java %}
+Dataset imagesDF = 
spark.read().format("image").load("data/mllib/images/origin");
+{% endhighlight %}
+
+
+
--- End diff --

how about SQL syntax? I think we can use `CREATE TABLE tableA USING 
LOCATION 'data/image.png'`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-10 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r224321873
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON, JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
--- End diff --

Shall we describe which image we can load? For instance, I think this 
delegates to ImageIO in Java which allows to read compressed format like PNG or 
JPG to raw image representation like BMP so that OpenCS can handles them.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-10 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r224321949
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON, JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
--- End diff --

I would also describe the schema structure and what each field means.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-10 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r224321446
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON, JDBC, we also 
provide some specific data source for ML.
--- End diff --

`JSON, JDBC` -> `JSON and JDBC`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22668: [SPARK-25675] [Spark Job History] Job UI page doe...

2018-10-10 Thread shivusondur

Github user shivusondur commented on a diff in the pull request:

https://github.com/apache/spark/pull/22668#discussion_r224318421
  
--- Diff: core/src/main/scala/org/apache/spark/ui/PagedTable.scala ---
@@ -123,10 +123,9 @@ private[ui] trait PagedTable[T] {
   /**
* Return a page navigation.
* 
-   *   If the totalPages is 1, the page navigation will be empty
*   
-   * If the totalPages is more than 1, it will create a page 
navigation including a group of
-   * page numbers and a form to submit the page number.
+   * It will create a page navigation including a group of page 
numbers and a form
--- End diff --

@gengliangwang @felixcheung 
i have updated according to your suggestion. please check.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22685: [SQL][MINOR][Refactor] Refactor on sql/core

2018-10-10 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22685#discussion_r224317853
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -96,7 +95,7 @@ case class DataSource(
   private val caseInsensitiveOptions = CaseInsensitiveMap(options)
   private val equality = sparkSession.sessionState.conf.resolver
 
-  bucketSpec.map { bucket =>
+  bucketSpec.foreach { bucket =>
--- End diff --

Yea, this is legitimate change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22419: [SPARK-23906][SQL] Add built-in UDF TRUNCATE(numb...

2018-10-10 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22419#discussion_r224318028
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -1245,3 +1245,27 @@ case class BRound(child: Expression, scale: 
Expression)
 with Serializable with ImplicitCastInputTypes {
   def this(child: Expression) = this(child, Literal(0))
 }
+
+/**
+ * The number truncated to scale decimal places.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(number, scale) - Returns number truncated to scale 
decimal places. " +
+"If scale is omitted, then number is truncated to 0 places. " +
+"scale can be negative to truncate (make zero) scale digits left of 
the decimal point.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(1234567891.1234567891, 4);
+   1234567891.1234
+  > SELECT _FUNC_(1234567891.1234567891, -4);
+   123456
+  > SELECT _FUNC_(1234567891.1234567891);
+   1234567891
+  """)
+// scalastyle:on line.size.limit
+case class Truncate(child: Expression, scale: Expression)
--- End diff --

In that case, its ok to handle the string as date. How about only accepting 
float, double, and decimal for number truncation?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22676: [SPARK-25684][SQL] Organize header related codes in CSV ...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22676
  
**[Test build #97235 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97235/testReport)**
 for PR 22676 at commit 
[`c504356`](https://github.com/apache/spark/commit/c504356b847e183f571a09ce5f808d4a7f229255).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22676: [SPARK-25684][SQL] Organize header related codes in CSV ...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22676
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3871/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22676: [SPARK-25684][SQL] Organize header related codes in CSV ...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22676
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22676: [SPARK-25684][SQL] Organize header related codes in CSV ...

2018-10-10 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22676
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22594: [SPARK-25674][SQL] If the records are incremented by mor...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22594
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97230/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22594: [SPARK-25674][SQL] If the records are incremented by mor...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22594
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22594: [SPARK-25674][SQL] If the records are incremented by mor...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22594
  
**[Test build #97230 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97230/testReport)**
 for PR 22594 at commit 
[`04eba30`](https://github.com/apache/spark/commit/04eba3019fa8e05b73823c91db48a50c544e8350).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22688
  
**[Test build #97234 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97234/testReport)**
 for PR 22688 at commit 
[`ded852c`](https://github.com/apache/spark/commit/ded852c3f99d9fe904a6b54691ac6c170da9a298).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22688
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22688
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3870/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22688: [SPARK-25700][SQL] Creates ReadSupport in only Ap...

2018-10-10 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22688#discussion_r224316297
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala 
---
@@ -351,6 +351,21 @@ class DataSourceV2Suite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-25700: do not read schema when writing in other modes except 
append mode") {
+withTempPath { file =>
+  val cls = classOf[SimpleWriteOnlyDataSource]
+  val path = file.getCanonicalPath
+  val df = spark.range(5).select('id as 'i, -'id as 'j)
--- End diff --

The write path looks requiring two columns:


https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/test/scala/org/apache/spark/sql/sources/v2/SimpleWritableDataSource.scala#L214



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22688: [SPARK-25700][SQL] Creates ReadSupport in only Ap...

2018-10-10 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22688#discussion_r224316130
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala 
---
@@ -351,6 +351,21 @@ class DataSourceV2Suite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-25700: do not read schema when writing in other modes except 
append mode") {
+withTempPath { file =>
+  val cls = classOf[SimpleWriteOnlyDataSource]
+  val path = file.getCanonicalPath
+  val df = spark.range(5).select($"id", $"id")
--- End diff --

The write path looks requiring two columns:


https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/test/scala/org/apache/spark/sql/sources/v2/SimpleWritableDataSource.scala#L214



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22668: [SPARK-25675] [Spark Job History] Job UI page doe...

2018-10-10 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22668#discussion_r224316034
  
--- Diff: core/src/main/scala/org/apache/spark/ui/PagedTable.scala ---
@@ -123,10 +123,9 @@ private[ui] trait PagedTable[T] {
   /**
* Return a page navigation.
* 
-   *   If the totalPages is 1, the page navigation will be empty
*   
-   * If the totalPages is more than 1, it will create a page 
navigation including a group of
-   * page numbers and a form to submit the page number.
+   * It will create a page navigation including a group of page 
numbers and a form
--- End diff --

true.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22688
  
I have no idea why it passes in my local. I fixed the test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22689: [SPARK-25697][CORE]When zstd compression enabled, InProg...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22689
  
**[Test build #97233 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97233/testReport)**
 for PR 22689 at commit 
[`c309f34`](https://github.com/apache/spark/commit/c309f3464522341f286fd4791d7989dcde988cac).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22688
  
Hm, yea, this was passed in my local so I expected this was flaky but seems 
I should fix.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22689: [SPARK-25697][CORE]When zstd compression enabled, InProg...

2018-10-10 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22689
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22681: [SPARK-25682][k8s] Package example jars in same t...

2018-10-10 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22681#discussion_r224314585
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile ---
@@ -18,6 +18,7 @@
 FROM openjdk:8-alpine
 
 ARG spark_jars=jars
+ARG example_jars=examples/jars
--- End diff --

could we make this optional? if someone wants to build a smaller image 
without example


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22688
  
Seems the same test failed?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...

2018-10-10 Thread sandeep-katta

Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22466
  
> The major comments are in the test cases. Could you help clean up the 
existing test cases?

 All the comments are fixed and corrected the testcases


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22688
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22688
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97229/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22688
  
**[Test build #97229 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97229/testReport)**
 for PR 22688 at commit 
[`9377bc3`](https://github.com/apache/spark/commit/9377bc35050408512c28f47ca0535b66c4dfcaf8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class SchemaReadAttemptException(m: String) extends 
RuntimeException(m)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22678: [SPARK-25685][BUILD] Allow running tests in Jenki...

2018-10-10 Thread LantaoJin

Github user LantaoJin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22678#discussion_r224309582
  
--- Diff: dev/run-tests-jenkins.py ---
@@ -39,7 +39,8 @@ def print_err(msg):
 def post_message_to_github(msg, ghprb_pull_id):
 print("Attempting to post to Github...")
 
-url = "https://api.github.com/repos/apache/spark/issues/; + 
ghprb_pull_id + "/comments"
+api_url = os.getenv("GITHUB_SERVER_API_URL", 
"https://api.github.com/repos/apache/spark;)
--- End diff --

Sure. @kiszk 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22690: [SPARK-19287][CORE][STREAMING] JavaPairRDD flatMapValues...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22690
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22690: [SPARK-19287][CORE][STREAMING] JavaPairRDD flatMapValues...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22690
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97226/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22690: [SPARK-19287][CORE][STREAMING] JavaPairRDD flatMapValues...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22690
  
**[Test build #97226 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97226/testReport)**
 for PR 22690 at commit 
[`a35b54f`](https://github.com/apache/spark/commit/a35b54fbb000665a87998c14ed940316d45d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22612
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97228/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22612
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22612
  
**[Test build #97228 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97228/testReport)**
 for PR 22612 at commit 
[`067b81d`](https://github.com/apache/spark/commit/067b81d24de7999afe5b9660e89d9a2e41de6d21).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22678: [SPARK-25685][BUILD] Allow running tests in Jenkins in e...

2018-10-10 Thread LantaoJin

Github user LantaoJin commented on the issue:

https://github.com/apache/spark/pull/22678
  
Sorry for closing the conversation mistakenly @dongjoon-hyun . I will 
update the documentation soon.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22309
  
**[Test build #97232 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97232/testReport)**
 for PR 22309 at commit 
[`5613217`](https://github.com/apache/spark/commit/5613217771b1929b9f66106468fd2da2c3ea7dec).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-10 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22309
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22674
  
**[Test build #97231 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97231/testReport)**
 for PR 22674 at commit 
[`3ffa536`](https://github.com/apache/spark/commit/3ffa536f3c29f6655843a4d45c215393f51e23c9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22674
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22674
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3869/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-10 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22309
  
somehow I lost track of this PR.

ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r224300113
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala
 ---
@@ -108,6 +108,16 @@ object TestingUDT {
   }
 }
 
+object TestingValueClass {
+  case class IntWrapper(i: Int) extends AnyVal
--- End diff --

does value class must be a case class?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...

2018-10-10 Thread wangyum

Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22661#discussion_r224300031
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/JoinBenchmark.scala
 ---
@@ -19,229 +19,165 @@ package org.apache.spark.sql.execution.benchmark
 
 import org.apache.spark.sql.execution.joins._
 import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types.IntegerType
 
 /**
  * Benchmark to measure performance for aggregate primitives.
- * To run this:
- *  build/sbt "sql/test-only *benchmark.JoinBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *  bin/spark-submit --class  --jars  

+ *   2. build/sbt "sql/test:runMain "
+ *   3. generate result:
+ *  SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain "
+ *  Results will be written to "benchmarks/JoinBenchmark-results.txt".
+ * }}}
  */
-class JoinBenchmark extends BenchmarkWithCodegen {
+object JoinBenchmark extends SqlBasedBenchmark {
 
-  ignore("broadcast hash join, long key") {
+  def broadcastHashJoinLongKey(): Unit = {
 val N = 20 << 20
 val M = 1 << 16
 
-val dim = broadcast(sparkSession.range(M).selectExpr("id as k", 
"cast(id as string) as v"))
-runBenchmark("Join w long", N) {
-  val df = sparkSession.range(N).join(dim, (col("id") % M) === 
col("k"))
+val dim = broadcast(spark.range(M).selectExpr("id as k", "cast(id as 
string) as v"))
+codegenBenchmark("Join w long", N) {
+  val df = spark.range(N).join(dim, (col("id") % M) === col("k"))
   
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
   df.count()
 }
-
-/*
-Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
-Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
-Join w long:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
-
---
-Join w long codegen=false3002 / 3262  7.0  
   143.2   1.0X
-Join w long codegen=true  321 /  371 65.3  
15.3   9.3X
-*/
   }
 
-  ignore("broadcast hash join, long key with duplicates") {
+
+  def broadcastHashJoinLongKeyWithDuplicates(): Unit = {
 val N = 20 << 20
 val M = 1 << 16
 
-val dim = broadcast(sparkSession.range(M).selectExpr("id as k", 
"cast(id as string) as v"))
--- End diff --

Yes


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...

2018-10-10 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22674
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22692: [SPARK-25598][STREAMING][BUILD] Remove flume connector i...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22692
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97221/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22692: [SPARK-25598][STREAMING][BUILD] Remove flume connector i...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22692
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22692: [SPARK-25598][STREAMING][BUILD] Remove flume connector i...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22692
  
**[Test build #97221 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97221/testReport)**
 for PR 22692 at commit 
[`4b39ac3`](https://github.com/apache/spark/commit/4b39ac3500d1ee6f8b3d93f4822c6e5f36e30e3b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19330: [SPARK-18134][SQL] Orderable MapType

2018-10-10 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19330
  
Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19330: [SPARK-18134][SQL] Orderable MapType

2018-10-10 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/19330
  
@maropu 
Thanks, and yes I'm still here and I can keep going if this pr is 
interested. 
I will update this pr this weekend.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22692: [SPARK-25598][STREAMING][BUILD] Remove flume connector i...

2018-10-10 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22692
  
sounds reasonable, also cc @tdas @zsxwing @jose-torres 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22259: [SPARK-25044][SQL] (take 2) Address translation o...

2018-10-10 Thread maryannxue

Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/spark/pull/22259#discussion_r224295469
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -47,7 +48,8 @@ case class ScalaUDF(
 inputTypes: Seq[DataType] = Nil,
 udfName: Option[String] = None,
 nullable: Boolean = true,
-udfDeterministic: Boolean = true)
+udfDeterministic: Boolean = true,
+nullableTypes: Seq[Boolean] = Nil)
--- End diff --

Yes, the test should not pass after removing `isInstanceOf[KnownNotNull]` 
condition from `needsNullCheck` test 
(https://github.com/apache/spark/pull/22259/files#diff-57b3d87be744b7d79a9beacf8e5e5eb2L2160).
 The idea was to add a `KnownNotNull` node on top of the original node to mark 
it as null-checked, so the rule won't add redundant null checks even if it is 
accidentally applied again. I'm not sure about the exact reason why you removed 
`isInstanceOf[KnownNotNull]` condition in this PR, but I think it should be 
left there alongside the new nullable type check.
After adding the `nullableTypes` parameter in the test, the issue can be 
reproduced:
```
  test("SPARK-24891 Fix HandleNullInputsForUDF rule") {
val a = testRelation.output(0)
val func = (x: Int, y: Int) => x + y
val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil, nullableTypes = 
false :: false :: Nil)
val udf2 = ScalaUDF(func, IntegerType, a :: udf1 :: Nil, nullableTypes 
= false :: false :: Nil)
val plan = Project(Alias(udf2, "")() :: Nil, testRelation)
comparePlans(plan.analyze, plan.analyze.analyze)
  }
```
BTW, I'm just curious: It looks like `nullableTypes` indicates something 
opposite to "nullable" used in schema. I would assume when `nullableTypes` is 
`Seq(false)`, it means this type is not nullable and we need not add the null 
check, vice versa. Did I miss something here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22594: [SPARK-25674][SQL] If the records are incremented by mor...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22594
  
**[Test build #97230 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97230/testReport)**
 for PR 22594 at commit 
[`04eba30`](https://github.com/apache/spark/commit/04eba3019fa8e05b73823c91db48a50c544e8350).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22594: [SPARK-25674][SQL] If the records are incremented by mor...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22594
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3868/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22594: [SPARK-25674][SQL] If the records are incremented by mor...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22594
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21669
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21669
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97220/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21669
  
**[Test build #97220 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97220/testReport)**
 for PR 21669 at commit 
[`dd95fca`](https://github.com/apache/spark/commit/dd95fcab754e71e9465f4e46818c3cef09e86c8b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22691: [SPARK-24109][CORE] Remove class SnappyOutputStreamWrapp...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22691
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97222/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22691: [SPARK-24109][CORE] Remove class SnappyOutputStreamWrapp...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22691
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22691: [SPARK-24109][CORE] Remove class SnappyOutputStreamWrapp...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22691
  
**[Test build #97222 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97222/testReport)**
 for PR 22691 at commit 
[`8850c7a`](https://github.com/apache/spark/commit/8850c7a7d563cf6bc46a84b7480b4d338d58b80f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22688
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22688
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3867/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22688
  
**[Test build #97229 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97229/testReport)**
 for PR 22688 at commit 
[`9377bc3`](https://github.com/apache/spark/commit/9377bc35050408512c28f47ca0535b66c4dfcaf8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22688
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22664: [SPARK-25662][TEST] Refactor DataSourceReadBenchmark to ...

2018-10-10 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22664
  
Hi, @peter-toth .
Could you review and merge https://github.com/peter-toth/spark/pull/1 which 
contains the result on EC2 r3.xlarge?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22688
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22688
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97224/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22688
  
**[Test build #97224 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97224/testReport)**
 for PR 22688 at commit 
[`9377bc3`](https://github.com/apache/spark/commit/9377bc35050408512c28f47ca0535b66c4dfcaf8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class SchemaReadAttemptException(m: String) extends 
RuntimeException(m)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22689: [SPARK-25697][CORE]When zstd compression enabled, InProg...

2018-10-10 Thread shahidki31

Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22689
  
@srowen . Yes. We should read only from the finished frames of zstd. When 
the listener try to read from the unfinished frame, zstd input reader throws an 
exception (unless we make set continuous true).

Currently the behavior is, it reads from the finished frames, but after 
that it tried to read from the unfinished frame and throws exception while 
loading the webui. So, the solution should be, we should not parse from the 
unfinished frame, and load the UI based on only the finish frames.
 
@vanzin has good idea about the history server. Hi @vanzin , could you 
please give your inputs? Thanks 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22594: [SPARK-25674][SQL] If the records are incremented...

2018-10-10 Thread 10110346

Github user 10110346 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22594#discussion_r224286853
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala
 ---
@@ -70,6 +70,8 @@ class FileScanRDD(
 
   private val ignoreCorruptFiles = 
sparkSession.sessionState.conf.ignoreCorruptFiles
   private val ignoreMissingFiles = 
sparkSession.sessionState.conf.ignoreMissingFiles
+  // only for test
+  private val inputMetricsTest = 
sparkSession.sessionState.conf.contains("spark.inputmetrics.test")
--- End diff --

If this place is controlled by `spark.testing`, other unit tests may fail.
Yeah, I agree with you ,this a simple change, it is  better to drop this.
thanks @srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22612
  
**[Test build #97228 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97228/testReport)**
 for PR 22612 at commit 
[`067b81d`](https://github.com/apache/spark/commit/067b81d24de7999afe5b9660e89d9a2e41de6d21).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...

2018-10-10 Thread rezasafi

Github user rezasafi commented on the issue:

https://github.com/apache/spark/pull/22612
  
Looking at the logs the failure doesn't seem to be related to this change. 
It is in HiveVersionSuite.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22664: [SPARK-25662][TEST] Refactor DataSourceReadBenchmark to ...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22664
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97227/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22664: [SPARK-25662][TEST] Refactor DataSourceReadBenchmark to ...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22664
  
**[Test build #97227 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97227/testReport)**
 for PR 22664 at commit 
[`cf61f1c`](https://github.com/apache/spark/commit/cf61f1c4df40b681f2db8cf233b8fbc0df88598b).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22664: [SPARK-25662][TEST] Refactor DataSourceReadBenchmark to ...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22664
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22664: [SPARK-25662][TEST] Refactor DataSourceReadBenchmark to ...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22664
  
**[Test build #97227 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97227/testReport)**
 for PR 22664 at commit 
[`cf61f1c`](https://github.com/apache/spark/commit/cf61f1c4df40b681f2db8cf233b8fbc0df88598b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22664: [SPARK-25662][TEST] Refactor DataSourceReadBenchmark to ...

2018-10-10 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22664
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22664: [SPARK-25662][TEST] Refactor DataSourceReadBenchm...

2018-10-10 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22664#discussion_r224272542
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala
 ---
@@ -34,10 +34,15 @@ import org.apache.spark.sql.vectorized.ColumnVector
 
 /**
  * Benchmark to measure data source read performance.
- * To run this:
- *  spark-submit --class  
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt: bin/spark-submit --class  --jars , 
--- End diff --

Could you run `dev/scalastyle` and fix this in your branch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22690: [SPARK-19287][CORE][STREAMING] JavaPairRDD flatMapValues...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22690
  
**[Test build #97226 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97226/testReport)**
 for PR 22690 at commit 
[`a35b54f`](https://github.com/apache/spark/commit/a35b54fbb000665a87998c14ed940316d45d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22690: [SPARK-19287][CORE][STREAMING] JavaPairRDD flatMapValues...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22690
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22690: [SPARK-19287][CORE][STREAMING] JavaPairRDD flatMapValues...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22690
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3866/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-10 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224272133
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ResourceRequestTestHelper.scala
 ---
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable.ListBuffer
+
+import org.apache.hadoop.yarn.api.records.Resource
+
+import org.apache.spark.util.Utils
+
+object ResourceRequestTestHelper {
+  def initializeResourceTypes(resourceTypes: List[String]): Unit = {
+if (!ResourceRequestHelper.isYarnResourceTypesAvailable()) {
+  throw new IllegalStateException("This method should not be invoked " 
+
+"since YARN resource types is not available because of old Hadoop 
version!" )
+}
+
+val allResourceTypes = new ListBuffer[AnyRef]
+val defaultResourceTypes = List(
+  createResourceTypeInfo("memory-mb"),
+  createResourceTypeInfo("vcores"))
+val customResourceTypes = resourceTypes.map(rt => 
createResourceTypeInfo(rt))
--- End diff --

`.map { rt => ... }`, or `.map(createResourceTypeInfo)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-10 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224271908
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ResourceRequestHelperSuite.scala
 ---
@@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import org.apache.hadoop.yarn.api.records.Resource
+import org.apache.hadoop.yarn.util.Records
+import org.scalatest.Matchers
+
+import org.apache.spark.{SparkConf, SparkException, SparkFunSuite}
+import 
org.apache.spark.deploy.yarn.ResourceRequestTestHelper.ResourceInformation
+import org.apache.spark.deploy.yarn.config._
+import org.apache.spark.internal.config.{DRIVER_MEMORY, EXECUTOR_MEMORY}
+
+class ResourceRequestHelperSuite extends SparkFunSuite with Matchers {
+
+  private val CUSTOM_RES_1 = "custom-resource-type-1"
+  private val CUSTOM_RES_2 = "custom-resource-type-2"
+  private val MEMORY = "memory"
+  private val CORES = "cores"
+  private val NEW_CONFIG_EXECUTOR_MEMORY = 
YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + MEMORY
+  private val NEW_CONFIG_EXECUTOR_CORES = 
YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + CORES
+  private val NEW_CONFIG_AM_MEMORY = YARN_AM_RESOURCE_TYPES_PREFIX + MEMORY
+  private val NEW_CONFIG_AM_CORES = YARN_AM_RESOURCE_TYPES_PREFIX + CORES
+  private val NEW_CONFIG_DRIVER_MEMORY = YARN_DRIVER_RESOURCE_TYPES_PREFIX 
+ MEMORY
+  private val NEW_CONFIG_DRIVER_CORES = YARN_DRIVER_RESOURCE_TYPES_PREFIX 
+ CORES
+
+  test("resource request value does not match pattern") {
+verifySetResourceRequestsException(List(CUSTOM_RES_1),
+  Map(CUSTOM_RES_1 -> "**@#"), CUSTOM_RES_1)
+  }
+
+  test("resource request just unit defined") {
+verifySetResourceRequestsException(List(), Map(CUSTOM_RES_1 -> "m"), 
CUSTOM_RES_1)
+  }
+
+  test("resource request with null value should not be allowed") {
+verifySetResourceRequestsException(List(), null, Map(CUSTOM_RES_1 -> 
"123"),
+  "requirement failed: Resource parameter should not be null!")
+  }
+
+  test("resource request with valid value and invalid unit") {
+verifySetResourceRequestsException(List(CUSTOM_RES_1), createResource,
+  Map(CUSTOM_RES_1 -> "123ppp"), "")
+  }
+
+  test("resource request with valid value and without unit") {
+verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1), 
Map(CUSTOM_RES_1 -> "123"),
+  Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 123, "")))
+  }
+
+  test("resource request with valid value and unit") {
+verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1), 
Map(CUSTOM_RES_1 -> "2g"),
+  Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 2, "G")))
+  }
+
+  test("two resource requests with valid values and units") {
+verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1, CUSTOM_RES_2),
+  Map(CUSTOM_RES_1 -> "123m", CUSTOM_RES_2 -> "10G"),
+  Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 123, "m"),
+CUSTOM_RES_2 -> ResourceInformation(CUSTOM_RES_2, 10, "G")))
+  }
+
+  test("empty SparkConf should be valid") {
+val sparkConf = new SparkConf()
+ResourceRequestHelper.validateResources(sparkConf)
+  }
+
+  test("just normal resources are defined") {
+val sparkConf = new SparkConf()
+sparkConf.set(DRIVER_MEMORY.key, "3G")
+sparkConf.set(DRIVER_CORES.key, "4")
+sparkConf.set(EXECUTOR_MEMORY.key, "4G")
+sparkConf.set(EXECUTOR_CORES.key, "2")
+ResourceRequestHelper.validateResources(sparkConf)
+  }
+
+  test("memory defined with new config for executor") {
+val sparkConf = new SparkConf()
+sparkConf.set(NEW_CONFIG_EXECUTOR_MEMORY, "30G")
+verifyValidateResourcesException(sparkConf, NEW_CONFIG_EXECUTOR_MEMORY)
+  }
+
+  test("cores defined with new config for executor") {
+val sparkConf =

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-10 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224270997
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala
 ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import java.lang.{Long => JLong}
+import java.lang.reflect.InvocationTargetException
+
+import scala.collection.mutable
+import scala.util.Try
+
+import org.apache.hadoop.yarn.api.records.Resource
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.yarn.config._
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.util.Utils
+
+/**
+ * This helper class uses some of Hadoop 3 methods from the YARN API,
+ * so we need to use reflection to avoid compile error when building 
against Hadoop 2.x
+ */
+private object ResourceRequestHelper extends Logging {
+  private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r
+  private val RESOURCE_INFO_CLASS = 
"org.apache.hadoop.yarn.api.records.ResourceInformation"
+
+  /**
+   * Validates sparkConf and throws a SparkException if any of standard 
resources (memory or cores)
+   * is defined with the property spark.yarn.x.resource.y
+   */
+  def validateResources(sparkConf: SparkConf): Unit = {
+val resourceDefinitions = Seq[(String, String)](
+  (AM_MEMORY.key, YARN_AM_RESOURCE_TYPES_PREFIX + "memory"),
+  (AM_CORES.key, YARN_AM_RESOURCE_TYPES_PREFIX + "cores"),
+  (DRIVER_MEMORY.key, YARN_DRIVER_RESOURCE_TYPES_PREFIX + "memory"),
+  (DRIVER_CORES.key, YARN_DRIVER_RESOURCE_TYPES_PREFIX + "cores"),
+  (EXECUTOR_MEMORY.key, YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + 
"memory"),
+  (EXECUTOR_CORES.key, YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + "cores"))
+val errorMessage = new mutable.StringBuilder()
+
+resourceDefinitions.foreach { case (sparkName, resourceRequest) =>
+  if (sparkConf.contains(resourceRequest)) {
+errorMessage.append(s"Error: Do not use $resourceRequest, " +
+s"please use $sparkName instead!\n")
+  }
+}
+
+if (errorMessage.nonEmpty) {
+  throw new SparkException(errorMessage.toString())
+}
+  }
+
+  /**
+   * Sets resource amount with the corresponding unit to the passed 
resource object.
+   * @param resources resource values to set
+   * @param resource resource object to update
+   */
+  def setResourceRequests(
+  resources: Map[String, String],
+  resource: Resource): Unit = {
+require(resource != null, "Resource parameter should not be null!")
+
+logDebug(s"Custom resources requested: $resources")
+if (!isYarnResourceTypesAvailable()) {
+  if (resources.nonEmpty) {
+logWarning("Ignoring custom resource requests because " +
+"the version of YARN does not support it!")
+  }
+  return
+}
+
+val resInfoClass = Utils.classForName(RESOURCE_INFO_CLASS)
+val setResourceInformationMethod =
+  resource.getClass.getMethod("setResourceInformation", 
classOf[String], resInfoClass)
+resources.foreach { case (name, rawAmount) =>
+  try {
+val AMOUNT_AND_UNIT_REGEX(amountPart, unitPart) = rawAmount
+val amount = amountPart.toLong
+val unit = unitPart match {
+  case "g" => "G"
+  case "t" => "T"
+  case "p" => "P"
+  case _ => unitPart
+}
+logDebug(s"Registering resource with name: $name, amount: $amount, 
unit: $unit")
+val resourceInformation = createResourceInformation(
+  name, amount, unit, resInfoClass)
--- End diff --

Fits in previous line.


---

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-10 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224271778
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ResourceRequestHelperSuite.scala
 ---
@@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import org.apache.hadoop.yarn.api.records.Resource
+import org.apache.hadoop.yarn.util.Records
+import org.scalatest.Matchers
+
+import org.apache.spark.{SparkConf, SparkException, SparkFunSuite}
+import 
org.apache.spark.deploy.yarn.ResourceRequestTestHelper.ResourceInformation
+import org.apache.spark.deploy.yarn.config._
+import org.apache.spark.internal.config.{DRIVER_MEMORY, EXECUTOR_MEMORY}
+
+class ResourceRequestHelperSuite extends SparkFunSuite with Matchers {
+
+  private val CUSTOM_RES_1 = "custom-resource-type-1"
+  private val CUSTOM_RES_2 = "custom-resource-type-2"
+  private val MEMORY = "memory"
+  private val CORES = "cores"
+  private val NEW_CONFIG_EXECUTOR_MEMORY = 
YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + MEMORY
+  private val NEW_CONFIG_EXECUTOR_CORES = 
YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + CORES
+  private val NEW_CONFIG_AM_MEMORY = YARN_AM_RESOURCE_TYPES_PREFIX + MEMORY
+  private val NEW_CONFIG_AM_CORES = YARN_AM_RESOURCE_TYPES_PREFIX + CORES
+  private val NEW_CONFIG_DRIVER_MEMORY = YARN_DRIVER_RESOURCE_TYPES_PREFIX 
+ MEMORY
+  private val NEW_CONFIG_DRIVER_CORES = YARN_DRIVER_RESOURCE_TYPES_PREFIX 
+ CORES
+
+  test("resource request value does not match pattern") {
+verifySetResourceRequestsException(List(CUSTOM_RES_1),
+  Map(CUSTOM_RES_1 -> "**@#"), CUSTOM_RES_1)
+  }
+
+  test("resource request just unit defined") {
+verifySetResourceRequestsException(List(), Map(CUSTOM_RES_1 -> "m"), 
CUSTOM_RES_1)
+  }
+
+  test("resource request with null value should not be allowed") {
+verifySetResourceRequestsException(List(), null, Map(CUSTOM_RES_1 -> 
"123"),
+  "requirement failed: Resource parameter should not be null!")
+  }
+
+  test("resource request with valid value and invalid unit") {
+verifySetResourceRequestsException(List(CUSTOM_RES_1), createResource,
+  Map(CUSTOM_RES_1 -> "123ppp"), "")
+  }
+
+  test("resource request with valid value and without unit") {
+verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1), 
Map(CUSTOM_RES_1 -> "123"),
+  Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 123, "")))
+  }
+
+  test("resource request with valid value and unit") {
+verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1), 
Map(CUSTOM_RES_1 -> "2g"),
+  Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 2, "G")))
+  }
+
+  test("two resource requests with valid values and units") {
+verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1, CUSTOM_RES_2),
+  Map(CUSTOM_RES_1 -> "123m", CUSTOM_RES_2 -> "10G"),
+  Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 123, "m"),
+CUSTOM_RES_2 -> ResourceInformation(CUSTOM_RES_2, 10, "G")))
+  }
+
+  test("empty SparkConf should be valid") {
+val sparkConf = new SparkConf()
+ResourceRequestHelper.validateResources(sparkConf)
+  }
+
+  test("just normal resources are defined") {
+val sparkConf = new SparkConf()
+sparkConf.set(DRIVER_MEMORY.key, "3G")
+sparkConf.set(DRIVER_CORES.key, "4")
+sparkConf.set(EXECUTOR_MEMORY.key, "4G")
+sparkConf.set(EXECUTOR_CORES.key, "2")
+ResourceRequestHelper.validateResources(sparkConf)
+  }
+
+  test("memory defined with new config for executor") {
+val sparkConf = new SparkConf()
+sparkConf.set(NEW_CONFIG_EXECUTOR_MEMORY, "30G")
+verifyValidateResourcesException(sparkConf, NEW_CONFIG_EXECUTOR_MEMORY)
+  }
+
+  test("cores defined with new config for executor") {
+val sparkConf =

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-10 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224270816
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala
 ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import java.lang.{Long => JLong}
+import java.lang.reflect.InvocationTargetException
+
+import scala.collection.mutable
+import scala.util.Try
+
+import org.apache.hadoop.yarn.api.records.Resource
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.yarn.config._
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.util.Utils
+
+/**
+ * This helper class uses some of Hadoop 3 methods from the YARN API,
+ * so we need to use reflection to avoid compile error when building 
against Hadoop 2.x
+ */
+private object ResourceRequestHelper extends Logging {
+  private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r
+  private val RESOURCE_INFO_CLASS = 
"org.apache.hadoop.yarn.api.records.ResourceInformation"
+
+  /**
+   * Validates sparkConf and throws a SparkException if any of standard 
resources (memory or cores)
+   * is defined with the property spark.yarn.x.resource.y
+   */
+  def validateResources(sparkConf: SparkConf): Unit = {
+val resourceDefinitions = Seq[(String, String)](
+  (AM_MEMORY.key, YARN_AM_RESOURCE_TYPES_PREFIX + "memory"),
--- End diff --

I went and looked at the documentation because I remember this being 
confusing. The documentation mentions both `memory` and `memory-mb` as being 
valid, with the latter being preferred. So it sounds to me like you can use 
either, and that this code should disallow both.

You even initialize `memory-mb` in your tests, instead of `memory`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-10 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224271821
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ResourceRequestHelperSuite.scala
 ---
@@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import org.apache.hadoop.yarn.api.records.Resource
+import org.apache.hadoop.yarn.util.Records
+import org.scalatest.Matchers
+
+import org.apache.spark.{SparkConf, SparkException, SparkFunSuite}
+import 
org.apache.spark.deploy.yarn.ResourceRequestTestHelper.ResourceInformation
+import org.apache.spark.deploy.yarn.config._
+import org.apache.spark.internal.config.{DRIVER_MEMORY, EXECUTOR_MEMORY}
+
+class ResourceRequestHelperSuite extends SparkFunSuite with Matchers {
+
+  private val CUSTOM_RES_1 = "custom-resource-type-1"
+  private val CUSTOM_RES_2 = "custom-resource-type-2"
+  private val MEMORY = "memory"
+  private val CORES = "cores"
+  private val NEW_CONFIG_EXECUTOR_MEMORY = 
YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + MEMORY
+  private val NEW_CONFIG_EXECUTOR_CORES = 
YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + CORES
+  private val NEW_CONFIG_AM_MEMORY = YARN_AM_RESOURCE_TYPES_PREFIX + MEMORY
+  private val NEW_CONFIG_AM_CORES = YARN_AM_RESOURCE_TYPES_PREFIX + CORES
+  private val NEW_CONFIG_DRIVER_MEMORY = YARN_DRIVER_RESOURCE_TYPES_PREFIX 
+ MEMORY
+  private val NEW_CONFIG_DRIVER_CORES = YARN_DRIVER_RESOURCE_TYPES_PREFIX 
+ CORES
+
+  test("resource request value does not match pattern") {
+verifySetResourceRequestsException(List(CUSTOM_RES_1),
+  Map(CUSTOM_RES_1 -> "**@#"), CUSTOM_RES_1)
+  }
+
+  test("resource request just unit defined") {
+verifySetResourceRequestsException(List(), Map(CUSTOM_RES_1 -> "m"), 
CUSTOM_RES_1)
+  }
+
+  test("resource request with null value should not be allowed") {
+verifySetResourceRequestsException(List(), null, Map(CUSTOM_RES_1 -> 
"123"),
+  "requirement failed: Resource parameter should not be null!")
+  }
+
+  test("resource request with valid value and invalid unit") {
+verifySetResourceRequestsException(List(CUSTOM_RES_1), createResource,
+  Map(CUSTOM_RES_1 -> "123ppp"), "")
+  }
+
+  test("resource request with valid value and without unit") {
+verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1), 
Map(CUSTOM_RES_1 -> "123"),
+  Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 123, "")))
+  }
+
+  test("resource request with valid value and unit") {
+verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1), 
Map(CUSTOM_RES_1 -> "2g"),
+  Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 2, "G")))
+  }
+
+  test("two resource requests with valid values and units") {
+verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1, CUSTOM_RES_2),
+  Map(CUSTOM_RES_1 -> "123m", CUSTOM_RES_2 -> "10G"),
+  Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 123, "m"),
+CUSTOM_RES_2 -> ResourceInformation(CUSTOM_RES_2, 10, "G")))
+  }
+
+  test("empty SparkConf should be valid") {
+val sparkConf = new SparkConf()
+ResourceRequestHelper.validateResources(sparkConf)
+  }
+
+  test("just normal resources are defined") {
+val sparkConf = new SparkConf()
+sparkConf.set(DRIVER_MEMORY.key, "3G")
+sparkConf.set(DRIVER_CORES.key, "4")
+sparkConf.set(EXECUTOR_MEMORY.key, "4G")
+sparkConf.set(EXECUTOR_CORES.key, "2")
+ResourceRequestHelper.validateResources(sparkConf)
+  }
+
+  test("memory defined with new config for executor") {
+val sparkConf = new SparkConf()
+sparkConf.set(NEW_CONFIG_EXECUTOR_MEMORY, "30G")
+verifyValidateResourcesException(sparkConf, NEW_CONFIG_EXECUTOR_MEMORY)
+  }
+
+  test("cores defined with new config for executor") {
+val sparkConf =

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-10 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224271845
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ResourceRequestHelperSuite.scala
 ---
@@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import org.apache.hadoop.yarn.api.records.Resource
+import org.apache.hadoop.yarn.util.Records
+import org.scalatest.Matchers
+
+import org.apache.spark.{SparkConf, SparkException, SparkFunSuite}
+import 
org.apache.spark.deploy.yarn.ResourceRequestTestHelper.ResourceInformation
+import org.apache.spark.deploy.yarn.config._
+import org.apache.spark.internal.config.{DRIVER_MEMORY, EXECUTOR_MEMORY}
+
+class ResourceRequestHelperSuite extends SparkFunSuite with Matchers {
+
+  private val CUSTOM_RES_1 = "custom-resource-type-1"
+  private val CUSTOM_RES_2 = "custom-resource-type-2"
+  private val MEMORY = "memory"
+  private val CORES = "cores"
+  private val NEW_CONFIG_EXECUTOR_MEMORY = 
YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + MEMORY
+  private val NEW_CONFIG_EXECUTOR_CORES = 
YARN_EXECUTOR_RESOURCE_TYPES_PREFIX + CORES
+  private val NEW_CONFIG_AM_MEMORY = YARN_AM_RESOURCE_TYPES_PREFIX + MEMORY
+  private val NEW_CONFIG_AM_CORES = YARN_AM_RESOURCE_TYPES_PREFIX + CORES
+  private val NEW_CONFIG_DRIVER_MEMORY = YARN_DRIVER_RESOURCE_TYPES_PREFIX 
+ MEMORY
+  private val NEW_CONFIG_DRIVER_CORES = YARN_DRIVER_RESOURCE_TYPES_PREFIX 
+ CORES
+
+  test("resource request value does not match pattern") {
+verifySetResourceRequestsException(List(CUSTOM_RES_1),
+  Map(CUSTOM_RES_1 -> "**@#"), CUSTOM_RES_1)
+  }
+
+  test("resource request just unit defined") {
+verifySetResourceRequestsException(List(), Map(CUSTOM_RES_1 -> "m"), 
CUSTOM_RES_1)
+  }
+
+  test("resource request with null value should not be allowed") {
+verifySetResourceRequestsException(List(), null, Map(CUSTOM_RES_1 -> 
"123"),
+  "requirement failed: Resource parameter should not be null!")
+  }
+
+  test("resource request with valid value and invalid unit") {
+verifySetResourceRequestsException(List(CUSTOM_RES_1), createResource,
+  Map(CUSTOM_RES_1 -> "123ppp"), "")
+  }
+
+  test("resource request with valid value and without unit") {
+verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1), 
Map(CUSTOM_RES_1 -> "123"),
+  Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 123, "")))
+  }
+
+  test("resource request with valid value and unit") {
+verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1), 
Map(CUSTOM_RES_1 -> "2g"),
+  Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 2, "G")))
+  }
+
+  test("two resource requests with valid values and units") {
+verifySetResourceRequestsSuccessful(List(CUSTOM_RES_1, CUSTOM_RES_2),
+  Map(CUSTOM_RES_1 -> "123m", CUSTOM_RES_2 -> "10G"),
+  Map(CUSTOM_RES_1 -> ResourceInformation(CUSTOM_RES_1, 123, "m"),
+CUSTOM_RES_2 -> ResourceInformation(CUSTOM_RES_2, 10, "G")))
+  }
+
+  test("empty SparkConf should be valid") {
+val sparkConf = new SparkConf()
+ResourceRequestHelper.validateResources(sparkConf)
+  }
+
+  test("just normal resources are defined") {
+val sparkConf = new SparkConf()
+sparkConf.set(DRIVER_MEMORY.key, "3G")
+sparkConf.set(DRIVER_CORES.key, "4")
+sparkConf.set(EXECUTOR_MEMORY.key, "4G")
+sparkConf.set(EXECUTOR_CORES.key, "2")
+ResourceRequestHelper.validateResources(sparkConf)
+  }
+
+  test("memory defined with new config for executor") {
+val sparkConf = new SparkConf()
+sparkConf.set(NEW_CONFIG_EXECUTOR_MEMORY, "30G")
+verifyValidateResourcesException(sparkConf, NEW_CONFIG_EXECUTOR_MEMORY)
+  }
+
+  test("cores defined with new config for executor") {
+val sparkConf =

[GitHub] spark issue #22690: [SPARK-19287][CORE][STREAMING] JavaPairRDD flatMapValues...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22690
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22690: [SPARK-19287][CORE][STREAMING] JavaPairRDD flatMapValues...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22690
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97225/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22690: [SPARK-19287][CORE][STREAMING] JavaPairRDD flatMapValues...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22690
  
**[Test build #97225 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97225/testReport)**
 for PR 22690 at commit 
[`6c6c1a3`](https://github.com/apache/spark/commit/6c6c1a3ab141353d1867a7d8ab9145da77048980).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...

2018-10-10 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22661#discussion_r224270755
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/JoinBenchmark.scala
 ---
@@ -19,229 +19,165 @@ package org.apache.spark.sql.execution.benchmark
 
 import org.apache.spark.sql.execution.joins._
 import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types.IntegerType
 
 /**
  * Benchmark to measure performance for aggregate primitives.
- * To run this:
- *  build/sbt "sql/test-only *benchmark.JoinBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *  bin/spark-submit --class  --jars  

+ *   2. build/sbt "sql/test:runMain "
+ *   3. generate result:
+ *  SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain "
+ *  Results will be written to "benchmarks/JoinBenchmark-results.txt".
+ * }}}
  */
-class JoinBenchmark extends BenchmarkWithCodegen {
+object JoinBenchmark extends SqlBasedBenchmark {
 
-  ignore("broadcast hash join, long key") {
+  def broadcastHashJoinLongKey(): Unit = {
 val N = 20 << 20
 val M = 1 << 16
 
-val dim = broadcast(sparkSession.range(M).selectExpr("id as k", 
"cast(id as string) as v"))
-runBenchmark("Join w long", N) {
-  val df = sparkSession.range(N).join(dim, (col("id") % M) === 
col("k"))
+val dim = broadcast(spark.range(M).selectExpr("id as k", "cast(id as 
string) as v"))
+codegenBenchmark("Join w long", N) {
+  val df = spark.range(N).join(dim, (col("id") % M) === col("k"))
   
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
   df.count()
 }
-
-/*
-Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
-Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
-Join w long:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
-
---
-Join w long codegen=false3002 / 3262  7.0  
   143.2   1.0X
-Join w long codegen=true  321 /  371 65.3  
15.3   9.3X
-*/
   }
 
-  ignore("broadcast hash join, long key with duplicates") {
+
+  def broadcastHashJoinLongKeyWithDuplicates(): Unit = {
 val N = 20 << 20
 val M = 1 << 16
 
-val dim = broadcast(sparkSession.range(M).selectExpr("id as k", 
"cast(id as string) as v"))
-runBenchmark("Join w long duplicated", N) {
-  val dim = broadcast(sparkSession.range(M).selectExpr("cast(id/10 as 
long) as k"))
-  val df = sparkSession.range(N).join(dim, (col("id") % M) === 
col("k"))
+codegenBenchmark("Join w long duplicated", N) {
+  val dim = broadcast(spark.range(M).selectExpr("cast(id/10 as long) 
as k"))
--- End diff --

According to another bechmark case in this file, `broadcast` seems to be 
put outside of `codegenBenchmark`. How do you think about this? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 499 matches

Mail list logo