date:20181018

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22666
  
**[Test build #97594 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97594/testReport)**
 for PR 22666 at commit 
[`1e90261`](https://github.com/apache/spark/commit/1e90261f964129efc605ed77433477715078745c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22761: [MINOR][DOC] Spacing items in migration guide for readab...

2018-10-18 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22761
  
Merged to master and branch-2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22666
  
**[Test build #97593 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97593/testReport)**
 for PR 22666 at commit 
[`aead783`](https://github.com/apache/spark/commit/aead783d895069b1b6781928eb0afda740085a21).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22766: [SPARK-25768][SQL] fix constant argument expecting UDAFs

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22766
  
**[Test build #97592 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97592/testReport)**
 for PR 22766 at commit 
[`6e6eca4`](https://github.com/apache/spark/commit/6e6eca400ba4040d5001e26b20d7815ed2a0c2f4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22773: [MINOR][SQL] Add prettyNames for from_json, to_json, fro...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22773
  
**[Test build #97591 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97591/testReport)**
 for PR 22773 at commit 
[`3447e73`](https://github.com/apache/spark/commit/3447e73989e39d0c052cf69e5a3e80d1ebb221dc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22773: [MINOR][SQL] Add prettyNames for from_json, to_json, fro...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22773
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22773: [MINOR][SQL] Add prettyNames for from_json, to_json, fro...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22773
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4117/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22773: [MINOR][SQL] Add prettyNames for from_json, to_js...

2018-10-18 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/22773

[MINOR][SQL] Add prettyNames for from_json, to_json, from_csv, and 
schema_of_json

## What changes were proposed in this pull request?

This PR adds `prettyNames` for `from_json`, `to_json`, `from_csv`, and 
`schema_of_json` so that appropriate names are used.

## How was this patch tested?

Unit tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark minor-prettyNames

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22773.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22773


commit 3447e73989e39d0c052cf69e5a3e80d1ebb221dc
Author: hyukjinkwon 
Date:   2018-10-19T05:28:55Z

Add prettyNames for from_json, to_json, from_csv, and schema_of_json




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22263
  
**[Test build #97590 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97590/testReport)**
 for PR 22263 at commit 
[`e2b5dcf`](https://github.com/apache/spark/commit/e2b5dcfc853b6f5608b27c914c397689d09cb267).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22263
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22263
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4116/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...

2018-10-18 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22466#discussion_r226536773
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -840,12 +840,19 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
   }
 
   test("create table in default db") {
-val catalog = spark.sessionState.catalog
-val tableIdent1 = TableIdentifier("tab1", None)
-createTable(catalog, tableIdent1)
-val expectedTableIdent = tableIdent1.copy(database = Some("default"))
-val expectedTable = generateTable(catalog, expectedTableIdent)
-checkCatalogTables(expectedTable, 
catalog.getTableMetadata(tableIdent1))
+var tablePath: Option[URI] = None
--- End diff --

`var tablePath: URI = null`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...

2018-10-18 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22466#discussion_r226536735
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -207,6 +207,14 @@ class SessionCatalog(
   "you cannot create a database with this name.")
 }
 validateName(dbName)
+// SPARK-25464 fail if DB location exists and is not empty
+val dbPath = new Path(dbDefinition.locationUri)
+val fs = dbPath.getFileSystem(hadoopConf)
+if (!externalCatalog.databaseExists(dbName) && fs.exists(dbPath)
+  && fs.listStatus(dbPath).nonEmpty) {
--- End diff --

Should we necessarily list up files? it's potentially expensive. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...

2018-10-18 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22466#discussion_r226536626
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -840,12 +840,19 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
   }
 
   test("create table in default db") {
-val catalog = spark.sessionState.catalog
-val tableIdent1 = TableIdentifier("tab1", None)
-createTable(catalog, tableIdent1)
-val expectedTableIdent = tableIdent1.copy(database = Some("default"))
-val expectedTable = generateTable(catalog, expectedTableIdent)
-checkCatalogTables(expectedTable, 
catalog.getTableMetadata(tableIdent1))
+var tablePath: Option[URI] = None
+try {
+  val catalog = spark.sessionState.catalog
+  val tableIdent1 = TableIdentifier("tab1", None)
+  createTable(catalog, tableIdent1)
+  val expectedTableIdent = tableIdent1.copy(database = Some("default"))
+  val expectedTable = generateTable(catalog, expectedTableIdent)
+  tablePath = Some(expectedTable.location)
+  checkCatalogTables(expectedTable, 
catalog.getTableMetadata(tableIdent1))
+} finally {
+  // This is external table,so it is required to deleted the path
--- End diff --

@HyukjinKwon The first one is `e,s` -> `e, s` ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...

2018-10-18 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22466#discussion_r226536585
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -2370,4 +2370,17 @@ class HiveDDLSuite
   ))
 }
   }
+
+  test("SPARK-25464 create Database with non empty location") {
+val dbName = "dbwithcustomlocation"
+withTempDir { tmpDir =>
+  val parentDir = tmpDir.getParent
+  val expectedMsg = s"Cannot create database at location $parentDir 
because the path is not " +
+s"empty."
--- End diff --

leading `s` can be removed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...

2018-10-18 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22466#discussion_r226536555
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -2370,4 +2370,17 @@ class HiveDDLSuite
   ))
 }
   }
+
+  test("SPARK-25464 create Database with non empty location") {
--- End diff --

`create a database with a non-empty location`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22466
  
**[Test build #97589 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97589/testReport)**
 for PR 22466 at commit 
[`d862591`](https://github.com/apache/spark/commit/d862591c20ef1d1536d069dcc4f3220ae232c702).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...

2018-10-18 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22466#discussion_r226536442
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -840,12 +840,19 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
   }
 
   test("create table in default db") {
-val catalog = spark.sessionState.catalog
-val tableIdent1 = TableIdentifier("tab1", None)
-createTable(catalog, tableIdent1)
-val expectedTableIdent = tableIdent1.copy(database = Some("default"))
-val expectedTable = generateTable(catalog, expectedTableIdent)
-checkCatalogTables(expectedTable, 
catalog.getTableMetadata(tableIdent1))
+var tablePath: Option[URI] = None
+try {
+  val catalog = spark.sessionState.catalog
+  val tableIdent1 = TableIdentifier("tab1", None)
+  createTable(catalog, tableIdent1)
+  val expectedTableIdent = tableIdent1.copy(database = Some("default"))
+  val expectedTable = generateTable(catalog, expectedTableIdent)
+  tablePath = Some(expectedTable.location)
+  checkCatalogTables(expectedTable, 
catalog.getTableMetadata(tableIdent1))
+} finally {
+  // This is external table,so it is required to deleted the path
--- End diff --

`this is an external table`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...

2018-10-18 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22466#discussion_r226536456
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -840,12 +840,19 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
   }
 
   test("create table in default db") {
-val catalog = spark.sessionState.catalog
-val tableIdent1 = TableIdentifier("tab1", None)
-createTable(catalog, tableIdent1)
-val expectedTableIdent = tableIdent1.copy(database = Some("default"))
-val expectedTable = generateTable(catalog, expectedTableIdent)
-checkCatalogTables(expectedTable, 
catalog.getTableMetadata(tableIdent1))
+var tablePath: Option[URI] = None
+try {
+  val catalog = spark.sessionState.catalog
+  val tableIdent1 = TableIdentifier("tab1", None)
+  createTable(catalog, tableIdent1)
+  val expectedTableIdent = tableIdent1.copy(database = Some("default"))
+  val expectedTable = generateTable(catalog, expectedTableIdent)
+  tablePath = Some(expectedTable.location)
+  checkCatalogTables(expectedTable, 
catalog.getTableMetadata(tableIdent1))
+} finally {
+  // This is external table,so it is required to deleted the path
--- End diff --

`it is required to delete`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...

2018-10-18 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22466#discussion_r226536304
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -840,12 +840,19 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
   }
 
   test("create table in default db") {
-val catalog = spark.sessionState.catalog
-val tableIdent1 = TableIdentifier("tab1", None)
-createTable(catalog, tableIdent1)
-val expectedTableIdent = tableIdent1.copy(database = Some("default"))
-val expectedTable = generateTable(catalog, expectedTableIdent)
-checkCatalogTables(expectedTable, 
catalog.getTableMetadata(tableIdent1))
+var tablePath: Option[URI] = None
+try {
+  val catalog = spark.sessionState.catalog
+  val tableIdent1 = TableIdentifier("tab1", None)
+  createTable(catalog, tableIdent1)
+  val expectedTableIdent = tableIdent1.copy(database = Some("default"))
+  val expectedTable = generateTable(catalog, expectedTableIdent)
+  tablePath = Some(expectedTable.location)
+  checkCatalogTables(expectedTable, 
catalog.getTableMetadata(tableIdent1))
+} finally {
+  // This is external table,so it is required to deleted the path
--- End diff --

tiny nit: `e,` -> `e ,`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...

2018-10-18 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22466
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22666
  
**[Test build #97588 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97588/testReport)**
 for PR 22666 at commit 
[`1b86834`](https://github.com/apache/spark/commit/1b86834c1265992e3b46aaf079e1e17ea7c389c4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-10-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r226534798
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -325,6 +325,21 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 assert(isExpectStorageLevel(rddId, Memory))
   }
 
+  test("SQL interface support storageLevel(Invalid StorageLevel)") {
+val message = intercept[IllegalArgumentException] {
+  sql("CACHE TABLE testData OPTIONS('storageLevel' 
'invalid_storage_level')")
+}.getMessage
+assert(message.contains("Invalid StorageLevel: INVALID_STORAGE_LEVEL"))
+  }
+
+  test("SQL interface support storageLevel(with LAZY)") {
+sql("CACHE LAZY TABLE testData OPTIONS('storageLevel' 'disk_only')")
+assertCached(spark.table("testData"))
+val rddId = rddIdOf("testData")
+sql("SELECT COUNT(*) FROM testData").collect()
+assert(isExpectStorageLevel(rddId, Disk))
--- End diff --

Do you think the previously existing `lazy`-related test cases protect this 
new SQL syntax contribution from future regressions?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22732
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4115/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22732
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22732
  
**[Test build #97587 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97587/testReport)**
 for PR 22732 at commit 
[`cb7e97a`](https://github.com/apache/spark/commit/cb7e97a18c0a8d5d806c70771b2c85a01e5f0df5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...

2018-10-18 Thread sandeep-katta

Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22466
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22732
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22732
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97578/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22732
  
**[Test build #97578 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97578/testReport)**
 for PR 22732 at commit 
[`84cb456`](https://github.com/apache/spark/commit/84cb456c18d02f8abb21934191508fca5e58e6e2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22504
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97565/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22504
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22504
  
**[Test build #97565 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97565/testReport)**
 for PR 22504 at commit 
[`a0a85b3`](https://github.com/apache/spark/commit/a0a85b3eff0f115b983ce5ba3214e09f8ee90dd2).
 * This patch **fails PySpark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22666
  
**[Test build #97586 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97586/testReport)**
 for PR 22666 at commit 
[`6cbc7fb`](https://github.com/apache/spark/commit/6cbc7fb45478882c15c6694fff964da043d2445c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22666
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22666
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97585/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22666
  
**[Test build #97585 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97585/testReport)**
 for PR 22666 at commit 
[`c9df3ab`](https://github.com/apache/spark/commit/c9df3ab40f5130cb1c3f7207e1371ddd5fb922fc).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class UnivocityParserSuite extends SparkFunSuite `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22749
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97570/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22749
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22666
  
**[Test build #97585 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97585/testReport)**
 for PR 22666 at commit 
[`c9df3ab`](https://github.com/apache/spark/commit/c9df3ab40f5130cb1c3f7207e1371ddd5fb922fc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22749
  
**[Test build #97570 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97570/testReport)**
 for PR 22749 at commit 
[`0c78b73`](https://github.com/apache/spark/commit/0c78b73e5abce2a51763c860e43aab214c8634d9).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22772: [SPARK-24499][SQL][DOC][Followup] Fix some broken links

2018-10-18 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22772
  
It's okay. the doc fix was huge and there should likely be some mistakes. I 
will read it closely too this weekends.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22666
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22666
  
**[Test build #97584 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97584/testReport)**
 for PR 22666 at commit 
[`4869b76`](https://github.com/apache/spark/commit/4869b76e4f35b094793ff1f69cce3edbeb922ef1).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22666
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97584/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-18 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22666
  
This is a WIP.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22666
  
**[Test build #97584 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97584/testReport)**
 for PR 22666 at commit 
[`4869b76`](https://github.com/apache/spark/commit/4869b76e4f35b094793ff1f69cce3edbeb922ef1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22666
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22746: [SPARK-24499][SQL][DOC] Split the page of sql-programmin...

2018-10-18 Thread xuanyuanking

Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/22746
  
Thanks all reviewers! Sorry for still having some mistake in new doc and 
I'll keep checking on this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22666
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97583/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22666
  
**[Test build #97583 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97583/testReport)**
 for PR 22666 at commit 
[`80d6759`](https://github.com/apache/spark/commit/80d67596e8a0d2c5040816d090c6ff912b76c02c).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22666
  
**[Test build #97583 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97583/testReport)**
 for PR 22666 at commit 
[`80d6759`](https://github.com/apache/spark/commit/80d67596e8a0d2c5040816d090c6ff912b76c02c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22749
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97569/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22749
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22749
  
**[Test build #97569 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97569/testReport)**
 for PR 22749 at commit 
[`b211ed0`](https://github.com/apache/spark/commit/b211ed069dceb33c45cf6caf12c19527334d4ad8).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22666
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22666
  
**[Test build #97582 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97582/testReport)**
 for PR 22666 at commit 
[`cd7e2ab`](https://github.com/apache/spark/commit/cd7e2abf4cea8744f0316fcbc7dafac4918079c7).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22666
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97582/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...

2018-10-18 Thread maryannxue

Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/spark/pull/22732#discussion_r226527439
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -932,6 +935,23 @@ trait ScalaReflection {
 tpe.dealias.erasure.typeSymbol.asClass.fullName
   }
 
+  /**
+   * Returns the nullability of the input parameter types of the scala 
function object.
+   *
+   * Note that this only works with Scala 2.11, and the information 
returned may be inaccurate if
+   * used with a different Scala version.
--- End diff --

The argument here is it's not necessarily wrong if using scala 2.12. if all 
inputs are of boxed types, then it can still be good. I think it's just enough 
to say "we don't support it. switch to the new interface otherwise we can't 
guarantee correctness."


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22666
  
**[Test build #97582 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97582/testReport)**
 for PR 22666 at commit 
[`cd7e2ab`](https://github.com/apache/spark/commit/cd7e2abf4cea8744f0316fcbc7dafac4918079c7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22732
  
**[Test build #97581 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97581/testReport)**
 for PR 22732 at commit 
[`e848ec7`](https://github.com/apache/spark/commit/e848ec7a2d420c28764ac7319f801666c40684c2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22732
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4114/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22732
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22504
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97560/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22504
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22295
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22504
  
**[Test build #97560 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97560/testReport)**
 for PR 22504 at commit 
[`07d2df8`](https://github.com/apache/spark/commit/07d2df87fddd540637b054b643eb5484c5e58eaf).
 * This patch **fails from timeout after a configured wait of `400m`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22295
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97577/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22295
  
**[Test build #97577 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97577/testReport)**
 for PR 22295 at commit 
[`94e3db0`](https://github.com/apache/spark/commit/94e3db0c0c9873daaca688c2a63f01420882692e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...

2018-10-18 Thread rezasafi

Github user rezasafi commented on a diff in the pull request:

https://github.com/apache/spark/pull/22612#discussion_r226526015
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/ProcfsBasedSystems.scala ---
@@ -0,0 +1,226 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.executor
+
+import java.io._
+import java.nio.charset.Charset
+import java.nio.file.{Files, Paths}
+import java.util.Locale
+
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.{SparkEnv, SparkException}
+import org.apache.spark.internal.{config, Logging}
+import org.apache.spark.util.Utils
+
+private[spark] case class ProcfsBasedSystemsMetrics(
+jvmVmemTotal: Long,
+jvmRSSTotal: Long,
+pythonVmemTotal: Long,
+pythonRSSTotal: Long,
+otherVmemTotal: Long,
+otherRSSTotal: Long)
+
+// Some of the ideas here are taken from the ProcfsBasedProcessTree class 
in hadoop
+// project.
+private[spark] class ProcfsBasedSystems(val procfsDir: String = "/proc/") 
extends Logging {
+  val procfsStatFile = "stat"
+  val testing = sys.env.contains("SPARK_TESTING") || 
sys.props.contains("spark.testing")
+  var pageSize = computePageSize()
+  var isAvailable: Boolean = isProcfsAvailable
+  private val pid = computePid()
+  private val ptree = mutable.Map[ Int, Set[Int]]()
+
+  var allMetrics: ProcfsBasedSystemsMetrics = ProcfsBasedSystemsMetrics(0, 
0, 0, 0, 0, 0)
+
+  computeProcessTree()
+
+  private def isProcfsAvailable: Boolean = {
+if (testing) {
+  return true
+}
+try {
+  if (!Files.exists(Paths.get(procfsDir))) {
+return false
+  }
+}
+catch {
+  case f: FileNotFoundException => return false
+}
+val shouldLogStageExecutorMetrics =
+  SparkEnv.get.conf.get(config.EVENT_LOG_STAGE_EXECUTOR_METRICS)
+val shouldLogStageExecutorProcessTreeMetrics =
+  SparkEnv.get.conf.get(config.EVENT_LOG_PROCESS_TREE_METRICS)
+shouldLogStageExecutorProcessTreeMetrics && 
shouldLogStageExecutorMetrics
+  }
+
+  private def computePid(): Int = {
+if (!isAvailable || testing) {
+  return -1;
+}
+try {
+  // This can be simplified in java9:
+  // 
https://docs.oracle.com/javase/9/docs/api/java/lang/ProcessHandle.html
+  val cmd = Array("bash", "-c", "echo $PPID")
+  val length = 10
+  val out2 = Utils.executeAndGetOutput(cmd)
+  val pid = Integer.parseInt(out2.split("\n")(0))
+  return pid;
+}
+catch {
+  case e: SparkException => logDebug("IO Exception when trying to 
compute process tree." +
+" As a result reporting of ProcessTree metrics is stopped", e)
+isAvailable = false
+return -1
+}
+  }
+
+  private def computePageSize(): Long = {
+if (testing) {
+  return 0;
+}
+val cmd = Array("getconf", "PAGESIZE")
+val out2 = Utils.executeAndGetOutput(cmd)
+return Integer.parseInt(out2.split("\n")(0))
+  }
+
+  private def computeProcessTree(): Unit = {
+if (!isAvailable || testing) {
+  return
+}
+val queue = mutable.Queue.empty[Int]
+queue += pid
+while( !queue.isEmpty ) {
+  val p = queue.dequeue()
+  val c = getChildPids(p)
+  if(!c.isEmpty) {
+queue ++= c
+ptree += (p -> c.toSet)
+  }
+  else {
+ptree += (p -> Set[Int]())
+  }
+}
+  }
+
+  private def getChildPids(pid: Int): ArrayBuffer[Int] = {
+try {
+  val cmd = Array("pgrep", "-P", pid.toString)
+  val builder = new ProcessBuilder("pgrep", "-P", pid.toString)
+  val process = builder.start()
+  val output = new StringBuilder()
+  val threadName = "read stdout for

[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...

2018-10-18 Thread rezasafi

Github user rezasafi commented on a diff in the pull request:

https://github.com/apache/spark/pull/22612#discussion_r226525661
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/ProcfsBasedSystems.scala ---
@@ -0,0 +1,226 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.executor
+
+import java.io._
+import java.nio.charset.Charset
+import java.nio.file.{Files, Paths}
+import java.util.Locale
+
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.{SparkEnv, SparkException}
+import org.apache.spark.internal.{config, Logging}
+import org.apache.spark.util.Utils
+
+private[spark] case class ProcfsBasedSystemsMetrics(
+jvmVmemTotal: Long,
+jvmRSSTotal: Long,
+pythonVmemTotal: Long,
+pythonRSSTotal: Long,
+otherVmemTotal: Long,
+otherRSSTotal: Long)
+
+// Some of the ideas here are taken from the ProcfsBasedProcessTree class 
in hadoop
+// project.
+private[spark] class ProcfsBasedSystems(val procfsDir: String = "/proc/") 
extends Logging {
+  val procfsStatFile = "stat"
+  val testing = sys.env.contains("SPARK_TESTING") || 
sys.props.contains("spark.testing")
+  var pageSize = computePageSize()
+  var isAvailable: Boolean = isProcfsAvailable
+  private val pid = computePid()
+  private val ptree = mutable.Map[ Int, Set[Int]]()
+
+  var allMetrics: ProcfsBasedSystemsMetrics = ProcfsBasedSystemsMetrics(0, 
0, 0, 0, 0, 0)
+
+  computeProcessTree()
+
+  private def isProcfsAvailable: Boolean = {
+if (testing) {
+  return true
+}
+try {
+  if (!Files.exists(Paths.get(procfsDir))) {
+return false
+  }
+}
+catch {
+  case f: FileNotFoundException => return false
+}
+val shouldLogStageExecutorMetrics =
+  SparkEnv.get.conf.get(config.EVENT_LOG_STAGE_EXECUTOR_METRICS)
+val shouldLogStageExecutorProcessTreeMetrics =
+  SparkEnv.get.conf.get(config.EVENT_LOG_PROCESS_TREE_METRICS)
+shouldLogStageExecutorProcessTreeMetrics && 
shouldLogStageExecutorMetrics
+  }
+
+  private def computePid(): Int = {
+if (!isAvailable || testing) {
+  return -1;
+}
+try {
+  // This can be simplified in java9:
+  // 
https://docs.oracle.com/javase/9/docs/api/java/lang/ProcessHandle.html
+  val cmd = Array("bash", "-c", "echo $PPID")
+  val length = 10
+  val out2 = Utils.executeAndGetOutput(cmd)
+  val pid = Integer.parseInt(out2.split("\n")(0))
+  return pid;
+}
+catch {
+  case e: SparkException => logDebug("IO Exception when trying to 
compute process tree." +
--- End diff --

oh it seems there wasn't a mistake here and I jut forgot the reason here. I 
caught SparkException since executeAndGetOutput may throw such an exception. I 
will remove the IOException


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22749
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22749
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97568/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22429
  
**[Test build #97580 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97580/testReport)**
 for PR 22429 at commit 
[`9f1d11d`](https://github.com/apache/spark/commit/9f1d11df99ce37959229b2830b89e2a943d638f0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22749
  
**[Test build #97568 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97568/testReport)**
 for PR 22749 at commit 
[`35700f4`](https://github.com/apache/spark/commit/35700f4a0f36fb397ac028a68011a2753c5c2c75).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

2018-10-18 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22429
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

2018-10-18 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22429
  
I am able to address his comments for his vacation. Please keep reviewing 
this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-18 Thread sujith71955

Github user sujith71955 commented on the issue:

https://github.com/apache/spark/pull/22758
  
> Inorder to make this flow consistent either
> a) we need to record HiveStats for insert command flow and always 
consider this stats while compting
> OR
> b) As mentioned above in snapshot we will estimate the data size with 
files always for convertable relations.

Just a suggestion  let me know for any thoughts;) Thanks all for your 
valuable time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22721: [SPARK-25403][SQL] Refreshes the table after inserting t...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22721
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22721: [SPARK-25403][SQL] Refreshes the table after inserting t...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22721
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4113/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-18 Thread sujith71955

Github user sujith71955 commented on the issue:

https://github.com/apache/spark/pull/22758
  
Inorder to make this flow consistent  either
a)  we need to record HiveStats for insert command flow and always consider 
this stats while compting
 OR
b) As mentioned above in snapshot we will estimate the data size with files 
always for convertable relations.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...

2018-10-18 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22503


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...

2018-10-18 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22503
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-18 Thread sujith71955

Github user sujith71955 commented on the issue:

https://github.com/apache/spark/pull/22758
  
> I think the cost of get the stats from `HadoopFileSystem` may be quite 
high.

Then we shall depend on HiveStats always to get the statistics, which is 
happening now also but partially. and i think this PR solving that problem, But 
what i told is based on cloudFans expectation


![image](https://user-images.githubusercontent.com/12999161/47195764-f3874700-d37a-11e8-9b93-e3c1cb228c54.png)



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22721: [SPARK-25403][SQL] Refreshes the table after inserting t...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22721
  
**[Test build #97579 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97579/testReport)**
 for PR 22721 at commit 
[`6c8a73f`](https://github.com/apache/spark/commit/6c8a73f0fe74f618b429dee23869a00e706b125d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...

2018-10-18 Thread rezasafi

Github user rezasafi commented on a diff in the pull request:

https://github.com/apache/spark/pull/22612#discussion_r226524522
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/ProcfsBasedSystems.scala ---
@@ -0,0 +1,226 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.executor
+
+import java.io._
+import java.nio.charset.Charset
+import java.nio.file.{Files, Paths}
+import java.util.Locale
+
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.{SparkEnv, SparkException}
+import org.apache.spark.internal.{config, Logging}
+import org.apache.spark.util.Utils
+
+private[spark] case class ProcfsBasedSystemsMetrics(
+jvmVmemTotal: Long,
+jvmRSSTotal: Long,
+pythonVmemTotal: Long,
+pythonRSSTotal: Long,
+otherVmemTotal: Long,
+otherRSSTotal: Long)
+
+// Some of the ideas here are taken from the ProcfsBasedProcessTree class 
in hadoop
+// project.
+private[spark] class ProcfsBasedSystems(val procfsDir: String = "/proc/") 
extends Logging {
+  val procfsStatFile = "stat"
+  val testing = sys.env.contains("SPARK_TESTING") || 
sys.props.contains("spark.testing")
+  var pageSize = computePageSize()
+  var isAvailable: Boolean = isProcfsAvailable
+  private val pid = computePid()
+  private val ptree = mutable.Map[ Int, Set[Int]]()
+
+  var allMetrics: ProcfsBasedSystemsMetrics = ProcfsBasedSystemsMetrics(0, 
0, 0, 0, 0, 0)
+
+  computeProcessTree()
+
+  private def isProcfsAvailable: Boolean = {
+if (testing) {
+  return true
+}
+try {
+  if (!Files.exists(Paths.get(procfsDir))) {
+return false
+  }
+}
+catch {
+  case f: FileNotFoundException => return false
+}
+val shouldLogStageExecutorMetrics =
+  SparkEnv.get.conf.get(config.EVENT_LOG_STAGE_EXECUTOR_METRICS)
+val shouldLogStageExecutorProcessTreeMetrics =
+  SparkEnv.get.conf.get(config.EVENT_LOG_PROCESS_TREE_METRICS)
+shouldLogStageExecutorProcessTreeMetrics && 
shouldLogStageExecutorMetrics
+  }
+
+  private def computePid(): Int = {
+if (!isAvailable || testing) {
+  return -1;
+}
+try {
+  // This can be simplified in java9:
+  // 
https://docs.oracle.com/javase/9/docs/api/java/lang/ProcessHandle.html
+  val cmd = Array("bash", "-c", "echo $PPID")
+  val length = 10
+  val out2 = Utils.executeAndGetOutput(cmd)
+  val pid = Integer.parseInt(out2.split("\n")(0))
+  return pid;
+}
+catch {
+  case e: SparkException => logDebug("IO Exception when trying to 
compute process tree." +
+" As a result reporting of ProcessTree metrics is stopped", e)
+isAvailable = false
+return -1
+}
+  }
+
+  private def computePageSize(): Long = {
+if (testing) {
+  return 0;
+}
+val cmd = Array("getconf", "PAGESIZE")
+val out2 = Utils.executeAndGetOutput(cmd)
+return Integer.parseInt(out2.split("\n")(0))
+  }
+
+  private def computeProcessTree(): Unit = {
+if (!isAvailable || testing) {
+  return
+}
+val queue = mutable.Queue.empty[Int]
+queue += pid
+while( !queue.isEmpty ) {
+  val p = queue.dequeue()
+  val c = getChildPids(p)
+  if(!c.isEmpty) {
+queue ++= c
+ptree += (p -> c.toSet)
+  }
+  else {
+ptree += (p -> Set[Int]())
+  }
+}
+  }
+
+  private def getChildPids(pid: Int): ArrayBuffer[Int] = {
+try {
+  val cmd = Array("pgrep", "-P", pid.toString)
+  val builder = new ProcessBuilder("pgrep", "-P", pid.toString)
+  val process = builder.start()
+  val output = new StringBuilder()
+  val threadName = "read stdout for

[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...

2018-10-18 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22503#discussion_r226524439
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -220,6 +221,17 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils with Te
 }
   }
 
+  test("crlf line separators in multiline mode") {
--- End diff --

nit: -> `SPARK-25493: crlf line separators in multiline mode` 

when a PR fixes a specific problem, let's add the jira prefix in the test 
name next time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...

2018-10-18 Thread rezasafi

Github user rezasafi commented on a diff in the pull request:

https://github.com/apache/spark/pull/22612#discussion_r226524183
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/ProcfsBasedSystems.scala ---
@@ -0,0 +1,226 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.executor
+
+import java.io._
+import java.nio.charset.Charset
+import java.nio.file.{Files, Paths}
+import java.util.Locale
+
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.{SparkEnv, SparkException}
+import org.apache.spark.internal.{config, Logging}
+import org.apache.spark.util.Utils
+
+private[spark] case class ProcfsBasedSystemsMetrics(
+jvmVmemTotal: Long,
+jvmRSSTotal: Long,
+pythonVmemTotal: Long,
+pythonRSSTotal: Long,
+otherVmemTotal: Long,
+otherRSSTotal: Long)
+
+// Some of the ideas here are taken from the ProcfsBasedProcessTree class 
in hadoop
+// project.
+private[spark] class ProcfsBasedSystems(val procfsDir: String = "/proc/") 
extends Logging {
+  val procfsStatFile = "stat"
+  val testing = sys.env.contains("SPARK_TESTING") || 
sys.props.contains("spark.testing")
+  var pageSize = computePageSize()
+  var isAvailable: Boolean = isProcfsAvailable
+  private val pid = computePid()
+  private val ptree = mutable.Map[ Int, Set[Int]]()
+
+  var allMetrics: ProcfsBasedSystemsMetrics = ProcfsBasedSystemsMetrics(0, 
0, 0, 0, 0, 0)
+
+  computeProcessTree()
+
+  private def isProcfsAvailable: Boolean = {
+if (testing) {
+  return true
+}
+try {
+  if (!Files.exists(Paths.get(procfsDir))) {
+return false
+  }
+}
+catch {
+  case f: FileNotFoundException => return false
+}
+val shouldLogStageExecutorMetrics =
+  SparkEnv.get.conf.get(config.EVENT_LOG_STAGE_EXECUTOR_METRICS)
+val shouldLogStageExecutorProcessTreeMetrics =
+  SparkEnv.get.conf.get(config.EVENT_LOG_PROCESS_TREE_METRICS)
+shouldLogStageExecutorProcessTreeMetrics && 
shouldLogStageExecutorMetrics
+  }
+
+  private def computePid(): Int = {
+if (!isAvailable || testing) {
+  return -1;
+}
+try {
+  // This can be simplified in java9:
+  // 
https://docs.oracle.com/javase/9/docs/api/java/lang/ProcessHandle.html
+  val cmd = Array("bash", "-c", "echo $PPID")
+  val length = 10
+  val out2 = Utils.executeAndGetOutput(cmd)
+  val pid = Integer.parseInt(out2.split("\n")(0))
+  return pid;
+}
+catch {
+  case e: SparkException => logDebug("IO Exception when trying to 
compute process tree." +
+" As a result reporting of ProcessTree metrics is stopped", e)
+isAvailable = false
+return -1
+}
+  }
+
+  private def computePageSize(): Long = {
+if (testing) {
+  return 0;
+}
+val cmd = Array("getconf", "PAGESIZE")
+val out2 = Utils.executeAndGetOutput(cmd)
+return Integer.parseInt(out2.split("\n")(0))
--- End diff --

yes, will fix it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...

2018-10-18 Thread rezasafi

Github user rezasafi commented on a diff in the pull request:

https://github.com/apache/spark/pull/22612#discussion_r226524080
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/ProcfsBasedSystems.scala ---
@@ -0,0 +1,226 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.executor
+
+import java.io._
+import java.nio.charset.Charset
+import java.nio.file.{Files, Paths}
+import java.util.Locale
+
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.{SparkEnv, SparkException}
+import org.apache.spark.internal.{config, Logging}
+import org.apache.spark.util.Utils
+
+private[spark] case class ProcfsBasedSystemsMetrics(
+jvmVmemTotal: Long,
+jvmRSSTotal: Long,
+pythonVmemTotal: Long,
+pythonRSSTotal: Long,
+otherVmemTotal: Long,
+otherRSSTotal: Long)
+
+// Some of the ideas here are taken from the ProcfsBasedProcessTree class 
in hadoop
+// project.
+private[spark] class ProcfsBasedSystems(val procfsDir: String = "/proc/") 
extends Logging {
+  val procfsStatFile = "stat"
+  val testing = sys.env.contains("SPARK_TESTING") || 
sys.props.contains("spark.testing")
+  var pageSize = computePageSize()
+  var isAvailable: Boolean = isProcfsAvailable
+  private val pid = computePid()
+  private val ptree = mutable.Map[ Int, Set[Int]]()
+
+  var allMetrics: ProcfsBasedSystemsMetrics = ProcfsBasedSystemsMetrics(0, 
0, 0, 0, 0, 0)
+
+  computeProcessTree()
+
+  private def isProcfsAvailable: Boolean = {
+if (testing) {
+  return true
+}
+try {
+  if (!Files.exists(Paths.get(procfsDir))) {
+return false
+  }
+}
+catch {
+  case f: FileNotFoundException => return false
+}
+val shouldLogStageExecutorMetrics =
+  SparkEnv.get.conf.get(config.EVENT_LOG_STAGE_EXECUTOR_METRICS)
+val shouldLogStageExecutorProcessTreeMetrics =
+  SparkEnv.get.conf.get(config.EVENT_LOG_PROCESS_TREE_METRICS)
+shouldLogStageExecutorProcessTreeMetrics && 
shouldLogStageExecutorMetrics
+  }
+
+  private def computePid(): Int = {
+if (!isAvailable || testing) {
+  return -1;
+}
+try {
+  // This can be simplified in java9:
+  // 
https://docs.oracle.com/javase/9/docs/api/java/lang/ProcessHandle.html
+  val cmd = Array("bash", "-c", "echo $PPID")
+  val length = 10
+  val out2 = Utils.executeAndGetOutput(cmd)
+  val pid = Integer.parseInt(out2.split("\n")(0))
+  return pid;
+}
+catch {
+  case e: SparkException => logDebug("IO Exception when trying to 
compute process tree." +
--- End diff --

Let me double check I thought there was a comment before that I should just 
get SparkException, but you are right. it doesn't make sense. Probably a 
mistake on my side. I was just caring about IOException here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...

2018-10-18 Thread rezasafi

Github user rezasafi commented on a diff in the pull request:

https://github.com/apache/spark/pull/22612#discussion_r226523830
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/ProcfsBasedSystems.scala ---
@@ -0,0 +1,266 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.executor
+
+import java.io._
+import java.nio.charset.Charset
+import java.nio.file.{Files, Paths}
+import java.util.Locale
+
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+import scala.collection.mutable.Queue
+
+import org.apache.spark.SparkEnv
+import org.apache.spark.internal.{config, Logging}
+
+private[spark] case class ProcfsBasedSystemsMetrics(
+jvmVmemTotal: Long,
+jvmRSSTotal: Long,
+pythonVmemTotal: Long,
+pythonRSSTotal: Long,
+otherVmemTotal: Long,
+otherRSSTotal: Long)
+
+// Some of the ideas here are taken from the ProcfsBasedProcessTree class 
in hadoop
+// project.
+private[spark] class ProcfsBasedSystems(val procfsDir: String = "/proc/") 
extends Logging {
+  val procfsStatFile = "stat"
+  var pageSize = computePageSize()
+  var isAvailable: Boolean = isProcfsAvailable
+  private val pid = computePid()
+  private val ptree = mutable.Map[ Int, Set[Int]]()
+
+  var allMetrics: ProcfsBasedSystemsMetrics = ProcfsBasedSystemsMetrics(0, 
0, 0, 0, 0, 0)
+  private var latestJVMVmemTotal = 0L
+  private var latestJVMRSSTotal = 0L
+  private var latestPythonVmemTotal = 0L
+  private var latestPythonRSSTotal = 0L
+  private var latestOtherVmemTotal = 0L
+  private var latestOtherRSSTotal = 0L
+
+  computeProcessTree()
+
+  private def isProcfsAvailable: Boolean = {
--- End diff --

I had a test case that this can change depending on the health of the node. 
I think it shouldn't be a val to be cautious.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-18 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22758
  
I think the cost of get the stats from `HadoopFileSystem` may be quite high.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...

2018-10-18 Thread rezasafi

Github user rezasafi commented on a diff in the pull request:

https://github.com/apache/spark/pull/22612#discussion_r226523734
  
--- Diff: core/src/main/scala/org/apache/spark/Heartbeater.scala ---
@@ -59,12 +60,13 @@ private[spark] class Heartbeater(
 heartbeater.awaitTermination(10, TimeUnit.SECONDS)
   }
 
-  /**
-   * Get the current executor level metrics. These are returned as an 
array, with the index
-   * determined by MetricGetter.values
-   */
+  /** Get the current executor level metrics. These are returned as a Map 
*/
   def getCurrentMetrics(): ExecutorMetrics = {
-val metrics = 
ExecutorMetricType.values.map(_.getMetricValue(memoryManager)).toArray
+// figure out how to append all the metrics
--- End diff --

No the commit should have been removed. will fix it. Thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22504
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22504
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97566/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22732
  
**[Test build #97578 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97578/testReport)**
 for PR 22732 at commit 
[`84cb456`](https://github.com/apache/spark/commit/84cb456c18d02f8abb21934191508fca5e58e6e2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22732
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...

2018-10-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22732
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4112/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...

2018-10-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22504
  
**[Test build #97566 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97566/testReport)**
 for PR 22504 at commit 
[`65c0800`](https://github.com/apache/spark/commit/65c080032bbea82024f1ab14cb43c771f6157fc4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-18 Thread sujith71955

Github user sujith71955 commented on the issue:

https://github.com/apache/spark/pull/22758
  
@cloud-fan 
I can think as one solution, that In DetermineStats flow we can add one 
more condition to not update the stats  for convertable relations, since we 
always get the stats from HadoopFileSystem for convertable relations . this 
shall solve all the problems which we are facing . please let me know for your 
suggestions as i will update this PR as per this logic.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22764: [SPARK-25765][ML] Add training cost to BisectingK...

2018-10-18 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22764#discussion_r226520698
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala
 ---
@@ -225,13 +227,14 @@ object BisectingKMeansModel extends 
Loader[BisectingKMeansModel] {
   assert(formatVersion == thisFormatVersion)
   val rootId = (metadata \ "rootId").extract[Int]
   val distanceMeasure = (metadata \ "distanceMeasure").extract[String]
+  val trainingCost = (metadata \ "trainingCost").extract[Double]
--- End diff --

- Could you avoid modifying loading model code in "mllib" package, but 
modifying code in "ml" package, i.e., the class 
`ml.clustering.BisectingKMeansModel.BisectingKMeansModelReader`, you can 
reference the `KMeans` code: `ml.clustering.KMeansModel.KMeansModelReader`.

- And, +1 with @viirya mentioned, we should keep model loading 
compatibility, add a version check (when >= 2.4) then we load "training cost" . 
Note that add these in 
`ml.clustering.BisectingKMeansModel.BisectingKMeansModelReader`.

- And, could you also add version check (when >= 2.4) then we load 
"training cost" into `ml.clustering.KMeansModel.KMeansModelReader` ? 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 >

1 - 100 of 709 matches

Mail list logo