[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97594 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97594/testReport)** for PR 22666 at commit [`1e90261`](https://github.com/apache/spark/commit/1e90261f964129efc605ed77433477715078745c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22761: [MINOR][DOC] Spacing items in migration guide for readab...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22761 Merged to master and branch-2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97593 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97593/testReport)** for PR 22666 at commit [`aead783`](https://github.com/apache/spark/commit/aead783d895069b1b6781928eb0afda740085a21). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22766: [SPARK-25768][SQL] fix constant argument expecting UDAFs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22766 **[Test build #97592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97592/testReport)** for PR 22766 at commit [`6e6eca4`](https://github.com/apache/spark/commit/6e6eca400ba4040d5001e26b20d7815ed2a0c2f4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22773: [MINOR][SQL] Add prettyNames for from_json, to_json, fro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22773 **[Test build #97591 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97591/testReport)** for PR 22773 at commit [`3447e73`](https://github.com/apache/spark/commit/3447e73989e39d0c052cf69e5a3e80d1ebb221dc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22773: [MINOR][SQL] Add prettyNames for from_json, to_json, fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22773 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22773: [MINOR][SQL] Add prettyNames for from_json, to_json, fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22773 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4117/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22773: [MINOR][SQL] Add prettyNames for from_json, to_js...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/22773 [MINOR][SQL] Add prettyNames for from_json, to_json, from_csv, and schema_of_json ## What changes were proposed in this pull request? This PR adds `prettyNames` for `from_json`, `to_json`, `from_csv`, and `schema_of_json` so that appropriate names are used. ## How was this patch tested? Unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark minor-prettyNames Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22773.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22773 commit 3447e73989e39d0c052cf69e5a3e80d1ebb221dc Author: hyukjinkwon Date: 2018-10-19T05:28:55Z Add prettyNames for from_json, to_json, from_csv, and schema_of_json --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22263 **[Test build #97590 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97590/testReport)** for PR 22263 at commit [`e2b5dcf`](https://github.com/apache/spark/commit/e2b5dcfc853b6f5608b27c914c397689d09cb267). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22263 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22263 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4116/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536773 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -840,12 +840,19 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } test("create table in default db") { -val catalog = spark.sessionState.catalog -val tableIdent1 = TableIdentifier("tab1", None) -createTable(catalog, tableIdent1) -val expectedTableIdent = tableIdent1.copy(database = Some("default")) -val expectedTable = generateTable(catalog, expectedTableIdent) -checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +var tablePath: Option[URI] = None --- End diff -- `var tablePath: URI = null` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536735 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -207,6 +207,14 @@ class SessionCatalog( "you cannot create a database with this name.") } validateName(dbName) +// SPARK-25464 fail if DB location exists and is not empty +val dbPath = new Path(dbDefinition.locationUri) +val fs = dbPath.getFileSystem(hadoopConf) +if (!externalCatalog.databaseExists(dbName) && fs.exists(dbPath) + && fs.listStatus(dbPath).nonEmpty) { --- End diff -- Should we necessarily list up files? it's potentially expensive. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536626 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -840,12 +840,19 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } test("create table in default db") { -val catalog = spark.sessionState.catalog -val tableIdent1 = TableIdentifier("tab1", None) -createTable(catalog, tableIdent1) -val expectedTableIdent = tableIdent1.copy(database = Some("default")) -val expectedTable = generateTable(catalog, expectedTableIdent) -checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +var tablePath: Option[URI] = None +try { + val catalog = spark.sessionState.catalog + val tableIdent1 = TableIdentifier("tab1", None) + createTable(catalog, tableIdent1) + val expectedTableIdent = tableIdent1.copy(database = Some("default")) + val expectedTable = generateTable(catalog, expectedTableIdent) + tablePath = Some(expectedTable.location) + checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +} finally { + // This is external table,so it is required to deleted the path --- End diff -- @HyukjinKwon The first one is `e,s` -> `e, s` ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536585 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -2370,4 +2370,17 @@ class HiveDDLSuite )) } } + + test("SPARK-25464 create Database with non empty location") { +val dbName = "dbwithcustomlocation" +withTempDir { tmpDir => + val parentDir = tmpDir.getParent + val expectedMsg = s"Cannot create database at location $parentDir because the path is not " + +s"empty." --- End diff -- leading `s` can be removed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536555 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -2370,4 +2370,17 @@ class HiveDDLSuite )) } } + + test("SPARK-25464 create Database with non empty location") { --- End diff -- `create a database with a non-empty location` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22466 **[Test build #97589 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97589/testReport)** for PR 22466 at commit [`d862591`](https://github.com/apache/spark/commit/d862591c20ef1d1536d069dcc4f3220ae232c702). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536442 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -840,12 +840,19 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } test("create table in default db") { -val catalog = spark.sessionState.catalog -val tableIdent1 = TableIdentifier("tab1", None) -createTable(catalog, tableIdent1) -val expectedTableIdent = tableIdent1.copy(database = Some("default")) -val expectedTable = generateTable(catalog, expectedTableIdent) -checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +var tablePath: Option[URI] = None +try { + val catalog = spark.sessionState.catalog + val tableIdent1 = TableIdentifier("tab1", None) + createTable(catalog, tableIdent1) + val expectedTableIdent = tableIdent1.copy(database = Some("default")) + val expectedTable = generateTable(catalog, expectedTableIdent) + tablePath = Some(expectedTable.location) + checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +} finally { + // This is external table,so it is required to deleted the path --- End diff -- `this is an external table` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536456 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -840,12 +840,19 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } test("create table in default db") { -val catalog = spark.sessionState.catalog -val tableIdent1 = TableIdentifier("tab1", None) -createTable(catalog, tableIdent1) -val expectedTableIdent = tableIdent1.copy(database = Some("default")) -val expectedTable = generateTable(catalog, expectedTableIdent) -checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +var tablePath: Option[URI] = None +try { + val catalog = spark.sessionState.catalog + val tableIdent1 = TableIdentifier("tab1", None) + createTable(catalog, tableIdent1) + val expectedTableIdent = tableIdent1.copy(database = Some("default")) + val expectedTable = generateTable(catalog, expectedTableIdent) + tablePath = Some(expectedTable.location) + checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +} finally { + // This is external table,so it is required to deleted the path --- End diff -- `it is required to delete` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536304 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -840,12 +840,19 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } test("create table in default db") { -val catalog = spark.sessionState.catalog -val tableIdent1 = TableIdentifier("tab1", None) -createTable(catalog, tableIdent1) -val expectedTableIdent = tableIdent1.copy(database = Some("default")) -val expectedTable = generateTable(catalog, expectedTableIdent) -checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +var tablePath: Option[URI] = None +try { + val catalog = spark.sessionState.catalog + val tableIdent1 = TableIdentifier("tab1", None) + createTable(catalog, tableIdent1) + val expectedTableIdent = tableIdent1.copy(database = Some("default")) + val expectedTable = generateTable(catalog, expectedTableIdent) + tablePath = Some(expectedTable.location) + checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +} finally { + // This is external table,so it is required to deleted the path --- End diff -- tiny nit: `e,` -> `e ,` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22466 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97588 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97588/testReport)** for PR 22666 at commit [`1b86834`](https://github.com/apache/spark/commit/1b86834c1265992e3b46aaf079e1e17ea7c389c4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r226534798 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -325,6 +325,21 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext assert(isExpectStorageLevel(rddId, Memory)) } + test("SQL interface support storageLevel(Invalid StorageLevel)") { +val message = intercept[IllegalArgumentException] { + sql("CACHE TABLE testData OPTIONS('storageLevel' 'invalid_storage_level')") +}.getMessage +assert(message.contains("Invalid StorageLevel: INVALID_STORAGE_LEVEL")) + } + + test("SQL interface support storageLevel(with LAZY)") { +sql("CACHE LAZY TABLE testData OPTIONS('storageLevel' 'disk_only')") +assertCached(spark.table("testData")) +val rddId = rddIdOf("testData") +sql("SELECT COUNT(*) FROM testData").collect() +assert(isExpectStorageLevel(rddId, Disk)) --- End diff -- Do you think the previously existing `lazy`-related test cases protect this new SQL syntax contribution from future regressions? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22732 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4115/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22732 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22732 **[Test build #97587 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97587/testReport)** for PR 22732 at commit [`cb7e97a`](https://github.com/apache/spark/commit/cb7e97a18c0a8d5d806c70771b2c85a01e5f0df5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22466 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22732 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22732 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97578/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22732 **[Test build #97578 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97578/testReport)** for PR 22732 at commit [`84cb456`](https://github.com/apache/spark/commit/84cb456c18d02f8abb21934191508fca5e58e6e2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22504 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97565/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22504 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22504 **[Test build #97565 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97565/testReport)** for PR 22504 at commit [`a0a85b3`](https://github.com/apache/spark/commit/a0a85b3eff0f115b983ce5ba3214e09f8ee90dd2). * This patch **fails PySpark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97586/testReport)** for PR 22666 at commit [`6cbc7fb`](https://github.com/apache/spark/commit/6cbc7fb45478882c15c6694fff964da043d2445c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22666 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22666 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97585/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97585 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97585/testReport)** for PR 22666 at commit [`c9df3ab`](https://github.com/apache/spark/commit/c9df3ab40f5130cb1c3f7207e1371ddd5fb922fc). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class UnivocityParserSuite extends SparkFunSuite ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97570/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97585 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97585/testReport)** for PR 22666 at commit [`c9df3ab`](https://github.com/apache/spark/commit/c9df3ab40f5130cb1c3f7207e1371ddd5fb922fc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22749 **[Test build #97570 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97570/testReport)** for PR 22749 at commit [`0c78b73`](https://github.com/apache/spark/commit/0c78b73e5abce2a51763c860e43aab214c8634d9). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22772: [SPARK-24499][SQL][DOC][Followup] Fix some broken links
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22772 It's okay. the doc fix was huge and there should likely be some mistakes. I will read it closely too this weekends. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22666 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97584/testReport)** for PR 22666 at commit [`4869b76`](https://github.com/apache/spark/commit/4869b76e4f35b094793ff1f69cce3edbeb922ef1). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22666 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97584/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22666 This is a WIP. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97584/testReport)** for PR 22666 at commit [`4869b76`](https://github.com/apache/spark/commit/4869b76e4f35b094793ff1f69cce3edbeb922ef1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22666 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22746: [SPARK-24499][SQL][DOC] Split the page of sql-programmin...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/22746 Thanks all reviewers! Sorry for still having some mistake in new doc and I'll keep checking on this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22666 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97583/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97583 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97583/testReport)** for PR 22666 at commit [`80d6759`](https://github.com/apache/spark/commit/80d67596e8a0d2c5040816d090c6ff912b76c02c). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97583/testReport)** for PR 22666 at commit [`80d6759`](https://github.com/apache/spark/commit/80d67596e8a0d2c5040816d090c6ff912b76c02c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97569/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22749 **[Test build #97569 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97569/testReport)** for PR 22749 at commit [`b211ed0`](https://github.com/apache/spark/commit/b211ed069dceb33c45cf6caf12c19527334d4ad8). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22666 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97582 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97582/testReport)** for PR 22666 at commit [`cd7e2ab`](https://github.com/apache/spark/commit/cd7e2abf4cea8744f0316fcbc7dafac4918079c7). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22666 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97582/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...
Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/22732#discussion_r226527439 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -932,6 +935,23 @@ trait ScalaReflection { tpe.dealias.erasure.typeSymbol.asClass.fullName } + /** + * Returns the nullability of the input parameter types of the scala function object. + * + * Note that this only works with Scala 2.11, and the information returned may be inaccurate if + * used with a different Scala version. --- End diff -- The argument here is it's not necessarily wrong if using scala 2.12. if all inputs are of boxed types, then it can still be good. I think it's just enough to say "we don't support it. switch to the new interface otherwise we can't guarantee correctness." --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97582 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97582/testReport)** for PR 22666 at commit [`cd7e2ab`](https://github.com/apache/spark/commit/cd7e2abf4cea8744f0316fcbc7dafac4918079c7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22732 **[Test build #97581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97581/testReport)** for PR 22732 at commit [`e848ec7`](https://github.com/apache/spark/commit/e848ec7a2d420c28764ac7319f801666c40684c2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22732 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4114/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22732 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22504 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97560/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22504 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22295 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22504 **[Test build #97560 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97560/testReport)** for PR 22504 at commit [`07d2df8`](https://github.com/apache/spark/commit/07d2df87fddd540637b054b643eb5484c5e58eaf). * This patch **fails from timeout after a configured wait of `400m`**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22295 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97577/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22295 **[Test build #97577 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97577/testReport)** for PR 22295 at commit [`94e3db0`](https://github.com/apache/spark/commit/94e3db0c0c9873daaca688c2a63f01420882692e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...
Github user rezasafi commented on a diff in the pull request: https://github.com/apache/spark/pull/22612#discussion_r226526015 --- Diff: core/src/main/scala/org/apache/spark/executor/ProcfsBasedSystems.scala --- @@ -0,0 +1,226 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.executor + +import java.io._ +import java.nio.charset.Charset +import java.nio.file.{Files, Paths} +import java.util.Locale + +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.{SparkEnv, SparkException} +import org.apache.spark.internal.{config, Logging} +import org.apache.spark.util.Utils + +private[spark] case class ProcfsBasedSystemsMetrics( +jvmVmemTotal: Long, +jvmRSSTotal: Long, +pythonVmemTotal: Long, +pythonRSSTotal: Long, +otherVmemTotal: Long, +otherRSSTotal: Long) + +// Some of the ideas here are taken from the ProcfsBasedProcessTree class in hadoop +// project. +private[spark] class ProcfsBasedSystems(val procfsDir: String = "/proc/") extends Logging { + val procfsStatFile = "stat" + val testing = sys.env.contains("SPARK_TESTING") || sys.props.contains("spark.testing") + var pageSize = computePageSize() + var isAvailable: Boolean = isProcfsAvailable + private val pid = computePid() + private val ptree = mutable.Map[ Int, Set[Int]]() + + var allMetrics: ProcfsBasedSystemsMetrics = ProcfsBasedSystemsMetrics(0, 0, 0, 0, 0, 0) + + computeProcessTree() + + private def isProcfsAvailable: Boolean = { +if (testing) { + return true +} +try { + if (!Files.exists(Paths.get(procfsDir))) { +return false + } +} +catch { + case f: FileNotFoundException => return false +} +val shouldLogStageExecutorMetrics = + SparkEnv.get.conf.get(config.EVENT_LOG_STAGE_EXECUTOR_METRICS) +val shouldLogStageExecutorProcessTreeMetrics = + SparkEnv.get.conf.get(config.EVENT_LOG_PROCESS_TREE_METRICS) +shouldLogStageExecutorProcessTreeMetrics && shouldLogStageExecutorMetrics + } + + private def computePid(): Int = { +if (!isAvailable || testing) { + return -1; +} +try { + // This can be simplified in java9: + // https://docs.oracle.com/javase/9/docs/api/java/lang/ProcessHandle.html + val cmd = Array("bash", "-c", "echo $PPID") + val length = 10 + val out2 = Utils.executeAndGetOutput(cmd) + val pid = Integer.parseInt(out2.split("\n")(0)) + return pid; +} +catch { + case e: SparkException => logDebug("IO Exception when trying to compute process tree." + +" As a result reporting of ProcessTree metrics is stopped", e) +isAvailable = false +return -1 +} + } + + private def computePageSize(): Long = { +if (testing) { + return 0; +} +val cmd = Array("getconf", "PAGESIZE") +val out2 = Utils.executeAndGetOutput(cmd) +return Integer.parseInt(out2.split("\n")(0)) + } + + private def computeProcessTree(): Unit = { +if (!isAvailable || testing) { + return +} +val queue = mutable.Queue.empty[Int] +queue += pid +while( !queue.isEmpty ) { + val p = queue.dequeue() + val c = getChildPids(p) + if(!c.isEmpty) { +queue ++= c +ptree += (p -> c.toSet) + } + else { +ptree += (p -> Set[Int]()) + } +} + } + + private def getChildPids(pid: Int): ArrayBuffer[Int] = { +try { + val cmd = Array("pgrep", "-P", pid.toString) + val builder = new ProcessBuilder("pgrep", "-P", pid.toString) + val process = builder.start() + val output = new StringBuilder() + val threadName = "read stdout for
[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...
Github user rezasafi commented on a diff in the pull request: https://github.com/apache/spark/pull/22612#discussion_r226525661 --- Diff: core/src/main/scala/org/apache/spark/executor/ProcfsBasedSystems.scala --- @@ -0,0 +1,226 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.executor + +import java.io._ +import java.nio.charset.Charset +import java.nio.file.{Files, Paths} +import java.util.Locale + +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.{SparkEnv, SparkException} +import org.apache.spark.internal.{config, Logging} +import org.apache.spark.util.Utils + +private[spark] case class ProcfsBasedSystemsMetrics( +jvmVmemTotal: Long, +jvmRSSTotal: Long, +pythonVmemTotal: Long, +pythonRSSTotal: Long, +otherVmemTotal: Long, +otherRSSTotal: Long) + +// Some of the ideas here are taken from the ProcfsBasedProcessTree class in hadoop +// project. +private[spark] class ProcfsBasedSystems(val procfsDir: String = "/proc/") extends Logging { + val procfsStatFile = "stat" + val testing = sys.env.contains("SPARK_TESTING") || sys.props.contains("spark.testing") + var pageSize = computePageSize() + var isAvailable: Boolean = isProcfsAvailable + private val pid = computePid() + private val ptree = mutable.Map[ Int, Set[Int]]() + + var allMetrics: ProcfsBasedSystemsMetrics = ProcfsBasedSystemsMetrics(0, 0, 0, 0, 0, 0) + + computeProcessTree() + + private def isProcfsAvailable: Boolean = { +if (testing) { + return true +} +try { + if (!Files.exists(Paths.get(procfsDir))) { +return false + } +} +catch { + case f: FileNotFoundException => return false +} +val shouldLogStageExecutorMetrics = + SparkEnv.get.conf.get(config.EVENT_LOG_STAGE_EXECUTOR_METRICS) +val shouldLogStageExecutorProcessTreeMetrics = + SparkEnv.get.conf.get(config.EVENT_LOG_PROCESS_TREE_METRICS) +shouldLogStageExecutorProcessTreeMetrics && shouldLogStageExecutorMetrics + } + + private def computePid(): Int = { +if (!isAvailable || testing) { + return -1; +} +try { + // This can be simplified in java9: + // https://docs.oracle.com/javase/9/docs/api/java/lang/ProcessHandle.html + val cmd = Array("bash", "-c", "echo $PPID") + val length = 10 + val out2 = Utils.executeAndGetOutput(cmd) + val pid = Integer.parseInt(out2.split("\n")(0)) + return pid; +} +catch { + case e: SparkException => logDebug("IO Exception when trying to compute process tree." + --- End diff -- oh it seems there wasn't a mistake here and I jut forgot the reason here. I caught SparkException since executeAndGetOutput may throw such an exception. I will remove the IOException --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97568/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22429 **[Test build #97580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97580/testReport)** for PR 22429 at commit [`9f1d11d`](https://github.com/apache/spark/commit/9f1d11df99ce37959229b2830b89e2a943d638f0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22749 **[Test build #97568 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97568/testReport)** for PR 22749 at commit [`35700f4`](https://github.com/apache/spark/commit/35700f4a0f36fb397ac028a68011a2753c5c2c75). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22429 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22429 I am able to address his comments for his vacation. Please keep reviewing this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 > Inorder to make this flow consistent either > a) we need to record HiveStats for insert command flow and always consider this stats while compting > OR > b) As mentioned above in snapshot we will estimate the data size with files always for convertable relations. Just a suggestion let me know for any thoughts;) Thanks all for your valuable time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22721: [SPARK-25403][SQL] Refreshes the table after inserting t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22721 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22721: [SPARK-25403][SQL] Refreshes the table after inserting t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22721 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4113/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 Inorder to make this flow consistent either a) we need to record HiveStats for insert command flow and always consider this stats while compting OR b) As mentioned above in snapshot we will estimate the data size with files always for convertable relations. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22503 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22503 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 > I think the cost of get the stats from `HadoopFileSystem` may be quite high. Then we shall depend on HiveStats always to get the statistics, which is happening now also but partially. and i think this PR solving that problem, But what i told is based on cloudFans expectation ![image](https://user-images.githubusercontent.com/12999161/47195764-f3874700-d37a-11e8-9b93-e3c1cb228c54.png) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22721: [SPARK-25403][SQL] Refreshes the table after inserting t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22721 **[Test build #97579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97579/testReport)** for PR 22721 at commit [`6c8a73f`](https://github.com/apache/spark/commit/6c8a73f0fe74f618b429dee23869a00e706b125d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...
Github user rezasafi commented on a diff in the pull request: https://github.com/apache/spark/pull/22612#discussion_r226524522 --- Diff: core/src/main/scala/org/apache/spark/executor/ProcfsBasedSystems.scala --- @@ -0,0 +1,226 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.executor + +import java.io._ +import java.nio.charset.Charset +import java.nio.file.{Files, Paths} +import java.util.Locale + +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.{SparkEnv, SparkException} +import org.apache.spark.internal.{config, Logging} +import org.apache.spark.util.Utils + +private[spark] case class ProcfsBasedSystemsMetrics( +jvmVmemTotal: Long, +jvmRSSTotal: Long, +pythonVmemTotal: Long, +pythonRSSTotal: Long, +otherVmemTotal: Long, +otherRSSTotal: Long) + +// Some of the ideas here are taken from the ProcfsBasedProcessTree class in hadoop +// project. +private[spark] class ProcfsBasedSystems(val procfsDir: String = "/proc/") extends Logging { + val procfsStatFile = "stat" + val testing = sys.env.contains("SPARK_TESTING") || sys.props.contains("spark.testing") + var pageSize = computePageSize() + var isAvailable: Boolean = isProcfsAvailable + private val pid = computePid() + private val ptree = mutable.Map[ Int, Set[Int]]() + + var allMetrics: ProcfsBasedSystemsMetrics = ProcfsBasedSystemsMetrics(0, 0, 0, 0, 0, 0) + + computeProcessTree() + + private def isProcfsAvailable: Boolean = { +if (testing) { + return true +} +try { + if (!Files.exists(Paths.get(procfsDir))) { +return false + } +} +catch { + case f: FileNotFoundException => return false +} +val shouldLogStageExecutorMetrics = + SparkEnv.get.conf.get(config.EVENT_LOG_STAGE_EXECUTOR_METRICS) +val shouldLogStageExecutorProcessTreeMetrics = + SparkEnv.get.conf.get(config.EVENT_LOG_PROCESS_TREE_METRICS) +shouldLogStageExecutorProcessTreeMetrics && shouldLogStageExecutorMetrics + } + + private def computePid(): Int = { +if (!isAvailable || testing) { + return -1; +} +try { + // This can be simplified in java9: + // https://docs.oracle.com/javase/9/docs/api/java/lang/ProcessHandle.html + val cmd = Array("bash", "-c", "echo $PPID") + val length = 10 + val out2 = Utils.executeAndGetOutput(cmd) + val pid = Integer.parseInt(out2.split("\n")(0)) + return pid; +} +catch { + case e: SparkException => logDebug("IO Exception when trying to compute process tree." + +" As a result reporting of ProcessTree metrics is stopped", e) +isAvailable = false +return -1 +} + } + + private def computePageSize(): Long = { +if (testing) { + return 0; +} +val cmd = Array("getconf", "PAGESIZE") +val out2 = Utils.executeAndGetOutput(cmd) +return Integer.parseInt(out2.split("\n")(0)) + } + + private def computeProcessTree(): Unit = { +if (!isAvailable || testing) { + return +} +val queue = mutable.Queue.empty[Int] +queue += pid +while( !queue.isEmpty ) { + val p = queue.dequeue() + val c = getChildPids(p) + if(!c.isEmpty) { +queue ++= c +ptree += (p -> c.toSet) + } + else { +ptree += (p -> Set[Int]()) + } +} + } + + private def getChildPids(pid: Int): ArrayBuffer[Int] = { +try { + val cmd = Array("pgrep", "-P", pid.toString) + val builder = new ProcessBuilder("pgrep", "-P", pid.toString) + val process = builder.start() + val output = new StringBuilder() + val threadName = "read stdout for
[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22503#discussion_r226524439 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -220,6 +221,17 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te } } + test("crlf line separators in multiline mode") { --- End diff -- nit: -> `SPARK-25493: crlf line separators in multiline mode` when a PR fixes a specific problem, let's add the jira prefix in the test name next time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...
Github user rezasafi commented on a diff in the pull request: https://github.com/apache/spark/pull/22612#discussion_r226524183 --- Diff: core/src/main/scala/org/apache/spark/executor/ProcfsBasedSystems.scala --- @@ -0,0 +1,226 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.executor + +import java.io._ +import java.nio.charset.Charset +import java.nio.file.{Files, Paths} +import java.util.Locale + +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.{SparkEnv, SparkException} +import org.apache.spark.internal.{config, Logging} +import org.apache.spark.util.Utils + +private[spark] case class ProcfsBasedSystemsMetrics( +jvmVmemTotal: Long, +jvmRSSTotal: Long, +pythonVmemTotal: Long, +pythonRSSTotal: Long, +otherVmemTotal: Long, +otherRSSTotal: Long) + +// Some of the ideas here are taken from the ProcfsBasedProcessTree class in hadoop +// project. +private[spark] class ProcfsBasedSystems(val procfsDir: String = "/proc/") extends Logging { + val procfsStatFile = "stat" + val testing = sys.env.contains("SPARK_TESTING") || sys.props.contains("spark.testing") + var pageSize = computePageSize() + var isAvailable: Boolean = isProcfsAvailable + private val pid = computePid() + private val ptree = mutable.Map[ Int, Set[Int]]() + + var allMetrics: ProcfsBasedSystemsMetrics = ProcfsBasedSystemsMetrics(0, 0, 0, 0, 0, 0) + + computeProcessTree() + + private def isProcfsAvailable: Boolean = { +if (testing) { + return true +} +try { + if (!Files.exists(Paths.get(procfsDir))) { +return false + } +} +catch { + case f: FileNotFoundException => return false +} +val shouldLogStageExecutorMetrics = + SparkEnv.get.conf.get(config.EVENT_LOG_STAGE_EXECUTOR_METRICS) +val shouldLogStageExecutorProcessTreeMetrics = + SparkEnv.get.conf.get(config.EVENT_LOG_PROCESS_TREE_METRICS) +shouldLogStageExecutorProcessTreeMetrics && shouldLogStageExecutorMetrics + } + + private def computePid(): Int = { +if (!isAvailable || testing) { + return -1; +} +try { + // This can be simplified in java9: + // https://docs.oracle.com/javase/9/docs/api/java/lang/ProcessHandle.html + val cmd = Array("bash", "-c", "echo $PPID") + val length = 10 + val out2 = Utils.executeAndGetOutput(cmd) + val pid = Integer.parseInt(out2.split("\n")(0)) + return pid; +} +catch { + case e: SparkException => logDebug("IO Exception when trying to compute process tree." + +" As a result reporting of ProcessTree metrics is stopped", e) +isAvailable = false +return -1 +} + } + + private def computePageSize(): Long = { +if (testing) { + return 0; +} +val cmd = Array("getconf", "PAGESIZE") +val out2 = Utils.executeAndGetOutput(cmd) +return Integer.parseInt(out2.split("\n")(0)) --- End diff -- yes, will fix it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...
Github user rezasafi commented on a diff in the pull request: https://github.com/apache/spark/pull/22612#discussion_r226524080 --- Diff: core/src/main/scala/org/apache/spark/executor/ProcfsBasedSystems.scala --- @@ -0,0 +1,226 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.executor + +import java.io._ +import java.nio.charset.Charset +import java.nio.file.{Files, Paths} +import java.util.Locale + +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.{SparkEnv, SparkException} +import org.apache.spark.internal.{config, Logging} +import org.apache.spark.util.Utils + +private[spark] case class ProcfsBasedSystemsMetrics( +jvmVmemTotal: Long, +jvmRSSTotal: Long, +pythonVmemTotal: Long, +pythonRSSTotal: Long, +otherVmemTotal: Long, +otherRSSTotal: Long) + +// Some of the ideas here are taken from the ProcfsBasedProcessTree class in hadoop +// project. +private[spark] class ProcfsBasedSystems(val procfsDir: String = "/proc/") extends Logging { + val procfsStatFile = "stat" + val testing = sys.env.contains("SPARK_TESTING") || sys.props.contains("spark.testing") + var pageSize = computePageSize() + var isAvailable: Boolean = isProcfsAvailable + private val pid = computePid() + private val ptree = mutable.Map[ Int, Set[Int]]() + + var allMetrics: ProcfsBasedSystemsMetrics = ProcfsBasedSystemsMetrics(0, 0, 0, 0, 0, 0) + + computeProcessTree() + + private def isProcfsAvailable: Boolean = { +if (testing) { + return true +} +try { + if (!Files.exists(Paths.get(procfsDir))) { +return false + } +} +catch { + case f: FileNotFoundException => return false +} +val shouldLogStageExecutorMetrics = + SparkEnv.get.conf.get(config.EVENT_LOG_STAGE_EXECUTOR_METRICS) +val shouldLogStageExecutorProcessTreeMetrics = + SparkEnv.get.conf.get(config.EVENT_LOG_PROCESS_TREE_METRICS) +shouldLogStageExecutorProcessTreeMetrics && shouldLogStageExecutorMetrics + } + + private def computePid(): Int = { +if (!isAvailable || testing) { + return -1; +} +try { + // This can be simplified in java9: + // https://docs.oracle.com/javase/9/docs/api/java/lang/ProcessHandle.html + val cmd = Array("bash", "-c", "echo $PPID") + val length = 10 + val out2 = Utils.executeAndGetOutput(cmd) + val pid = Integer.parseInt(out2.split("\n")(0)) + return pid; +} +catch { + case e: SparkException => logDebug("IO Exception when trying to compute process tree." + --- End diff -- Let me double check I thought there was a comment before that I should just get SparkException, but you are right. it doesn't make sense. Probably a mistake on my side. I was just caring about IOException here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...
Github user rezasafi commented on a diff in the pull request: https://github.com/apache/spark/pull/22612#discussion_r226523830 --- Diff: core/src/main/scala/org/apache/spark/executor/ProcfsBasedSystems.scala --- @@ -0,0 +1,266 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.executor + +import java.io._ +import java.nio.charset.Charset +import java.nio.file.{Files, Paths} +import java.util.Locale + +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer +import scala.collection.mutable.Queue + +import org.apache.spark.SparkEnv +import org.apache.spark.internal.{config, Logging} + +private[spark] case class ProcfsBasedSystemsMetrics( +jvmVmemTotal: Long, +jvmRSSTotal: Long, +pythonVmemTotal: Long, +pythonRSSTotal: Long, +otherVmemTotal: Long, +otherRSSTotal: Long) + +// Some of the ideas here are taken from the ProcfsBasedProcessTree class in hadoop +// project. +private[spark] class ProcfsBasedSystems(val procfsDir: String = "/proc/") extends Logging { + val procfsStatFile = "stat" + var pageSize = computePageSize() + var isAvailable: Boolean = isProcfsAvailable + private val pid = computePid() + private val ptree = mutable.Map[ Int, Set[Int]]() + + var allMetrics: ProcfsBasedSystemsMetrics = ProcfsBasedSystemsMetrics(0, 0, 0, 0, 0, 0) + private var latestJVMVmemTotal = 0L + private var latestJVMRSSTotal = 0L + private var latestPythonVmemTotal = 0L + private var latestPythonRSSTotal = 0L + private var latestOtherVmemTotal = 0L + private var latestOtherRSSTotal = 0L + + computeProcessTree() + + private def isProcfsAvailable: Boolean = { --- End diff -- I had a test case that this can change depending on the health of the node. I think it shouldn't be a val to be cautious. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/22758 I think the cost of get the stats from `HadoopFileSystem` may be quite high. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...
Github user rezasafi commented on a diff in the pull request: https://github.com/apache/spark/pull/22612#discussion_r226523734 --- Diff: core/src/main/scala/org/apache/spark/Heartbeater.scala --- @@ -59,12 +60,13 @@ private[spark] class Heartbeater( heartbeater.awaitTermination(10, TimeUnit.SECONDS) } - /** - * Get the current executor level metrics. These are returned as an array, with the index - * determined by MetricGetter.values - */ + /** Get the current executor level metrics. These are returned as a Map */ def getCurrentMetrics(): ExecutorMetrics = { -val metrics = ExecutorMetricType.values.map(_.getMetricValue(memoryManager)).toArray +// figure out how to append all the metrics --- End diff -- No the commit should have been removed. will fix it. Thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22504 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22504 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97566/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22732 **[Test build #97578 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97578/testReport)** for PR 22732 at commit [`84cb456`](https://github.com/apache/spark/commit/84cb456c18d02f8abb21934191508fca5e58e6e2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22732 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22732 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4112/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22504 **[Test build #97566 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97566/testReport)** for PR 22504 at commit [`65c0800`](https://github.com/apache/spark/commit/65c080032bbea82024f1ab14cb43c771f6157fc4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 @cloud-fan I can think as one solution, that In DetermineStats flow we can add one more condition to not update the stats for convertable relations, since we always get the stats from HadoopFileSystem for convertable relations . this shall solve all the problems which we are facing . please let me know for your suggestions as i will update this PR as per this logic. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22764: [SPARK-25765][ML] Add training cost to BisectingK...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22764#discussion_r226520698 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala --- @@ -225,13 +227,14 @@ object BisectingKMeansModel extends Loader[BisectingKMeansModel] { assert(formatVersion == thisFormatVersion) val rootId = (metadata \ "rootId").extract[Int] val distanceMeasure = (metadata \ "distanceMeasure").extract[String] + val trainingCost = (metadata \ "trainingCost").extract[Double] --- End diff -- - Could you avoid modifying loading model code in "mllib" package, but modifying code in "ml" package, i.e., the class `ml.clustering.BisectingKMeansModel.BisectingKMeansModelReader`, you can reference the `KMeans` code: `ml.clustering.KMeansModel.KMeansModelReader`. - And, +1 with @viirya mentioned, we should keep model loading compatibility, add a version check (when >= 2.4) then we load "training cost" . Note that add these in `ml.clustering.BisectingKMeansModel.BisectingKMeansModelReader`. - And, could you also add version check (when >= 2.4) then we load "training cost" into `ml.clustering.KMeansModel.KMeansModelReader` ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org