[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226548140 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -207,6 +207,14 @@ class SessionCatalog( "you cannot create a database with this name.") } validateName(dbName) +// SPARK-25464 fail if DB location exists and is not empty +val dbPath = new Path(dbDefinition.locationUri) +val fs = dbPath.getFileSystem(hadoopConf) +if (!externalCatalog.databaseExists(dbName) && fs.exists(dbPath) + && fs.listStatus(dbPath).nonEmpty) { --- End diff -- listing files is expensive, but create database command is not frequent. Same is mentioned here https://github.com/apache/spark/pull/22466#discussion_r220795973 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22666 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22666 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97586/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97586/testReport)** for PR 22666 at commit [`6cbc7fb`](https://github.com/apache/spark/commit/6cbc7fb45478882c15c6694fff964da043d2445c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97595 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97595/testReport)** for PR 22666 at commit [`8763494`](https://github.com/apache/spark/commit/876349476f2a36e66fa94bb3d4e19b7acd2882a7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22732 For Scala 2.11, we should not introduce any behavior change and also keep binary and source compatibility. ``` scala> import org.apache.spark.sql.types.DataTypes import org.apache.spark.sql.types.DataTypes scala> val f2 = udf({(x: Int) => x}, DataTypes.IntegerType) f2: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(,IntegerType,None) scala> spark.range(3).select(f2('id + null)).show() ++ |UDF((id + null))| ++ |null| |null| |null| ++ ``` For Scala 2.12, since we are unable to know the type nullability in a few APIs, we issue a warning message in these cases. Below is the example which will generate a different answer: ``` scala> import org.apache.spark.sql.types.DataTypes import org.apache.spark.sql.types.DataTypes scala> val f2 = udf({(x: Int) => x}, DataTypes.IntegerType) f2: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction($Lambda$2801/26868055@5eb35a26,IntegerType,None) scala> spark.range(3).select(f2('id + null)).show() 18/10/18 23:07:02 WARN ScalaReflection: Scala version 2.12.7 cannot get type nullability correctly via reflection, thus Spark cannot add proper input null check for UDF. To avoid this problem, use the typed UDF interfaces instead. ++ |UDF((id + null))| ++ | 0| | 0| | 0| ++ ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22576: [SPARK-25560][SQL] Allow FunctionInjection in SparkExten...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22576 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22576: [SPARK-25560][SQL] Allow FunctionInjection in SparkExten...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22576 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97576/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22576: [SPARK-25560][SQL] Allow FunctionInjection in SparkExten...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22576 **[Test build #97576 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97576/testReport)** for PR 22576 at commit [`32e0a78`](https://github.com/apache/spark/commit/32e0a78c712094b16d9af7fcacb0b045186d3550). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22732 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22732 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97573/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22732 **[Test build #97573 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97573/testReport)** for PR 22732 at commit [`cf8c457`](https://github.com/apache/spark/commit/cf8c4573b0c69bf4ac5e83dd54c3ce129ff2b329). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * `trait ScalaReflection extends Logging ` * `// TODO: make sure this class is only instantiated through `SparkUserDefinedFunction.create()`` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22732 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97574/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22732 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22732 **[Test build #97574 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97574/testReport)** for PR 22732 at commit [`7a6b2e1`](https://github.com/apache/spark/commit/7a6b2e16a5f45da399da75ee15eb7d81fd8443cf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22761: [MINOR][DOC] Spacing items in migration guide for...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22761 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97594 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97594/testReport)** for PR 22666 at commit [`1e90261`](https://github.com/apache/spark/commit/1e90261f964129efc605ed77433477715078745c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22761: [MINOR][DOC] Spacing items in migration guide for readab...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22761 Merged to master and branch-2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97593 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97593/testReport)** for PR 22666 at commit [`aead783`](https://github.com/apache/spark/commit/aead783d895069b1b6781928eb0afda740085a21). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22766: [SPARK-25768][SQL] fix constant argument expecting UDAFs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22766 **[Test build #97592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97592/testReport)** for PR 22766 at commit [`6e6eca4`](https://github.com/apache/spark/commit/6e6eca400ba4040d5001e26b20d7815ed2a0c2f4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22773: [MINOR][SQL] Add prettyNames for from_json, to_json, fro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22773 **[Test build #97591 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97591/testReport)** for PR 22773 at commit [`3447e73`](https://github.com/apache/spark/commit/3447e73989e39d0c052cf69e5a3e80d1ebb221dc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22773: [MINOR][SQL] Add prettyNames for from_json, to_json, fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22773 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22773: [MINOR][SQL] Add prettyNames for from_json, to_json, fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22773 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4117/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22773: [MINOR][SQL] Add prettyNames for from_json, to_js...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/22773 [MINOR][SQL] Add prettyNames for from_json, to_json, from_csv, and schema_of_json ## What changes were proposed in this pull request? This PR adds `prettyNames` for `from_json`, `to_json`, `from_csv`, and `schema_of_json` so that appropriate names are used. ## How was this patch tested? Unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark minor-prettyNames Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22773.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22773 commit 3447e73989e39d0c052cf69e5a3e80d1ebb221dc Author: hyukjinkwon Date: 2018-10-19T05:28:55Z Add prettyNames for from_json, to_json, from_csv, and schema_of_json --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22263 **[Test build #97590 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97590/testReport)** for PR 22263 at commit [`e2b5dcf`](https://github.com/apache/spark/commit/e2b5dcfc853b6f5608b27c914c397689d09cb267). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22263 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22263 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4116/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536773 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -840,12 +840,19 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } test("create table in default db") { -val catalog = spark.sessionState.catalog -val tableIdent1 = TableIdentifier("tab1", None) -createTable(catalog, tableIdent1) -val expectedTableIdent = tableIdent1.copy(database = Some("default")) -val expectedTable = generateTable(catalog, expectedTableIdent) -checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +var tablePath: Option[URI] = None --- End diff -- `var tablePath: URI = null` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536735 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -207,6 +207,14 @@ class SessionCatalog( "you cannot create a database with this name.") } validateName(dbName) +// SPARK-25464 fail if DB location exists and is not empty +val dbPath = new Path(dbDefinition.locationUri) +val fs = dbPath.getFileSystem(hadoopConf) +if (!externalCatalog.databaseExists(dbName) && fs.exists(dbPath) + && fs.listStatus(dbPath).nonEmpty) { --- End diff -- Should we necessarily list up files? it's potentially expensive. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536626 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -840,12 +840,19 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } test("create table in default db") { -val catalog = spark.sessionState.catalog -val tableIdent1 = TableIdentifier("tab1", None) -createTable(catalog, tableIdent1) -val expectedTableIdent = tableIdent1.copy(database = Some("default")) -val expectedTable = generateTable(catalog, expectedTableIdent) -checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +var tablePath: Option[URI] = None +try { + val catalog = spark.sessionState.catalog + val tableIdent1 = TableIdentifier("tab1", None) + createTable(catalog, tableIdent1) + val expectedTableIdent = tableIdent1.copy(database = Some("default")) + val expectedTable = generateTable(catalog, expectedTableIdent) + tablePath = Some(expectedTable.location) + checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +} finally { + // This is external table,so it is required to deleted the path --- End diff -- @HyukjinKwon The first one is `e,s` -> `e, s` ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536585 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -2370,4 +2370,17 @@ class HiveDDLSuite )) } } + + test("SPARK-25464 create Database with non empty location") { +val dbName = "dbwithcustomlocation" +withTempDir { tmpDir => + val parentDir = tmpDir.getParent + val expectedMsg = s"Cannot create database at location $parentDir because the path is not " + +s"empty." --- End diff -- leading `s` can be removed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22466 **[Test build #97589 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97589/testReport)** for PR 22466 at commit [`d862591`](https://github.com/apache/spark/commit/d862591c20ef1d1536d069dcc4f3220ae232c702). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536555 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -2370,4 +2370,17 @@ class HiveDDLSuite )) } } + + test("SPARK-25464 create Database with non empty location") { --- End diff -- `create a database with a non-empty location` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536442 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -840,12 +840,19 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } test("create table in default db") { -val catalog = spark.sessionState.catalog -val tableIdent1 = TableIdentifier("tab1", None) -createTable(catalog, tableIdent1) -val expectedTableIdent = tableIdent1.copy(database = Some("default")) -val expectedTable = generateTable(catalog, expectedTableIdent) -checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +var tablePath: Option[URI] = None +try { + val catalog = spark.sessionState.catalog + val tableIdent1 = TableIdentifier("tab1", None) + createTable(catalog, tableIdent1) + val expectedTableIdent = tableIdent1.copy(database = Some("default")) + val expectedTable = generateTable(catalog, expectedTableIdent) + tablePath = Some(expectedTable.location) + checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +} finally { + // This is external table,so it is required to deleted the path --- End diff -- `this is an external table` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536456 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -840,12 +840,19 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } test("create table in default db") { -val catalog = spark.sessionState.catalog -val tableIdent1 = TableIdentifier("tab1", None) -createTable(catalog, tableIdent1) -val expectedTableIdent = tableIdent1.copy(database = Some("default")) -val expectedTable = generateTable(catalog, expectedTableIdent) -checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +var tablePath: Option[URI] = None +try { + val catalog = spark.sessionState.catalog + val tableIdent1 = TableIdentifier("tab1", None) + createTable(catalog, tableIdent1) + val expectedTableIdent = tableIdent1.copy(database = Some("default")) + val expectedTable = generateTable(catalog, expectedTableIdent) + tablePath = Some(expectedTable.location) + checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +} finally { + // This is external table,so it is required to deleted the path --- End diff -- `it is required to delete` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536304 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -840,12 +840,19 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } test("create table in default db") { -val catalog = spark.sessionState.catalog -val tableIdent1 = TableIdentifier("tab1", None) -createTable(catalog, tableIdent1) -val expectedTableIdent = tableIdent1.copy(database = Some("default")) -val expectedTable = generateTable(catalog, expectedTableIdent) -checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +var tablePath: Option[URI] = None +try { + val catalog = spark.sessionState.catalog + val tableIdent1 = TableIdentifier("tab1", None) + createTable(catalog, tableIdent1) + val expectedTableIdent = tableIdent1.copy(database = Some("default")) + val expectedTable = generateTable(catalog, expectedTableIdent) + tablePath = Some(expectedTable.location) + checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +} finally { + // This is external table,so it is required to deleted the path --- End diff -- tiny nit: `e,` -> `e ,` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22466 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97588 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97588/testReport)** for PR 22666 at commit [`1b86834`](https://github.com/apache/spark/commit/1b86834c1265992e3b46aaf079e1e17ea7c389c4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r226534798 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -325,6 +325,21 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext assert(isExpectStorageLevel(rddId, Memory)) } + test("SQL interface support storageLevel(Invalid StorageLevel)") { +val message = intercept[IllegalArgumentException] { + sql("CACHE TABLE testData OPTIONS('storageLevel' 'invalid_storage_level')") +}.getMessage +assert(message.contains("Invalid StorageLevel: INVALID_STORAGE_LEVEL")) + } + + test("SQL interface support storageLevel(with LAZY)") { +sql("CACHE LAZY TABLE testData OPTIONS('storageLevel' 'disk_only')") +assertCached(spark.table("testData")) +val rddId = rddIdOf("testData") +sql("SELECT COUNT(*) FROM testData").collect() +assert(isExpectStorageLevel(rddId, Disk)) --- End diff -- Do you think the previously existing `lazy`-related test cases protect this new SQL syntax contribution from future regressions? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22732 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4115/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22732 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22732 **[Test build #97587 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97587/testReport)** for PR 22732 at commit [`cb7e97a`](https://github.com/apache/spark/commit/cb7e97a18c0a8d5d806c70771b2c85a01e5f0df5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22466 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22732 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22732 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97578/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22732 **[Test build #97578 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97578/testReport)** for PR 22732 at commit [`84cb456`](https://github.com/apache/spark/commit/84cb456c18d02f8abb21934191508fca5e58e6e2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22504 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97565/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22504 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22504 **[Test build #97565 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97565/testReport)** for PR 22504 at commit [`a0a85b3`](https://github.com/apache/spark/commit/a0a85b3eff0f115b983ce5ba3214e09f8ee90dd2). * This patch **fails PySpark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97586/testReport)** for PR 22666 at commit [`6cbc7fb`](https://github.com/apache/spark/commit/6cbc7fb45478882c15c6694fff964da043d2445c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22666 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97585 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97585/testReport)** for PR 22666 at commit [`c9df3ab`](https://github.com/apache/spark/commit/c9df3ab40f5130cb1c3f7207e1371ddd5fb922fc). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class UnivocityParserSuite extends SparkFunSuite ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22666 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97585/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97570/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97585 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97585/testReport)** for PR 22666 at commit [`c9df3ab`](https://github.com/apache/spark/commit/c9df3ab40f5130cb1c3f7207e1371ddd5fb922fc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22749 **[Test build #97570 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97570/testReport)** for PR 22749 at commit [`0c78b73`](https://github.com/apache/spark/commit/0c78b73e5abce2a51763c860e43aab214c8634d9). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22772: [SPARK-24499][SQL][DOC][Followup] Fix some broken links
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22772 It's okay. the doc fix was huge and there should likely be some mistakes. I will read it closely too this weekends. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22666 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97584/testReport)** for PR 22666 at commit [`4869b76`](https://github.com/apache/spark/commit/4869b76e4f35b094793ff1f69cce3edbeb922ef1). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22666 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97584/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22666 This is a WIP. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97584/testReport)** for PR 22666 at commit [`4869b76`](https://github.com/apache/spark/commit/4869b76e4f35b094793ff1f69cce3edbeb922ef1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22666 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22746: [SPARK-24499][SQL][DOC] Split the page of sql-programmin...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/22746 Thanks all reviewers! Sorry for still having some mistake in new doc and I'll keep checking on this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22666 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97583/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97583 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97583/testReport)** for PR 22666 at commit [`80d6759`](https://github.com/apache/spark/commit/80d67596e8a0d2c5040816d090c6ff912b76c02c). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97583/testReport)** for PR 22666 at commit [`80d6759`](https://github.com/apache/spark/commit/80d67596e8a0d2c5040816d090c6ff912b76c02c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97569/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22666 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22749 **[Test build #97569 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97569/testReport)** for PR 22749 at commit [`b211ed0`](https://github.com/apache/spark/commit/b211ed069dceb33c45cf6caf12c19527334d4ad8). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22666 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97582/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97582 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97582/testReport)** for PR 22666 at commit [`cd7e2ab`](https://github.com/apache/spark/commit/cd7e2abf4cea8744f0316fcbc7dafac4918079c7). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...
Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/22732#discussion_r226527439 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -932,6 +935,23 @@ trait ScalaReflection { tpe.dealias.erasure.typeSymbol.asClass.fullName } + /** + * Returns the nullability of the input parameter types of the scala function object. + * + * Note that this only works with Scala 2.11, and the information returned may be inaccurate if + * used with a different Scala version. --- End diff -- The argument here is it's not necessarily wrong if using scala 2.12. if all inputs are of boxed types, then it can still be good. I think it's just enough to say "we don't support it. switch to the new interface otherwise we can't guarantee correctness." --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22666 **[Test build #97582 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97582/testReport)** for PR 22666 at commit [`cd7e2ab`](https://github.com/apache/spark/commit/cd7e2abf4cea8744f0316fcbc7dafac4918079c7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22732 **[Test build #97581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97581/testReport)** for PR 22732 at commit [`e848ec7`](https://github.com/apache/spark/commit/e848ec7a2d420c28764ac7319f801666c40684c2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22732 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4114/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22732 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22504 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97560/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22504 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22295 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22504 **[Test build #97560 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97560/testReport)** for PR 22504 at commit [`07d2df8`](https://github.com/apache/spark/commit/07d2df87fddd540637b054b643eb5484c5e58eaf). * This patch **fails from timeout after a configured wait of `400m`**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22295 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97577/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22295 **[Test build #97577 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97577/testReport)** for PR 22295 at commit [`94e3db0`](https://github.com/apache/spark/commit/94e3db0c0c9873daaca688c2a63f01420882692e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...
Github user rezasafi commented on a diff in the pull request: https://github.com/apache/spark/pull/22612#discussion_r226526015 --- Diff: core/src/main/scala/org/apache/spark/executor/ProcfsBasedSystems.scala --- @@ -0,0 +1,226 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.executor + +import java.io._ +import java.nio.charset.Charset +import java.nio.file.{Files, Paths} +import java.util.Locale + +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.{SparkEnv, SparkException} +import org.apache.spark.internal.{config, Logging} +import org.apache.spark.util.Utils + +private[spark] case class ProcfsBasedSystemsMetrics( +jvmVmemTotal: Long, +jvmRSSTotal: Long, +pythonVmemTotal: Long, +pythonRSSTotal: Long, +otherVmemTotal: Long, +otherRSSTotal: Long) + +// Some of the ideas here are taken from the ProcfsBasedProcessTree class in hadoop +// project. +private[spark] class ProcfsBasedSystems(val procfsDir: String = "/proc/") extends Logging { + val procfsStatFile = "stat" + val testing = sys.env.contains("SPARK_TESTING") || sys.props.contains("spark.testing") + var pageSize = computePageSize() + var isAvailable: Boolean = isProcfsAvailable + private val pid = computePid() + private val ptree = mutable.Map[ Int, Set[Int]]() + + var allMetrics: ProcfsBasedSystemsMetrics = ProcfsBasedSystemsMetrics(0, 0, 0, 0, 0, 0) + + computeProcessTree() + + private def isProcfsAvailable: Boolean = { +if (testing) { + return true +} +try { + if (!Files.exists(Paths.get(procfsDir))) { +return false + } +} +catch { + case f: FileNotFoundException => return false +} +val shouldLogStageExecutorMetrics = + SparkEnv.get.conf.get(config.EVENT_LOG_STAGE_EXECUTOR_METRICS) +val shouldLogStageExecutorProcessTreeMetrics = + SparkEnv.get.conf.get(config.EVENT_LOG_PROCESS_TREE_METRICS) +shouldLogStageExecutorProcessTreeMetrics && shouldLogStageExecutorMetrics + } + + private def computePid(): Int = { +if (!isAvailable || testing) { + return -1; +} +try { + // This can be simplified in java9: + // https://docs.oracle.com/javase/9/docs/api/java/lang/ProcessHandle.html + val cmd = Array("bash", "-c", "echo $PPID") + val length = 10 + val out2 = Utils.executeAndGetOutput(cmd) + val pid = Integer.parseInt(out2.split("\n")(0)) + return pid; +} +catch { + case e: SparkException => logDebug("IO Exception when trying to compute process tree." + +" As a result reporting of ProcessTree metrics is stopped", e) +isAvailable = false +return -1 +} + } + + private def computePageSize(): Long = { +if (testing) { + return 0; +} +val cmd = Array("getconf", "PAGESIZE") +val out2 = Utils.executeAndGetOutput(cmd) +return Integer.parseInt(out2.split("\n")(0)) + } + + private def computeProcessTree(): Unit = { +if (!isAvailable || testing) { + return +} +val queue = mutable.Queue.empty[Int] +queue += pid +while( !queue.isEmpty ) { + val p = queue.dequeue() + val c = getChildPids(p) + if(!c.isEmpty) { +queue ++= c +ptree += (p -> c.toSet) + } + else { +ptree += (p -> Set[Int]()) + } +} + } + + private def getChildPids(pid: Int): ArrayBuffer[Int] = { +try { + val cmd = Array("pgrep", "-P", pid.toString) + val builder = new ProcessBuilder("pgrep", "-P", pid.toString) + val process = builder.start() + val output = new StringBuilder() + val threadName = "read stdout for "
[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...
Github user rezasafi commented on a diff in the pull request: https://github.com/apache/spark/pull/22612#discussion_r226525661 --- Diff: core/src/main/scala/org/apache/spark/executor/ProcfsBasedSystems.scala --- @@ -0,0 +1,226 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.executor + +import java.io._ +import java.nio.charset.Charset +import java.nio.file.{Files, Paths} +import java.util.Locale + +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.{SparkEnv, SparkException} +import org.apache.spark.internal.{config, Logging} +import org.apache.spark.util.Utils + +private[spark] case class ProcfsBasedSystemsMetrics( +jvmVmemTotal: Long, +jvmRSSTotal: Long, +pythonVmemTotal: Long, +pythonRSSTotal: Long, +otherVmemTotal: Long, +otherRSSTotal: Long) + +// Some of the ideas here are taken from the ProcfsBasedProcessTree class in hadoop +// project. +private[spark] class ProcfsBasedSystems(val procfsDir: String = "/proc/") extends Logging { + val procfsStatFile = "stat" + val testing = sys.env.contains("SPARK_TESTING") || sys.props.contains("spark.testing") + var pageSize = computePageSize() + var isAvailable: Boolean = isProcfsAvailable + private val pid = computePid() + private val ptree = mutable.Map[ Int, Set[Int]]() + + var allMetrics: ProcfsBasedSystemsMetrics = ProcfsBasedSystemsMetrics(0, 0, 0, 0, 0, 0) + + computeProcessTree() + + private def isProcfsAvailable: Boolean = { +if (testing) { + return true +} +try { + if (!Files.exists(Paths.get(procfsDir))) { +return false + } +} +catch { + case f: FileNotFoundException => return false +} +val shouldLogStageExecutorMetrics = + SparkEnv.get.conf.get(config.EVENT_LOG_STAGE_EXECUTOR_METRICS) +val shouldLogStageExecutorProcessTreeMetrics = + SparkEnv.get.conf.get(config.EVENT_LOG_PROCESS_TREE_METRICS) +shouldLogStageExecutorProcessTreeMetrics && shouldLogStageExecutorMetrics + } + + private def computePid(): Int = { +if (!isAvailable || testing) { + return -1; +} +try { + // This can be simplified in java9: + // https://docs.oracle.com/javase/9/docs/api/java/lang/ProcessHandle.html + val cmd = Array("bash", "-c", "echo $PPID") + val length = 10 + val out2 = Utils.executeAndGetOutput(cmd) + val pid = Integer.parseInt(out2.split("\n")(0)) + return pid; +} +catch { + case e: SparkException => logDebug("IO Exception when trying to compute process tree." + --- End diff -- oh it seems there wasn't a mistake here and I jut forgot the reason here. I caught SparkException since executeAndGetOutput may throw such an exception. I will remove the IOException --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97568/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22429 **[Test build #97580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97580/testReport)** for PR 22429 at commit [`9f1d11d`](https://github.com/apache/spark/commit/9f1d11df99ce37959229b2830b89e2a943d638f0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22749 **[Test build #97568 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97568/testReport)** for PR 22749 at commit [`35700f4`](https://github.com/apache/spark/commit/35700f4a0f36fb397ac028a68011a2753c5c2c75). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22429 I am able to address his comments for his vacation. Please keep reviewing this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22429 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 > Inorder to make this flow consistent either > a) we need to record HiveStats for insert command flow and always consider this stats while compting > OR > b) As mentioned above in snapshot we will estimate the data size with files always for convertable relations. Just a suggestion let me know for any thoughts;) Thanks all for your valuable time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22721: [SPARK-25403][SQL] Refreshes the table after inserting t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22721 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22721: [SPARK-25403][SQL] Refreshes the table after inserting t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22721 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4113/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 Inorder to make this flow consistent either a) we need to record HiveStats for insert command flow and always consider this stats while compting OR b) As mentioned above in snapshot we will estimate the data size with files always for convertable relations. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22503 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 > I think the cost of get the stats from `HadoopFileSystem` may be quite high. Then we shall depend on HiveStats always to get the statistics, which is happening now also but partially. and i think this PR solving that problem, But what i told is based on cloudFans expectation ![image](https://user-images.githubusercontent.com/12999161/47195764-f3874700-d37a-11e8-9b93-e3c1cb228c54.png) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22503 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org