[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16910 **[Test build #72808 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72808/testReport)** for PR 16910 at commit [`cb98375`](https://github.com/apache/spark/commit/cb983756f7fb270c545f90a98d03e0db3ccc0bd9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16870: [SPARK-19496][SQL]to_date udf to return null when input ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16870 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72799/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16870: [SPARK-19496][SQL]to_date udf to return null when input ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16870 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16910: [SPARK-19575][SQL]Reading from or writing to a hi...
GitHub user windpiger opened a pull request: https://github.com/apache/spark/pull/16910 [SPARK-19575][SQL]Reading from or writing to a hive serde table with a non pre-existing location should succeed ## What changes were proposed in this pull request? This PR is a folllowup work from [SPARK-19329](https://issues.apache.org/jira/browse/SPARK-19329), which has unify the action when we reading from or writing to a datasource table with a non pre-existing locaiton, so here we should also unify the hive serde tables. Currently when we select from a hive serde table which has a non pre-existing location will throw an exception: ``` Input path does not exist: file:/tmp/spark-37caa4e6-5a6a-4361-a905-06cc56afb274 org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/tmp/spark-37caa4e6-5a6a-4361-a905-06cc56afb274 at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2080) at org.apache.spark.rdd.RDD.count(RDD.scala:1157) at org.apache.spark.sql.QueryTest$.checkAnswer(QueryTest.scala:258) ``` ## How was this patch tested? unit tests added You can merge this pull request into a Git repository by running: $ git pull https://github.com/windpiger/spark selectHiveFromNotExistLocation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16910.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16910 commit cb983756f7fb270c545f90a98d03e0db3ccc0bd9 Author: windpiger Date: 2017-02-13T07:50:55Z [SPARK-19575][SQL]Reading from or writing to a hive serde table with a non pre-existing location should succeed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16870: [SPARK-19496][SQL]to_date udf to return null when input ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16870 **[Test build #72799 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72799/testReport)** for PR 16870 at commit [`3b1cfd4`](https://github.com/apache/spark/commit/3b1cfd41ba6171633a85f42482391c1c7d25182e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16891: [SPARK-19318][SQL] Fix to treat JDBC connection properti...
Github user sureshthalamati commented on the issue: https://github.com/apache/spark/pull/16891 Thank you for reviewing the PR @cloud-fan. Addressed the review comments, please let me know if it requires any further changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16868 Very thoughtful consideration. Thanks for your explanation and suggestion! @tejasapatil what do you think? @gatorsmile @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16870: [SPARK-19496][SQL]to_date udf to return null when input ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16870 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16870: [SPARK-19496][SQL]to_date udf to return null when input ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16870 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72798/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16870: [SPARK-19496][SQL]to_date udf to return null when input ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16870 **[Test build #72798 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72798/testReport)** for PR 16870 at commit [`7238e94`](https://github.com/apache/spark/commit/7238e94ac762f03eca3f67d50acf090bb2cc9cf9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16891: [SPARK-19318][SQL] Fix to treat JDBC connection p...
Github user sureshthalamati commented on a diff in the pull request: https://github.com/apache/spark/pull/16891#discussion_r100737587 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala --- @@ -75,7 +75,7 @@ class JDBCWriteSuite extends SharedSQLContext with BeforeAndAfter { s""" |CREATE OR REPLACE TEMPORARY VIEW PEOPLE1 |USING org.apache.spark.sql.jdbc -|OPTIONS (url '$url1', dbtable 'TEST.PEOPLE1', user 'testUser', password 'testPass') +|OPTIONS (url '$url1', dbTable 'TEST.PEOPLE1', user 'testUser', password 'testPass') --- End diff -- Yes, they should be case-insensitive. Just additional case-sensitivity test case. During testing of my fix I did not notice a test in the write suite for data source table for case-sensitivity checking during insert. I flipped the "dbTable" to make sure case-insensitivity is not broken in this case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16891: [SPARK-19318][SQL] Fix to treat JDBC connection properti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16891 **[Test build #72807 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72807/testReport)** for PR 16891 at commit [`a156074`](https://github.com/apache/spark/commit/a1560742f2196ba04c14ad50e955bdcc839c4ad8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16891: [SPARK-19318][SQL] Fix to treat JDBC connection p...
Github user sureshthalamati commented on a diff in the pull request: https://github.com/apache/spark/pull/16891#discussion_r100737505 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala --- @@ -23,16 +23,30 @@ package org.apache.spark.sql.catalyst.util class CaseInsensitiveMap(map: Map[String, String]) extends Map[String, String] --- End diff -- Good question. For some reason I was hung up on making only the case-sensitive key available to the caller. Changed the code to expose the original map , it made code simpler. Thank you very much for the suggestion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16891: [SPARK-19318][SQL] Fix to treat JDBC connection p...
Github user sureshthalamati commented on a diff in the pull request: https://github.com/apache/spark/pull/16891#discussion_r100737377 --- Diff: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala --- @@ -149,4 +155,29 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo assert(values.getDate(9).equals(dateVal)) assert(values.getTimestamp(10).equals(timestampVal)) } + + test("SPARK-19318: connection property keys should be case-sensitive") { +sql( + s""" + |CREATE TEMPORARY TABLE datetime --- End diff -- done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16891: [SPARK-19318][SQL] Fix to treat JDBC connection p...
Github user sureshthalamati commented on a diff in the pull request: https://github.com/apache/spark/pull/16891#discussion_r100737351 --- Diff: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala --- @@ -62,6 +62,12 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo } override def dataPreparation(conn: Connection): Unit = { +conn.prepareStatement("CREATE TABLE datetime (id NUMBER(10), d DATE, t TIMESTAMP)") + .executeUpdate() +conn.prepareStatement("INSERT INTO datetime VALUES (" + + "1, {d '1991-11-09'}, {ts '1996-01-01 01:23:45'})").executeUpdate() +conn.prepareStatement("CREATE TABLE datetime1 (id NUMBER(10), d DATE, t TIMESTAMP)") --- End diff -- Thank you for reviewing the patch. I think cleanup is not required, these tables are not persistent across the test runs. They are cleaned up when docker container is removed at the end of the test. Currently I did notice any setup in the afterAll() to do it after the test. I moved up creation of temporary views also to the same place, to keep them together. And possibly any future tests can also use these tables. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16891: [SPARK-19318][SQL] Fix to treat JDBC connection p...
Github user sureshthalamati commented on a diff in the pull request: https://github.com/apache/spark/pull/16891#discussion_r100737390 --- Diff: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala --- @@ -149,4 +155,29 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo assert(values.getDate(9).equals(dateVal)) assert(values.getTimestamp(10).equals(timestampVal)) } + + test("SPARK-19318: connection property keys should be case-sensitive") { +sql( + s""" + |CREATE TEMPORARY TABLE datetime + |USING org.apache.spark.sql.jdbc + |OPTIONS (url '$jdbcUrl', dbTable 'datetime', oracle.jdbc.mapDateToTimestamp 'false') + """.stripMargin.replaceAll("\n", " ")) +val row = sql("SELECT * FROM datetime where id = 1").collect()(0) --- End diff -- done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16908: [SPARK-19574][ML][Documentation] Fix Liquid Exception: S...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16908 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16908: [SPARK-19574][ML][Documentation] Fix Liquid Exception: S...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16908 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72805/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16908: [SPARK-19574][ML][Documentation] Fix Liquid Exception: S...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16908 **[Test build #72805 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72805/testReport)** for PR 16908 at commit [`b97b49b`](https://github.com/apache/spark/commit/b97b49b11f3c6113b5b9491e5469ca7a011beac6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16868 Very serious consideration. Thanks for your explanation and suggestion! what do you think? @gatorsmile @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16672: [SPARK-19329][SQL]Reading from or writing to a datasourc...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16672 Could you move the test cases to `DDLSuite.scala`? This is not for Hive specific. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...
Github user titicaca commented on the issue: https://github.com/apache/spark/pull/16689 Yes. The JIRA id is SPARK-19342. Thank you for the help and advices :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16909: [SPARK-13450] Introduce ExternalAppendOnlyUnsafeRowArray...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16909 **[Test build #72806 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72806/testReport)** for PR 16909 at commit [`e9cdd30`](https://github.com/apache/spark/commit/e9cdd30252bce12d34f52cc31f95adb271ef2209). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16672: [SPARK-19329][SQL]Reading from or writing to a da...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16672#discussion_r100735110 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -1431,4 +1432,133 @@ class HiveDDLSuite } } } + + test("insert data to a data source table which has a not existed location should succeed") { +withTable("t") { + withTempDir { dir => +spark.sql( + s"""CREATE TABLE t(a string, b int) + |USING parquet + |OPTIONS(path "file:${dir.getCanonicalPath}") + """.stripMargin) +var table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) --- End diff -- Another general comment. Please avoid using `var`, if possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16909: [SPARK-13450] Introduce UnsafeRowExternalArray. Change S...
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/16909 @rxin : can you please recommend someone who could review this PR ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16672: [SPARK-19329][SQL]Reading from or writing to a da...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16672#discussion_r100735007 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -1431,4 +1432,133 @@ class HiveDDLSuite } } } + + test("insert data to a data source table which has a not existed location should succeed") { +withTable("t") { + withTempDir { dir => +spark.sql( + s"""CREATE TABLE t(a string, b int) + |USING parquet + |OPTIONS(path "file:${dir.getCanonicalPath}") + """.stripMargin) --- End diff -- A general comment about the style. We prefer to the following indentation styles. ```Scala sql( """ |SELECT '1' AS part, key, value FROM VALUES |(1, "one"), (2, "two"), (3, null) AS data(key, value) """.stripMargin) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16909: [SPARK-13450] Introduce UnsafeRowExternalArray. Change S...
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/16909 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16909: [SPARK-13450] Introduce UnsafeRowExternalArray. C...
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/16909 [SPARK-13450] Introduce UnsafeRowExternalArray. Change SortMergeJoin and WindowExec to use it ## What issue does this PR address ? Jira: https://issues.apache.org/jira/browse/SPARK-13450 In `SortMergeJoinExec`, rows of the right relation having the same value for a join key are buffered in-memory. In case of skew, this causes OOMs (see comments in SPARK-13450 for more details). Heap dump from a failed job confirms this : https://issues.apache.org/jira/secure/attachment/12846382/heap-dump-analysis.png . While its possible to increase the heap size to workaround, Spark should be resilient to such issues as skews can happen arbitrarily. ## Change proposed in this pull request - Introduces `ExternalAppendOnlyUnsafeRowArray` - It holds `UnsafeRow`s in-memory upto a certain threshold. - After the threshold is hit, it switches to `UnsafeExternalSorter` which enables spilling of the rows to disk. It does NOT sort the data. - Allows iterating the array multiple times. However, any alteration to the array (using `add` or `clear`) will invalidate the existing iterator(s) - `WindowExec` was already using `UnsafeExternalSorter` to support spilling. Changed it to use the new array - Changed `SortMergeJoinExec` to use the new array implementation - NOTE: I have not changed FULL OUTER JOIN to use this new array implementation. Changing that will need more surgery and I will rather put up a separate PR for that once this gets in. Note for reviewers: The diff can be divided into 3 (or more) parts. My motive behind having all the changes in a single PR was to demonstrate that the API is sane and supports 2 use cases. If reviewing the whole thing as 3 separate PRs would help, I am happy to make the spilt. ## How was this patch tested ? Unit testing - Added unit tests `ExternalAppendOnlyUnsafeRowArray` to validate all its APIs and access patterns - Added unit test for `SortMergeExec` - with and without spill for inner join, left outer join, right outer join to confirm that the spill threshold config behaves as expected and output is as expected. - This PR touches the scanning logic in `SortMergeExec` for _all_ joins (except FULL OUTER JOIN). However, I expect existing test cases to cover that there is no regression in correctness. - Added unit test for `WindowExec` to check behavior of spilling and correctness of results. Stress testing - Confirmed that OOM is gone by running against a production job which used to OOM - Since I cannot share details about prod workload externally, created synthetic data to mimic to issue. Ran before and after the fix to demonstrate the issue and query success with this PR Generating the synthetic data ``` ./bin/spark-shell --driver-memory=6G import org.apache.spark.sql._ val hc = SparkSession.builder.master("local").getOrCreate() hc.sql("DROP TABLE IF EXISTS spark_13450_large_table").collect hc.sql("DROP TABLE IF EXISTS spark_13450_one_row_table").collect val df1 = (0 until 1).map(i => ("10", "100", i.toString, (i * 2).toString)).toDF("i", "j", "str1", "str2") df1.write.format("org.apache.spark.sql.hive.orc.OrcFileFormat").bucketBy(100, "i", "j").sortBy("i", "j").saveAsTable("spark_13450_one_row_table") val df2 = (0 until 300).map(i => ("10", "100", i.toString, (i * 2).toString)).toDF("i", "j", "str1", "str2") df2.write.format("org.apache.spark.sql.hive.orc.OrcFileFormat").bucketBy(100, "i", "j").sortBy("i", "j").saveAsTable("spark_13450_large_table") ``` Ran this against trunk VS local build with this PR. OOM repros with trunk and with the fix this query runs fine. ``` ./bin/spark-shell --driver-java-options="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/spark.driver.heapdump.hprof" import org.apache.spark.sql._ val hc = SparkSession.builder.master("local").getOrCreate() hc.sql("SET spark.sql.autoBroadcastJoinThreshold=1") hc.sql("SET spark.sql.sortMergeJoinExec.buffer.spill.threshold=1") hc.sql("DROP TABLE IF EXISTS spark_13450_result").collect hc.sql(""" CREATE TABLE spark_13450_result AS SELECT a.i AS a_i, a.j AS a_j, a.str1 AS a_str1, a.str2 AS a_str2, b.i AS b_i, b.j AS b_j, b.str1 AS b_str1, b.str2 AS b_str2 FROM spark_13450_one_row_table a JOIN spark_13450_large_table b ON a.i=b.i AND a.j=b.j """) ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/tejasapatil/spark SPARK-13450_smb_buffer_oom Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16909.patch
[GitHub] spark pull request #16672: [SPARK-19329][SQL]Reading from or writing to a da...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16672#discussion_r100734735 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -1431,4 +1432,133 @@ class HiveDDLSuite } } } + + test("insert data to a data source table which has a not existed location should succeed") { +withTable("t") { + withTempDir { dir => +spark.sql( + s"""CREATE TABLE t(a string, b int) + |USING parquet + |OPTIONS(path "file:${dir.getCanonicalPath}") + """.stripMargin) +var table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) +val expectedPath = s"file:${dir.getAbsolutePath.stripSuffix("/")}" +assert(table.location.stripSuffix("/") == expectedPath) + +dir.delete +assert(!new File(table.location).exists()) +spark.sql("INSERT INTO TABLE t SELECT 'c', 1") +checkAnswer(spark.table("t"), Row("c", 1) :: Nil) + +Utils.deleteRecursively(dir) +assert(!new File(table.location).exists()) +spark.sql("INSERT OVERWRITE TABLE t SELECT 'c', 1") +checkAnswer(spark.table("t"), Row("c", 1) :: Nil) + +var newDir = dir.getAbsolutePath.stripSuffix("/") + "/x" +spark.sql(s"ALTER TABLE t SET LOCATION '$newDir'") +spark.sessionState.catalog.refreshTable(TableIdentifier("t")) + +table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) +assert(table.location == newDir) +assert(!new File(newDir).exists()) + +spark.sql("INSERT INTO TABLE t SELECT 'c', 1") +checkAnswer(spark.table("t"), Row("c", 1) :: Nil) + } +} + } + + test("insert into a data source table with no existed partition location should succeed") { +withTable("t") { + withTempDir { dir => +spark.sql( + s"""CREATE TABLE t(a int, b int, c int, d int) + |USING parquet + |PARTITIONED BY(a, b) + |LOCATION "file:${dir.getCanonicalPath}" + """.stripMargin) +var table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) +val expectedPath = s"file:${dir.getAbsolutePath.stripSuffix("/")}" +assert(table.location.stripSuffix("/") == expectedPath) + +spark.sql("INSERT INTO TABLE t PARTITION(a=1, b=2) SELECT 3, 4") +checkAnswer(spark.table("t"), Row(3, 4, 1, 2) :: Nil) + +val partLoc = new File(s"${dir.getAbsolutePath}/a=1") --- End diff -- A general comment about the test cases. Can you please check whether the directory exists after the insert? It can help others confirm the path is correct --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16908: [SPARK-19574][ML][Documentation] Fix Liquid Exception: S...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16908 **[Test build #72805 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72805/testReport)** for PR 16908 at commit [`b97b49b`](https://github.com/apache/spark/commit/b97b49b11f3c6113b5b9491e5469ca7a011beac6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16908: [SPARK-19574][ML][Documentation] Fix Liquid Exception: S...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16908 cc @srowen @anshbansal --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16908: [SPARK-19574][ML][Documentation] Fix Liquid Excep...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/16908 [SPARK-19574][ML][Documentation] Fix Liquid Exception: Start indices amount is not equal to end indices amount ### What changes were proposed in this pull request? ``` Liquid Exception: Start indices amount is not equal to end indices amount, see /Users/xiao/IdeaProjects/sparkDelivery/docs/../examples/src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java. in ml-features.md ``` So far, the build is broken after merging https://github.com/apache/spark/pull/16789 This PR is to fix it. ## How was this patch tested? Manual You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark docMLFix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16908.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16908 commit b97b49b11f3c6113b5b9491e5469ca7a011beac6 Author: Xiao Li Date: 2017-02-13T07:00:05Z fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16902: [SPARK-19564][SPARK-19559][SS][KAFKA] KafkaOffset...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16902 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16902: [SPARK-19564][SPARK-19559][SS][KAFKA] KafkaOffsetReader'...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16902 Good catch. LGTM. Thanks! Merging to master and 2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16789: [SPARK-19444][ML][Documentation] Fix imports not ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16789#discussion_r100733678 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java --- @@ -35,13 +35,11 @@ import org.apache.spark.sql.types.Metadata; import org.apache.spark.sql.types.StructField; import org.apache.spark.sql.types.StructType; -// $example off$ -// $example on:untyped_ops$ // col("...") is preferable to df.col("...") import static org.apache.spark.sql.functions.callUDF; import static org.apache.spark.sql.functions.col; -// $example off:untyped_ops$ +// $example off --- End diff -- It misses `$` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16672: [SPARK-19329][SQL]Reading from or writing to a datasourc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16672 **[Test build #72804 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72804/testReport)** for PR 16672 at commit [`334e89f`](https://github.com/apache/spark/commit/334e89fe7258ab6a6773d534bee469cda7cd6d0c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100730262 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,48 +114,66 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { -case (t1: DecimalType, t2: DecimalType) => - Some(DecimalPrecision.widerDecimalType(t1, t2)) -case (t: IntegralType, d: DecimalType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (d: DecimalType, t: IntegralType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => - Some(DoubleType) -case _ => - findTightestCommonTypeToString(t1, t2) + def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { +findTightestCommonType(t1, t2) --- End diff -- I see. Thank you for catching it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON parsing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16750 **[Test build #72803 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72803/testReport)** for PR 16750 at commit [`a455f4f`](https://github.com/apache/spark/commit/a455f4f900939aa961f9cc1e652c60c9d8d5c523). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16776 https://issues.apache.org/jira/browse/SPARK-19573 is created to track the issue on non-consistent na-droping. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON p...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/16750#discussion_r100729875 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -859,6 +859,48 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils { } } + test("Write timestamps correctly with timestampFormat option and timeZone option") { +withTempDir { dir => + // With dateFormat option and timeZone option. + val timestampsWithFormatPath = s"${dir.getCanonicalPath}/timestampsWithFormat.csv" + val timestampsWithFormat = spark.read +.format("csv") +.option("header", "true") +.option("inferSchema", "true") +.option("timestampFormat", "dd/MM/ HH:mm") +.load(testFile(datesFile)) + timestampsWithFormat.write +.format("csv") +.option("header", "true") +.option("timestampFormat", "/MM/dd HH:mm") +.option("timeZone", "GMT") +.save(timestampsWithFormatPath) + + // This will load back the timestamps as string. + val stringTimestampsWithFormat = spark.read +.format("csv") +.option("header", "true") +.option("inferSchema", "false") --- End diff -- The schema will be `StringType` for all columns. ([CSVInferSchema.scala#L68](https://github.com/ueshin/apache-spark/blob/ffc4912e17cc900fc9d7ceefd0f66461109728e9/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala#L68)) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON p...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/16750#discussion_r100729866 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -58,13 +59,15 @@ private[sql] class JSONOptions( private val parseMode = parameters.getOrElse("mode", "PERMISSIVE") val columnNameOfCorruptRecord = parameters.get("columnNameOfCorruptRecord") + val timeZone: TimeZone = TimeZone.getTimeZone(parameters.getOrElse("timeZone", defaultTimeZoneId)) + // Uses `FastDateFormat` which can be direct replacement for `SimpleDateFormat` and thread-safe. val dateFormat: FastDateFormat = FastDateFormat.getInstance(parameters.getOrElse("dateFormat", "-MM-dd"), Locale.US) --- End diff -- That is a combination of the `dateFormat` and `DateTimeUtils.millisToDays()` (see [JacksonParser.scala#L251](https://github.com/ueshin/apache-spark/blob/ffc4912e17cc900fc9d7ceefd0f66461109728e9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala#L251) or [UnivocityParser.scala#L137](https://github.com/ueshin/apache-spark/blob/ffc4912e17cc900fc9d7ceefd0f66461109728e9/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala#L137)). If both timezones of the `dateFormat` and `DateTimeUtils.millisToDays()` are the same, the days will be calculated correctly. Here the `dateFormat` will have the default timezone to parse and `DateTimeUtils.millisToDays()` will also use the default timezone to calculate days here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16777: [SPARK-19435][SQL] Type coercion between ArrayTypes
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16777 @gatorsmile, Can we make this merged and then add test cases for them separately? It seems the results are the same. I ran two tests as below: ```scala val integralTypes = IndexedSeq( ByteType, ShortType, IntegerType, LongType) val decimals = (-38 to 38).flatMap { p => (-38 to 38).flatMap(s => allCatch opt DecimalType(p, s)) } assert(decimals.nonEmpty) integralTypes.foreach { it => test(s"$it test") { decimals.foreach { d => // From TypeCoercion.findWiderTypeForTwo val maybeType1 = (d, it) match { case (d: DecimalType, t: IntegralType) => Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) case _ => None } // From TypeCoercion.findTightestCommonType val maybeType2 = (d, it) match { case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) => Some(t1) case _ => None } if (maybeType2.isDefined) { val t1 = maybeType1.get val t2 = maybeType2.get assert(t1 == t2) } } } } ``` ```scala val integralTypes = IndexedSeq( ByteType, ShortType, IntegerType, LongType) val decimals = (-38 to 38).flatMap { p => (-38 to 38).flatMap(s => allCatch opt DecimalType(p, s)) } assert(decimals.nonEmpty) integralTypes.foreach { it => test(s"$it test") { val widenDecimals = decimals.flatMap { d => // From TypeCoercion.findWiderTypeForTwo (d, it) match { case (d: DecimalType, t: IntegralType) => Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) case _ => None } }.toSet val tightDecimals = decimals.flatMap { d => // From TypeCoercion.findTightestCommonType (d, it) match { case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) => Some(t1) case _ => None } }.toSet assert(widenDecimals.nonEmpty) assert(tightDecimals.nonEmpty) assert(tightDecimals.subsetOf(widenDecimals)) } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16776 **[Test build #72802 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72802/testReport)** for PR 16776 at commit [`4b7ad19`](https://github.com/apache/spark/commit/4b7ad193729d3829d3222d4cb44c6aea9c557d77). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16907: [SPARK-19582][SPARKR] Allow to disable hive in sparkR sh...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16907 **[Test build #72801 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72801/testReport)** for PR 16907 at commit [`8329be6`](https://github.com/apache/spark/commit/8329be6dce176022d08bb3109dc994434bf7c84a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16776: [SPARK-19436][SQL] Add missing tests for approxQu...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16776#discussion_r100727960 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -58,49 +58,54 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) { * @param probabilities a list of quantile probabilities * Each number must belong to [0, 1]. * For example 0 is the minimum, 0.5 is the median, 1 is the maximum. - * @param relativeError The relative target precision to achieve (greater or equal to 0). + * @param relativeError The relative target precision to achieve (greater than or equal to 0). * If set to zero, the exact quantiles are computed, which could be very expensive. * Note that values greater than 1 are accepted but give the same result as 1. * @return the approximate quantiles at the given probabilities * - * @note NaN values will be removed from the numerical column before calculation + * @note null and NaN values will be removed from the numerical column before calculation. If + * the dataframe is empty or all rows contain null or NaN, null is returned. * * @since 2.0.0 */ def approxQuantile( col: String, probabilities: Array[Double], relativeError: Double): Array[Double] = { -StatFunctions.multipleApproxQuantiles(df.select(col).na.drop(), - Seq(col), probabilities, relativeError).head.toArray +val res = approxQuantile(Array(col), probabilities, relativeError) +Option(res).map(_.head).orNull } /** * Calculates the approximate quantiles of numerical columns of a DataFrame. - * @see [[DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile]] for - * detailed description. + * @see `[[DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile]]` for detailed --- End diff -- `DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile` -> `DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16907: [SPARK-19582][SPARKR] Allow to disable hive in sp...
GitHub user zjffdu opened a pull request: https://github.com/apache/spark/pull/16907 [SPARK-19582][SPARKR] Allow to disable hive in sparkR shell ## What changes were proposed in this pull request? SPARK-15236 do this for scala shell, this ticket is for sparkR shell. This is not only for sparkR itself, but can also benefit downstream project like livy which use shell.R for its interactive session. For now, livy has no control of whether enable hive or not. ## How was this patch tested? Tested it manually, run `bin/sparkR --master local --conf spark.sql.catalogImplementation=in-memory` and verify hive is not enabled. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zjffdu/spark SPARK-19572 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16907.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16907 commit 8329be6dce176022d08bb3109dc994434bf7c84a Author: Jeff Zhang Date: 2017-02-13T05:52:22Z [SPARK-19582][SPARKR] Allow to disable hive in sparkR shell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100727682 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,48 +114,66 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { -case (t1: DecimalType, t2: DecimalType) => - Some(DecimalPrecision.widerDecimalType(t1, t2)) -case (t: IntegralType, d: DecimalType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (d: DecimalType, t: IntegralType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => - Some(DoubleType) -case _ => - findTightestCommonTypeToString(t1, t2) + def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { +findTightestCommonType(t1, t2) --- End diff -- We should make them consistent. That is why I think it is right to make the change, even if it causes the behavior changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/16868 >> we don't need to do check whether the targetTable.storage.locationUri is the same with sourceTable.storage.locationUri We should not do that check for external tables. But continue doing that for other types of tables. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16878: [SPARK-19539][SQL] Block duplicate temp table during cre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16878 **[Test build #72800 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72800/testReport)** for PR 16878 at commit [`f7253c5`](https://github.com/apache/spark/commit/f7253c578d0a7c712bd1e42d46362ab377d93923). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16672: [SPARK-19329][SQL]Reading from or writing to a ta...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/16672#discussion_r100725345 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -754,6 +754,8 @@ case class AlterTableSetLocationCommand( // No partition spec is specified, so we set the location for the table itself catalog.alterTable(table.withNewStorage(locationUri = Some(location))) } + +catalog.refreshTable(table.identifier) --- End diff -- sorry, the test case hit the bug, so I fix it here, I will avoid the bug to use clear cache. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16870: [SPARK-19496][SQL]to_date udf to return null when input ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16870 **[Test build #72799 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72799/testReport)** for PR 16870 at commit [`3b1cfd4`](https://github.com/apache/spark/commit/3b1cfd41ba6171633a85f42482391c1c7d25182e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16672: [SPARK-19329][SQL]Reading from or writing to a table wit...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/16672 ok, let me create a new pr for hive serde tables, and continue to finish this pr~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16870: [SPARK-19496][SQL]to_date udf to return null when input ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16870 **[Test build #72798 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72798/testReport)** for PR 16870 at commit [`7238e94`](https://github.com/apache/spark/commit/7238e94ac762f03eca3f67d50acf090bb2cc9cf9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72797/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72797 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72797/testReport)** for PR 16620 at commit [`46ef5a3`](https://github.com/apache/spark/commit/46ef5a369902ce2ca8c0dfde64b973647f5fffeb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/16870#discussion_r100723941 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala --- @@ -500,6 +527,23 @@ class DateFunctionsSuite extends QueryTest with SharedSQLContext { Row(date1.getTime / 1000L), Row(date2.getTime / 1000L))) checkAnswer(df.selectExpr(s"to_unix_timestamp(s, '$fmt')"), Seq( Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L))) + +val x1 = "2015-07-24 10:00:00" +val x2 = "2015-25-07 02:02:02" +val x3 = "2015-07-24 25:02:02" +val x4 = "2015-24-07 26:02:02" +val ts3 = Timestamp.valueOf("2015-07-24 02:25:02") +val ts4 = Timestamp.valueOf("2015-07-24 00:10:00") + +val df1 = Seq(x1, x2, x3, x4).toDF("x") +checkAnswer(df1.selectExpr("to_unix_timestamp(x)"), Seq( + Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null))) --- End diff -- the same with above~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/16870#discussion_r100723919 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala --- @@ -477,6 +483,27 @@ class DateFunctionsSuite extends QueryTest with SharedSQLContext { checkAnswer(df.selectExpr(s"unix_timestamp(s, '$fmt')"), Seq( Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L))) +val x1 = "2015-07-24 10:00:00" +val x2 = "2015-25-07 02:02:02" +val x3 = "2015-07-24 25:02:02" +val x4 = "2015-24-07 26:02:02" +val ts3 = Timestamp.valueOf("2015-07-24 02:25:02") +val ts4 = Timestamp.valueOf("2015-07-24 00:10:00") + +val df1 = Seq(x1, x2, x3, x4).toDF("x") +checkAnswer(df1.select(unix_timestamp(col("x"))), Seq( + Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null))) --- End diff -- yes, it is ts1, the timestamp of `x1` is `ts1` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16777: [SPARK-19435][SQL] Type coercion between ArrayTypes
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16777 I see what you mean. The code paths are now different. Let me try to investigate it and split them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100723132 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,48 +114,66 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { -case (t1: DecimalType, t2: DecimalType) => - Some(DecimalPrecision.widerDecimalType(t1, t2)) -case (t: IntegralType, d: DecimalType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (d: DecimalType, t: IntegralType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => - Some(DoubleType) -case _ => - findTightestCommonTypeToString(t1, t2) + def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { +findTightestCommonType(t1, t2) --- End diff -- Aha, thank you for correcting me. I overlooked but the result should be still the same, shouldn't it? - `DecimalType.isWiderThan` ``` (p1 - s1) >= (p2 - s2) && s1 >= s2 ``` - DecimalPrecision.widerDecimalType ``` max(s1, s2) + max(p1-s1, p2-s2), max(s1, s2) ``` If both are different, we were already applying different type coercion rules between `findWiderTypeWithoutStringPromotion` and `findWiderTypeForTwo`, I guess we should match them with the same given https://github.com/apache/spark/pull/14439 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16776 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16776 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72795/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16776 **[Test build #72795 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72795/testReport)** for PR 16776 at commit [`c77755d`](https://github.com/apache/spark/commit/c77755d0a0ec386d76500eee8fbdb1156382de21). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16776 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16776 **[Test build #72794 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72794/testReport)** for PR 16776 at commit [`a3171e4`](https://github.com/apache/spark/commit/a3171e4065afb26e95f1136f823e59a017a72b19). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16776 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72794/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16906: [SPARK-19570][PYSPARK] Allow to disable hive in pyspark ...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/16906 Let me take a look tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16906: [SPARK-19570][PYSPARK] Allow to disable hive in pyspark ...
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/16906 @holdenk Please help review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16868 Do you mean that we don't need to do check whether the targetTable.storage.locationUri is the same with sourceTable.storage.locationUri or not ? @tejasapatil --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16733: [SPARK-19392][SQL] Fix the bug that throws an exception ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16733 okay, I'll close this and jira, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16733: [SPARK-19392][SQL] Fix the bug that throws an exc...
Github user maropu closed the pull request at: https://github.com/apache/spark/pull/16733 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100719964 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,48 +114,66 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { -case (t1: DecimalType, t2: DecimalType) => - Some(DecimalPrecision.widerDecimalType(t1, t2)) -case (t: IntegralType, d: DecimalType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (d: DecimalType, t: IntegralType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => - Some(DoubleType) -case _ => - findTightestCommonTypeToString(t1, t2) + def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { +findTightestCommonType(t1, t2) --- End diff -- The original `findTightestCommonTypeToString` does not handle `DecimalType `. However, this PR is calling the `findTightestCommonType ` at first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100719832 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,48 +114,66 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { -case (t1: DecimalType, t2: DecimalType) => - Some(DecimalPrecision.widerDecimalType(t1, t2)) -case (t: IntegralType, d: DecimalType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (d: DecimalType, t: IntegralType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => - Some(DoubleType) -case _ => - findTightestCommonTypeToString(t1, t2) + def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { +findTightestCommonType(t1, t2) --- End diff -- The result is the same? See the cases in `findTightestCommonType`: https://github.com/HyukjinKwon/spark/blob/510a0eee43030abbf37ef922684e6165d6f1e1c8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala#L87-L90 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16777: [SPARK-19435][SQL] Type coercion between ArrayTypes
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16777 Yeah, the first PR is for refactoring and cleaning up `findWiderTypeForTwo`. We need to add the test cases for the behavior changes. We might also need to document this in the release note, because it changes the output types. The second one is for `Type coercion between ArrayTypes`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16733: [SPARK-19392][SQL] Fix the bug that throws an exception ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16733 I prefer to closing it now. If users hit this again, we can revisit it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16906: [SPARK-19570][PYSPARK] Allow to disable hive in pyspark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16906 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/16868 There are two main uses of EXTERNAL tables I am aware of: 1. Ingest data from non-hive locations into Hive tables. This can be covered by adding test case for reading from external table creating using the command this PR enables 2. Create a logical "pointer" to an existing hive table / partition (without creating multiple copies of the underlying data). Testing if the destination table can have the same location as of the source table will cover this. I don't think Spark's interpretation of external tables is different from Hive's so its OK to support both. BTW: If you are supporting 1st use case, one can mimic to get behavior of 2nd use case by creating external table with a fake location and later issuing a `ALTER TABLE SET LOCATION` command to make it point to an existing table's location. There is really no mechanism to guard against having EXTERNAL tables not point to an existing table / partition in Spark. So, both use cases were already possible in Spark --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16906: [SPARK-19570][PYSPARK] Allow to disable hive in pyspark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16906 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72796/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16906: [SPARK-19570][PYSPARK] Allow to disable hive in pyspark ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16906 **[Test build #72796 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72796/testReport)** for PR 16906 at commit [`431bcf8`](https://github.com/apache/spark/commit/431bcf8d332afe9d971b1f44a51e5dd2ca32ff81). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100716751 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,48 +114,66 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { -case (t1: DecimalType, t2: DecimalType) => - Some(DecimalPrecision.widerDecimalType(t1, t2)) -case (t: IntegralType, d: DecimalType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (d: DecimalType, t: IntegralType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => - Some(DoubleType) -case _ => - findTightestCommonTypeToString(t1, t2) + def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { +findTightestCommonType(t1, t2) --- End diff -- @cloud-fan refactored this logic recently and I believe he didn't missed this part. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16777: [SPARK-19435][SQL] Type coercion between ArrayTypes
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16777 Do you mean two PRs for cleaning up the logics here and the support of array type coercion? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16868 In @tejasapatil's comment, Whether we need to be exactly the same as Hive? @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100716252 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,48 +114,66 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { -case (t1: DecimalType, t2: DecimalType) => - Some(DecimalPrecision.widerDecimalType(t1, t2)) -case (t: IntegralType, d: DecimalType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (d: DecimalType, t: IntegralType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => - Some(DoubleType) -case _ => - findTightestCommonTypeToString(t1, t2) + def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { +findTightestCommonType(t1, t2) --- End diff -- Yes, it is true that the type dispatch order was changed but `findTightestCommonType` does not take care of `DecimalType` therefore the results would be the same. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72797 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72797/testReport)** for PR 16620 at commit [`46ef5a3`](https://github.com/apache/spark/commit/46ef5a369902ce2ca8c0dfde64b973647f5fffeb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16777: [SPARK-19435][SQL] Type coercion between ArrayTypes
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16777 I think we need to separate the changes from the support of `Type coercion between ArrayTypes`? Could you submit another PR at first? We might need extra test cases for this change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16733: [SPARK-19392][SQL] Fix the bug that throws an exception ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16733 yea, I think we do not need to handle this. Either way, it'd be better to just add checking the exception in tests?; ``` intercept[NoSuchElementException] { assert(oracleDialect.getCatalystType(java.sql.Types.NUMERIC, "numeric", 0, metadata1) == Some(DecimalType(DecimalType.MAX_PRECISION, 10))) } ``` Anyway, I follow commiter's decision. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r100715216 --- Diff: R/pkg/R/mllib_classification.R --- @@ -39,6 +46,116 @@ setClass("MultilayerPerceptronClassificationModel", representation(jobj = "jobj" #' @note NaiveBayesModel since 2.0.0 setClass("NaiveBayesModel", representation(jobj = "jobj")) +#' linear SVM Model +#' +#' Fits an linear SVM model against a SparkDataFrame. It is a binary classifier, similar to svm in glmnet package +#' Users can print, make predictions on the produced model and save the model to the input path. +#' +#' @param data SparkDataFrame for training. +#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', '.', ':', '+', and '-'. +#' @param regParam The regularization parameter. +#' @param maxIter Maximum iteration number. +#' @param tol Convergence tolerance of iterations. +#' @param standardization Whether to standardize the training features before fitting the model. The coefficients +#'of models will be always returned on the original scale, so it will be transparent for +#'users. Note that with/without standardization, the models should be always converged +#'to the same solution when no regularization is applied. Default is TRUE, same as glmnet. +#' @param threshold The threshold in binary classification, in range [0, 1]. +#' @param weightCol The weight column name. +#' @param ... additional arguments passed to the method. --- End diff -- I don't think that would hurt. We have expert params in tree models. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100715180 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,48 +114,66 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { -case (t1: DecimalType, t2: DecimalType) => - Some(DecimalPrecision.widerDecimalType(t1, t2)) -case (t: IntegralType, d: DecimalType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (d: DecimalType, t: IntegralType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => - Some(DoubleType) -case _ => - findTightestCommonTypeToString(t1, t2) + def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { +findTightestCommonType(t1, t2) --- End diff -- Previously, `findWiderTypeForDecimal ` is before `findTightestCommonTypeToString `. Thus, the results could be different. cc @cloud-fan You changed the order. I am not sure whether this should be documented in the release note. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on the issue: https://github.com/apache/spark/pull/16476 @gatorsmile Hi, this patch has passed all tests, is there some code I still need to modify? Thank you for working on this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16906: [SPARK-19570][PYSPARK] Allow to disable hive in pyspark ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16906 **[Test build #72796 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72796/testReport)** for PR 16906 at commit [`431bcf8`](https://github.com/apache/spark/commit/431bcf8d332afe9d971b1f44a51e5dd2ca32ff81). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16906: [SPARK-19570][PYSPARK] Allow to disable hive in p...
GitHub user zjffdu opened a pull request: https://github.com/apache/spark/pull/16906 [SPARK-19570][PYSPARK] Allow to disable hive in pyspark shell ## What changes were proposed in this pull request? SPARK-15236 do this for scala shell, this ticket is for pyspark shell. This is not only for pyspark itself, but can also benefit downstream project like livy which use shell.py for its interactive session. For now, livy has no control of whether enable hive or not. ## How was this patch tested? I didn't find a way to add test for it. Just manually test it. Run `bin/pyspark --master local --conf spark.sql.catalogImplementation=in-memory` and verify hive is not enabled. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zjffdu/spark SPARK-19570 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16906.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16906 commit 431bcf8d332afe9d971b1f44a51e5dd2ca32ff81 Author: Jeff Zhang Date: 2017-02-13T02:03:40Z [SPARK-19570][PYSPARK] Allow to disable hive in pyspark shell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16776 **[Test build #72795 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72795/testReport)** for PR 16776 at commit [`c77755d`](https://github.com/apache/spark/commit/c77755d0a0ec386d76500eee8fbdb1156382de21). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16868 I think @tejasapatil's suggestion is reasonable, because the location is specified by users, So the sourceTable.storage.locationUri and targetTable.storage.locationUri can be same or different, Whether we need to be exactly the same as Hive? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16776 **[Test build #72794 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72794/testReport)** for PR 16776 at commit [`a3171e4`](https://github.com/apache/spark/commit/a3171e4065afb26e95f1136f823e59a017a72b19). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16733: [SPARK-19392][SQL] Fix the bug that throws an exception ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16733 I think we can close this PR, @maropu ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16868 Please add a test case based on what @tejasapatil suggested. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16776: [SPARK-19436][SQL] Add missing tests for approxQu...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/16776#discussion_r100713371 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -63,44 +63,49 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) { * Note that values greater than 1 are accepted but give the same result as 1. * @return the approximate quantiles at the given probabilities * - * @note NaN values will be removed from the numerical column before calculation + * @note null and NaN values will be removed from the numerical column before calculation * * @since 2.0.0 */ def approxQuantile( col: String, probabilities: Array[Double], relativeError: Double): Array[Double] = { -StatFunctions.multipleApproxQuantiles(df.select(col).na.drop(), - Seq(col), probabilities, relativeError).head.toArray +val res = approxQuantile(Array(col), probabilities, relativeError) +if (res != null) { + res.head +} else { + null +} } /** * Calculates the approximate quantiles of numerical columns of a DataFrame. - * @see [[DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile]] for - * detailed description. + * @see `DataFrameStatsFunctions.approxQuantile` for detailed description. --- End diff -- @jkbradley Do you mean `@see [[DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile]])`? I am not sure whether it work for java docs. @HyukjinKwon Could you help review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16870: [SPARK-19496][SQL]to_date udf to return null when input ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16870 Could you also add one more case for verifying `to_date` on "2016-02-29" and "2017-02-29"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16868 I think there is no need to do this validation, because the location is specified by users, So the targetTable.storage.lcaotionUri and sourceTable.storage.locationUri can be same or different. @tejasapatil --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16870#discussion_r100713312 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala --- @@ -500,6 +527,23 @@ class DateFunctionsSuite extends QueryTest with SharedSQLContext { Row(date1.getTime / 1000L), Row(date2.getTime / 1000L))) checkAnswer(df.selectExpr(s"to_unix_timestamp(s, '$fmt')"), Seq( Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L))) + +val x1 = "2015-07-24 10:00:00" +val x2 = "2015-25-07 02:02:02" +val x3 = "2015-07-24 25:02:02" +val x4 = "2015-24-07 26:02:02" +val ts3 = Timestamp.valueOf("2015-07-24 02:25:02") +val ts4 = Timestamp.valueOf("2015-07-24 00:10:00") + +val df1 = Seq(x1, x2, x3, x4).toDF("x") +checkAnswer(df1.selectExpr("to_unix_timestamp(x)"), Seq( + Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null))) --- End diff -- The same issue here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org