[GitHub] spark pull request #14746: [SPARK-17180] [SQL] Fix View Resolution Order in ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14746#discussion_r76366559 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala --- @@ -105,7 +105,13 @@ case class CreateViewCommand( } val sessionState = sparkSession.sessionState -if (isTemporary) { +// 1) CREATE VIEW: create a temp view when users explicitly specify the keyword TEMPORARY; +// otherwise, create a permanent view no matter whether the temporary view +// with the same name exists or not. +// 2) ALTER VIEW: alter the temporary view if the temp view exists; otherwise, try to alter --- End diff -- Yeah! The only way is to pass a flag. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14572: [SPARK-17192] [SQL] Issue Exception when Users Specify t...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14572 ping @yhuai : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/8880 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64451/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/8880 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/8880 **[Test build #64451 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64451/consoleFull)** for PR 8880 at commit [`a9a05c5`](https://github.com/apache/spark/commit/a9a05c5168eede0db26135c0f8f330b451c840ad). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14572: [SPARK-17192] [SQL] Issue Exception when Users Specify t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14572 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64452/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14572: [SPARK-17192] [SQL] Issue Exception when Users Specify t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14572 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14572: [SPARK-17192] [SQL] Issue Exception when Users Specify t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14572 **[Test build #64452 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64452/consoleFull)** for PR 14572 at commit [`d3a79c8`](https://github.com/apache/spark/commit/d3a79c847b24b5eb3dce0818099d99dd25869b87). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14801: [SPARK-17234] [SQL] Table Existence Checking when Index ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14801 **[Test build #64456 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64456/consoleFull)** for PR 14801 at commit [`8bcd946`](https://github.com/apache/spark/commit/8bcd946ac37a36726a8059f2d074551357d2ed2b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14801: [SPARK-17234] [SQL] Table Existence Checking when Index ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14801 **[Test build #64455 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64455/consoleFull)** for PR 14801 at commit [`3f75605`](https://github.com/apache/spark/commit/3f7560517955fae5b47d093567690e30988a1925). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14808: [SPARK-17156][ML][EXAMPLE] Add multiclass logistic regre...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/14808 This is going to have to be changed after [SPARK-17163](https://issues.apache.org/jira/browse/SPARK-17163). Sorry about the confusion! We'll still want to make an example with multiclass, though, so maybe we can reuse some of this :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14818: [SPARK-17157][SPARKR][WIP]: Add multiclass logistic regr...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/14818 In fact, we are actually just eliminating the `MultinomialLogisticRegression` interface and merging into the existing `LogisticRegression` estimator. So, maybe we won't need a change after all? I'm not very familiar with the R side, but basically the existing logistic regression will now support multiclass. My apologies for the confusion, I hadn't seen this Jira until just now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13231: [SPARK-15453] [SQL] Sort Merge Join to use bucketing met...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13231 @tejasapatil any chance to update it soon? If not, I am interested in implement it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14821: [SPARK-17250] [SQL] Remove HiveClient and setCurrentData...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14821 LGTM, cc @yhuai to confirm. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14617 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64450/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14617 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14617 **[Test build #64450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64450/consoleFull)** for PR 14617 at commit [`838840d`](https://github.com/apache/spark/commit/838840dc3e40b8b10a111d343329f735e76fad36). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14728: [SPARK-17165][SQL] FileStreamSource should not track the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14728 **[Test build #64454 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64454/consoleFull)** for PR 14728 at commit [`9a5ed19`](https://github.com/apache/spark/commit/9a5ed19f3b397b991794a6852aebb2b14c83d635). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14802: [SPARK-17235][SQL] Support purging of old logs in Metada...
Github user petermaxlee commented on the issue: https://github.com/apache/spark/pull/14802 @zsxwing yup I plan to consolidate them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14814: [SPARK-17242][Document]Update links of external d...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14814 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14819: [SPARK-17246][SQL] Add BigDecimal literal
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14819 Does other database do this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14814: [SPARK-17242][Document]Update links of external dstream ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14814 Merging in master/2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14618: [SPARK-17030] [SQL] Remove/Cleanup HiveMetastoreCatalog....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14618 **[Test build #64453 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64453/consoleFull)** for PR 14618 at commit [`ebdfad1`](https://github.com/apache/spark/commit/ebdfad1b575650dd5bedc3ab97c5cf1e97fa3072). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14818: [SPARK-17157][SPARKR][WIP]: Add multiclass logistic regr...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14818 cool, thanks for the heads up @sethah - please loop us in for the R side changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14572: [SPARK-17192] [SQL] Issue Exception when Users Specify t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14572 **[Test build #64452 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64452/consoleFull)** for PR 14572 at commit [`d3a79c8`](https://github.com/apache/spark/commit/d3a79c847b24b5eb3dce0818099d99dd25869b87). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14572: [SPARK-17192] [SQL] Issue Exception when Users Specify t...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14572 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14818: [SPARK-17157][SPARKR][WIP]: Add multiclass logistic regr...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/14818 This is going to have to wait. We are changing the interface completely. See https://issues.apache.org/jira/browse/SPARK-17163. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/8880 **[Test build #64451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64451/consoleFull)** for PR 8880 at commit [`a9a05c5`](https://github.com/apache/spark/commit/a9a05c5168eede0db26135c0f8f330b451c840ad). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14821: [SPARK-17250] [SQL] Remove HiveClient and setCurrentData...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14821 cc @cloud-fan @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14809: [SPARK-17238][SQL] simplify the logic for converting dat...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14809 Yeah, that is a bug. We did not get an exception when we read it, but we can get the error when trying to write it. The error message is confusing ``` Can only write data to relations with a single path.; org.apache.spark.sql.AnalysisException: Can only write data to relations with a single path.; at org.apache.spark.sql.execution.datasources.DataSourceAnalysis$$anonfun$apply$1.applyOrElse(DataSourceStrategy.scala:167) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13452: [SPARK-15718][SQL] better error message for writing buck...
Github user Downchuck commented on the issue: https://github.com/apache/spark/pull/13452 Regarding the reason for disallowing bucket writes: "we have no idea [on read] if the data is bucketed or not, so it doesn't make sense to use save to write bucketed data" It's easy enough to pass information to the reader, it doesn't need to be automatic or rely on a metastore or other discover methods. Something as simple as read.sortedBy(cols...).bucketedBy(func Or cols..) would do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Support empty orc table when converti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14537 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64449/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Support empty orc table when converti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14537 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Support empty orc table when converti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14537 **[Test build #64449 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64449/consoleFull)** for PR 14537 at commit [`9ecb2ed`](https://github.com/apache/spark/commit/9ecb2ed01db1daa19dfe837745d5468cc4990703). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14821: [SPARK-17250] [SQL] Remove HiveClient and setCurrentData...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14821 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64448/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14821: [SPARK-17250] [SQL] Remove HiveClient and setCurrentData...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14821 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14821: [SPARK-17250] [SQL] Remove HiveClient and setCurrentData...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14821 **[Test build #64448 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64448/consoleFull)** for PR 14821 at commit [`1c9a1e3`](https://github.com/apache/spark/commit/1c9a1e3c608be72ca7c4203ecc0e80f15080eb80). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/14712#discussion_r76356157 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala --- @@ -88,14 +89,70 @@ case class AnalyzeTableCommand(tableName: String) extends RunnableCommand { } }.getOrElse(0L) -// Update the Hive metastore if the total size of the table is different than the size -// recorded in the Hive metastore. -// This logic is based on org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats(). -if (newTotalSize > 0 && newTotalSize != oldTotalSize) { +var needUpdate = false +val totalSize = if (newTotalSize > 0 && newTotalSize != oldTotalSize) { + needUpdate = true + newTotalSize +} else { + oldTotalSize +} +var numRows: Option[BigInt] = None +if (!noscan) { + val oldRowCount: Long = if (catalogTable.catalogStats.isDefined) { + catalogTable.catalogStats.get.rowCount.map(_.toLong).getOrElse(-1L) + } else { +-1L + } + val newRowCount = sparkSession.table(tableName).count() + if (newRowCount >= 0 && newRowCount != oldRowCount) { --- End diff -- If we delete the statistics, we can't tell whether we don't collect stats or the table is empty. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14809: [SPARK-17238][SQL] simplify the logic for converting dat...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14809 ah i see, btw in your example when will we throw exception? when we read it? a file-based external table without path is invalid. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/14712 @cloud-fan Can you please launch test for this pr? thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14819: [SPARK-17246][SQL] Add BigDecimal literal
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14819#discussion_r76355514 --- Diff: sql/core/src/test/resources/sql-tests/inputs/literals.sql --- @@ -27,6 +27,12 @@ select 9223372036854775807L, -9223372036854775808L; -- out of range long select 9223372036854775808L; +-- big decimal parsing --- End diff -- nit: if we move the two new queries to the end of this file, the following diff of `literals.sql.out` can be less. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14537: [SPARK-16948][SQL] Support empty orc table when c...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14537#discussion_r76355262 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -237,21 +237,27 @@ private[hive] class HiveMetastoreCatalog(sparkSession: SparkSession) extends Log new Path(metastoreRelation.catalogTable.storage.locationUri.get), partitionSpec) -val inferredSchema = if (fileType.equals("parquet")) { - val inferredSchema = -defaultSource.inferSchema(sparkSession, options, fileCatalog.allFiles()) - inferredSchema.map { inferred => -ParquetFileFormat.mergeMetastoreParquetSchema(metastoreSchema, inferred) - }.getOrElse(metastoreSchema) -} else { - defaultSource.inferSchema(sparkSession, options, fileCatalog.allFiles()).get +val schema = fileType match { + case "parquet" => +val inferredSchema = + defaultSource.inferSchema(sparkSession, options, fileCatalog.allFiles()) + +// For Parquet, get correct schema by merging Metastore schema data types --- End diff -- To follow the decision we made in https://github.com/apache/spark/pull/14207 , I think we should always use the metastore schema and not infer it again. For branch 2.0, we should open another PR to fix the `OrcFileFormat.inferSchema`, to not throw `FileNotFoundException` for empty table. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14617 **[Test build #64450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64450/consoleFull)** for PR 14617 at commit [`838840d`](https://github.com/apache/spark/commit/838840dc3e40b8b10a111d343329f735e76fad36). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/14712#discussion_r76354939 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala --- @@ -88,14 +89,70 @@ case class AnalyzeTableCommand(tableName: String) extends RunnableCommand { } }.getOrElse(0L) -// Update the Hive metastore if the total size of the table is different than the size -// recorded in the Hive metastore. -// This logic is based on org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats(). -if (newTotalSize > 0 && newTotalSize != oldTotalSize) { +var needUpdate = false +val totalSize = if (newTotalSize > 0 && newTotalSize != oldTotalSize) { + needUpdate = true + newTotalSize +} else { + oldTotalSize +} +var numRows: Option[BigInt] = None +if (!noscan) { + val oldRowCount: Long = if (catalogTable.catalogStats.isDefined) { + catalogTable.catalogStats.get.rowCount.map(_.toLong).getOrElse(-1L) + } else { +-1L + } + val newRowCount = sparkSession.table(tableName).count() + if (newRowCount >= 0 && newRowCount != oldRowCount) { +numRows = Some(BigInt(newRowCount)) +needUpdate = true + } +} +// Update the metastore if the above statistics of the table are different from those +// recorded in the metastore. +if (needUpdate) { + sessionState.catalog.alterTable( +catalogTable.copy( + catalogStats = Some(Statistics( +sizeInBytes = totalSize, rowCount = numRows))), +fromAnalyze = true) + + // Refresh the cache of the table in the catalog. + sessionState.catalog.refreshTable(tableIdent) +} + + // data source tables have been converted into LogicalRelations + case logicalRel: LogicalRelation if logicalRel.metastoreTableIdentifier.isDefined => --- End diff -- you can run my added test case "test table-level statistics for data source table created in HiveExternalCatalog" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Support empty orc table when converti...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14537 BTW, @rajeshbalamohan as you directly use metastore schema now, the PR description looks not correct anymore, can you also update it? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14819: [SPARK-17246][SQL] Add BigDecimal literal
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14819 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14819: [SPARK-17246][SQL] Add BigDecimal literal
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14819 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64445/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14617 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14819: [SPARK-17246][SQL] Add BigDecimal literal
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14819 **[Test build #64445 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64445/consoleFull)** for PR 14819 at commit [`fda100f`](https://github.com/apache/spark/commit/fda100f3c42bf82c9d0accafc7230c906e0b8317). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/14712#discussion_r76354802 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala --- @@ -88,14 +89,70 @@ case class AnalyzeTableCommand(tableName: String) extends RunnableCommand { } }.getOrElse(0L) -// Update the Hive metastore if the total size of the table is different than the size -// recorded in the Hive metastore. -// This logic is based on org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats(). -if (newTotalSize > 0 && newTotalSize != oldTotalSize) { +var needUpdate = false +val totalSize = if (newTotalSize > 0 && newTotalSize != oldTotalSize) { + needUpdate = true + newTotalSize +} else { + oldTotalSize +} +var numRows: Option[BigInt] = None +if (!noscan) { + val oldRowCount: Long = if (catalogTable.catalogStats.isDefined) { --- End diff -- thanks, this looks more concise! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Support empty orc table when converti...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14537 @gatorsmile Thanks for cc'ing me. As `spark.sql.hive.convertMetastoreOrc` is set to `false` by default, this change looks fine. However, if setting the config to `true`, and hitting with inconsistent schema between metastore and Orc files, I remember it will cause failure when reading the files. I've implemented two approaches to this issue, #14282 is simply disabling Orc conversion if the case happens, #14365 is doing complicated schema mapping. Once this is merged, I think we should fix the schema inconsistency soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14239: [SPARK-16593] [CORE] [WIP] Provide a pre-fetch mechanism...
Github user f7753 commented on the issue: https://github.com/apache/spark/pull/14239 @tgravescs To make it more readable and answer the question above. **1. Are you saying that you are loading all the data for all the maps from disk into memory and caching it waiting for the reducer to fetch it?** **2. does it conditionally do this or always do it?** I use parameters ` spark.shuffle.prepare.open ` to switch this mechanism off/on and `spark.shuffle.prepare.count ` to control the block number to cache. So here gives the user the privilege to control the MEM used for the pre-fetch block based on their machine conditions. **3. How exactly does the timing work on this, aren't you going to send the prepare immediately before sending the fetch? does the fetch block on waiting on the prepare to cache the data?** I changed the logistic of the shuffle message transfer process, each time I send a FetchRequest, I'll also send the next, so here the server side would eaxctly know the blockIds for the next fetch loop, then cache them, on the FetchRequest succeed callback, the cache would be released since all of them had send to the map side and no longer be used.When the `PrepareRequest` arrived, the server get a thread from the threadpool to operate the read request(In fact, I use a `FutureTask` to do this), if the `FetchRequest` arrived , since the data has not been cached fully yet, this req would be blocked like before and also more effcient than before while the data has been load to mem before the req actually arrive. **4. what testing have you done with this and what size of data? What type of load was on the nodes when testing, etc?** I have implement this and tested based on the branch 1.4 and 1.6, using Intel Hibench4.0 terasort 1TB data size, I got about 30% performance enhancements, on a cluster which has 5 node, each node has 96GB Mem,CPU is Xeon E5 v3 , 7200RPM Disk. But note that since Benchmark like terasort would shuffle all the data that has been read, so in other cases, it may not work so well as that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/14712#discussion_r76354055 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala --- @@ -88,14 +89,70 @@ case class AnalyzeTableCommand(tableName: String) extends RunnableCommand { } }.getOrElse(0L) -// Update the Hive metastore if the total size of the table is different than the size -// recorded in the Hive metastore. -// This logic is based on org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats(). -if (newTotalSize > 0 && newTotalSize != oldTotalSize) { +var needUpdate = false +val totalSize = if (newTotalSize > 0 && newTotalSize != oldTotalSize) { + needUpdate = true + newTotalSize +} else { + oldTotalSize +} +var numRows: Option[BigInt] = None +if (!noscan) { + val oldRowCount: Long = if (catalogTable.catalogStats.isDefined) { + catalogTable.catalogStats.get.rowCount.map(_.toLong).getOrElse(-1L) + } else { +-1L + } + val newRowCount = sparkSession.table(tableName).count() + if (newRowCount >= 0 && newRowCount != oldRowCount) { +numRows = Some(BigInt(newRowCount)) +needUpdate = true + } +} +// Update the metastore if the above statistics of the table are different from those +// recorded in the metastore. +if (needUpdate) { + sessionState.catalog.alterTable( +catalogTable.copy( + catalogStats = Some(Statistics( +sizeInBytes = totalSize, rowCount = numRows))), +fromAnalyze = true) + + // Refresh the cache of the table in the catalog. + sessionState.catalog.refreshTable(tableIdent) +} + + // data source tables have been converted into LogicalRelations + case logicalRel: LogicalRelation if logicalRel.metastoreTableIdentifier.isDefined => --- End diff -- We will reach here when analyzing data source table with hive, the table is in form of LogicalRelation maintained in "cachedDataSourceTables" in HiveMetastoreCatalog. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14710 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64442/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14710 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14710 **[Test build #64442 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64442/consoleFull)** for PR 14710 at commit [`380291b`](https://github.com/apache/spark/commit/380291b7122aaf1fab461a07d72f0c285696c967). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Support empty orc table when converti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14537 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Support empty orc table when converti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14537 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64446/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Support empty orc table when converti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14537 **[Test build #64446 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64446/consoleFull)** for PR 14537 at commit [`fc14e2d`](https://github.com/apache/spark/commit/fc14e2d95cb95becf90a38e91e7725e483bae835). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14809: [SPARK-17238][SQL] simplify the logic for converting dat...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14809 If we do not specify the schema, it will behave like what you said. For example, ```Scala sparkSession.catalog.createExternalTable( "createdParquetTable", "parquet", Map.empty[String, String]) ``` ``` Unable to infer schema for ParquetFormat at . It must be specified manually; org.apache.spark.sql.AnalysisException: Unable to infer schema for ParquetFormat at . It must be specified manually; ``` However, if we specify the schema, we will not get an error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...
Github user angolon commented on the issue: https://github.com/apache/spark/pull/14710 Thanks for the feedback, @vanzin - all good points. I'll fix them up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/14537 Thanks @gatorsmile . Removed the changes related to OrcFileFormat --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14537 **[Test build #64449 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64449/consoleFull)** for PR 14537 at commit [`9ecb2ed`](https://github.com/apache/spark/commit/9ecb2ed01db1daa19dfe837745d5468cc4990703). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14815: [SPARK-17244] Catalyst should not pushdown non-de...
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/14815#discussion_r76351420 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1386,15 +1386,17 @@ object EliminateOuterJoin extends Rule[LogicalPlan] with PredicateHelper { object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper { /** * Splits join condition expressions into three categories based on the attributes required - * to evaluate them. + * to evaluate them. Note that we explicitly exclude non-deterministic (i.e., stateful) condition + * expressions in canEvaluateInLeft or canEvaluateInRight to prevent pushing these predicates on + * either side of the join. * * @return (canEvaluateInLeft, canEvaluateInRight, haveToEvaluateInBoth) */ private def split(condition: Seq[Expression], left: LogicalPlan, right: LogicalPlan) = { val (leftEvaluateCondition, rest) = -condition.partition(_.references subsetOf left.outputSet) +condition.partition(expr => expr.references.subsetOf(left.outputSet) && expr.deterministic) --- End diff -- Good catch! Didn't realize that relative ordering of these expressions could become an issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14821: [SPARK-17250] [SQL] Remove HiveClient and setCurrentData...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14821 **[Test build #64448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64448/consoleFull)** for PR 14821 at commit [`1c9a1e3`](https://github.com/apache/spark/commit/1c9a1e3c608be72ca7c4203ecc0e80f15080eb80). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14820: [SparkR][Minor] Fix example of spark.naiveBayes
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14820 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64447/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14811: [SPARK-17231][CORE] Avoid building debug or trace log me...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14811 Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14811: [SPARK-17231][CORE] Avoid building debug or trace...
Github user mallman closed the pull request at: https://github.com/apache/spark/pull/14811 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14820: [SparkR][Minor] Fix example of spark.naiveBayes
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14820 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14820: [SparkR][Minor] Fix example of spark.naiveBayes
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14820 **[Test build #64447 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64447/consoleFull)** for PR 14820 at commit [`607f117`](https://github.com/apache/spark/commit/607f1177cd7800c29ae29edc9548f820f589495d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14816: [SPARK-17245] [SQL] [BRANCH-1.6] Do not rely on Hive's s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14816 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14816: [SPARK-17245] [SQL] [BRANCH-1.6] Do not rely on Hive's s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14816 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64443/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14809: [SPARK-17238][SQL] simplify the logic for converting dat...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14809 @gatorsmile can you explain more about this example? I think we will throw exception in `CreateDataSourceTableCommand` when we create a `DataSource` and call its `resolveRelation`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14816: [SPARK-17245] [SQL] [BRANCH-1.6] Do not rely on Hive's s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14816 **[Test build #64443 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64443/consoleFull)** for PR 14816 at commit [`8b57886`](https://github.com/apache/spark/commit/8b57886c0489c759f0308a7b104f5b058204cdcd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14821: [SPARK-17250] [SQL] Remove HiveClient and setCurr...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/14821 [SPARK-17250] [SQL] Remove HiveClient and setCurrentDatabase from HiveSessionCatalog ### What changes were proposed in this pull request? This is the first step to remove `HiveClient` from `HiveSessionState`. In the metastore interaction, we always set fully qualified names when accessing/operating a table. That means, we always specify the database. Thus, it is not necessary to use `HiveClient` to change the active database in Hive metastore. In `HiveSessionCatalog `, `setCurrentDatabase` is the only function that uses `HiveClient`. Thus, we can remove it after removing `setCurrentDatabase` ### How was this patch tested? The existing test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark setCurrentDB Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14821.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14821 commit 1c9a1e3c608be72ca7c4203ecc0e80f15080eb80 Author: gatorsmileDate: 2016-08-26T00:51:57Z fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14753: [SPARK-17187][SQL] Supports using arbitrary Java ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14753 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14786: [SPARK-17212][SQL] TypeCoercion supports widening...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14786 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14786: [SPARK-17212][SQL] TypeCoercion supports widening conver...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14786 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14537 You might forget this comment https://github.com/apache/spark/pull/14537#discussion_r76189474 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14807: [Deploy, Windows]Check before adding double quotes in sp...
Github user qualiu commented on the issue: https://github.com/apache/spark/pull/14807 @srowen @tsudukim @tritab @andrewor14 : Hello, I've updated to a more conservative fix, please review it, thanks! I didn't push [my former fix]( https://github.com/qualiu/spark/tree/submit-cmd-all) which is now pushed, just because they has same effect in fact currently, but this/former involves more line changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14820: [SparkR][Minor] Fix example of spark.naiveBayes
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14820 **[Test build #64447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64447/consoleFull)** for PR 14820 at commit [`607f117`](https://github.com/apache/spark/commit/607f1177cd7800c29ae29edc9548f820f589495d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14820: [SparkR][Minor] Fix example of spark.naiveBayes
GitHub user junyangq opened a pull request: https://github.com/apache/spark/pull/14820 [SparkR][Minor] Fix example of spark.naiveBayes ## What changes were proposed in this pull request? The original example doesn't work because the features are not categorical. This PR fixes this by changing to another dataset. ## How was this patch tested? Manual test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/junyangq/spark SPARK-FixNaiveBayes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14820.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14820 commit 607f1177cd7800c29ae29edc9548f820f589495d Author: Junyang QianDate: 2016-08-26T00:17:15Z Fix example of naiveBayes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/14710 Looks ok, a couple of minor suggestions that from my understanding should work now. I guess this is the next best thing without making all of these APIs properly asynchronous. pinging @zsxwing also in case he wants to take a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14710: [SPARK-16533][CORE] resolve deadlocking in driver...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/14710#discussion_r76348848 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala --- @@ -269,20 +258,22 @@ private[spark] abstract class YarnSchedulerBackend( case AddWebUIFilter(filterName, filterParams, proxyBase) => addWebUIFilter(filterName, filterParams, proxyBase) - case RemoveExecutor(executorId, reason) => + case r @ RemoveExecutor(executorId, reason) => logWarning(reason.toString) -removeExecutor(executorId, reason) +driverEndpoint.ask[Boolean](r).onFailure { + case e => +logError(s"Error requesting driver to remove executor $executorId for reason $reason") +} } override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = { case r: RequestExecutors => amEndpoint match { case Some(am) => -Future { - context.reply(am.askWithRetry[Boolean](r)) -} onFailure { - case NonFatal(e) => +am.ask[Boolean](r).andThen { --- End diff -- Similarly here, could you replace `askAmExecutor` with `ThreadUtils.sameThreadExecutionContext` and get rid of another thread pool? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14638 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14638 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64440/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14638 **[Test build #64440 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64440/consoleFull)** for PR 14638 at commit [`3c9adb3`](https://github.com/apache/spark/commit/3c9adb37f77165d78a3cdd159c554621ddb1985d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14802: [SPARK-17235][SQL] Support purging of old logs in Metada...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14802 It would be great if we can reuse codes in `FileStreamSinkLog` for both `FileStreamSource` and `FileStreamSink`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14710: [SPARK-16533][CORE] resolve deadlocking in driver...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/14710#discussion_r76347979 --- Diff: core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala --- @@ -220,19 +225,13 @@ private[spark] class StandaloneAppClient( endpointRef: RpcEndpointRef, context: RpcCallContext, msg: T): Unit = { - // Create a thread to ask a message and reply with the result. Allow thread to be + // Ask a message and create a thread to reply with the result. Allow thread to be // interrupted during shutdown, otherwise context must be notified of NonFatal errors. - askAndReplyThreadPool.execute(new Runnable { -override def run(): Unit = { - try { -context.reply(endpointRef.askWithRetry[Boolean](msg)) - } catch { -case ie: InterruptedException => // Cancelled -case NonFatal(t) => - context.sendFailure(t) - } -} - }) + endpointRef.ask[Boolean](msg).andThen { +case Success(b) => context.reply(b) +case Failure(ie: InterruptedException) => // Cancelled +case Failure(NonFatal(t)) => context.sendFailure(t) + }(askAndReplyExecutionContext) --- End diff -- Do you need `askAndReplyExecutionContext` anymore? It seems now all the heavy lifting is being done in the RPC thread pool, and the `andThen` code could just use `ThreadUtils.sameThreadExecutionContext` since it doesn't do much. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14728: [SPARK-17165][SQL] FileStreamSource should not track the...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14728 Looks pretty good. Just one comment about `Serializable`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14749: [SPARK-17182][SQL] Mark Collect as non-deterministic
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14749 @rxin It doesn't fail any tests. Found this issue while working on related code path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14537 **[Test build #64446 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64446/consoleFull)** for PR 14537 at commit [`fc14e2d`](https://github.com/apache/spark/commit/fc14e2d95cb95becf90a38e91e7725e483bae835). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/14537 Fixed the test case name. I haven't changed the parquet code path as I wasn't sure on whether it would break any backward compatibility. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14814: [SPARK-17242][Document]Update links of external dstream ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14814 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/6/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14814: [SPARK-17242][Document]Update links of external dstream ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14814 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14814: [SPARK-17242][Document]Update links of external dstream ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14814 **[Test build #6 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/6/consoleFull)** for PR 14814 at commit [`46bf9ab`](https://github.com/apache/spark/commit/46bf9ab1acf8c1f3e18afe95d971f0cb66ed0c41). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14819: [SPARK-17246][SQL] Add BigDecimal literal
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14819 **[Test build #64445 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64445/consoleFull)** for PR 14819 at commit [`fda100f`](https://github.com/apache/spark/commit/fda100f3c42bf82c9d0accafc7230c906e0b8317). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14819: [SPARK-17246][SQL] Add BigDecimal literal
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14819 cc @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14819: [SPARK-17246][SQL] Add BigDecimal literal
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/14819 [SPARK-17246][SQL] Add BigDecimal literal ## What changes were proposed in this pull request? This PR adds parser support for `BigDecimal` literals. If you append the suffix `BD` to a valid number then this will be interpreted as a `BigDecimal`, for example `12.0E10BD` will interpreted into a BigDecimal with scale -9 and precision 3. This is useful in situations where you need exact values. ## How was this patch tested? Added tests to `ExpressionParserSuite`, `ExpressionSQLBuilderSuite` and `SQLQueryTestSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hvanhovell/spark SPARK-17246 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14819.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14819 commit fda100f3c42bf82c9d0accafc7230c906e0b8317 Author: Herman van HovellDate: 2016-08-25T23:31:47Z Add BigDecimal literal to parser. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14809: [SPARK-17238][SQL] simplify the logic for converting dat...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14809 Condition 2 is not always true when condition 1 is `true`. I found an exception. ```Scala val schema = StructType(StructField("b", StringType, true) :: Nil) sparkSession.catalog.createExternalTable( "createdParquetTable", "parquet", schema, Map.empty[String, String]) ``` I think this is a bug. Do you want to fix it in this PR? Or I can fix it in another PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org