[GitHub] spark pull request: [SPARK-14998][SQL]fix ArrayIndexOutOfBoundsExc...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12772#issuecomment-215624817 Maybe I think the title is incomplete. It would be nicer if the title includes where (in.. where). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14997]Files in subdirectories are incor...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12774#issuecomment-215624123 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14997]Files in subdirectories are incor...
GitHub user sbcd90 opened a pull request: https://github.com/apache/spark/pull/12774 [SPARK-14997]Files in subdirectories are incorrectly considered in sqlContext.read.json() ## What changes were proposed in this pull request? This PR fixes the issue of "Files in subdirectories are incorrectly considered in sqlContext.read.json()". An example, ``` xyz/file0.json xyz/subdir1/file1.json xyz/subdir2/file2.json xyz/subdir1/subsubdir1/file3.json sqlContext.read.json("xyz") should read only file0.json according to behavior in Spark 1.6.1. However in current master, all the 4 files are read. ``` ## How was this patch tested? unit tests (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/sbcd90/spark jsonReadIssue Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12774.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12774 commit a69329790648fc53d4cf8cc5be659f6ae1989046 Author: Subhobrata Dey Date: 2016-04-29T04:25:53Z [SPARK-14997]Files in subdirectories are incorrectly considered in sqlContext.read.json() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][DOC] Minor typo fixes
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12755#issuecomment-215623845 **[Test build #2930 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2930/consoleFull)** for PR 12755 at commit [`ef6c1fb`](https://github.com/apache/spark/commit/ef6c1fbd69afe1cf8113727b323ed5275649d2bd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][DOC] Minor typo fixes
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12755#issuecomment-215623807 LGTM pending jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][CORE] fix a concurrence issue in NewA...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12773#issuecomment-215622662 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57299/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][CORE] fix a concurrence issue in NewA...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12773#issuecomment-215622661 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][CORE] fix a concurrence issue in NewA...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12773#issuecomment-215622600 **[Test build #57299 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57299/consoleFull)** for PR 12773 at commit [`ea2ba20`](https://github.com/apache/spark/commit/ea2ba20a1141f1a5ac89a780906b0ef90b40fc80). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14976][Streaming] make StreamingContext...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12752#issuecomment-215622100 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57301/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14976][Streaming] make StreamingContext...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12752#issuecomment-215622098 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14976][Streaming] make StreamingContext...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12752#issuecomment-215621978 **[Test build #57301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57301/consoleFull)** for PR 12752 at commit [`f1d14bd`](https://github.com/apache/spark/commit/f1d14bd0f9d1b1f572b5c850f67a51e094c9f331). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14996][SQL] Add TPCDS Benchmark Queries...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12771#issuecomment-215621546 **[Test build #57307 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57307/consoleFull)** for PR 12771 at commit [`461ab81`](https://github.com/apache/spark/commit/461ab81adbc76f2d04ab5aed46b7ebb24cf5c7af). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14996][SQL] Add TPCDS Benchmark Queries...
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/12771#issuecomment-215621056 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215620953 **[Test build #57306 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57306/consoleFull)** for PR 12765 at commit [`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Improve EliminateSorts opti...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-215620961 Right. Thank you so much for enriching ideas! I'll update this PR with `FoldablePropagation`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215620780 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Improve EliminateSorts opti...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-215620606 the scope of `NullPropagation` is one operator, but we need a `FoldablePropagation` whose scope is the whole plan tree. Think about `Sort(a, Filter(true, Project(1 AS a)))`, we should be able to propagate the foldable information up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Improve EliminateSorts opti...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-215620522 Oh, I got. Thanks. I will try to generalize. * Sort(_, Project(_)) * Project(_, Project(...)) And so on. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Improve EliminateSorts opti...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-215620369 If that's just about how to handle `Sort(_, Project(_,_))` expressions in `EliminateSorts`, I can easily modify this PR according to your advice. After moving up the foldables, and the existing `case` statement removes them eventually. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14996][SQL] Add TPCDS Benchmark Queries...
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/12771#discussion_r61532083 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/TPCDSBenchmark.scala --- @@ -0,0 +1,1225 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.parquet + +import org.apache.spark.{SparkConf, SparkContext} +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.SQLContext +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation +import org.apache.spark.util.Benchmark + +/** + * Benchmark to measure TPCDS query performance. + * To run this: + * spark-submit --class --jars + */ +object TPCDSBenchmark { + val conf = new SparkConf() + conf.set("spark.sql.parquet.compression.codec", "snappy") + conf.set("spark.sql.shuffle.partitions", "4") + conf.set("spark.driver.memory", "3g") + conf.set("spark.executor.memory", "3g") + conf.set("spark.sql.autoBroadcastJoinThreshold", (20 * 1024 * 1024).toString) + + val sc = new SparkContext("local[1]", "test-sql-context", conf) + val sqlContext = new SQLContext(sc) + + // These queries a subset of the TPCDS benchmark queries and are taken from + // https://github.com/databricks/spark-sql-perf/blob/master/src/main/scala/com/databricks/spark/ + // sql/perf/tpcds/ImpalaKitQueries.scala + val tpcds = Seq( +("q19", """ + |select + | i_brand_id, + | i_brand, + | i_manufact_id, + | i_manufact, + | sum(ss_ext_sales_price) ext_price + |from + | store_sales + | join item on (store_sales.ss_item_sk = item.i_item_sk) + | join store on (store_sales.ss_store_sk = store.s_store_sk) + | join date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk) + | join customer on (store_sales.ss_customer_sk = customer.c_customer_sk) + | join customer_address on + |(customer.c_current_addr_sk = customer_address.ca_address_sk) + |where + | ss_sold_date_sk between 2451484 and 2451513 + | and d_moy = 11 + | and d_year = 1999 + | and i_manager_id = 7 + | and substr(ca_zip, 1, 5) <> substr(s_zip, 1, 5) + |group by + | i_brand, + | i_brand_id, + | i_manufact_id, + | i_manufact + |order by + | ext_price desc, + | i_brand, + | i_brand_id, + | i_manufact_id, + | i_manufact + |limit 100 +""".stripMargin), + +/* +Java HotSpot(TM) 64-Bit Server VM 1.8.0_73-b02 on Mac OS X 10.11.4 +Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz + +TPCDS Snappy (scale = 5): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative + --- +q19 1710 / 1858 8.7 114.5 1.0X + */ + +("q27", """ + |select + | i_item_id, + | s_state, + | avg(ss_quantity) agg1, + | avg(ss_list_price) agg2, + | avg(ss_coupon_amt) agg3, + | avg(ss_sales_price) agg4 + |from + | store_sales + | join store on (store_sales.ss_store_sk = store.s_store_sk) + | join customer_demographics on + |(store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk) +
[GitHub] spark pull request: [SPARK-3767] [CORE] Support wildcard in Spark ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/12753#issuecomment-215619844 Thanks @rxin for checking this, I don't think @ is used any where. Here again we are replacing only for 'spark.executor.extraJavaOptions' value when @execid@ occurs, any other @ symbols we leave as it is, so I don't think any problem occurs due to this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12919][SPARKR] Implement dapply() on Da...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12493#issuecomment-215619600 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57296/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12919][SPARKR] Implement dapply() on Da...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12493#issuecomment-215619599 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12919][SPARKR] Implement dapply() on Da...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12493#issuecomment-215619544 **[Test build #57296 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57296/consoleFull)** for PR 12493 at commit [`3efe9f5`](https://github.com/apache/spark/commit/3efe9f5f067bf66d35c1c8243d00f2f1fdb4e6f9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Improve EliminateSorts opti...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-215619413 Actually, `Sort` is dead end, we can not propagate up anymore. So, in that case, removing looks more efficient. Do you mean more generalized `FoldablePropagation` like `NullPropagation` by 'not only Sort'? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14987] [SQL] inline hive-service (cli) ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12764#issuecomment-215619043 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14987] [SQL] inline hive-service (cli) ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12764#issuecomment-215619044 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57305/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14987] [SQL] inline hive-service (cli) ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12764#issuecomment-215619040 **[Test build #57305 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57305/consoleFull)** for PR 12764 at commit [`dc010bc`](https://github.com/apache/spark/commit/dc010bcb8520e16a6d174ec04df4b6e0f3a3589d). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14987] [SQL] inline hive-service (cli) ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12764#issuecomment-215618895 **[Test build #57305 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57305/consoleFull)** for PR 12764 at commit [`dc010bc`](https://github.com/apache/spark/commit/dc010bcb8520e16a6d174ec04df4b6e0f3a3589d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Improve EliminateSorts opti...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-215618718 `select 1 as a from tbl order by a` is equal to `select 1 as a from tbl order by 1`. When the child operator is `Project` and has foldable output, if the parent operator references the foldable output, we should replace the attribute with the real foldable expression in `Project`. (and keep the alias to preserve the naming info) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14987] [SQL] inline hive-service (cli) ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12764#issuecomment-215618646 **[Test build #57304 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57304/consoleFull)** for PR 12764 at commit [`769aaa0`](https://github.com/apache/spark/commit/769aaa0daecdc00b7e23cf9a215a65f10d9f9dbe). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14987] [SQL] inline hive-service (cli) ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12764#issuecomment-215618649 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57304/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14987] [SQL] inline hive-service (cli) ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12764#issuecomment-215618648 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14987] [SQL] inline hive-service (cli) ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12764#issuecomment-215618497 **[Test build #57304 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57304/consoleFull)** for PR 12764 at commit [`769aaa0`](https://github.com/apache/spark/commit/769aaa0daecdc00b7e23cf9a215a65f10d9f9dbe). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14991][SQL] Remove HiveNativeCommand
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12769#issuecomment-215618498 **[Test build #57303 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57303/consoleFull)** for PR 12769 at commit [`308a896`](https://github.com/apache/spark/commit/308a89682624967a0c65985585adb951cbad5d4c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14987] [SQL] inline hive-service (cli) ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12764#issuecomment-215618220 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14987] [SQL] inline hive-service (cli) ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12764#issuecomment-215618223 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57302/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14987] [SQL] inline hive-service (cli) ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12764#issuecomment-215618215 **[Test build #57302 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57302/consoleFull)** for PR 12764 at commit [`daeddde`](https://github.com/apache/spark/commit/daeddde9dde084ebcb79e60ce95bc600f3d23164). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14991][SQL] Remove HiveNativeCommand
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12769#issuecomment-215618149 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14987] [SQL] inline hive-service (cli) ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12764#issuecomment-215618028 **[Test build #57302 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57302/consoleFull)** for PR 12764 at commit [`daeddde`](https://github.com/apache/spark/commit/daeddde9dde084ebcb79e60ce95bc600f3d23164). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14991][SQL] Remove HiveNativeCommand
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12769#issuecomment-215617795 **[Test build #2924 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2924/consoleFull)** for PR 12769 at commit [`308a896`](https://github.com/apache/spark/commit/308a89682624967a0c65985585adb951cbad5d4c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14976][Streaming] make StreamingContext...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12752#issuecomment-215617596 **[Test build #57301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57301/consoleFull)** for PR 12752 at commit [`f1d14bd`](https://github.com/apache/spark/commit/f1d14bd0f9d1b1f572b5c850f67a51e094c9f331). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][DOC] Minor typo fixes
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12755#issuecomment-215617450 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57297/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][DOC] Minor typo fixes
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12755#issuecomment-215617449 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][DOC] Minor typo fixes
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12755#issuecomment-215617401 **[Test build #57297 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57297/consoleFull)** for PR 12755 at commit [`ef6c1fb`](https://github.com/apache/spark/commit/ef6c1fbd69afe1cf8113727b323ed5275649d2bd). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14976][Streaming] make StreamingContext...
Github user mwws commented on the pull request: https://github.com/apache/spark/pull/12752#issuecomment-215617286 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14976][Streaming] make StreamingContext...
Github user mwws commented on the pull request: https://github.com/apache/spark/pull/12752#issuecomment-215617172 The failed test is not related to my change (I think PR#12416 break spark CI) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][CORE] fix a concurrence issue in NewA...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12773#issuecomment-215616890 **[Test build #2928 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2928/consoleFull)** for PR 12773 at commit [`ea2ba20`](https://github.com/apache/spark/commit/ea2ba20a1141f1a5ac89a780906b0ef90b40fc80). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14716][SQL] Added support for partition...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12409#issuecomment-215616915 **[Test build #2922 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2922/consoleFull)** for PR 12409 at commit [`acb8e92`](https://github.com/apache/spark/commit/acb8e92c6e4952ef6ea02d10ece79f32ba41c249). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Improve EliminateSorts opti...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-215616875 Thank you for review, @cloud-fan ! Do you mean removing aliases by replacing the base expression(?) by using `transformUp`? Maybe, except the top most aliases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][CORE] fix a concurrence issue in NewA...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12773#issuecomment-215616903 **[Test build #2929 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2929/consoleFull)** for PR 12773 at commit [`ea2ba20`](https://github.com/apache/spark/commit/ea2ba20a1141f1a5ac89a780906b0ef90b40fc80). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][CORE] fix a concurrence issue in NewA...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12773#issuecomment-215616837 **[Test build #2925 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2925/consoleFull)** for PR 12773 at commit [`ea2ba20`](https://github.com/apache/spark/commit/ea2ba20a1141f1a5ac89a780906b0ef90b40fc80). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][CORE] fix a concurrence issue in NewA...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12773#issuecomment-215616866 **[Test build #2927 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2927/consoleFull)** for PR 12773 at commit [`ea2ba20`](https://github.com/apache/spark/commit/ea2ba20a1141f1a5ac89a780906b0ef90b40fc80). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][CORE] fix a concurrence issue in NewA...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12773#issuecomment-215616838 **[Test build #57300 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57300/consoleFull)** for PR 12773 at commit [`ea2ba20`](https://github.com/apache/spark/commit/ea2ba20a1141f1a5ac89a780906b0ef90b40fc80). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][CORE] fix a concurrence issue in NewA...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12773#issuecomment-215616855 **[Test build #2926 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2926/consoleFull)** for PR 12773 at commit [`ea2ba20`](https://github.com/apache/spark/commit/ea2ba20a1141f1a5ac89a780906b0ef90b40fc80). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][CORE] fix a concurrence issue in NewA...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12773#issuecomment-215616619 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14996][SQL] Add TPCDS Benchmark Queries...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12771#issuecomment-215616450 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57295/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14996][SQL] Add TPCDS Benchmark Queries...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12771#issuecomment-215616449 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14996][SQL] Add TPCDS Benchmark Queries...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12771#issuecomment-215616380 **[Test build #57295 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57295/consoleFull)** for PR 12771 at commit [`461ab81`](https://github.com/apache/spark/commit/461ab81adbc76f2d04ab5aed46b7ebb24cf5c7af). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` | and i_class in('personal', 'portable', 'reference', 'self-help')` * ` | and i_class in('accessories', 'classical', 'fragrances', 'pants')` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14996][SQL] Add TPCDS Benchmark Queries...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12771#discussion_r61530661 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/TPCDSBenchmark.scala --- @@ -0,0 +1,1225 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.parquet + +import org.apache.spark.{SparkConf, SparkContext} +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.SQLContext +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation +import org.apache.spark.util.Benchmark + +/** + * Benchmark to measure TPCDS query performance. + * To run this: + * spark-submit --class --jars + */ +object TPCDSBenchmark { + val conf = new SparkConf() + conf.set("spark.sql.parquet.compression.codec", "snappy") + conf.set("spark.sql.shuffle.partitions", "4") + conf.set("spark.driver.memory", "3g") + conf.set("spark.executor.memory", "3g") + conf.set("spark.sql.autoBroadcastJoinThreshold", (20 * 1024 * 1024).toString) + + val sc = new SparkContext("local[1]", "test-sql-context", conf) + val sqlContext = new SQLContext(sc) + + // These queries a subset of the TPCDS benchmark queries and are taken from + // https://github.com/databricks/spark-sql-perf/blob/master/src/main/scala/com/databricks/spark/ + // sql/perf/tpcds/ImpalaKitQueries.scala + val tpcds = Seq( +("q19", """ + |select + | i_brand_id, + | i_brand, + | i_manufact_id, + | i_manufact, + | sum(ss_ext_sales_price) ext_price + |from + | store_sales + | join item on (store_sales.ss_item_sk = item.i_item_sk) + | join store on (store_sales.ss_store_sk = store.s_store_sk) + | join date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk) + | join customer on (store_sales.ss_customer_sk = customer.c_customer_sk) + | join customer_address on + |(customer.c_current_addr_sk = customer_address.ca_address_sk) + |where + | ss_sold_date_sk between 2451484 and 2451513 + | and d_moy = 11 + | and d_year = 1999 + | and i_manager_id = 7 + | and substr(ca_zip, 1, 5) <> substr(s_zip, 1, 5) + |group by + | i_brand, + | i_brand_id, + | i_manufact_id, + | i_manufact + |order by + | ext_price desc, + | i_brand, + | i_brand_id, + | i_manufact_id, + | i_manufact + |limit 100 +""".stripMargin), + +/* +Java HotSpot(TM) 64-Bit Server VM 1.8.0_73-b02 on Mac OS X 10.11.4 +Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz + +TPCDS Snappy (scale = 5): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative + --- +q19 1710 / 1858 8.7 114.5 1.0X + */ + +("q27", """ + |select + | i_item_id, + | s_state, + | avg(ss_quantity) agg1, + | avg(ss_list_price) agg2, + | avg(ss_coupon_amt) agg3, + | avg(ss_sales_price) agg4 + |from + | store_sales + | join store on (store_sales.ss_store_sk = store.s_store_sk) + | join customer_demographics on + |(store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk) + | join
[GitHub] spark pull request: [SPARK-14613][ML] Add @Since into the matrix a...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12416#issuecomment-215615904 I have reverted this commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13568] [ML] Create feature transformer ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11601#issuecomment-215615452 **[Test build #57298 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57298/consoleFull)** for PR 11601 at commit [`aef094b`](https://github.com/apache/spark/commit/aef094bc7b7a00c0ded1b2998b7f98d2bc42c666). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13568] [ML] Create feature transformer ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11601#issuecomment-215615467 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57298/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13568] [ML] Create feature transformer ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11601#issuecomment-215615465 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14613][ML] Add @Since into the matrix a...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12416#issuecomment-215615493 Sorry. I am going to revert it. I believe it breaks the build. Seems those build changes are not related to adding Since tag. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14613][ML] Add @Since into the matrix a...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12416#issuecomment-215615096 Looks like this one breaks the pr builder? https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57289/testReport/org.apache.spark.network/ChunkFetchIntegrationSuite/ChunkFetchIntegrationSuite/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14991][SQL] Remove HiveNativeCommand
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12769#issuecomment-215615011 **[Test build #2924 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2924/consoleFull)** for PR 12769 at commit [`308a896`](https://github.com/apache/spark/commit/308a89682624967a0c65985585adb951cbad5d4c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Improve EliminateSorts opti...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-215614987 instead of doing this, can we propagate foldable alias bottom up? So that it's not only `Sort`, but all operators can benefit from it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14850][ML] convert primitive array from...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12640#discussion_r61529874 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/UDTSerializationBenchmark.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.linalg + +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder +import org.apache.spark.util.Benchmark + +/** + * Serialization benchmark for VectorUDT. + */ +object UDTSerializationBenchmark { + + def main(args: Array[String]): Unit = { +val iters = 1e2.toInt +val numRows = 1e3.toInt + +val encoder = ExpressionEncoder[Vector].defaultBinding + +val vectors = (1 to numRows).map { i => + Vectors.dense(Array.fill(1e5.toInt)(1.0 * i)) +}.toArray +val rows = vectors.map(encoder.toRow) + +val benchmark = new Benchmark("VectorUDT de/serialization", numRows, iters) + +benchmark.addCase("serialize") { _ => + var sum = 0 + var i = 0 + while (i < numRows) { +sum += encoder.toRow(vectors(i)).numFields --- End diff -- it's different, `VectorUDT.serialize` only turn user object to catalyst data, but the real serialization should also include convert catalyst data into unsafe format. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][CORE] fix a concurrence issue in NewA...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12773#issuecomment-215614070 **[Test build #57299 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57299/consoleFull)** for PR 12773 at commit [`ea2ba20`](https://github.com/apache/spark/commit/ea2ba20a1141f1a5ac89a780906b0ef90b40fc80). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][CORE] fix a concurrence issue in NewA...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12773#issuecomment-215613977 cc @rxin @davies @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][CORE] fix a concurrence issue in NewA...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/12773 [HOTFIX][CORE] fix a concurrence issue in NewAccumulator ## What changes were proposed in this pull request? `AccumulatorContext` is not thread-safe, that's why all of its methods are synchronized. However, there is one exception: the `AccumulatorContext.originals`. `NewAccumulator` use it to check if it's registered, which is wrong as it's not synchronized. This PR mark `AccumulatorContext.originals` as `private` and now all access to `AccumulatorContext` is synchronized. ## How was this patch tested? I verified it locally. To be safe, we can let jenkins test it many times to make sure this problem is gone. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark debug Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12773.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12773 commit bf04ffd3734c9a64a8eb4a5d4e0b24b2d199c204 Author: Wenchen Fan Date: 2016-04-29T02:25:00Z fix a concurrence bug --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14987] [SQL] inline hive-service (cli) ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12764#issuecomment-215613711 **[Test build #2921 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2921/consoleFull)** for PR 12764 at commit [`a07440b`](https://github.com/apache/spark/commit/a07440bd312eaa6c618e303f96420f3e5c09bd8c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13568] [ML] Create feature transformer ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11601#issuecomment-215613222 **[Test build #57298 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57298/consoleFull)** for PR 11601 at commit [`aef094b`](https://github.com/apache/spark/commit/aef094bc7b7a00c0ded1b2998b7f98d2bc42c666). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/12765#discussion_r61529274 --- Diff: python/pyspark/sql/catalog.py --- @@ -0,0 +1,426 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from collections import namedtuple + +from pyspark import since +from pyspark.rdd import ignore_unicode_prefix +from pyspark.sql.dataframe import DataFrame +from pyspark.sql.functions import UserDefinedFunction +from pyspark.sql.types import IntegerType, StringType, StructType + + +Database = namedtuple("Database", "name description locationUri") +Table = namedtuple("Table", "name database description tableType isTemporary") +Column = namedtuple("Column", "name description dataType nullable isPartition isBucket") +Function = namedtuple("Function", "name description className isTemporary") + + +class Catalog(object): +"""User-facing catalog API, accessible through `SparkSession.catalog`. + +This is a thin wrapper around its Scala implementation org.apache.spark.sql.catalog.Catalog. +""" + +def __init__(self, sparkSession): +"""Create a new Catalog that wraps the underlying JVM object.""" +self._sparkSession = sparkSession +self._jsparkSession = sparkSession._jsparkSession +self._jcatalog = sparkSession._jsparkSession.catalog() + +@ignore_unicode_prefix +@since(2.0) +def currentDatabase(self): +"""Returns the current default database in this session. + +>>> spark.catalog._reset() +>>> spark.catalog.currentDatabase() +u'default' +""" +return self._jcatalog.currentDatabase() + +@ignore_unicode_prefix +@since(2.0) +def setCurrentDatabase(self, dbName): +"""Sets the current default database in this session. + +>>> spark.catalog._reset() +>>> spark.sql("CREATE DATABASE some_db") +DataFrame[] +>>> spark.catalog.setCurrentDatabase("some_db") +>>> spark.catalog.currentDatabase() +u'some_db' +>>> spark.catalog.setCurrentDatabase("does_not_exist") # doctest: +IGNORE_EXCEPTION_DETAIL +Traceback (most recent call last): +... +AnalysisException: ... +""" +return self._jcatalog.setCurrentDatabase(dbName) + +@ignore_unicode_prefix +@since(2.0) +def listDatabases(self): +"""Returns a list of databases available across all sessions. + +>>> spark.catalog._reset() +>>> [db.name for db in spark.catalog.listDatabases()] +[u'default'] +>>> spark.sql("CREATE DATABASE some_db") +DataFrame[] +>>> [db.name for db in spark.catalog.listDatabases()] +[u'default', u'some_db'] +""" +iter = self._jcatalog.listDatabases().toLocalIterator() +databases = [] +while iter.hasNext(): +jdb = iter.next() +databases.append(Database( +name=jdb.name(), +description=jdb.description(), +locationUri=jdb.locationUri())) +return databases + +@ignore_unicode_prefix +@since(2.0) +def listTables(self, dbName=None): +"""Returns a list of tables in the specified database. + +If no database is specified, the current database is used. +This includes all temporary tables. + +>>> spark.catalog._reset() +>>> spark.sql("CREATE DATABASE some_db") +DataFrame[] +>>> spark.catalog.listTables() +[] +>>> spark.catalog.listTables("some_db") +[] +>>> spark.createDataFrame([(1, 1)]).registerTempTable("my_temp_tab") +>>> spark.sql("CREATE TABLE my_tab1 (name STRING, age INT)") +DataFrame[] +>>> spark.s
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/12765#discussion_r61529260 --- Diff: python/pyspark/sql/session.py --- @@ -121,6 +121,19 @@ def newSession(self): """ return self.__class__(self._sc, self._jsparkSession.newSession()) +@property +@since(2.0) +def conf(self): +"""Runtime configuration interface for Spark. + +This is the interface through which the user can get and set all Spark and Hadoop +configurations that are relevant to Spark SQL. When getting the value of a config, +this defaults to the value set in the underlying :class:`SparkContext`, if any. +""" +if not hasattr(self, "_conf"): +self._conf = RuntimeConfig(self._jsparkSession.conf()) +return self._conf + @since(2.0) def setConf(self, key, value): --- End diff -- this one is also in scala, so I kept it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14716][SQL] Added support for partition...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12409#issuecomment-215612403 **[Test build #2923 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2923/consoleFull)** for PR 12409 at commit [`acb8e92`](https://github.com/apache/spark/commit/acb8e92c6e4952ef6ea02d10ece79f32ba41c249). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][DOC] Minor typo fixes
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12755#issuecomment-215612379 **[Test build #57297 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57297/consoleFull)** for PR 12755 at commit [`ef6c1fb`](https://github.com/apache/spark/commit/ef6c1fbd69afe1cf8113727b323ed5275649d2bd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14976][Streaming] make StreamingContext...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12752#issuecomment-215612058 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14976][Streaming] make StreamingContext...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12752#issuecomment-215612059 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57294/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14976][Streaming] make StreamingContext...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12752#issuecomment-215612048 **[Test build #57294 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57294/consoleFull)** for PR 12752 at commit [`f1d14bd`](https://github.com/apache/spark/commit/f1d14bd0f9d1b1f572b5c850f67a51e094c9f331). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14998][SQL]fix ArrayIndexOutOfBoundsExc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12772#issuecomment-215611950 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14716][SQL] Added support for partition...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12409#issuecomment-215611801 **[Test build #2923 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2923/consoleFull)** for PR 12409 at commit [`acb8e92`](https://github.com/apache/spark/commit/acb8e92c6e4952ef6ea02d10ece79f32ba41c249). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14716][SQL] Added support for partition...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12409#issuecomment-215611786 **[Test build #2922 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2922/consoleFull)** for PR 12409 at commit [`acb8e92`](https://github.com/apache/spark/commit/acb8e92c6e4952ef6ea02d10ece79f32ba41c249). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14998][SQL]fix ArrayIndexOutOfBoundsExc...
GitHub user liyuance opened a pull request: https://github.com/apache/spark/pull/12772 [SPARK-14998][SQL]fix ArrayIndexOutOfBoundsException in when I use transformation in SparkSQL, may throw java.lang.ArrayIndexOutOfBoundsException, as some output lines of the transformation script end with sequential TOK_TABLEROWFORMATFIELD. like: A\tB\tC\t\t You can merge this pull request into a Git repository by running: $ git pull https://github.com/liyuance/spark transformation-bug Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12772.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12772 commit de1b66f2721c58345b8e89606cbff01b3bee387c Author: liyuance Date: 2016-04-29T01:54:01Z fix transformation bug --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12919][SPARKR] Implement dapply() on Da...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12493#issuecomment-215611550 **[Test build #57296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57296/consoleFull)** for PR 12493 at commit [`3efe9f5`](https://github.com/apache/spark/commit/3efe9f5f067bf66d35c1c8243d00f2f1fdb4e6f9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14996][SQL] Add TPCDS Benchmark Queries...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12771#issuecomment-215611065 **[Test build #57295 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57295/consoleFull)** for PR 12771 at commit [`461ab81`](https://github.com/apache/spark/commit/461ab81adbc76f2d04ab5aed46b7ebb24cf5c7af). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14996] Add TPCDS Benchmark Queries for ...
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/12771#issuecomment-215610732 cc @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14996] Add TPCDS Benchmark Queries for ...
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/12771 [SPARK-14996] Add TPCDS Benchmark Queries for SparkSQL ## What changes were proposed in this pull request? This PR adds support for easily running and benchmarking a set of common TPCDS queries locally in SparkSQL. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/sameeragarwal/spark tpcds-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12771.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12771 commit b4e20e4a106bc487352a9e5c711c89dcaebf8b0e Author: Sameer Agarwal Date: 2016-04-24T06:57:11Z TPCDSBenchmark commit 461ab81adbc76f2d04ab5aed46b7ebb24cf5c7af Author: Sameer Agarwal Date: 2016-04-28T21:04:42Z all queries work --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][DOC] Minor typo fixes
Github user zhengruifeng commented on the pull request: https://github.com/apache/spark/pull/12755#issuecomment-215610516 @rxin The PR was first created to fix errors in Dataset.scala doc. What about updating this PR by remove changes in other files? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14987] [SQL] inline hive-service (cli) ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12764#issuecomment-215610395 **[Test build #2920 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2920/consoleFull)** for PR 12764 at commit [`a07440b`](https://github.com/apache/spark/commit/a07440bd312eaa6c618e303f96420f3e5c09bd8c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14938][ML] replace RDD.map with Dataset...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12718#discussion_r61527726 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -79,11 +79,12 @@ final class ChiSqSelector(override val uid: String) @Since("2.0.0") override def fit(dataset: Dataset[_]): ChiSqSelectorModel = { +val sqlContext = dataset.sqlContext +import sqlContext.implicits._ + transformSchema(dataset.schema, logging = true) -val input = dataset.select($(labelCol), $(featuresCol)).rdd.map { - case Row(label: Double, features: Vector) => -LabeledPoint(label, features) -} +val input = dataset.select(col($(labelCol)).cast(DoubleType).as("label"), --- End diff -- Do you mean change from `col($(labelCol)).cast(DoubleType).as("label")` to `col($(labelCol)).cast(DoubleType).as(getDefault(labelCol).get)` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14976][Streaming] make StreamingContext...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12752#issuecomment-215609668 **[Test build #57294 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57294/consoleFull)** for PR 12752 at commit [`f1d14bd`](https://github.com/apache/spark/commit/f1d14bd0f9d1b1f572b5c850f67a51e094c9f331). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215609430 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57289/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215609428 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215609352 **[Test build #57289 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57289/consoleFull)** for PR 12765 at commit [`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215609337 **[Test build #2916 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2916/consoleFull)** for PR 12765 at commit [`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14969][MLLib] Remove duplicate implemen...
Github user dding3 commented on the pull request: https://github.com/apache/spark/pull/12747#issuecomment-215609146 @srowen Thanks for your review. I have removed it in ANNGradient. Besides, I checked all subclass of Gradient, looks like there is no duplicate implementation now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11395][SPARKR] Support over and window ...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/10094#issuecomment-215609110 I think someone also add a `window` function? https://github.com/apache/spark/blob/a55fbe2a16aa0866ff8aca25bf9f772e6eb516a1/R/pkg/R/functions.R#L2154 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14988][PYTHON] SparkSession catalog and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12765#issuecomment-215608783 **[Test build #2918 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2918/consoleFull)** for PR 12765 at commit [`923b92a`](https://github.com/apache/spark/commit/923b92aee7220ec2f2960080853ce8af6d8f51a2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org