[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14452 **[Test build #63108 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63108/consoleFull)** for PR 14452 at commit [`55a44c8`](https://github.com/apache/spark/commit/55a44c85ebfb9a065902995662c2353fdc562224). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14452 **[Test build #63107 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63107/consoleFull)** for PR 14452 at commit [`00b29ed`](https://github.com/apache/spark/commit/00b29ede65b84e0fc99ab9e0ebd33f6092077bbc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14452: [SPARK-16849][SQL] Improve subquery execution by ...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/14452 [SPARK-16849][SQL] Improve subquery execution by deduplicating the subqueries with the same results ## What changes were proposed in this pull request? The subqueries in SparkSQL will be run even they have the same physical plan and output same results. We should be able to deduplicate these subqueries which are referred in a query for many times. ## How was this patch tested? Jenkins tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 single-exec-subquery Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14452.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14452 commit 00b29ede65b84e0fc99ab9e0ebd33f6092077bbc Author: Liang-Chi HsiehDate: 2016-08-01T03:41:34Z Dedup common subqueries. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14420: [SPARK-14204] [SQL] register driverClass rather than use...
Github user zzcclp commented on the issue: https://github.com/apache/spark/pull/14420 @JoshRosen can you have a look at this pr? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14451: [SPARK-16848][SQL] Make jdbc() and read.format("jdbc") c...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14451 Yeah, keep it open. That PR just tries to get all the possible holes (corner cases). You know, I do not care which PR is merged, but, in my opinion, we need to cover all the cases. That is for Read API. Originally, I think we should do the same for the Write API. Later, it sounds like the efforts are not worthy. Thus, I did not continue it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14451: [SPARK-16848][SQL] Make jdbc() and read.format("jdbc") c...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14451 Oh, you took a look already. Yes, it seems your PR includes this change. Do you mind if I leave this open? This bit seems arguably get merged quickly. I don't mind if this credits to him (for other reviewers). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14451: [SPARK-16848][SQL] Make jdbc() and read.format("jdbc") c...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14451 Is this related to: https://github.com/apache/spark/pull/13770? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14451: [SPARK-16848][SQL] Make jdbc() and read.format("jdbc") c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14451 **[Test build #63106 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63106/consoleFull)** for PR 14451 at commit [`3def251`](https://github.com/apache/spark/commit/3def251fb9bd213e0d343cd404f9896a576c0d74). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14451: Make jdbc() and read.format("jdbc") consistently ...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/14451 Make jdbc() and read.format("jdbc") consistently throwing exception for user-specified schema ## What changes were proposed in this pull request? Currently, ```scala spark.read.schema(StructType(Seq())).jdbc(...),show() ``` does not throws an exception whereas ```scala spark.read.schema(StructType(Seq())).option(...).format("jdbc").load().show() ``` does as below: ``` jdbc does not allow user-specified schemas.; org.apache.spark.sql.AnalysisException: jdbc does not allow user-specified schemas.; at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:320) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122) at org.apache.spark.sql.jdbc.JDBCSuite$$anonfun$17.apply$mcV$sp(JDBCSuite.scala:351) ``` It'd make sense throwing the exception when user specifies schema identically. This PR makes the behaviour consistent for both jdbc APIs. ## How was this patch tested? Unit test in `JDBCSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-16848 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14451.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14451 commit 3def251fb9bd213e0d343cd404f9896a576c0d74 Author: hyukjinkwonDate: 2016-08-02T05:07:57Z Make jdbc() and read.format("jdbc") consistently throwing exception for user-specified schema --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14449: [SPARK-16843][MLLIB] add the percentage ChiSquareSelecto...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14449 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14298: [SPARK-16283][SQL] Implement `percentile_approx` SQL fun...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14298 **[Test build #63105 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63105/consoleFull)** for PR 14298 at commit [`c0acf16`](https://github.com/apache/spark/commit/c0acf1697ad369302068aeaecad59e812038d14a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14450: [SPARK-16847][SQL] Prevent to potentially read corrupt s...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14450 This will be a legitimate change because this replaces the deprecated usage of constructor. Please let me cc @liancheng and @srowen as well as it is also partly about building. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14450: [SPARK-16847][SQL] Prevent to potentially read corrupt s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14450 **[Test build #63104 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63104/consoleFull)** for PR 14450 at commit [`3c46111`](https://github.com/apache/spark/commit/3c461117852c86eae631b06cacfd72773653083c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14450: [SPARK-16847][SQL] Prevent to potentially read co...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/14450 [SPARK-16847][SQL] Prevent to potentially read corrupt statstics on binary in Parquet a VectorizedReader ## What changes were proposed in this pull request? it is still possible to read corrupt Parquet's statistics. This problem was found in [PARQUET-251](https://issues.apache.org/jira/browse/PARQUET-251) and we disabled filter pushdown on binary columns in Spark before. We enabled this after upgrading Parquet but it seems there are potential incompatibility for Parquet files written in lower Spark versions. Currently, this does not affect Parquet standard API. However, In Spark, we implemented a vectorized reader, separately with Parquet's standard API. For standard API, this is being handled but not in the vectorized reader. This will be okay in Spark 2.0 because we don't use the statistics for not in vectorized reader, https://github.com/apache/spark/pull/13701. However, if we support this, we will meet this potential incompatibility. It is okay to just pass `FileMetaData`. This is being handled in parquet-mr (See https://github.com/apache/parquet-mr/commit/e3b95020f777eb5e0651977f654c1662e3ea1f29) ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-16847 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14450.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14450 commit 3c461117852c86eae631b06cacfd72773653083c Author: hyukjinkwonDate: 2016-08-02T04:31:04Z Prevent to potentially read corrupt statstics on binary in Parquet via VectorizedReader --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14449: [SPARK-16843][MLLIB] add the percentage ChiSquare...
GitHub user mpjlu opened a pull request: https://github.com/apache/spark/pull/14449 [SPARK-16843][MLLIB] add the percentage ChiSquareSelector feature ## What changes were proposed in this pull request? add the percentage ChiSquareSelector feature ## How was this patch tested? add scala ut You can merge this pull request into a Git repository by running: $ git pull https://github.com/mpjlu/spark chisquare2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14449.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14449 commit fb3c9a93e4b1b20f6738a3b56d8fb0604fbbb59e Author: Peng, MengDate: 2016-08-01T05:00:16Z add the percentage ChiSquareSelector feature --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14446 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63101/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14446 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14446 **[Test build #63101 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63101/consoleFull)** for PR 14446 at commit [`1054b74`](https://github.com/apache/spark/commit/1054b74f18193378942b7fde26df36e06bff765e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11411: [SPARK-13385][MLlib] Enable AssociationRules to generate...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/11411 Test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/12135 Test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12983: [SPARK-15213][PySpark] Unify 'range' usages
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/12983 @srowen In Python2, `xrange` is more efficient than `range`. This PR add 'range = xrange' in files like `python/pyspark/accumulators.py` `python/pyspark/heapq3.py` `python/pyspark/heapq3.py` etc. So those file may run faster in Python2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13657: [SPARK-15939][ML][PySpark] Clarify ml.linalg usag...
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/13657 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14368: [SPARK-16734][EXAMPLES][SQL] Revise examples of all lang...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14368 **[Test build #63103 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63103/consoleFull)** for PR 14368 at commit [`e1a521f`](https://github.com/apache/spark/commit/e1a521fce6221629634b7f85335dc0ae568dd4c0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14368: [SPARK-16734][EXAMPLES][SQL] Revise examples of all lang...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14368 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14368: [SPARK-16734][EXAMPLES][SQL] Revise examples of all lang...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14368 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63103/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14448: [Spark-16579][SparkR] Add install.spark function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14448 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14448: [Spark-16579][SparkR] Add install.spark function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14448 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63102/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14448: [Spark-16579][SparkR] Add install.spark function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14448 **[Test build #63102 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63102/consoleFull)** for PR 14448 at commit [`370cc5d`](https://github.com/apache/spark/commit/370cc5d567ab7d1568d64b2d3b3b63af5f22725f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14368: [SPARK-16734][EXAMPLES][SQL] Revise examples of all lang...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14368 **[Test build #63103 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63103/consoleFull)** for PR 14368 at commit [`e1a521f`](https://github.com/apache/spark/commit/e1a521fce6221629634b7f85335dc0ae568dd4c0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13893: [SPARK-14172][SQL] Hive table partition predicate not pa...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/13893 ping @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14368: [SPARK-16734][EXAMPLES][SQL] Revise examples of all lang...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14368 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14368: [SPARK-16734][EXAMPLES][SQL] Revise examples of all lang...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14368 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14401: [SPARK-16793][SQL]Set the temporary warehouse path to sc...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/14401 @rxin As @yhuai previously addressed, this change benifits in following cases: 1. Right now, we set the warehouse path to the default one firstly, and then we override the setting in `TestHiveSharedState` when we create `metadataHive`. This flow is not easy to follow and can introduce confusion in debugging. 2. Removing the field of `warehousePath` will be the first step in removing `TestHiveSessionState` and `TestHiveSharedState`, so that we can really test the reflection logic based on the setting of `CATALOG_IMPLEMENTATION`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14427: [SPARK-16818] Exchange reuse incorrectly reuses scans ov...
Github user ericl commented on the issue: https://github.com/apache/spark/pull/14427 Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14427: [SPARK-16818] Exchange reuse incorrectly reuses s...
Github user ericl closed the pull request at: https://github.com/apache/spark/pull/14427 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14427: [SPARK-16818] Exchange reuse incorrectly reuses scans ov...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14427 @ericl can you close this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14427: [SPARK-16818] Exchange reuse incorrectly reuses scans ov...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14427 Merging in branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14448: [Spark-16579][SparkR] Add install.spark function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14448 **[Test build #63102 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63102/consoleFull)** for PR 14448 at commit [`370cc5d`](https://github.com/apache/spark/commit/370cc5d567ab7d1568d64b2d3b3b63af5f22725f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14448: [Spark-16579][SparkR] Add install.spark function
GitHub user junyangq opened a pull request: https://github.com/apache/spark/pull/14448 [Spark-16579][SparkR] Add install.spark function ## What changes were proposed in this pull request? Add an `install.spark` function to the SparkR package. User can run `install.spark()` to install Spark to a local directory within R if not existing one found. It searches for installation files in three ways, in the following order. 1. user provided mirror site in `mirrorUrl` 2. mirror site suggested from apache website 3. hardcoded backup option ## How was this patch tested? Manual tests. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/junyangq/spark SPARK-16579-2.0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14448.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14448 commit 0b676314a13a8a796ee45baf99f4bc6d936d01d5 Author: Junyang QianDate: 2016-07-29T22:24:07Z Add install.spark function to SparkR Users can download and install Spark package inside R console --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14446 **[Test build #63101 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63101/consoleFull)** for PR 14446 at commit [`1054b74`](https://github.com/apache/spark/commit/1054b74f18193378942b7fde26df36e06bff765e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user clockfly commented on the issue: https://github.com/apache/spark/pull/14446 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11157: [SPARK-11714][Mesos] Make Spark on Mesos honor port rest...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11157 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63099/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11157: [SPARK-11714][Mesos] Make Spark on Mesos honor port rest...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11157 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11157: [SPARK-11714][Mesos] Make Spark on Mesos honor port rest...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11157 **[Test build #63099 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63099/consoleFull)** for PR 11157 at commit [`efc1d18`](https://github.com/apache/spark/commit/efc1d183c2f04bd1bd71f2b5425432a588b68caa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14446 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14446 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63097/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14440: [SPARK-16835][ML] add training data unpersist handling w...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14440 sounds reasonable... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14396: [SPARK-16787] SparkContext.addFile() should not throw if...
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/14396 @zsxwing, I meant to describe what happens on executors in the following scenario: - `addFile(foo)` is called for the first time at `timestamp = 1` - A task runs on an executor and downloads the copy of the file added at `timestamp = 1`. - By default, the [file fetch cache is enabled](https://github.com/apache/spark/blob/2eedc00b04ef8ca771ff64c4f834c25f835f5f44/core/src/main/scala/org/apache/spark/util/Utils.scala#L432) and filenames in that cache incorporate timestamps. Thus, this file will be downloaded to a file named `$timestamp_cache`. - `addFile(foo)` is called a second time at `timestamp = 2` and the same file is passed to it. - A task runs on an executor and discovers that the added file's timestamp (2) is newer than the timestamp of the file that it has already downloaded (1), so it tries to fetch files again: - Because the file with the newer timestamp is not present in the fetch file cache, a new copy of the file will be downloaded. **<--- this is the second download I was referring to** If the fetch file cache is disabled, on the other hand, then we directly call [`doFetchFile`](https://github.com/apache/spark/blob/2eedc00b04ef8ca771ff64c4f834c25f835f5f44/core/src/main/scala/org/apache/spark/util/Utils.scala#L617) which, in turn, will call `downloadFile()`, which [downloads the file to a temporary file](https://github.com/apache/spark/blob/2eedc00b04ef8ca771ff64c4f834c25f835f5f44/core/src/main/scala/org/apache/spark/util/Utils.scala#L499) before considering whether to overwrite an existing file. In either case, it looks like re-adding a file with a new timestamp will trigger downloads on the executors and those downloads will be unnecessary if the file's contents are unchanged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14440: [SPARK-16835][ML] add training data unpersist han...
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/14440 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14446 **[Test build #63097 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63097/consoleFull)** for PR 14446 at commit [`1054b74`](https://github.com/apache/spark/commit/1054b74f18193378942b7fde26df36e06bff765e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13778 ping @cloud-fan again, this is waiting for a while. Do you have time to look at again? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...
Github user keypointt commented on the issue: https://github.com/apache/spark/pull/14447 hi @mengxr , could you please tell me how to debug R wrapper? Thanks a lot I tried to read documentation and google, but cannot figure it out myself. From SparkR console, the error message is too vague as below, and I tried to `tail -f` spark logs but no error messages, and also I tried to create a `RWrapperSuite` but the class is private and cannot be accessed. ``` > model <- spark.mlp(irisDF, ~ Sepal_Length + Sepal_Width + Petal_Length + Petal_Width, blockSize=128, initialWeights=seq(1, 9, by = 2), layers=3, solver='LBFGS', seed=1234L, maxIter=100, tol=0.5, stepSize=1) 16/08/01 17:16:38 ERROR RBackendHandler: fit on org.apache.spark.ml.r.MultilayerPerceptronClassifierWrapper failed Error in invokeJava(isStatic = TRUE, className, methodName, ...) : In addition: Warning message: In if (is.na(object)) { : the condition has length > 1 and only the first element will be used ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14442: [SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14442 Can you add an end-to-end test for this in SQLQuerySuite? It's not a great place but we will refactor it soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14447 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14447 **[Test build #63100 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63100/consoleFull)** for PR 14447 at commit [`04f7fed`](https://github.com/apache/spark/commit/04f7fed0682548068d4bfddebce7bed276432a4d). * This patch **fails R style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14447 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63100/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14447 **[Test build #63100 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63100/consoleFull)** for PR 14447 at commit [`04f7fed`](https://github.com/apache/spark/commit/04f7fed0682548068d4bfddebce7bed276432a4d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14434: [SPARK-16828][SQL] remove MaxOf and MinOf
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14434 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptro...
GitHub user keypointt opened a pull request: https://github.com/apache/spark/pull/14447 [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Classifier wrapper in SparkR https://issues.apache.org/jira/browse/SPARK-16445 ## What changes were proposed in this pull request? Create Multilayer Perceptron Classifier wrapper in SparkR ## How was this patch tested? Tested manually on local machine You can merge this pull request into a Git repository by running: $ git pull https://github.com/keypointt/spark SPARK-16445 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14447.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14447 commit 400a5ec398bbae0f50f3220cec2acc20bc8b1d6a Author: Xin RenDate: 2016-07-15T19:20:17Z [SPARK-16445] add to r method list commit 2e3fe27bd7f6109cacbe3e6b8a675c4034cc11e4 Author: Xin Ren Date: 2016-07-16T17:54:32Z Merge branch 'master' into SPARK-16445 commit 4d822b0d8e67adf264ff2019766ddbf3221934e8 Author: Xin Ren Date: 2016-07-18T06:26:50Z Merge branch 'master' into SPARK-16445 commit ce6f74d5020282762adbd96e101ed95ae9abf92c Author: Xin Ren Date: 2016-07-19T06:56:29Z [SPARK-16445] add R method for monmlp commit eff4097ffd9d66b7f226b07a8b6e89c4f7c15336 Author: Xin Ren Date: 2016-07-20T06:26:46Z [SPARK-16445] add fit() in r wrapper commit fb87bd58f1490356d9c0b99d791194e1a18f03e6 Author: Xin Ren Date: 2016-07-22T05:59:45Z [SPARK-16445] model exists already, remove added ones commit 0ed2280d7ca7d25b86411b3d97fa3e85353b19b1 Author: Xin Ren Date: 2016-07-22T06:11:04Z [SPARK-16445] rename, monmlp, to, mlp commit 2d1d1400fc168a7628c62df88d8267ca12eceb0a Author: Xin Ren Date: 2016-07-22T06:22:56Z [SPARK-16445] fix styles commit bddde5c09bd65a2608c4287c9461b61c3598efab Author: Xin Ren Date: 2016-07-22T06:42:07Z [SPARK-16445] r style fix commit fc3b9492f6333e1049d2ea483e141f442a152098 Author: Xin Ren Date: 2016-07-22T06:51:12Z [SPARK-16445] missed json4s import commit f3aa8fd75a67c557e193a9f030b07001781097a1 Author: Xin Ren Date: 2016-07-26T22:04:16Z Merge branch 'master' into SPARK-16445 commit 61c8122a2584dafb581b045bd3cd7c9742022786 Author: Xin Ren Date: 2016-07-26T22:04:31Z Merge branch 'SPARK-16445' of https://github.com/keypointt/spark into SPARK-16445 commit 07638f4f310469109ca766d14916a77960f80987 Author: Xin Ren Date: 2016-07-27T00:10:04Z [SPARK-16445] correct r method name commit 79675ad567e16494d1d2445b773dc6fd3649bc7c Author: Xin Ren Date: 2016-07-27T01:03:34Z [SPARK-16445] tmp save commit 2d66705a4f26e2823c7102032f596d74a278bc68 Author: Xin Ren Date: 2016-07-29T22:36:39Z [SPARK-16445] fix model name commit b7c4f0cd4870054eb628b333c016fabea37eb957 Author: Xin Ren Date: 2016-07-30T00:50:55Z [SPARK-16445] fix parameters commit 52c23106d1623a6f54fb7ed2eae842988e8c7bbf Author: Xin Ren Date: 2016-08-01T22:19:29Z Merge branch 'master' into SPARK-16445 commit 04f7fed0682548068d4bfddebce7bed276432a4d Author: Xin Ren Date: 2016-08-02T00:52:27Z [SPARK-16445] r test failing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14434: [SPARK-16828][SQL] remove MaxOf and MinOf
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/14434 Thanks. Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Mode...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14392#discussion_r73079016 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +659,106 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + +#' Multivariate Gaussian Mixture Model (GMM) +#' +#' Fits multivariate gaussian mixture model against a Spark DataFrame, similarly to R's +#' mvnormalmixEM(). Users can call \code{summary} to print a summary of the fitted model, +#' \code{predict} to make predictions on new data, and \code{write.ml}/\code{read.ml} +#' to save/load fitted models. +#' +#' @param data SparkDataFrame for training +#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', '.', ':', '+', and '-'. +#'Note that the response variable of formula is empty in spark.mvnormalmixEM. +#' @param k Number of independent Gaussians in the mixture model. +#' @param maxIter Maximum iteration number +#' @param tol The convergence tolerance +#' @aliases spark.mvnormalmixEM,SparkDataFrame,formula-method +#' @return \code{spark.mvnormalmixEM} returns a fitted multivariate gaussian mixture model +#' @rdname spark.mvnormalmixEM +#' @name spark.mvnormalmixEM +#' @export +#' @examples +#' \dontrun{ +#' sparkR.session() +#' library(mvtnorm) +#' set.seed(100) +#' a <- rmvnorm(4, c(0, 0)) +#' b <- rmvnorm(6, c(3, 4)) +#' data <- rbind(a, b) +#' df <- createDataFrame(as.data.frame(data)) +#' model <- spark.mvnormalmixEM(df, ~ V1 + V2, k = 2) +#' summary(model) +#' +#' # fitted values on training data +#' fitted <- predict(model, df) +#' head(select(fitted, "V1", "prediction")) +#' +#' # save fitted model to input path +#' path <- "path/to/model" +#' write.ml(model, path) +#' +#' # can also read back the saved model and print +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.mvnormalmixEM since 2.1.0 +#' @seealso mixtools: \url{https://cran.r-project.org/web/packages/mixtools/} +#' @seealso \link{predict}, \link{read.ml}, \link{write.ml} +setMethod("spark.mvnormalmixEM", signature(data = "SparkDataFrame", formula = "formula"), + function(data, formula, k = 2, maxIter = 100, tol = 0.01) { +formula <- paste(deparse(formula), collapse = "") +jobj <- callJStatic("org.apache.spark.ml.r.GaussianMixtureWrapper", "fit", data@sdf, +formula, as.integer(k), as.integer(maxIter), tol) +return(new("GaussianMixtureModel", jobj = jobj)) + }) + +# Get the summary of a multivariate gaussian mixture model + +#' @param object A fitted gaussian mixture model +#' @return \code{summary} returns the model's lambda, mu, sigma and posterior +#' @rdname spark.mvnormalmixEM --- End diff -- You can also run the `check-cran.sh` script in `R/` and see if there are any warnings related to the methods being added in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13647: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/13647 ping @mengxr @jkbradley @yanboliang Can you give me some comments on this PR? I can start improving it for 2.1+. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14433: [SPARK-16829][SparkR]:sparkR sc.setLogLevel doesn't work
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/14433 @felixcheung I will try to retrieve terminal/shell type before printing out the message. I will update the PR if I can find a way of doing that. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14445: [SPARK-16320] [SQL] Fix performance regression for parqu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14445 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63096/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14445: [SPARK-16320] [SQL] Fix performance regression for parqu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14445 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14445: [SPARK-16320] [SQL] Fix performance regression for parqu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14445 **[Test build #63096 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63096/consoleFull)** for PR 14445 at commit [`272fb81`](https://github.com/apache/spark/commit/272fb8100f1861d78f78d7bc34e1ff68284b773a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14368: [SPARK-16734][EXAMPLES][SQL] Revise examples of a...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14368#discussion_r73076628 --- Diff: examples/src/main/r/RSparkSQLExample.R --- @@ -18,31 +18,43 @@ library(SparkR) # $example on:init_session$ -sparkR.session(appName = "MyApp", sparkConfig = list(spark.executor.memory = "1g")) +sparkR.session(appName = "MyApp", sparkConfig = list(spark.some.config.option = "some-value")) --- End diff -- It's just an example for how to set extra configuration options. It's not read anywhere. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11157: [SPARK-11714][Mesos] Make Spark on Mesos honor port rest...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11157 **[Test build #63099 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63099/consoleFull)** for PR 11157 at commit [`efc1d18`](https://github.com/apache/spark/commit/efc1d183c2f04bd1bd71f2b5425432a588b68caa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11157: [SPARK-11714][Mesos] Make Spark on Mesos honor port rest...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11157 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11157: [SPARK-11714][Mesos] Make Spark on Mesos honor port rest...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11157 **[Test build #63098 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63098/consoleFull)** for PR 11157 at commit [`a8e828f`](https://github.com/apache/spark/commit/a8e828fecb88c254a89ee82e68c9aa548969dfdb). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11157: [SPARK-11714][Mesos] Make Spark on Mesos honor port rest...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11157 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63098/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11157: [SPARK-11714][Mesos] Make Spark on Mesos honor po...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/11157#discussion_r73074211 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala --- @@ -356,4 +374,233 @@ private[mesos] trait MesosSchedulerUtils extends Logging { sc.conf.getTimeAsSeconds("spark.mesos.rejectOfferDurationForReachedMaxCores", "120s") } + /** + * Checks executor ports if they are within some range of the offered list of ports ranges, + * + * @param sc the Spark Context + * @param ports the list of ports to check + * @return true if ports are within range false otherwise + */ + protected def checkPorts(sc: SparkContext, ports: List[(Long, Long)]): Boolean = { + +def checkIfInRange(port: Long, ps: List[(Long, Long)]): Boolean = { + ps.exists(r => r._1 <= port & r._2 >= port) +} + +val portsToCheck = ManagedPorts.getPortValues(sc.conf) +val nonZeroPorts = portsToCheck.filter(_ != 0) +val withinRange = nonZeroPorts.forall(p => checkIfInRange(p, ports)) +// make sure we have enough ports to allocate per offer +ports.map(r => r._2 - r._1 + 1).sum >= portsToCheck.size && withinRange + } + + /** + * Partitions port resources. + * + * @param conf the spark config + * @param ports the ports offered + * @return resources left, port resources to be used and the list of assigned ports + */ + def partitionPorts( + conf: SparkConf, + ports: List[Resource]) +: (List[Resource], List[Resource], List[Long]) = { +val taskPortRanges = getRangeResourceWithRoleInfo(ports.asJava, "ports") +val portsToCheck = ManagedPorts.getPortValues(conf) +val nonZeroPorts = portsToCheck.filter(_ != 0) +// reserve non zero ports first +val nonZeroResources = reservePorts(taskPortRanges, nonZeroPorts) +// reserve actual port numbers for zero ports - not set by the user +val numOfZeroPorts = portsToCheck.count(_ == 0) +val randPorts = pickRandomPortsFromRanges(nonZeroResources._1, numOfZeroPorts) +val zeroResources = reservePorts(nonZeroResources._1, randPorts) +val (portResourcesLeft, portResourcesToBeUsed) = + createResources(nonZeroResources, zeroResources) +(portResourcesLeft, portResourcesToBeUsed, nonZeroPorts ++ randPorts) + } + + private object ManagedPorts { +val portNames = List("spark.executor.port", "spark.blockManager.port") + +def getPortValues(conf: SparkConf): List[Long] = { + portNames.map(conf.getLong(_, 0)) +} + } + + private def createResources( --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11157: [SPARK-11714][Mesos] Make Spark on Mesos honor port rest...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11157 **[Test build #63098 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63098/consoleFull)** for PR 11157 at commit [`a8e828f`](https://github.com/apache/spark/commit/a8e828fecb88c254a89ee82e68c9aa548969dfdb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14446 **[Test build #63097 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63097/consoleFull)** for PR 14446 at commit [`1054b74`](https://github.com/apache/spark/commit/1054b74f18193378942b7fde26df36e06bff765e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14434: [SPARK-16828][SQL] remove MaxOf and MinOf
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14434#discussion_r73073652 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -662,10 +662,6 @@ object NullPropagation extends Rule[LogicalPlan] { case e @ Substring(_, Literal(null, _), _) => Literal.create(null, e.dataType) case e @ Substring(_, _, Literal(null, _)) => Literal.create(null, e.dataType) - // MaxOf and MinOf can't do null propagation - case e: MaxOf => e - case e: MinOf => e --- End diff -- no, we put `MaxOf` and `MinOf` here because they are a special case of `BinaryArithmetic`, but `Greatest` and `Least` is not binary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14446: [SPARK-16841][SQL] Improves the row level metrics...
GitHub user clockfly opened a pull request: https://github.com/apache/spark/pull/14446 [SPARK-16841][SQL] Improves the row level metrics performance when reading Parquet table ## What changes were proposed in this pull request? When reading from Parquet table, Spark updates row level metrics like recordsRead, bytesRead. The implementation is not very efficient. It may take 20% of read them to update these metrics. Test benchmark: ``` // Generates parquet table with nested columns spark.range(1).select(struct($"id").as("nc")).write.parquet("/tmp/data4") def time[R](block: => R): Long = { val t0 = System.nanoTime() val result = block// call-by-name val t1 = System.nanoTime() println("Elapsed time: " + (t1 - t0)/100 + "ms") (t1 - t0)/100 } val x = ((0 until 20).toList.map(x => time(spark.read.parquet("/tmp/data4").filter($"nc.id" < 100).collect(.sum/20 ``` ## How was this patch tested? Exisiting unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/clockfly/spark improve_metrics_performance Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14446.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14446 commit 1054b74f18193378942b7fde26df36e06bff765e Author: Sean ZhongDate: 2016-08-01T23:35:30Z improve row level metrics performance --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14411: [SPARK-16804][SQL] Correlated subqueries containing LIMI...
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14411 @hvanhovell, First, my apologies for delaying the replies. I am travelling this week, only getting spontaneous connections. Thank you for your explanation of the implementation and the reason behind the choice of the implementation. It is very helpful for a beginner like me. My bad! What I meant in my previous comment on rewriting of subqueries to join is actually the moving of the positions of the correlated predicates from their original positions to outside of the scopes of subqueries, specifically, the call to the function pullOutCorrelatedPredicates() -- I hope I got it right this time. I see this as one of the root causes of many problems. Bear with me, I don't have a good solution as I am still getting myself familiar with the code. Here is an example of the problems, in my opinion. With the rewrite, we cannot distinct between the EXISTS form and IN form of the original SQL. select * from t1 where exists (select 1 from t2 where t1.c1=t2.c2) -and- select * from t1 where t1.c1 in (select t2.c2 from t2) are represented after Analysis phase. This does not have issue because they are semantically equivalent. However, when we add the NOT in select * from t1 where not exists (select 1 from t2 where t1.c1=t2.c2) -and- select * from t1 where t1.c1 not in (select t2.c2 from t2) are NOT semantically equivalent when T2.C2 can produce NULL values. Lastly, your comment on the operator SAMPLE seems right. I will give it shot on adding it to this PR. Thanks again for your patience. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14445: [SPARK-16320] [SQL] Fix performance regression for parqu...
Github user clockfly commented on the issue: https://github.com/apache/spark/pull/14445 @gatorsmile Thanks! updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14176: [SPARK-16525][SQL] Enable Row Based HashMap in Ha...
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/14176#discussion_r73070005 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -279,9 +280,15 @@ case class HashAggregateExec( .map(_.asInstanceOf[DeclarativeAggregate]) private val bufferSchema = StructType.fromAttributes(aggregateBufferAttributes) - // The name for Vectorized HashMap - private var vectorizedHashMapTerm: String = _ - private var isVectorizedHashMapEnabled: Boolean = _ + // The name for Fast HashMap + private var fastHashMapTerm: String = _ + // whether vectorized hashmap or row based hashmap is enabled + // we make sure that at most one of the two flags is true + // i.e., assertFalse(isVectorizedHashMapEnabled && isRowBasedHashMapEnabled) + private var isVectorizedHashMapEnabled: Boolean = false + private var isRowBasedHashMapEnabled: Boolean = false + // auxiliary flag, true if any of two above is true + private var isFastHashMapEnabled: Boolean = false --- End diff -- Sure, what I meant was that we can even initialize it with `isVectorizedHashMapEnabled || isRowBasedHashMapEnabled` to make the implied semantics clear. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14384 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63095/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14384 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14445: [SPARK-16320][SQL] Fix performance regression for parque...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14445 Maybe, we can correct the PR description and make it more accurate. This PR avoids the extra memory copy when the vectorized parquet record reader is not being used for reading a non-partitioned Parquet table. One of the typical case is the parquet table with non atomic types, including null, UDTs, arrays, structs, and maps. Another case is when users set `spark.sql.parquet.enableVectorizedReader` to `false`. Is my understanding correct? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14384 **[Test build #63095 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63095/consoleFull)** for PR 14384 at commit [`119e576`](https://github.com/apache/spark/commit/119e57601b6bd0b9aa0ad29ca20624f18f13a362). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14445: [SPARK-16320][SQL] Fix performance regression for parque...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14445 **[Test build #63096 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63096/consoleFull)** for PR 14445 at commit [`272fb81`](https://github.com/apache/spark/commit/272fb8100f1861d78f78d7bc34e1ff68284b773a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14442: [SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14442 LGTM FYI, MySQL and PostgreSQL support NOW as a synonym of CURRENT_TIMESTAMP --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14384 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14384 **[Test build #63094 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63094/consoleFull)** for PR 14384 at commit [`b376dfb`](https://github.com/apache/spark/commit/b376dfb35dc8ebb90804264dd5683514a3166d9e). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14384 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63094/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14445: [SPARK-16320][SQL] Fix performance regression for...
GitHub user clockfly opened a pull request: https://github.com/apache/spark/pull/14445 [SPARK-16320][SQL] Fix performance regression for parquet table with nested fields ## What changes were proposed in this pull request? For non-partitioned parquet table with nested column, Spark 2.0 adds an extra unnecessary memory copy to append partition values for each row. By fixing this bug, we get about 30% performance gain in test case like this: ``` // Generates parquet table with nested columns spark.range(1).select(struct($"id").as("nc")).write.parquet("/tmp/data4") val t0 = System.nanoTime() val x = ((0 until 20).toList.map(x => time(spark.read.parquet("/tmp/data4").filter($"nc.id" < 100).collect(.sum/20 println("Elapsed time: " + (System.nanoTime() - t0)/100 + "ms") ``` ## How was this patch tested? Existing unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/clockfly/spark fix_parquet_regression_2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14445.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14445 commit 272fb8100f1861d78f78d7bc34e1ff68284b773a Author: Sean ZhongDate: 2016-08-01T04:29:44Z fix parquet_regression --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14439: [SPARK-16714][SPARK-16735][SPARK-16646] array, ma...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14439#discussion_r73064350 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -157,6 +145,26 @@ object TypeCoercion { }) } + /** + * Similar to [[findWiderCommonType]], but can't promote to string. + */ + private def findWiderTypeWithoutStringPromotion(types: Seq[DataType]): Option[DataType] = { --- End diff -- It is weird that its name is `findWiderTypeWithoutStringPromotion` because `findTightestCommonTypeOfTwo` is used inside. Also, let's add more docs to this method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14439: [SPARK-16714][SPARK-16735][SPARK-16646] array, map, grea...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/14439 It will be good to summarize the behaviors of other systems in the description. Let's also explain the behavioral change of this pr in the description. So, others can understand its implication. Also, I am wondering if we can change the behavior of `DecimalPrecision.widerDecimalType`. Right now, `widerDecimalType` will truncate the integral part, which is not intuitive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14384 **[Test build #63095 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63095/consoleFull)** for PR 14384 at commit [`119e576`](https://github.com/apache/spark/commit/119e57601b6bd0b9aa0ad29ca20624f18f13a362). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14212: [SPARK-16558][Examples][MLlib] examples/mllib/LDAExample...
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/14212 @MLnick They serve different purpose. This one is for users who have built their tools upon it. The `LatentDirichletAllocationExample` is for ML docs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/1 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...
Github user eyalfa commented on the issue: https://github.com/apache/spark/pull/1 @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...
GitHub user eyalfa opened a pull request: https://github.com/apache/spark/pull/1 [SPARK-16839] [SQL] redundant aliases after cleanupAliases ## What changes were proposed in this pull request? a failing test, soon to add a proposed fix ## How was this patch tested? running the analysis suite, making sure added test fails while existing tests are still passing. (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/eyalfa/spark SPARK-16839_redundant_aliases_after_cleanupAliases Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1 commit 313d584532a4435e02e39e574070354cdef240ea Author: eyal farago Date: 2016-08-01T21:51:45Z SPARK-16839_redundant_aliases_after_cleanupAliases: failing test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14384 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14384 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63093/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14384 **[Test build #63093 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63093/consoleFull)** for PR 14384 at commit [`b376dfb`](https://github.com/apache/spark/commit/b376dfb35dc8ebb90804264dd5683514a3166d9e). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14441: [SPARK-16837] [SQL] TimeWindow incorrectly drops slideDu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14441 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63090/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org