[GitHub] spark issue #14175: [SPARK-16522][MESOS] Spark application throws exception ...
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14175 @mgummelt, regression test case added. Not sure it is the expected one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14175: [SPARK-16522][MESOS] Spark application throws exception ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14175 **[Test build #63312 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63312/consoleFull)** for PR 14175 at commit [`1a8b8e6`](https://github.com/apache/spark/commit/1a8b8e606c051f7f9e3da78d51cd92b69e8f84d9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14477: [SPARK-16870][docs]Summary:add "spark.sql.broadca...
Github user biglobster commented on a diff in the pull request: https://github.com/apache/spark/pull/14477#discussion_r73782824 --- Diff: docs/sql-programming-guide.md --- @@ -790,6 +790,15 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession + --- End diff -- done. thx :) for you suggestion --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Model wrapp...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14392 @felixcheung @junyangq Any thoughts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14175: [SPARK-16522][MESOS] Spark application throws exception ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14175 **[Test build #63312 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63312/consoleFull)** for PR 14175 at commit [`1a8b8e6`](https://github.com/apache/spark/commit/1a8b8e606c051f7f9e3da78d51cd92b69e8f84d9). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14175: [SPARK-16522][MESOS] Spark application throws exception ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14175 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14175: [SPARK-16522][MESOS] Spark application throws exception ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14175 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63312/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14504: [SPARK-16409] [SQL] regexp_extract with optional groups ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14504 **[Test build #63313 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63313/consoleFull)** for PR 14504 at commit [`b835bd3`](https://github.com/apache/spark/commit/b835bd3d5fe4c736b4c92b3486f2344a83d09438). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14504: [SPARK-16409] [SQL] regexp_extract with optional groups ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14504 **[Test build #63313 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63313/consoleFull)** for PR 14504 at commit [`b835bd3`](https://github.com/apache/spark/commit/b835bd3d5fe4c736b4c92b3486f2344a83d09438). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14504: [SPARK-16409] [SQL] regexp_extract with optional groups ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14504 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14504: [SPARK-16409] [SQL] regexp_extract with optional groups ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14504 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63313/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14109: [SPARK-16404][ML] LeastSquaresAggregators serializes unn...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14109 The current fix for broadcast variable destroy is ok. LGTM. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14519: [SPARK-16933] [ML] Fix AFTAggregator in AFTSurviv...
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/14519 [SPARK-16933] [ML] Fix AFTAggregator in AFTSurvivalRegression serializes unnecessary data. ## What changes were proposed in this pull request? Similar to ```LeastSquaresAggregator``` in #14109, ```AFTAggregator``` used for ```AFTSurvivalRegression``` ends up serializing the ```parameters``` and ```featuresStd```, which is not necessary and can cause performance issues for high dimensional data. This patch removes this serialization. This PR is highly inspired by #14109. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark spark-16933 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14519.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14519 commit d152a3a32ec08743b026ab5bd632b22909c6aa3f Author: Yanbo Liang Date: 2016-08-06T13:37:37Z Fix AFTAggregator in AFTSurvivalRegression serializes unnecessary data. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14519: [SPARK-16933] [ML] Fix AFTAggregator in AFTSurvivalRegre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14519 **[Test build #63314 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63314/consoleFull)** for PR 14519 at commit [`d152a3a`](https://github.com/apache/spark/commit/d152a3a32ec08743b026ab5bd632b22909c6aa3f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14519: [SPARK-16933] [ML] Fix AFTAggregator in AFTSurvivalRegre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14519 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14519: [SPARK-16933] [ML] Fix AFTAggregator in AFTSurvivalRegre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14519 **[Test build #63314 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63314/consoleFull)** for PR 14519 at commit [`d152a3a`](https://github.com/apache/spark/commit/d152a3a32ec08743b026ab5bd632b22909c6aa3f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14519: [SPARK-16933] [ML] Fix AFTAggregator in AFTSurvivalRegre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14519 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63314/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14519: [SPARK-16933] [ML] Fix AFTAggregator in AFTSurvivalRegre...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14519 cc @sethah @dbtsai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun ...
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14520 [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoid redundant serielization ## What changes were proposed in this pull request? Improve LogisticCostFun, replace closure var `localFeaturesStd` with broadcast var, so it can avoid redundant serialization for each calling. and make several other modifications to make it similar to the patterns in https://github.com/apache/spark/pull/14109 ## How was this patch tested? Existing test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/WeichenXu123/spark improve_logistic_regression_costfun Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14520.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14520 commit 417aa1ea623b10d0d7b9f13b3d3f65fa8ac64ce8 Author: WeichenXu Date: 2016-08-05T01:28:18Z update --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14520 cc @sethah @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14519: [SPARK-16933] [ML] Fix AFTAggregator in AFTSurviv...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14519#discussion_r73787877 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala --- @@ -583,19 +591,22 @@ private class AFTAggregator( private class AFTCostFun( data: RDD[AFTPoint], fitIntercept: Boolean, -featuresStd: Array[Double]) extends DiffFunction[BDV[Double]] { +bcFeaturesStd: Broadcast[Array[Double]]) extends DiffFunction[BDV[Double]] { override def calculate(parameters: BDV[Double]): (Double, BDV[Double]) = { +val bcParameters = data.context.broadcast(parameters) + val aftAggregator = data.treeAggregate( - new AFTAggregator(parameters, fitIntercept, featuresStd))( + new AFTAggregator(bcParameters, fitIntercept, bcFeaturesStd))( seqOp = (c, v) => (c, v) match { case (aggregator, instance) => aggregator.add(instance) }, combOp = (c1, c2) => (c1, c2) match { case (aggregator1, aggregator2) => aggregator1.merge(aggregator2) }) --- End diff -- No need (c, v) match {...} pattern, directly use `(aggregator, instance) => aggregator.add(instance)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14520 **[Test build #63315 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63315/consoleFull)** for PR 14520 at commit [`417aa1e`](https://github.com/apache/spark/commit/417aa1ea623b10d0d7b9f13b3d3f65fa8ac64ce8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14519: [SPARK-16933] [ML] Fix AFTAggregator in AFTSurvivalRegre...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14519 Let's put this into https://github.com/apache/spark/pull/14109 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14520 Let's put this into https://github.com/apache/spark/pull/14109 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14520 Oh..its another algorithm and there are several different details so in order to make it clear I create a separated PR to discuss it , thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...
Github user eyalfa commented on a diff in the pull request: https://github.com/apache/spark/pull/1#discussion_r73788331 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1101,7 +1101,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { * Create a [[CreateStruct]] expression. */ override def visitRowConstructor(ctx: RowConstructorContext): Expression = withOrigin(ctx) { -CreateStruct(ctx.expression.asScala.map(expression)) + CreateStruct(ctx.expression.asScala.map(expression)).toCreateNamedStruct --- End diff -- @cloud-fan this process tow-constructor expression of the form `(1,"a") as col1(a,b)`. I could basically leave this as a `CreateStruct` but than I'd had to do something like transformDown in `visitInlineTable` which is basically the reason for this mess. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14520 **[Test build #63315 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63315/consoleFull)** for PR 14520 at commit [`417aa1e`](https://github.com/apache/spark/commit/417aa1ea623b10d0d7b9f13b3d3f65fa8ac64ce8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14520 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14520 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63315/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...
Github user zasdfgbnm closed the pull request at: https://github.com/apache/spark/pull/14231 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14489: [MINOR][SparkR] R API documentation for "coltypes...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14489#discussion_r73789927 --- Diff: R/pkg/R/DataFrame.R --- @@ -41,7 +41,7 @@ setOldClass("structType") #'\dontrun{ #' sparkR.session() #' df <- createDataFrame(faithful) -#'} +#' } --- End diff -- Yeah lets not do them in this PR. If we want to fix this lets do them in a separate PR ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14521: [SPARK-16935] [SQL] Verification of Function-rela...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/14521 [SPARK-16935] [SQL] Verification of Function-related ExternalCatalog APIs ### What changes were proposed in this pull request? Function-related `HiveExternalCatalog` APIs do not have enough verification logics. After the PR, `HiveExternalCatalog` and `InMemoryCatalog` become consistent in the error handling. For example, below is the exception we got when calling `renameFunction`. ``` 15:13:40.369 WARN org.apache.hadoop.hive.metastore.ObjectStore: Failed to get database db1, returning NoSuchObjectException 15:13:40.377 WARN org.apache.hadoop.hive.metastore.ObjectStore: Failed to get database db2, returning NoSuchObjectException 15:13:40.739 ERROR DataNucleus.Datastore.Persist: Update of object "org.apache.hadoop.hive.metastore.model.MFunction@205629e9" using statement "UPDATE FUNCS SET FUNC_NAME=? WHERE FUNC_ID=?" failed : org.apache.derby.shared.common.error.DerbySQLIntegrityConstraintViolationException: The statement was aborted because it would have caused a duplicate key value in a unique or primary key constraint or unique index identified by 'UNIQUEFUNCTION' defined on 'FUNCS'. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) ``` ### How was this patch tested? Improved the existing test cases to check whether the messages are right. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark functionChecking Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14521.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14521 commit e53809aeccb936ade39abbbaab408fccbe347b7f Author: gatorsmile Date: 2016-08-04T22:45:09Z fix commit 58cba4ba3658d2b0c5bb7bf7b0bfe929bec1aafd Author: gatorsmile Date: 2016-08-06T15:32:22Z Merge remote-tracking branch 'upstream/master' into functionChecking --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14521: [SPARK-16935] [SQL] Verification of Function-related Ext...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14521 **[Test build #63316 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63316/consoleFull)** for PR 14521 at commit [`58cba4b`](https://github.com/apache/spark/commit/58cba4ba3658d2b0c5bb7bf7b0bfe929bec1aafd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame from dict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14469 **[Test build #3205 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3205/consoleFull)** for PR 14469 at commit [`c0ad866`](https://github.com/apache/spark/commit/c0ad8668ba22e51b07ba08b8e19c312783cd1b87). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame fr...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/14469#discussion_r73793829 --- Diff: python/pyspark/sql/context.py --- @@ -253,6 +254,8 @@ def createDataFrame(self, data, schema=None, samplingRatio=None): If it's not a :class:`pyspark.sql.types.StructType`, it will be wrapped into a :class:`pyspark.sql.types.StructType` and each record will also be wrapped into a tuple. + Added verifySchema. --- End diff -- +1. I wasn't aware of this, but it looks like it's possible to have multiple `versionchanged` directives in the same docstring. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame fr...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/14469#discussion_r73793841 --- Diff: python/pyspark/sql/session.py --- @@ -432,14 +430,9 @@ def createDataFrame(self, data, schema=None, samplingRatio=None): ``byte`` instead of ``tinyint`` for :class:`pyspark.sql.types.ByteType`. We can also use ``int`` as a short name for ``IntegerType``. :param samplingRatio: the sample ratio of rows used for inferring +:param verifySchema: verify data types of every row against schema. --- End diff -- +1 on also adding a `versionchanged` directive for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame fr...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/14469#discussion_r73793890 --- Diff: python/pyspark/sql/session.py --- @@ -432,14 +430,9 @@ def createDataFrame(self, data, schema=None, samplingRatio=None): ``byte`` instead of ``tinyint`` for :class:`pyspark.sql.types.ByteType`. We can also use ``int`` as a short name for ``IntegerType``. :param samplingRatio: the sample ratio of rows used for inferring +:param verifySchema: verify data types of every row against schema. :return: :class:`DataFrame` -.. versionchanged:: 2.0 --- End diff -- @davies, I'm also slightly confused by this documentation change since it looks like the new 2.x behavior of wrapping single-field datatypes into structtypes and values into tuples is preserved by this patch. Could you clarify? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame fr...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/14469#discussion_r73793910 --- Diff: python/pyspark/sql/tests.py --- @@ -411,6 +411,21 @@ def test_infer_schema_to_local(self): df3 = self.spark.createDataFrame(rdd, df.schema) self.assertEqual(10, df3.count()) +def test_apply_schema_to_dict_and_rows(self): --- End diff -- Should we also add a test to exercise the `verifySchema=False` case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame fr...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/14469#discussion_r73793929 --- Diff: python/pyspark/sql/types.py --- @@ -582,6 +582,8 @@ def toInternal(self, obj): else: if isinstance(obj, dict): return tuple(obj.get(n) for n in self.names) +elif isinstance(obj, Row) and getattr(obj, "__from_dict__", False): --- End diff -- Nice. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame from dict...
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/14469 This looks pretty good to me overall but I have a couple of clarification questions regarding some of the doc changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame from dict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14469 **[Test build #3205 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3205/consoleFull)** for PR 14469 at commit [`c0ad866`](https://github.com/apache/spark/commit/c0ad8668ba22e51b07ba08b8e19c312783cd1b87). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame fr...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/14469#discussion_r73794005 --- Diff: python/pyspark/sql/session.py --- @@ -384,17 +384,15 @@ def _createFromLocal(self, data, schema): if schema is None or isinstance(schema, (list, tuple)): struct = self._inferSchemaFromList(data) +converter = _create_converter(struct) --- End diff -- This [`_create_converter` method](https://github.com/davies/spark/blob/c0ad8668ba22e51b07ba08b8e19c312783cd1b87/python/pyspark/sql/types.py#L1054) is confusingly-named: what it's actually doing here is converting `data` from a dict to a tuple in case the schema is a StructType and data is a Python dictionary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12574: [SPARK-13857][ML][WIP] Add "recommend all" functionality...
Github user debasish83 commented on the issue: https://github.com/apache/spark/pull/12574 @MLnick I recently visited IBM STC but unfortunately missed you on the meeting...we discussed about the ML/MLlib changes for matrix factorization... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12574: [SPARK-13857][ML][WIP] Add "recommend all" functionality...
Github user debasish83 commented on the issue: https://github.com/apache/spark/pull/12574 I will take a pass at the PR as well.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14521: [SPARK-16935] [SQL] Verification of Function-related Ext...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14521 **[Test build #63316 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63316/consoleFull)** for PR 14521 at commit [`58cba4b`](https://github.com/apache/spark/commit/58cba4ba3658d2b0c5bb7bf7b0bfe929bec1aafd). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class ShuffleIndexInformation ` * `public class ShuffleIndexRecord ` * `case class CreateTable(tableDesc: CatalogTable, mode: SaveMode, query: Option[LogicalPlan])` * `case class PreprocessDDL(conf: SQLConf) extends Rule[LogicalPlan] ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14521: [SPARK-16935] [SQL] Verification of Function-related Ext...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14521 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63316/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14521: [SPARK-16935] [SQL] Verification of Function-related Ext...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14521 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14522: [Spark-16508][SparkR] Split docs for arrange and ...
GitHub user junyangq opened a pull request: https://github.com/apache/spark/pull/14522 [Spark-16508][SparkR] Split docs for arrange and orderBy methods ## What changes were proposed in this pull request? This PR splits arrange and orderBy methods according to their functionality (the former for sorting sparkDataFrame and the latter for windowSpec). ## How was this patch tested? ![screen shot 2016-08-06 at 6 39 19 pm](https://cloud.githubusercontent.com/assets/15318264/17459969/51eade28-5c05-11e6-8ca1-8d8a8e344bab.png) ![screen shot 2016-08-06 at 6 39 29 pm](https://cloud.githubusercontent.com/assets/15318264/17459966/51e3c246-5c05-11e6-8d35-3e905ca48676.png) ![screen shot 2016-08-06 at 6 40 02 pm](https://cloud.githubusercontent.com/assets/15318264/17459967/51e650ec-5c05-11e6-8698-0f037f5199ff.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/junyangq/spark SPARK-16508-0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14522.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14522 commit 0876b7588cee1b2d39ffe8869e6b8320d8e27d1e Author: Junyang Qian Date: 2016-08-05T22:41:39Z Separate docs for arrange and orderBy methods according to their functionality --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14522: [Spark-16508][SparkR] Split docs for arrange and orderBy...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14522 **[Test build #63317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63317/consoleFull)** for PR 14522 at commit [`0876b75`](https://github.com/apache/spark/commit/0876b7588cee1b2d39ffe8869e6b8320d8e27d1e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14510: [SPARK-16925] Master should call schedule() after all ex...
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/14510 I'm going to merge this to master, branch-2.0, and branch-1.6. I have a followup patch to add configuration options for controlling the "remove application that has experienced too many back-to-back executor failures" code path, which I'll submit tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14258: [Spark-16579][SparkR] add install.spark function
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14258 I think we should go ahead with this and get some usage from the community if we could as early as possible. LGTM - we could see if we could improve on how to detect if running from shell later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14510: [SPARK-16925] Master should call schedule() after...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14510 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install.spark function
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r73794815 --- Diff: R/pkg/R/sparkR.R --- @@ -365,6 +365,23 @@ sparkR.session <- function( } overrideEnvs(sparkConfigMap, paramMap) } + # do not download if it is run in the sparkR shell + if (!grepl(".*shell\\.R$", Sys.getenv("R_PROFILE_USER"), perl = TRUE)) { +if (!nzchar(master) || is_master_local(master)) { --- End diff -- shouldn't we also fail if master != local but SPARK_HOME is not defined or spark jar is not in SPARK_HOME? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install.spark function
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r73794831 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,230 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Apache Spark to a Local Directory +#' +#' \code{install.spark} downloads and installs Spark to a local directory if +#' it is not found. The Spark version we use is the same as the SparkR version. +#' Users can specify a desired Hadoop version, the remote mirror site, and +#' the directory where the package is installed locally. +#' +#' The full url of remote file is inferred from \code{mirrorUrl} and \code{hadoopVersion}. +#' \code{mirrorUrl} specifies the remote path to a Spark folder. It is followed by a subfolder +#' named after the Spark version (that corresponds to SparkR), and then the tar filename. +#' The filename is composed of four parts, i.e. [Spark version]-bin-[Hadoop version].tgz. +#' For example, the full path for a Spark 2.0.0 package for Hadoop 2.7 from +#' \code{http://apache.osuosl.org} has path: +#' \code{http://apache.osuosl.org/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz}. +#' For \code{hadoopVersion = "without"}, [Hadoop version] in the filename is then +#' \code{without-hadoop}. +#' +#' @param hadoopVersion Version of Hadoop to install. Default is \code{"2.7"}. It can take other +#' version number in the format of "x.y" where x and y are integer. +#' If \code{hadoopVersion = "without"}, "Hadoop free" build is installed. +#' See +#' \href{http://spark.apache.org/docs/latest/hadoop-provided.html}{ +#' "Hadoop Free" Build} for more information. +#' Other patched version names can also be used, e.g. \code{"cdh4"} +#' @param mirrorUrl base URL of the repositories to use. The directory layout should follow +#' \href{http://www.apache.org/dyn/closer.lua/spark/}{Apache mirrors}. +#' @param localDir a local directory where Spark is installed. The directory contains +#' version-specific folders of Spark packages. Default is path to +#' the cache directory: +#' \itemize{ +#' \item Mac OS X: \file{~/Library/Caches/spark} +#' \item Unix: \env{$XDG_CACHE_HOME} if defined, otherwise \file{~/.cache/spark} +#' \item Windows: \file{\%LOCALAPPDATA\%\\spark\\spark\\Cache}. See +#' \href{https://www.microsoft.com/security/portal/mmpc/shared/variables.aspx}{ +#' Windows Common Folder Variables} about \%LOCALAPPDATA\% +#' } +#' @param overwrite If \code{TRUE}, download and overwrite the existing tar file in localDir +#' and force re-install Spark (in case the local directory or file is corrupted) +#' @return \code{install.spark} returns the local directory where Spark is found or installed +#' @rdname install.spark +#' @name install.spark --- End diff -- add @aliases --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install.spark function
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r73794861 --- Diff: R/pkg/R/sparkR.R --- @@ -365,6 +365,23 @@ sparkR.session <- function( } overrideEnvs(sparkConfigMap, paramMap) } + # do not download if it is run in the sparkR shell + if (!grepl(".*shell\\.R$", Sys.getenv("R_PROFILE_USER"), perl = TRUE)) { +if (!nzchar(master) || is_master_local(master)) { --- End diff -- to clarify, i mean this check isn't restricted to local only, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install.spark function
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r73794892 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,230 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Apache Spark to a Local Directory +#' +#' \code{install.spark} downloads and installs Spark to a local directory if +#' it is not found. The Spark version we use is the same as the SparkR version. +#' Users can specify a desired Hadoop version, the remote mirror site, and +#' the directory where the package is installed locally. +#' +#' The full url of remote file is inferred from \code{mirrorUrl} and \code{hadoopVersion}. +#' \code{mirrorUrl} specifies the remote path to a Spark folder. It is followed by a subfolder +#' named after the Spark version (that corresponds to SparkR), and then the tar filename. +#' The filename is composed of four parts, i.e. [Spark version]-bin-[Hadoop version].tgz. +#' For example, the full path for a Spark 2.0.0 package for Hadoop 2.7 from +#' \code{http://apache.osuosl.org} has path: +#' \code{http://apache.osuosl.org/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz}. +#' For \code{hadoopVersion = "without"}, [Hadoop version] in the filename is then +#' \code{without-hadoop}. +#' +#' @param hadoopVersion Version of Hadoop to install. Default is \code{"2.7"}. It can take other +#' version number in the format of "x.y" where x and y are integer. +#' If \code{hadoopVersion = "without"}, "Hadoop free" build is installed. +#' See +#' \href{http://spark.apache.org/docs/latest/hadoop-provided.html}{ +#' "Hadoop Free" Build} for more information. +#' Other patched version names can also be used, e.g. \code{"cdh4"} +#' @param mirrorUrl base URL of the repositories to use. The directory layout should follow +#' \href{http://www.apache.org/dyn/closer.lua/spark/}{Apache mirrors}. +#' @param localDir a local directory where Spark is installed. The directory contains +#' version-specific folders of Spark packages. Default is path to +#' the cache directory: +#' \itemize{ +#' \item Mac OS X: \file{~/Library/Caches/spark} +#' \item Unix: \env{$XDG_CACHE_HOME} if defined, otherwise \file{~/.cache/spark} +#' \item Windows: \file{\%LOCALAPPDATA\%\\spark\\spark\\Cache}. See +#' \href{https://www.microsoft.com/security/portal/mmpc/shared/variables.aspx}{ +#' Windows Common Folder Variables} about \%LOCALAPPDATA\% +#' } +#' @param overwrite If \code{TRUE}, download and overwrite the existing tar file in localDir +#' and force re-install Spark (in case the local directory or file is corrupted) +#' @return \code{install.spark} returns the local directory where Spark is found or installed +#' @rdname install.spark +#' @name install.spark +#' @export +#' @examples +#'\dontrun{ +#' install.spark() +#'} +#' @note install.spark since 2.1.0 +#' @seealso See available Hadoop versions: +#' \href{http://spark.apache.org/downloads.html}{Apache Spark} +install.spark <- function(hadoopVersion = "2.7", mirrorUrl = NULL, + localDir = NULL, overwrite = FALSE) { + version <- paste0("spark-", packageVersion("SparkR")) + hadoopVersion <- tolower(hadoopVersion) + hadoopVersionName <- hadoop_version_name(hadoopVersion) + packageName <- paste(version, "bin", hadoopVersionName, sep = "-") + localDir <- ifelse(is.null(localDir), spark_cache_path(), + normalizePath(localDir, mustWork = FALSE)) + + if (is.na(file.info(lo
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install.spark function
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r73794933 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,230 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Apache Spark to a Local Directory +#' +#' \code{install.spark} downloads and installs Spark to a local directory if +#' it is not found. The Spark version we use is the same as the SparkR version. +#' Users can specify a desired Hadoop version, the remote mirror site, and +#' the directory where the package is installed locally. +#' +#' The full url of remote file is inferred from \code{mirrorUrl} and \code{hadoopVersion}. +#' \code{mirrorUrl} specifies the remote path to a Spark folder. It is followed by a subfolder +#' named after the Spark version (that corresponds to SparkR), and then the tar filename. +#' The filename is composed of four parts, i.e. [Spark version]-bin-[Hadoop version].tgz. +#' For example, the full path for a Spark 2.0.0 package for Hadoop 2.7 from +#' \code{http://apache.osuosl.org} has path: +#' \code{http://apache.osuosl.org/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz}. +#' For \code{hadoopVersion = "without"}, [Hadoop version] in the filename is then +#' \code{without-hadoop}. +#' +#' @param hadoopVersion Version of Hadoop to install. Default is \code{"2.7"}. It can take other +#' version number in the format of "x.y" where x and y are integer. +#' If \code{hadoopVersion = "without"}, "Hadoop free" build is installed. +#' See +#' \href{http://spark.apache.org/docs/latest/hadoop-provided.html}{ +#' "Hadoop Free" Build} for more information. +#' Other patched version names can also be used, e.g. \code{"cdh4"} +#' @param mirrorUrl base URL of the repositories to use. The directory layout should follow +#' \href{http://www.apache.org/dyn/closer.lua/spark/}{Apache mirrors}. +#' @param localDir a local directory where Spark is installed. The directory contains +#' version-specific folders of Spark packages. Default is path to +#' the cache directory: +#' \itemize{ +#' \item Mac OS X: \file{~/Library/Caches/spark} +#' \item Unix: \env{$XDG_CACHE_HOME} if defined, otherwise \file{~/.cache/spark} +#' \item Windows: \file{\%LOCALAPPDATA\%\\spark\\spark\\Cache}. See +#' \href{https://www.microsoft.com/security/portal/mmpc/shared/variables.aspx}{ +#' Windows Common Folder Variables} about \%LOCALAPPDATA\% +#' } +#' @param overwrite If \code{TRUE}, download and overwrite the existing tar file in localDir +#' and force re-install Spark (in case the local directory or file is corrupted) +#' @return \code{install.spark} returns the local directory where Spark is found or installed +#' @rdname install.spark +#' @name install.spark +#' @export +#' @examples +#'\dontrun{ +#' install.spark() +#'} +#' @note install.spark since 2.1.0 +#' @seealso See available Hadoop versions: +#' \href{http://spark.apache.org/downloads.html}{Apache Spark} +install.spark <- function(hadoopVersion = "2.7", mirrorUrl = NULL, + localDir = NULL, overwrite = FALSE) { + version <- paste0("spark-", packageVersion("SparkR")) + hadoopVersion <- tolower(hadoopVersion) + hadoopVersionName <- hadoop_version_name(hadoopVersion) + packageName <- paste(version, "bin", hadoopVersionName, sep = "-") + localDir <- ifelse(is.null(localDir), spark_cache_path(), + normalizePath(localDir, mustWork = FALSE)) + + if (is.na(file.info(lo
[GitHub] spark issue #14522: [Spark-16508][SparkR] Split docs for arrange and orderBy...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14522 **[Test build #63317 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63317/consoleFull)** for PR 14522 at commit [`0876b75`](https://github.com/apache/spark/commit/0876b7588cee1b2d39ffe8869e6b8320d8e27d1e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14522: [Spark-16508][SparkR] Split docs for arrange and orderBy...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14522 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63317/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14522: [Spark-16508][SparkR] Split docs for arrange and orderBy...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14522 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14509: [SPARK-16924][SQL] - Support option("inferSchema", true)...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14509 If my understanding is correct, JSON one does have have `inferSchema` option. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14509: [SPARK-16924][SQL] - Support option("inferSchema", true)...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14509 Actually, `inferSchema` in CSV would be CSV-datasource specific option in order to allow read the headers as column names but to avoid infer the schema. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13680 **[Test build #63318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63318/consoleFull)** for PR 13680 at commit [`87aca80`](https://github.com/apache/spark/commit/87aca805f0c0270b6b25ade05bc7904fc0b96a06). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14431 It seems that, currently, in SparkR the `GroupedData` which represents scala's GroupedData object doesn't have any information about the grouping keys. `RelationalGroupedDataset` has a private attribute `groupingExpr` which contains information about grouping columns, however it is not accessible from R side. I was thinking that maybe we could pass grouping columns to groups.R like: groupedData(sgd, cols). Any thoughts @shivaram ? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r73795810 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -101,6 +101,8 @@ object ScalaReflection extends ScalaReflection { case t if t <:< definitions.ShortTpe => classOf[Array[Short]] case t if t <:< definitions.ByteTpe => classOf[Array[Byte]] case t if t <:< definitions.BooleanTpe => classOf[Array[Boolean]] + case t if t <:< localTypeOf[CalendarInterval] => classOf[Array[CalendarInterval]] + case t if t <:< localTypeOf[Decimal] => classOf[Array[Decimal]] --- End diff -- When I added test cases for `CalendarInterval` and `Decimal`, I got the following cast exception without these changes. What is an appropriate way to fix this? ```java org.apache.spark.sql.types.CalendarIntervalType$ cannot be cast to org.apache.spark.sql.types.ObjectType java.lang.ClassCastException: org.apache.spark.sql.types.CalendarIntervalType$ cannot be cast to org.apache.spark.sql.types.ObjectType at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$arrayClassFor(ScalaReflection.scala:108) at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$dataTypeFor(ScalaReflection.scala:82) at org.apache.spark.sql.catalyst.ScalaReflection$.dataTypeFor(ScalaReflection.scala:63) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:53) at org.apache.spark.sql.catalyst.util.UnsafeArraySuite$$anonfun$1.apply$mcV$sp(UnsafeArraySuite.scala:129) at org.apache.spark.sql.catalyst.util.UnsafeArraySuite$$anonfun$1.apply(UnsafeArraySuite.scala:48) at org.apache.spark.sql.catalyst.util.UnsafeArraySuite$$anonfun$1.apply(UnsafeArraySuite.scala:48) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.scalatest.FunSuite.runTest(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:381) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:29) at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:29) at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55) at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563) at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557) at scala.collection.immutable.List.foreach(List.scala:381) at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
[GitHub] spark issue #14491: [SPARK-16886] [EXAMPLES][SQL] structured streaming netwo...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14491 @ganeshchand Ah, I see, right. `Dataset` is `DataFrame` so you didn't change? I skimmed through the list I provided and it seems these are all and it seems `structured_network_wordcount.py` is loading `DataFrame` (Java one and Scala one are converting it to `Dataset` explicitly). But I believe we still need to correct `structured-streaming-programming-guide.md`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14523: [SPARK-16936] [SQL] Case Sensitivity Support for ...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/14523 [SPARK-16936] [SQL] Case Sensitivity Support for Refresh Temp Table ### What changes were proposed in this pull request? Currently, the `refreshTable` API is always case sensitive. When users use the view name without the exact case match, the API silently ignores the call. Users might expect the command has been successfully completed. However, when users run the subsequent SQL commands, they might still get the exception, like ``` Job aborted due to stage failure: Task 1 in stage 4.0 failed 1 times, most recent failure: Lost task 1.0 in stage 4.0 (TID 7, localhost): java.io.FileNotFoundException: File file:/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-bd4b9ea6-9aec-49c5-8f05-01cff426211e/part-r-0-0c84b915-c032-4f2e-abf5-1d48fdbddf38.snappy.parquet does not exist ``` This PR is to fix the issue. ### How was this patch tested? Added a test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark refreshTempTable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14523.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14523 commit ade173c2397613b2649d6f61e8fe27c2d659d088 Author: gatorsmile Date: 2016-08-07T04:27:41Z fix commit f62fb19791f590a8110d6a7be65987b348dc167a Author: gatorsmile Date: 2016-08-07T05:16:21Z fix2 commit fb0dd0b03640c9456313d8b7a63203607940e683 Author: gatorsmile Date: 2016-08-07T05:35:55Z update the comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14523: [SPARK-16936] [SQL] Case Sensitivity Support for Refresh...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14523 **[Test build #63319 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63319/consoleFull)** for PR 14523 at commit [`fb0dd0b`](https://github.com/apache/spark/commit/fb0dd0b03640c9456313d8b7a63203607940e683). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14489: [MINOR][SparkR] R API documentation for "coltypes...
Github user keypointt commented on a diff in the pull request: https://github.com/apache/spark/pull/14489#discussion_r73796329 --- Diff: R/pkg/R/DataFrame.R --- @@ -41,7 +41,7 @@ setOldClass("structType") #'\dontrun{ #' sparkR.session() #' df <- createDataFrame(faithful) -#'} +#' } --- End diff -- sure I'll revert it :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14489: [MINOR][SparkR] R API documentation for "coltypes" is co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14489 **[Test build #63320 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63320/consoleFull)** for PR 14489 at commit [`022bb69`](https://github.com/apache/spark/commit/022bb69b4cc04564b8521e8129d57d3d59aa05c5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13680 **[Test build #63318 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63318/consoleFull)** for PR 13680 at commit [`87aca80`](https://github.com/apache/spark/commit/87aca805f0c0270b6b25ade05bc7904fc0b96a06). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13680 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63318/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13680 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14489: [MINOR][SparkR] R API documentation for "coltypes" is co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14489 **[Test build #63320 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63320/consoleFull)** for PR 14489 at commit [`022bb69`](https://github.com/apache/spark/commit/022bb69b4cc04564b8521e8129d57d3d59aa05c5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14489: [MINOR][SparkR] R API documentation for "coltypes" is co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14489 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63320/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14489: [MINOR][SparkR] R API documentation for "coltypes" is co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14489 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org