[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user maropu commented on the issue: https://github.com/apache/spark/pull/17712 Aha, good! We already have a related JIRA ticket for that? I'ld like to leave this issue to it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17712 Yes! My PR has not been submitted due to my family issues. In addition to the name and deterministic flag, we have another two Scala UDF properties based on the existing Hive UDF types. Instead of adding them one by one, we plan to use a Map. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17732: Branch 2.0
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17732 @tangchun it looks mistakenly open. Could you close this please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17732: Branch 2.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17732 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17732: Branch 2.0
GitHub user tangchun opened a pull request: https://github.com/apache/spark/pull/17732 Branch 2.0 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-2.0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17732.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17732 commit 0cdd7370a61618d042417ee387a3c32ee5c924e6 Author: Bjarne Fruergaard Date: 2016-09-29T22:39:57Z [SPARK-17721][MLLIB][ML] Fix for multiplying transposed SparseMatrix with SparseVector ## What changes were proposed in this pull request? * changes the implementation of gemv with transposed SparseMatrix and SparseVector both in mllib-local and mllib (identical) * adds a test that was failing before this change, but succeeds with these changes. The problem in the previous implementation was that it only increments `i`, that is enumerating the columns of a row in the SparseMatrix, when the row-index of the vector matches the column-index of the SparseMatrix. In cases where a particular row of the SparseMatrix has non-zero values at column-indices lower than corresponding non-zero row-indices of the SparseVector, the non-zero values of the SparseVector are enumerated without ever matching the column-index at index `i` and the remaining column-indices i+1,...,indEnd-1 are never attempted. The test cases in this PR illustrate this issue. ## How was this patch tested? I have run the specific `gemv` tests in both mllib-local and mllib. I am currently still running `./dev/run-tests`. ## ___ As per instructions, I hereby state that this is my original work and that I license the work to the project (Apache Spark) under the project's open source license. Mentioning dbtsai, viirya and brkyvz whom I can see have worked/authored on these parts before. Author: Bjarne Fruergaard Closes #15296 from bwahlgreen/bugfix-spark-17721. (cherry picked from commit 29396e7d1483d027960b9a1bed47008775c4253e) Signed-off-by: Joseph K. Bradley commit a99ea4c9e0e2f91e4b524987788f0acee88e564d Author: Bryan Cutler Date: 2016-09-29T23:31:30Z Updated the following PR with minor changes to allow cherry-pick to branch-2.0 [SPARK-17697][ML] Fixed bug in summary calculations that pattern match against label without casting In calling LogisticRegression.evaluate and GeneralizedLinearRegression.evaluate using a Dataset where the Label is not of a double type, calculations pattern match against a double and throw a MatchError. This fix casts the Label column to a DoubleType to ensure there is no MatchError. Added unit tests to call evaluate with a dataset that has Label as other numeric types. Author: Bryan Cutler Closes #15288 from BryanCutler/binaryLOR-numericCheck-SPARK-17697. (cherry picked from commit 2f739567080d804a942cfcca0e22f91ab7cbea36) Signed-off-by: Joseph K. Bradley commit 744aac8e6ff04d7a3f1e8ccad335605ac8fe2f29 Author: Dongjoon Hyun Date: 2016-10-01T05:05:59Z [MINOR][DOC] Add an up-to-date description for default serialization during shuffling ## What changes were proposed in this pull request? This PR aims to make the doc up-to-date. The documentation is generally correct, but after https://issues.apache.org/jira/browse/SPARK-13926, Spark starts to choose Kyro as a default serialization library during shuffling of simple types, arrays of simple types, or string type. ## How was this patch tested? This is a documentation update. Author: Dongjoon Hyun Closes #15315 from dongjoon-hyun/SPARK-DOC-SERIALIZER. (cherry picked from commit 15e9bbb49e00b3982c428d39776725d0dea2cdfa) Signed-off-by: Reynold Xin commit b57e2acb134d94dafc81686da875c5dd3ea35c74 Author: Jagadeesan Date: 2016-10-03T09:46:38Z [SPARK-17736][DOCUMENTATION][SPARKR] Update R README for rmarkdown,⦠## What changes were proposed in this pull request? To build R docs (which are built when R tests are run), users need to install pandoc and rmarkdown. This was done for Jenkins in ~~[SPARK-17420](https://issues.apache.org/jira/browse/SPARK-17420)~~ ⦠pandoc] Author: Jagadeesan Closes #15309 fr
[GitHub] spark issue #17730: [SPARK-20439] [SQL] Fix Catalog API listTables and getTa...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17730 cc @cloud-fan @sameeragarwal --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17480: [SPARK-20079][Core][yarn] Re registration of AM hangs sp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17480 **[Test build #76076 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76076/testReport)** for PR 17480 at commit [`17a7757`](https://github.com/apache/spark/commit/17a7757c3ba76f083fa198519580a2146cb6c8af). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17480: [SPARK-20079][Core][yarn] Re registration of AM h...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/17480#discussion_r112825043 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -249,7 +249,6 @@ private[spark] class ExecutorAllocationManager( * yarn-client mode when AM re-registers after a failure. */ def reset(): Unit = synchronized { -initializing = true --- End diff -- @jerryshao @vanzin I think that deleting the `initializing = true` is a good idea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 Hi, I has checked R GBM's code and found that: R's gbm uses mean value $(x + y) / 2$, not weighted mean $(c_x * x + c_y * y) / (c_x + c_y)$ described in [JIRA SPARK-16957](https://issues.apache.org/jira/browse/SPARK-16957), for split point. 1. code snippet: [gbm-developers/gbm](https://github.com/gbm-developers/gbm) commit a1defa382a629f8b97bf9f552dcd821ee7ac9dac src/node_search.cpp:145: ```c++ else if(cCurrentVarClasses == 0) // variable is continuous { // Evaluate the current split dCurrentSplitValue = 0.5*(dLastXValue + dX); } ``` 2. test To verify it, I create a toy dataset and take a test on R. ```R > f = c(0.0, 0.0, 1.0, 1.0, 1.0, 1.0) > l = c(0, 0, 1, 1, 1, 1) > df = data.frame(l, f) > sapply(df, class) l f "numeric" "numeric" > mod <- gbm(l~f, data=df, n.trees=1, bag.fraction=1, n.minobsinnode=1, distribution = "bernoulli") > pretty.gbm.tree(mod) SplitVar SplitCodePred LeftNode RightNode MissingNode ErrorReduction Weight 00 5.00e-011 2 3 1.33 6 1 -1 -3.00e-03 -1-1 -1 0.00 2 2 -1 1.50e-03 -1-1 -1 0.00 4 3 -1 1.480297e-19 -1-1 -1 0.00 6 Prediction 0 1.480297e-19 1 -3.00e-03 2 1.50e-03 3 1.480297e-19 ``` As expected, the root's split point is 5.00e-01, namely mean value `0.5 = (0 + 1) / 2`, not weighted mean `0.7 = (0 * 2 + 1 * 4) / 6`. 3. conclusion I prefer to using weighted mean for split point in the PR, rather than mean value in R's gbm package. How about you? @sethah @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17649: [SPARK-20380][SQL] Output table comment for DESC ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17649#discussion_r112824647 --- Diff: sql/core/src/test/resources/sql-tests/results/describe-table-after-alter-table.sql.out --- @@ -0,0 +1,162 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 12 + + +-- !query 0 +CREATE TABLE table_with_comment (a STRING, b INT, c STRING, d STRING) USING parquet COMMENT 'table_comment' +-- !query 0 schema +struct<> +-- !query 0 output + + + +-- !query 1 +DESC formatted table_with_comment +-- !query 1 schema +struct +-- !query 1 output +# col_name data_type comment +a string +b int +c string +d string + +# Detailed Table Information +Database default +Table table_with_comment +Created [not included in comparison] +Last Access [not included in comparison] +Type MANAGED +Provider parquet +Commenttable_comment +Location [not included in comparison]sql/core/spark-warehouse/table_with_comment + + +-- !query 2 +ALTER TABLE table_with_comment set tblproperties(comment = "modified comment") +-- !query 2 schema +struct<> +-- !query 2 output + + + +-- !query 3 +DESC formatted table_with_comment +-- !query 3 schema +struct +-- !query 3 output +# col_name data_type comment +a string +b int +c string +d string + +# Detailed Table Information +Database default +Table table_with_comment +Created [not included in comparison] +Last Access [not included in comparison] +Type MANAGED +Provider parquet +Commentmodified comment +Properties [comment=modified comment] --- End diff -- We should remove `comment` from `Properties ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17649: [SPARK-20380][SQL] Output table comment for DESC FORMATT...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17649 @wzhfy Could you check the behavior of Hive? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17649: [SPARK-20380][SQL] Output table comment for DESC ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17649#discussion_r112824524 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -267,8 +271,15 @@ case class AlterTableUnsetPropertiesCommand( } } } +// if 'comment' key is present in the seq of keys which needs to be unset then reset the table +// level comment with none. +val tableComment = if (propKeys.contains("comment")) { + None +} else { + table.properties.get("comment") +} --- End diff -- Nit: ```Scala val comment = if (propKeys.contains("comment")) None else table.properties.get("comment") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17708: [SPARK-20413] Add new query hint NO_COLLAPSE.
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17708 It sounds like we should not simply merge two Projects to avoid calling the same UDF multiple times. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17469 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17469 LGTM, thanks for your work on this @map222 & thanks for your work reviewing this @HyukjinKwon. Merged to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17688: [MINOR][DOCS][PYTHON] Adding missing boolean type for re...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17688 LGTM, thanks @HyukjinKwon for noticing the lack of bool in the scala code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17688: [MINOR][DOCS][PYTHON] Adding missing boolean type...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/17688#discussion_r112823550 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1238,7 +1238,7 @@ def fillna(self, value, subset=None): Value to replace null values with. If the value is a dict, then `subset` is ignored and `value` must be a mapping from column name (string) to replacement value. The replacement value must be -an int, long, float, or string. +an int, long, float, boolean, or string. --- End diff -- That makes sense, I'd say that the eventual improvement would maybe be offering `fill` for bool for symetry with the rest of the types but its not necessary here rather than type checking for bool on the input. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17688: [MINOR][DOCS][PYTHON] Adding missing boolean type...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17688#discussion_r112823211 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1238,7 +1238,7 @@ def fillna(self, value, subset=None): Value to replace null values with. If the value is a dict, then `subset` is ignored and `value` must be a mapping from column name (string) to replacement value. The replacement value must be -an int, long, float, or string. +an int, long, float, boolean, or string. --- End diff -- I think this indicates the replacement `If the value is a dict` whereas `param value` can't be a bool as below: ```python >>> from pyspark.sql import Row >>> spark.createDataFrame([Row(a=None), Row(a=True)]).fillna({"a": True}).first() Row(a=True) >>> spark.createDataFrame([Row(a=None), Row(a=True)]).fillna(True).first() Row(a=None) ``` I can't find `def fill(value: Boolean)` in `functions.scala`. Namely, this will call it with `int`. So, ```python >>> spark.createDataFrame([Row(a=None), Row(a=0)]).fillna(True).first() Row(a=1) >>> spark.createDataFrame([Row(a=None), Row(a=0)]).fillna(False).first() Row(a=0) ``` So, the current status looks correct to me. BTW, ideally, we should throw an exception in ```python if not isinstance(value, (float, int, long, basestring, dict)): raise ValueError("value should be a float, int, long, string, or dict") ``` However, in Python boolean is a int - https://www.python.org/dev/peps/pep-0285/ > 6) Should bool inherit from int? > >=> Yes. ```python >>> isinstance(True, int) True ``` However, this looks just a documentation fix and I guess there are many instances with it. I think it is fine with not fixing it here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17731 so essentially it's still evaluating the `get` before when the 2nd `get` is hit from the delay binding (as a way to prevent going into an infinite loop, really) what if you have this instead to break the loop? ``` delayAssign(delayedAssign(".sparkRsession", { rm(".sparkRsession", envir = SparkR:::.sparkREnv); sparkR.session(..) }, assign.env=SparkR:::.sparkREnv) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17731 so both `sparkSession` or `sparkRjsc` are valid even after the call to `get` failed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...
Github user vijoshi commented on the issue: https://github.com/apache/spark/pull/17731 "I understand these 2 cases, can you explain how your change connect to these two?" Say, I do this: ``` delayAssign(delayedAssign(".sparkRsession", { sparkR.session(..) }, assign.env=SparkR:::.sparkREnv) ``` Now, when the user code such as this runs: ``` a <- createDataFrame(iris) ``` this sequence occurs: ``` createDataFrame() > getSparkSession() > get(".sparkRsession", envir = .sparkREnv) > delayed evaluation of sparkR.session(...) > if (exists(".sparkRsession", envir = .sparkREnv)) sparkSession <- get(".sparkRsession", envir = .sparkREnv) # error occurs here > Error "Promise already under evaluation" ``` The change is to ignore the "Promise under evaluation" error. At the line where error occurs, there doesn't seem to be any other possible cause for failure since the previous line of code has already checked that the `.sparkRsession` exists in the environment. So if we take it that this happens only when is `.sparkRsession` bound lazily and ignore it - which is what my change does - the code proceeds with regular computation of sparkSession. Similar is the case with `.sparkRjsc`. The SparkR code inside `spark.sparkContext(..)` does this: ``` if (exists(".sparkRjsc", envir = .sparkREnv)) { sparkRjsc <- get(".sparkRjsc", envir = .sparkREnv) # "Promise under evaluation" error occurs here } ``` When `.sparkRjsc` is lazily bound the `exists(..)` condition succeeds, and the ""Promise under evaluation" error occurs. If the error is ignored considering that there can't be any other cause for failure, the lazy initialization works. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17469 @holdenk --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17469 LGTM if committers are okay with merging fixing some of documentation (not all) but regarding it is his very first contribution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17731 I understand these 2 cases, can you explain how your change connect to these two? if you delay bind to `".sparkRjsc", envir = .sparkREnv`, doesn't it just work? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17469 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76075/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17469 **[Test build #76075 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76075/testReport)** for PR 17469 at commit [`b52765f`](https://github.com/apache/spark/commit/b52765f5ef156862bd3cc4793a0d3fbd4d334449). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17469 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...
Github user vijoshi commented on the issue: https://github.com/apache/spark/pull/17731 @felixcheung yes. We need to support these two types of possibilities: ``` #do not call sparkR.session() - followed by implicit reference to sparkSession a <- createDataFrame(iris) ``` or ``` #do not call sparkR.session() - followed by implicit reference to sparkContext doubled <- spark.lapply(1:10, function(x){2 * x}) ``` Internal implementations of APIs like `spark.lapply` directly look for the sparkContext so to account for these, the sparkContext needs to be friendly to being lazily initialized. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17731 also, what if an user wants to explicitly create a spark session with specific parameter? the delay binding model doesn't seem to support that properly? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17728#discussion_r112822291 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -308,6 +308,21 @@ numCyl <- summarize(groupBy(carsDF, carsDF$cyl), count = n(carsDF$cyl)) head(numCyl) ``` +`groupBy` can be replaced with `cube` or `rollup` to compute subtotals across multiple dimensions. --- End diff -- do you think the programming guide can use updates too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17728#discussion_r112821786 --- Diff: R/pkg/R/DataFrame.R --- @@ -3642,3 +3642,58 @@ setMethod("checkpoint", df <- callJMethod(x@sdf, "checkpoint", as.logical(eager)) dataFrame(df) }) + + +#' cube +#' +#' Create a multi-dimensional cube for the SparkDataFrame using the specified columns. +#' +#' @param x a SparkDataFrame. +#' @param ... variable(s) (character names(s) or Column(s)) to group on. +#' @return A GroupedData. +#' @family SparkDataFrame functions +#' @aliases cube,SparkDataFrame-method +#' @rdname cube +#' @name cube +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(mtcars) +#' mean(cube(df, "cyl", "gear", "am"), "mpg") +#' } +#' @note cube since 2.3.0 +setMethod("cube", + signature(x = "SparkDataFrame"), + function(x, ...) { +cols <- list(...) +jcol <- lapply(cols, function(x) if (is.character(x)) column(x)@jc else x@jc) +sgd <- callJMethod(x@sdf, "cube", jcol) +groupedData(sgd) + }) + +#' rollup +#' +#' Create a multi-dimensional rollup for the SparkDataFrame using the specified columns. +#' +#' @param x a SparkDataFrame. +#' @param ... variable(s) (character names(s) or Column(s)) to group on. +#' @return A GroupedData. +#' @family SparkDataFrame functions +#' @aliases rollup,SparkDataFrame-method +#' @rdname rollup +#' @name rollup +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(mtcars) +#' mean(rollup(df, "cyl", "gear", "am"), "mpg") +#' } +#' @note rollup since 2.3.0 +setMethod("rollup", + signature(x = "SparkDataFrame"), + function(x, ...) { +cols <- list(...) +jcol <- lapply(cols, function(x) if (is.character(x)) column(x)@jc else x@jc) +sgd <- callJMethod(x@sdf, "rollup", jcol) +groupedData(sgd) + }) --- End diff -- please add extra newline at end of file --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17728#discussion_r112821792 --- Diff: R/pkg/R/DataFrame.R --- @@ -3642,3 +3642,58 @@ setMethod("checkpoint", df <- callJMethod(x@sdf, "checkpoint", as.logical(eager)) dataFrame(df) }) + + +#' cube +#' +#' Create a multi-dimensional cube for the SparkDataFrame using the specified columns. +#' +#' @param x a SparkDataFrame. +#' @param ... variable(s) (character names(s) or Column(s)) to group on. +#' @return A GroupedData. +#' @family SparkDataFrame functions +#' @aliases cube,SparkDataFrame-method +#' @rdname cube +#' @name cube +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(mtcars) +#' mean(cube(df, "cyl", "gear", "am"), "mpg") +#' } +#' @note cube since 2.3.0 +setMethod("cube", + signature(x = "SparkDataFrame"), + function(x, ...) { +cols <- list(...) +jcol <- lapply(cols, function(x) if (is.character(x)) column(x)@jc else x@jc) +sgd <- callJMethod(x@sdf, "cube", jcol) +groupedData(sgd) + }) + +#' rollup +#' +#' Create a multi-dimensional rollup for the SparkDataFrame using the specified columns. +#' +#' @param x a SparkDataFrame. +#' @param ... variable(s) (character names(s) or Column(s)) to group on. --- End diff -- `names(s)` -> `name(s)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17728#discussion_r112822250 --- Diff: R/pkg/R/DataFrame.R --- @@ -3642,3 +3642,58 @@ setMethod("checkpoint", df <- callJMethod(x@sdf, "checkpoint", as.logical(eager)) dataFrame(df) }) + + +#' cube +#' +#' Create a multi-dimensional cube for the SparkDataFrame using the specified columns. +#' +#' @param x a SparkDataFrame. +#' @param ... variable(s) (character names(s) or Column(s)) to group on. +#' @return A GroupedData. +#' @family SparkDataFrame functions +#' @aliases cube,SparkDataFrame-method +#' @rdname cube +#' @name cube +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(mtcars) +#' mean(cube(df, "cyl", "gear", "am"), "mpg") +#' } +#' @note cube since 2.3.0 +setMethod("cube", + signature(x = "SparkDataFrame"), + function(x, ...) { +cols <- list(...) +jcol <- lapply(cols, function(x) if (is.character(x)) column(x)@jc else x@jc) +sgd <- callJMethod(x@sdf, "cube", jcol) +groupedData(sgd) + }) + +#' rollup +#' +#' Create a multi-dimensional rollup for the SparkDataFrame using the specified columns. +#' +#' @param x a SparkDataFrame. +#' @param ... variable(s) (character names(s) or Column(s)) to group on. +#' @return A GroupedData. +#' @family SparkDataFrame functions +#' @aliases rollup,SparkDataFrame-method +#' @rdname rollup +#' @name rollup +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(mtcars) +#' mean(rollup(df, "cyl", "gear", "am"), "mpg") +#' } +#' @note rollup since 2.3.0 +setMethod("rollup", + signature(x = "SparkDataFrame"), + function(x, ...) { +cols <- list(...) --- End diff -- check length of cols --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17728#discussion_r112822273 --- Diff: R/pkg/R/DataFrame.R --- @@ -3642,3 +3642,58 @@ setMethod("checkpoint", df <- callJMethod(x@sdf, "checkpoint", as.logical(eager)) dataFrame(df) }) + + +#' cube +#' +#' Create a multi-dimensional cube for the SparkDataFrame using the specified columns. +#' +#' @param x a SparkDataFrame. +#' @param ... variable(s) (character names(s) or Column(s)) to group on. +#' @return A GroupedData. +#' @family SparkDataFrame functions +#' @aliases cube,SparkDataFrame-method +#' @rdname cube +#' @name cube +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(mtcars) +#' mean(cube(df, "cyl", "gear", "am"), "mpg") +#' } +#' @note cube since 2.3.0 +setMethod("cube", + signature(x = "SparkDataFrame"), + function(x, ...) { +cols <- list(...) +jcol <- lapply(cols, function(x) if (is.character(x)) column(x)@jc else x@jc) +sgd <- callJMethod(x@sdf, "cube", jcol) +groupedData(sgd) + }) + +#' rollup +#' +#' Create a multi-dimensional rollup for the SparkDataFrame using the specified columns. +#' +#' @param x a SparkDataFrame. +#' @param ... variable(s) (character names(s) or Column(s)) to group on. +#' @return A GroupedData. +#' @family SparkDataFrame functions +#' @aliases rollup,SparkDataFrame-method +#' @rdname rollup +#' @name rollup +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(mtcars) +#' mean(rollup(df, "cyl", "gear", "am"), "mpg") +#' } +#' @note rollup since 2.3.0 +setMethod("rollup", + signature(x = "SparkDataFrame"), + function(x, ...) { +cols <- list(...) +jcol <- lapply(cols, function(x) if (is.character(x)) column(x)@jc else x@jc) --- End diff -- ditto `if (class(x) == "Column") x@jc else column(x)@jc` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17728#discussion_r112822277 --- Diff: R/pkg/R/generics.R --- @@ -631,6 +635,11 @@ setGeneric("sample", standardGeneric("sample") }) +#' @rdname rollup +#' @export +setGeneric("rollup", + function(x, ...) { standardGeneric("rollup") }) --- End diff -- could you keep this in one line please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17728#discussion_r112821835 --- Diff: R/pkg/R/DataFrame.R --- @@ -3642,3 +3642,58 @@ setMethod("checkpoint", df <- callJMethod(x@sdf, "checkpoint", as.logical(eager)) dataFrame(df) }) + + +#' cube +#' +#' Create a multi-dimensional cube for the SparkDataFrame using the specified columns. +#' +#' @param x a SparkDataFrame. +#' @param ... variable(s) (character names(s) or Column(s)) to group on. --- End diff -- ditto below --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17728#discussion_r112821831 --- Diff: R/pkg/R/DataFrame.R --- @@ -3642,3 +3642,58 @@ setMethod("checkpoint", df <- callJMethod(x@sdf, "checkpoint", as.logical(eager)) dataFrame(df) }) + + +#' cube +#' +#' Create a multi-dimensional cube for the SparkDataFrame using the specified columns. +#' +#' @param x a SparkDataFrame. +#' @param ... variable(s) (character names(s) or Column(s)) to group on. +#' @return A GroupedData. +#' @family SparkDataFrame functions +#' @aliases cube,SparkDataFrame-method +#' @rdname cube +#' @name cube +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(mtcars) +#' mean(cube(df, "cyl", "gear", "am"), "mpg") +#' } +#' @note cube since 2.3.0 +setMethod("cube", + signature(x = "SparkDataFrame"), + function(x, ...) { +cols <- list(...) +jcol <- lapply(cols, function(x) if (is.character(x)) column(x)@jc else x@jc) +sgd <- callJMethod(x@sdf, "cube", jcol) +groupedData(sgd) + }) + +#' rollup +#' +#' Create a multi-dimensional rollup for the SparkDataFrame using the specified columns. +#' +#' @param x a SparkDataFrame. +#' @param ... variable(s) (character names(s) or Column(s)) to group on. --- End diff -- perhaps `variable(s)` is misleading and just `character name(s) or Column(s) to group on.` is sufficient? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17728#discussion_r112822286 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -308,6 +308,21 @@ numCyl <- summarize(groupBy(carsDF, carsDF$cyl), count = n(carsDF$cyl)) head(numCyl) ``` +`groupBy` can be replaced with `cube` or `rollup` to compute subtotals across multiple dimensions. --- End diff -- minor: I wouldn't say replace because they are not functionally the same? how about `use cube or rollup to compute subtotals across multiple dimensions.` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17728#discussion_r112822246 --- Diff: R/pkg/R/DataFrame.R --- @@ -3642,3 +3642,58 @@ setMethod("checkpoint", df <- callJMethod(x@sdf, "checkpoint", as.logical(eager)) dataFrame(df) }) + + +#' cube +#' +#' Create a multi-dimensional cube for the SparkDataFrame using the specified columns. +#' +#' @param x a SparkDataFrame. +#' @param ... variable(s) (character names(s) or Column(s)) to group on. +#' @return A GroupedData. +#' @family SparkDataFrame functions +#' @aliases cube,SparkDataFrame-method +#' @rdname cube +#' @name cube +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(mtcars) +#' mean(cube(df, "cyl", "gear", "am"), "mpg") +#' } +#' @note cube since 2.3.0 +setMethod("cube", + signature(x = "SparkDataFrame"), + function(x, ...) { +cols <- list(...) --- End diff -- check length of cols is > 0? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17728#discussion_r112822261 --- Diff: R/pkg/R/DataFrame.R --- @@ -3642,3 +3642,58 @@ setMethod("checkpoint", df <- callJMethod(x@sdf, "checkpoint", as.logical(eager)) dataFrame(df) }) + + +#' cube +#' +#' Create a multi-dimensional cube for the SparkDataFrame using the specified columns. +#' +#' @param x a SparkDataFrame. +#' @param ... variable(s) (character names(s) or Column(s)) to group on. +#' @return A GroupedData. +#' @family SparkDataFrame functions +#' @aliases cube,SparkDataFrame-method +#' @rdname cube +#' @name cube +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(mtcars) +#' mean(cube(df, "cyl", "gear", "am"), "mpg") +#' } +#' @note cube since 2.3.0 +setMethod("cube", + signature(x = "SparkDataFrame"), + function(x, ...) { +cols <- list(...) +jcol <- lapply(cols, function(x) if (is.character(x)) column(x)@jc else x@jc) --- End diff -- nit: I'd flip this since Column is a stronger type, and also this way there is a nicer error message instead of `if (is.character(x)) column(x)@jc else x@jc` do `if (class(x) == "Column") x@jc else column(x)@jc` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17730: [SPARK-20439] [SQL] Fix Catalog API listTables and getTa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17730 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17730: [SPARK-20439] [SQL] Fix Catalog API listTables and getTa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17730 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76072/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17730: [SPARK-20439] [SQL] Fix Catalog API listTables and getTa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17730 **[Test build #76072 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76072/testReport)** for PR 17730 at commit [`1a5e24d`](https://github.com/apache/spark/commit/1a5e24dc5d6d538e975200b4eb95583db36d5f9f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17688: [MINOR][DOCS][PYTHON] Adding missing boolean type for re...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17688 good catch - instead of duplicating it, perhaps just say `supported data types` or `supported data types above` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17469 **[Test build #76075 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76075/testReport)** for PR 17469 at commit [`b52765f`](https://github.com/apache/spark/commit/b52765f5ef156862bd3cc4793a0d3fbd4d334449). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17469 I don't why Jenkins doesn't pick up the changes automatically... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17469 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17729#discussion_r112822059 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -1546,6 +1546,40 @@ test_that("string operators", { expect_equal(collect(select(df3, substring_index(df3$a, ".", 2)))[1, 1], "a.b") expect_equal(collect(select(df3, substring_index(df3$a, ".", -3)))[1, 1], "b.c.d") expect_equal(collect(select(df3, translate(df3$a, "bc", "12")))[1, 1], "a.1.2.d") + + l4 <- list(list(a = "a.b@c.d 1\\b")) + df4 <- createDataFrame(l4) + expect_equal( +collect(select(df4, split_string(df4$a, "\\s+")))[1, 1], +list(list("a.b@c.d", "1\\b")) + ) + expect_equal( +collect(select(df4, split_string(df4$a, "\\.")))[1, 1], +list(list("a", "b@c", "d 1\\b")) + ) + expect_equal( +collect(select(df4, split_string(df4$a, "@")))[1, 1], +list(list("a.b", "c.d 1\\b")) + ) + expect_equal( +collect(select(df4, split_string(df4$a, "")))[1, 1], +list(list("a.b@c.d 1", "b")) + ) + + l5 <- list(list(a = "abc")) + df5 <- createDataFrame(l5) + expect_equal( +collect(select(df5, repeat_string(df5$a, 1L)))[1, 1], +"abc" + ) + expect_equal( +collect(select(df5, repeat_string(df5$a, 3)))[1, 1], +"abcabcabc" + ) + expect_equal( +collect(select(df5, repeat_string(df5$a, -1)))[1, 1], --- End diff -- :) ahh, `-1` works?! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17729#discussion_r112821860 --- Diff: R/pkg/NAMESPACE --- @@ -300,6 +300,7 @@ exportMethods("%in%", "rank", "regexp_extract", "regexp_replace", + "repeat_string", --- End diff -- good call on these names! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17729#discussion_r112822015 --- Diff: R/pkg/R/functions.R --- @@ -3745,3 +3745,55 @@ setMethod("collect_set", jc <- callJStatic("org.apache.spark.sql.functions", "collect_set", x@jc) column(jc) }) + +#' split_string +#' +#' Splits string on regular expression. +#' +#' @param x Column to compute on +#' @param pattern Java regular expression +#' +#' @rdname split_string +#' @family string_funcs +#' @aliases split_string,Column-method +#' @export +#' @examples \dontrun{ +#' df <- read.text("README.md") +#' +#' head(select(split_string(df$value, "\\s+"))) +#' } +#' @note split_string 2.3.0 +#' @note equivalent to \code{split} SQL function +setMethod("split_string", + signature(x = "Column", pattern = "character"), + function(x, pattern) { +jc <- callJStatic("org.apache.spark.sql.functions", "split", x@jc, pattern) +column(jc) + }) + +#' repeat_string +#' +#' Repeats string n times. +#' +#' @param x Column to compute on +#' @param n Number of repetitions +#' +#' @rdname repeat_string +#' @family string_funcs +#' @aliases repeat_string,Column-method +#' @export +#' @examples \dontrun{ +#' df <- createDataFame(data.frame( +#' text = c("foo", "bar") +#' )) +#' +#' head(select(repeat_string(df$text, 3))) +#' } +#' @note repeat_string 2.3.0 +#' @note equivalent to \code{repeat} SQL function --- End diff -- ditto above --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17729#discussion_r112822000 --- Diff: R/pkg/R/functions.R --- @@ -3745,3 +3745,55 @@ setMethod("collect_set", jc <- callJStatic("org.apache.spark.sql.functions", "collect_set", x@jc) column(jc) }) + +#' split_string +#' +#' Splits string on regular expression. +#' +#' @param x Column to compute on +#' @param pattern Java regular expression +#' +#' @rdname split_string +#' @family string_funcs +#' @aliases split_string,Column-method +#' @export +#' @examples \dontrun{ +#' df <- read.text("README.md") +#' +#' head(select(split_string(df$value, "\\s+"))) +#' } +#' @note split_string 2.3.0 +#' @note equivalent to \code{split} SQL function --- End diff -- Note is somewhat hard to discover on the generated doc page, if you want this, you could put it as 2nd content paragraph like below and it will show up as the details section like here http://spark.apache.org/docs/latest/api/R/read.jdbc.html ``` #' split_string #' #' Splits string on regular expression. #' #' This is equivalent to \code{split} SQL function ``` (yes, through the magic of roxygen2) Also, instead of `\code{split}` you might want to link to Spark Scala doc too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17729#discussion_r112822029 --- Diff: R/pkg/R/functions.R --- @@ -3745,3 +3745,55 @@ setMethod("collect_set", jc <- callJStatic("org.apache.spark.sql.functions", "collect_set", x@jc) column(jc) }) + +#' split_string +#' +#' Splits string on regular expression. +#' +#' @param x Column to compute on +#' @param pattern Java regular expression +#' +#' @rdname split_string +#' @family string_funcs +#' @aliases split_string,Column-method +#' @export +#' @examples \dontrun{ +#' df <- read.text("README.md") +#' +#' head(select(split_string(df$value, "\\s+"))) +#' } +#' @note split_string 2.3.0 +#' @note equivalent to \code{split} SQL function +setMethod("split_string", + signature(x = "Column", pattern = "character"), + function(x, pattern) { +jc <- callJStatic("org.apache.spark.sql.functions", "split", x@jc, pattern) +column(jc) + }) + +#' repeat_string +#' +#' Repeats string n times. +#' +#' @param x Column to compute on +#' @param n Number of repetitions +#' +#' @rdname repeat_string +#' @family string_funcs +#' @aliases repeat_string,Column-method +#' @export +#' @examples \dontrun{ +#' df <- createDataFame(data.frame( +#' text = c("foo", "bar") +#' )) --- End diff -- I'm ok with this though would it be better with the read.text example than a fake 1 row like this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17729#discussion_r112822065 --- Diff: R/pkg/R/functions.R --- @@ -3745,3 +3745,55 @@ setMethod("collect_set", jc <- callJStatic("org.apache.spark.sql.functions", "collect_set", x@jc) column(jc) }) + +#' split_string +#' +#' Splits string on regular expression. +#' +#' @param x Column to compute on +#' @param pattern Java regular expression +#' +#' @rdname split_string +#' @family string_funcs +#' @aliases split_string,Column-method +#' @export +#' @examples \dontrun{ +#' df <- read.text("README.md") +#' +#' head(select(split_string(df$value, "\\s+"))) +#' } +#' @note split_string 2.3.0 +#' @note equivalent to \code{split} SQL function +setMethod("split_string", + signature(x = "Column", pattern = "character"), + function(x, pattern) { +jc <- callJStatic("org.apache.spark.sql.functions", "split", x@jc, pattern) +column(jc) + }) + +#' repeat_string +#' +#' Repeats string n times. +#' +#' @param x Column to compute on +#' @param n Number of repetitions +#' +#' @rdname repeat_string +#' @family string_funcs +#' @aliases repeat_string,Column-method +#' @export +#' @examples \dontrun{ +#' df <- createDataFame(data.frame( +#' text = c("foo", "bar") +#' )) +#' +#' head(select(repeat_string(df$text, 3))) +#' } +#' @note repeat_string 2.3.0 +#' @note equivalent to \code{repeat} SQL function +setMethod("repeat_string", + signature(x = "Column", n = "numeric"), + function(x, n) { +jc <- callJStatic("org.apache.spark.sql.functions", "repeat", x@jc, as.integer(n)) --- End diff -- this is good actually, may I introduce you to `numToInt`, an internal util --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76074/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17731 **[Test build #76074 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76074/testReport)** for PR 17731 at commit [`c06da49`](https://github.com/apache/spark/commit/c06da49214f3591602cdc3220ac606a6adb24ac8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17467: [SPARK-20140][DStream] Remove hardcoded kinesis retry wa...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17467 @brkyvz are you ok with this PR at a high level? If yes, I could help with review and shepherd this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17731 this **might** be reasonable, but `sparkR.sparkContext` is only called when `sparkR.session()` is called, and so I'm not sure I follow how if someone is doing this in a brand new R session: ``` # do not call sparkR.session() a <- createDataFrame(iris) ``` ...which is what I understand from the email exchange on user@, I think. Could you elaborate if this is what you are trying to support? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17731 **[Test build #76074 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76074/testReport)** for PR 17731 at commit [`c06da49`](https://github.com/apache/spark/commit/c06da49214f3591602cdc3220ac606a6adb24ac8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/15125#discussion_r112821540 --- Diff: docs/graphx-programming-guide.md --- @@ -708,9 +708,8 @@ messages remaining. > messaging function. These constraints allow additional optimization within GraphX. The following is the type signature of the [Pregel operator][GraphOps.pregel] as well as a *sketch* -of its implementation (note: to avoid stackOverflowError due to long lineage chains, graph and -messages are periodically checkpoint and the checkpoint interval is set by -"spark.graphx.pregel.checkpointInterval", it can be disable by set as -1): +of its implementation (note: to avoid stackOverflowError due to long lineage chains, pregel support periodcally +checkpoint graph and messages by setting "spark.graphx.pregel.checkpointInterval"): --- End diff -- I think we can recommend a good value (say 10 was the earlier default) to set to since now it defaults to off Also, good point about checkpointdir - would be good to mention that is required be set as well and link to any doc we have on that --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...
Github user vijoshi commented on the issue: https://github.com/apache/spark/pull/17731 @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17731 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17731 **[Test build #76073 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76073/testReport)** for PR 17731 at commit [`4423f5c`](https://github.com/apache/spark/commit/4423f5cb6f47d01d00064fc4886f0fa0eec2e9ed). * This patch **fails R style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17731 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76073/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17731 **[Test build #76073 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76073/testReport)** for PR 17731 at commit [`4423f5c`](https://github.com/apache/spark/commit/4423f5cb6f47d01d00064fc4886f0fa0eec2e9ed). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17731: [SPARK-20440][SparkR] Allow SparkR session and co...
GitHub user vijoshi opened a pull request: https://github.com/apache/spark/pull/17731 [SPARK-20440][SparkR] Allow SparkR session and context to have delayed bindings ## What changes were proposed in this pull request? Allow SparkR to ignore the "promise already under evaluation" error in case the user has created a delayed binding for the `.sparkRsession / .sparkRjsc` names in the `SparkR:::.sparkREnv`. ## How was this patch tested? Ran all unit tests - run-tests.sh You can merge this pull request into a Git repository by running: $ git pull https://github.com/vijoshi/spark lazysparkr_master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17731.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17731 commit 4423f5cb6f47d01d00064fc4886f0fa0eec2e9ed Author: Vinayak Date: 2017-04-21T15:24:13Z Allow SparkR session and context to have delayed/active binding --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/17728 cc @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17729: [SPARK-20438][R] SparkR wrappers for split and repeat
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/17729 cc @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17729: [SPARK-20438][R] SparkR wrappers for split and repeat
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17729 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76071/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17729: [SPARK-20438][R] SparkR wrappers for split and repeat
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17729 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17729: [SPARK-20438][R] SparkR wrappers for split and repeat
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17729 **[Test build #76071 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76071/testReport)** for PR 17729 at commit [`255863a`](https://github.com/apache/spark/commit/255863acfbe6f91ae533c7fee6b190a350b2f880). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17730: [SPARK-20439] [SQL] Fix Catalog API listTables and getTa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17730 **[Test build #76072 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76072/testReport)** for PR 17730 at commit [`1a5e24d`](https://github.com/apache/spark/commit/1a5e24dc5d6d538e975200b4eb95583db36d5f9f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17730: [SPARK-20439] [SQL] Fix Catalog API listTables an...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/17730 [SPARK-20439] [SQL] Fix Catalog API listTables and getTable when failed to fetch table metadata ### What changes were proposed in this pull request? `spark.catalog.listTables` and `spark.catalog.getTable` does not work if we are unable to retrieve table metadata due to any reason (e.g., table serde class is not accessible or the table type is not accepted by Spark SQL). After this PR, the APIs still return the corresponding Table without the description and tableType) ### How was this patch tested? Added a test case You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark listTables Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17730.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17730 commit fded331a6ddc002e0476056cae0ccee095fc75e5 Author: Xiao Li Date: 2017-04-22T22:21:13Z fix. commit ee2df36d580ed729a38a01b4cd81a41639af6143 Author: Xiao Li Date: 2017-04-22T22:30:15Z clean test case commit 1a5e24dc5d6d538e975200b4eb95583db36d5f9f Author: Xiao Li Date: 2017-04-22T22:31:16Z clean test case --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17729: [SPARK-20438][R] SparkR wrappers for split and repeat
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17729 **[Test build #76071 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76071/testReport)** for PR 17729 at commit [`255863a`](https://github.com/apache/spark/commit/255863acfbe6f91ae533c7fee6b190a350b2f880). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...
GitHub user zero323 opened a pull request: https://github.com/apache/spark/pull/17729 [SPARK-20438][R] SparkR wrappers for split and repeat ## What changes were proposed in this pull request? Add wrappers for `o.a.s.sql.functions`: - `split` as `split_string` - `repeat` as `repeat_string` ## How was this patch tested? Existing tests, additional unit tests, `check-cran.sh` You can merge this pull request into a Git repository by running: $ git pull https://github.com/zero323/spark SPARK-20438 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17729.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17729 commit 255863acfbe6f91ae533c7fee6b190a350b2f880 Author: zero323 Date: 2017-04-22T22:01:22Z Add split_string and repeat_string --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17713: [SPARK-20417][SQL] Move subquery error handling t...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/17713#discussion_r112819365 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -414,4 +352,269 @@ trait CheckAnalysis extends PredicateHelper { plan.foreach(_.setAnalyzed()) } + --- End diff -- note to reviewers: This function basically refactors the validation logic for subquery expressions from checkAnalysis. This is the entry point function to do all the validation for subquery is is called from checkAnalysis(). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15125 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76069/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15125 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15125 **[Test build #76069 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76069/testReport)** for PR 15125 at commit [`24d4ad6`](https://github.com/apache/spark/commit/24d4ad6fd5b05e1d024a42ee656058e77237ffb9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17728 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76070/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17728 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17728 **[Test build #76070 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76070/testReport)** for PR 17728 at commit [`132099c`](https://github.com/apache/spark/commit/132099cc668baa240a0a417950f78fea4be961ec). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112816898 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala --- @@ -295,6 +306,23 @@ class KinesisSequenceRangeIterator( private[streaming] object KinesisSequenceRangeIterator { - val MAX_RETRIES = 3 - val MIN_RETRY_WAIT_TIME_MS = 100 + /** + * The maximum number of attempts to be made to kinesis. Defaults to 3. + */ + val MAX_RETRIES = "3" + + /** + * The interval between consequent kinesis retries. Defaults to 100ms. + */ + val MIN_RETRY_WAIT_TIME_MS = "100ms" + + /** + * Key for configuring the retry wait time for kinesis. The values can be passed to SparkConf. --- End diff -- *nit:* I'd make the following tweaks here: ```scala /** * SparkConf key for configuring the wait time to use before retrying a Kinesis attempt. */ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112816922 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala --- @@ -295,6 +306,23 @@ class KinesisSequenceRangeIterator( private[streaming] object KinesisSequenceRangeIterator { - val MAX_RETRIES = 3 - val MIN_RETRY_WAIT_TIME_MS = 100 + /** + * The maximum number of attempts to be made to kinesis. Defaults to 3. + */ + val MAX_RETRIES = "3" + + /** + * The interval between consequent kinesis retries. Defaults to 100ms. + */ + val MIN_RETRY_WAIT_TIME_MS = "100ms" + + /** + * Key for configuring the retry wait time for kinesis. The values can be passed to SparkConf. + */ + val RETRY_WAIT_TIME_KEY = "spark.streaming.kinesis.retry.waitTime" + + /** + * Key for configuring the number of retries for kinesis. The values can be passed to SparkConf. --- End diff -- *nit:* I'd make the following tweaks here: ```scala /** * SparkConf key for configuring the maximum number of retries used when attempting a Kinesis * request. */ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112817123 --- Diff: docs/streaming-kinesis-integration.md --- @@ -216,3 +216,7 @@ de-aggregate records during consumption. - If no Kinesis checkpoint info exists when the input DStream starts, it will start either from the oldest record available (`InitialPositionInStream.TRIM_HORIZON`) or from the latest tip (`InitialPositionInStream.LATEST`). This is configurable. - `InitialPositionInStream.LATEST` could lead to missed records if data is added to the stream while no input DStreams are running (and no checkpoint info is being stored). - `InitialPositionInStream.TRIM_HORIZON` may lead to duplicate processing of records where the impact is dependent on checkpoint frequency and processing idempotency. + +- Kinesis retry configurations --- End diff -- @brkyvz or another Spark committer might have better suggestions here, but I would suggest making this section a new heading (rather than part of **Kinesis Checkpointing**) and adding a brief explanatory sentence, e.g.: ``` Kinesis retry configuration - A Kinesis DStream will retry any failed request to the Kinesis API. The following SparkConf properties can be set in order to customize the behavior of the retry logic: ``` followed by the rest of your changes here. This also reminds me that I owe @brkyvz a change to add docs for the stream builder interface here :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112816822 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala --- @@ -295,6 +306,23 @@ class KinesisSequenceRangeIterator( private[streaming] object KinesisSequenceRangeIterator { - val MAX_RETRIES = 3 - val MIN_RETRY_WAIT_TIME_MS = 100 + /** + * The maximum number of attempts to be made to kinesis. Defaults to 3. + */ + val MAX_RETRIES = "3" + + /** + * The interval between consequent kinesis retries. Defaults to 100ms. --- End diff -- *nit:* **K**inesis --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112816810 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala --- @@ -295,6 +306,23 @@ class KinesisSequenceRangeIterator( private[streaming] object KinesisSequenceRangeIterator { - val MAX_RETRIES = 3 - val MIN_RETRY_WAIT_TIME_MS = 100 + /** + * The maximum number of attempts to be made to kinesis. Defaults to 3. --- End diff -- *nit:* **K**inesis --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17728 **[Test build #76070 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76070/testReport)** for PR 17728 at commit [`132099c`](https://github.com/apache/spark/commit/132099cc668baa240a0a417950f78fea4be961ec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17467: [SPARK-20140][DStream] Remove hardcoded kinesis retry wa...
Github user budde commented on the issue: https://github.com/apache/spark/pull/17467 @yssharma Fair enough. I'll try to get your update reviewed later today --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17693 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17693 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76068/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17693 **[Test build #76068 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76068/testReport)** for PR 17693 at commit [`f706ce3`](https://github.com/apache/spark/commit/f706ce3bfbc1f22ed32d2785fd7674fd7d03e874). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17712 cc @gatorsmile This is related to the deterministic thing you want to do? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15125 **[Test build #76069 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76069/testReport)** for PR 15125 at commit [`24d4ad6`](https://github.com/apache/spark/commit/24d4ad6fd5b05e1d024a42ee656058e77237ffb9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...
Github user dding3 commented on the issue: https://github.com/apache/spark/pull/15125 OK, agreed. If user didn't set checkpointer directory while we turn on checkpoint in pregel by default, there may be exception. I will change spark.graphx.pregel.checkpointInterval to -1 as default value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17693 **[Test build #76068 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76068/testReport)** for PR 17693 at commit [`f706ce3`](https://github.com/apache/spark/commit/f706ce3bfbc1f22ed32d2785fd7674fd7d03e874). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17688: [MINOR][DOCS][PYTHON] Adding missing boolean type for re...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17688 @vundela L1237 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17693 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17720: [SPARK-20407][TESTS][BACKPORT-2.1] ParquetQuerySuite 'En...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17720 Thanks! Merging to 2.1 Could you close it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17728 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76067/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org