[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-21 Thread shivaram
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14705 I cherry-picked this PR into `branch-2.0` in https://github.com/apache/spark/commit/0297896119e11f23da4b14f62f50ec72b5fac57f -- The merge was a little awkward and but I think I got it to work

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-20 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14705 Thanks. Reviewing each change, I think we need this PR (14705) and PR #14734 in 2.0.1 - so maybe only a few lines of conflicts. --- If your project is set up for it, you can

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-20 Thread shivaram
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14705 Yeah so we can do a couple of things. One is we try to cherry-pick this PR to branch-2.0 and then fix all the merge conflicts that are thrown. I think that should handle cases where the method

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-20 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14705 I think a subset of this should go to 2.0.1 as well (as requirement to fix warning for CRAN in 2.0.x), but it's a non-trivial port: mllib isoreg are new in 2.1.0 only. What's the best

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-19 Thread junyangq
Github user junyangq commented on the issue: https://github.com/apache/spark/pull/14705 Thanks @felixcheung! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-19 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14705 Yes agreed. I will do one more pass tonight and merge if there isn't any blocking issue. Thanks! --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14705 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14705 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64107/ Test PASSed. ---

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14705 **[Test build #64107 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64107/consoleFull)** for PR 14705 at commit

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14705 **[Test build #64107 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64107/consoleFull)** for PR 14705 at commit

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-19 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/14705 Agree with @felixcheung . Let do the `setGeneric(, ...)` and do not use `...` in function definition if it is not required. Note that having every param documented is not a strict requirement of

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-19 Thread junyangq
Github user junyangq commented on the issue: https://github.com/apache/spark/pull/14705 That makes sense. Perhaps this could be done in another PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-19 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14705 I think it's ok to put @param ... on top of the function in the order we want it. Or @param na.rm for sd, var etc. Yes it is a bit odd to have param in the documentation block that is not in

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-19 Thread junyangq
Github user junyangq commented on the issue: https://github.com/apache/spark/pull/14705 Yeah, I totally agree that in terms of usage this is safer. Then the doc for `...` would be an issue. If we keep to the principle that doc be close to the function, then `...` would be in generic

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-19 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14705 Hmm, I'm beginning to think we could do this: ``` generics.R setGeneric("spark.naiveBayes", function(data, formula, ...) { standardGeneric("spark.naiveBayes") }) ``` ```

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14705 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64047/ Test PASSed. ---

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14705 **[Test build #64047 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64047/consoleFull)** for PR 14705 at commit

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14705 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14705 **[Test build #64047 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64047/consoleFull)** for PR 14705 at commit

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-18 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14705 surely - we should have said `lint-r` as the baseline. There's definitely more we could add though. It would be great if we have bandwidth to write more

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14705 @felixcheung Thanks for kind explanation. BTW, it'd be great too if it just has a sentence, for example, `"For R code, Apache Spark follows lint-r"` in the wiki just like Python has `"For

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-18 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14705 looking good - looks like we are very close. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-18 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14705 @inheritParams would be the way to go. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-18 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14705 @HyukjinKwon - we don't have a coding style guide for R. We have some style check with lint-r. In addition, the document style you are looking at is a bit different from coding style - this

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-18 Thread junyangq
Github user junyangq commented on the issue: https://github.com/apache/spark/pull/14705 @shivaram I found perhaps a neat way to document R'glm if we don't want to remove it is to use `@inheritParams stats::glm`. That will bring in all the parameters from `stats::glm` not listed in

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14705 @shivaram BTW do you mind if I ask if we have R style guidelines somewhere please? I remember I made a PR for R referencing only the other codes. It seems it is also missing in

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14705 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64019/ Test PASSed. ---

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14705 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14705 **[Test build #64019 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64019/consoleFull)** for PR 14705 at commit

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-18 Thread shivaram
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14705 Yeah I think we will be more careful about adding new algorithms to override existing R methods -- but given that `glm` is already exposed I'd think we can make an exception for just this one.

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14705 **[Test build #64019 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64019/consoleFull)** for PR 14705 at commit

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-18 Thread junyangq
Github user junyangq commented on the issue: https://github.com/apache/spark/pull/14705 I don't have a good answer in mind now. I'm not sure how much we gain by making `glm` also applicable to `SparkDataFrame` as well. Would `spark.glm` be enough, like other ML methods? --- If your

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14705 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64015/ Test PASSed. ---

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-18 Thread shivaram
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14705 Thanks @junyangq -- I just ran the check on `branch-2.0` with this PR and in addition to `glm` there were two warnings for `...` in `first` and `unpersist` that we can fix in this PR.

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14705 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14705 **[Test build #64015 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64015/consoleFull)** for PR 14705 at commit

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14705 **[Test build #64015 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64015/consoleFull)** for PR 14705 at commit