Github user shivaram commented on the issue:
https://github.com/apache/spark/pull/14705
I cherry-picked this PR into `branch-2.0` in
https://github.com/apache/spark/commit/0297896119e11f23da4b14f62f50ec72b5fac57f
-- The merge was a little awkward and but I think I got it to work
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/14705
Thanks.
Reviewing each change, I think we need this PR (14705) and PR #14734 in
2.0.1 - so maybe only a few lines of conflicts.
---
If your project is set up for it, you can
Github user shivaram commented on the issue:
https://github.com/apache/spark/pull/14705
Yeah so we can do a couple of things. One is we try to cherry-pick this PR
to branch-2.0 and then fix all the merge conflicts that are thrown. I think
that should handle cases where the method
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/14705
I think a subset of this should go to 2.0.1 as well (as requirement to fix
warning for CRAN in 2.0.x), but it's a non-trivial port: mllib isoreg are new
in 2.1.0 only.
What's the best
Github user junyangq commented on the issue:
https://github.com/apache/spark/pull/14705
Thanks @felixcheung!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/14705
Yes agreed. I will do one more pass tonight and merge if there isn't any
blocking issue.
Thanks!
---
If your project is set up for it, you can reply to this email and have your
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14705
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14705
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64107/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14705
**[Test build #64107 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64107/consoleFull)**
for PR 14705 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14705
**[Test build #64107 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64107/consoleFull)**
for PR 14705 at commit
Github user mengxr commented on the issue:
https://github.com/apache/spark/pull/14705
Agree with @felixcheung . Let do the `setGeneric(, ...)` and do not use
`...` in function definition if it is not required. Note that having every
param documented is not a strict requirement of
Github user junyangq commented on the issue:
https://github.com/apache/spark/pull/14705
That makes sense. Perhaps this could be done in another PR?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/14705
I think it's ok to put @param ... on top of the function in the order we
want it. Or @param na.rm for sd, var etc. Yes it is a bit odd to have param in
the documentation block that is not in
Github user junyangq commented on the issue:
https://github.com/apache/spark/pull/14705
Yeah, I totally agree that in terms of usage this is safer. Then the doc
for `...` would be an issue. If we keep to the principle that doc be close to
the function, then `...` would be in generic
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/14705
Hmm, I'm beginning to think we could do this:
```
generics.R
setGeneric("spark.naiveBayes", function(data, formula, ...) {
standardGeneric("spark.naiveBayes") })
```
```
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14705
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64047/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14705
**[Test build #64047 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64047/consoleFull)**
for PR 14705 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14705
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14705
**[Test build #64047 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64047/consoleFull)**
for PR 14705 at commit
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/14705
surely - we should have said `lint-r` as the baseline. There's definitely
more we could add though. It would be great if we have bandwidth to write more
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14705
@felixcheung Thanks for kind explanation. BTW, it'd be great too if it just
has a sentence, for example, `"For R code, Apache Spark follows lint-r"` in the
wiki just like Python has `"For
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/14705
looking good - looks like we are very close.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/14705
@inheritParams would be the way to go.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/14705
@HyukjinKwon - we don't have a coding style guide for R. We have some style
check with lint-r.
In addition, the document style you are looking at is a bit different from
coding style - this
Github user junyangq commented on the issue:
https://github.com/apache/spark/pull/14705
@shivaram I found perhaps a neat way to document R'glm if we don't want to
remove it is to use `@inheritParams stats::glm`. That will bring in all the
parameters from `stats::glm` not listed in
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14705
@shivaram BTW do you mind if I ask if we have R style guidelines somewhere
please? I remember I made a PR for R referencing only the other codes. It seems
it is also missing in
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14705
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64019/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14705
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14705
**[Test build #64019 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64019/consoleFull)**
for PR 14705 at commit
Github user shivaram commented on the issue:
https://github.com/apache/spark/pull/14705
Yeah I think we will be more careful about adding new algorithms to
override existing R methods -- but given that `glm` is already exposed I'd
think we can make an exception for just this one.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14705
**[Test build #64019 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64019/consoleFull)**
for PR 14705 at commit
Github user junyangq commented on the issue:
https://github.com/apache/spark/pull/14705
I don't have a good answer in mind now. I'm not sure how much we gain by
making `glm` also applicable to `SparkDataFrame` as well. Would `spark.glm` be
enough, like other ML methods?
---
If your
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14705
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64015/
Test PASSed.
---
Github user shivaram commented on the issue:
https://github.com/apache/spark/pull/14705
Thanks @junyangq -- I just ran the check on `branch-2.0` with this PR and
in addition to `glm` there were two warnings for `...` in `first` and
`unpersist` that we can fix in this PR.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14705
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14705
**[Test build #64015 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64015/consoleFull)**
for PR 14705 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14705
**[Test build #64015 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64015/consoleFull)**
for PR 14705 at commit
37 matches
Mail list logo