[jira] [Comment Edited] (SPARK-22281) Handle R method breaking signature changes
[ https://issues.apache.org/jira/browse/SPARK-22281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214751#comment-16214751 ] Felix Cheung edited comment on SPARK-22281 at 10/23/17 7:30 AM: ok, I have a solution for each of both. it turns out the fix for glm is quite a bit different from attach, so I added the error above. With attach, we need to match the signature of base::attach, since it changes we are going to generate the signature at runtime by pulling from base::attach directly. in short, with glm it's pulling in the function definition (ie. "usage") from the stats::glm function. Since this is "compiled in" when we build the source package into the .Rd, when/if it changes at runtime or in CRAN check it won't match the latest signature. was (Author: felixcheung): ok, I have a solution for both. it turns out the fix for glm is quite a bit different from attach, so I added the error above. With attach, we need to match the signature of base::attach, since it changes we are going to generate the signature at runtime by pulling from base::attach directly. in short, with glm it's pulling in the function definition (ie. "usage") from the stats::glm function. Since this is "compiled in" when we build the source package into the .Rd, when/if it changes at runtime or in CRAN check it won't match the latest signature. > Handle R method breaking signature changes > -- > > Key: SPARK-22281 > URL: https://issues.apache.org/jira/browse/SPARK-22281 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.1, 2.3.0 >Reporter: Felix Cheung > > cAs discussed here > http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555 > this WARNING on R-devel > * checking for code/documentation mismatches ... WARNING > Codoc mismatches from documentation object 'attach': > attach > Code: function(what, pos = 2L, name = deparse(substitute(what), > backtick = FALSE), warn.conflicts = TRUE) > Docs: function(what, pos = 2L, name = deparse(substitute(what)), > warn.conflicts = TRUE) > Mismatches in argument default values: > Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: > deparse(substitute(what)) > Codoc mismatches from documentation object 'glm': > glm > Code: function(formula, family = gaussian, data, weights, subset, > na.action, start = NULL, etastart, mustart, offset, > control = list(...), model = TRUE, method = "glm.fit", > x = FALSE, y = TRUE, singular.ok = TRUE, contrasts = > NULL, ...) > Docs: function(formula, family = gaussian, data, weights, subset, > na.action, start = NULL, etastart, mustart, offset, > control = list(...), model = TRUE, method = "glm.fit", > x = FALSE, y = TRUE, contrasts = NULL, ...) > Argument names in code not in docs: > singular.ok > Mismatches in argument names: > Position: 16 Code: singular.ok Docs: contrasts > Position: 17 Code: contrasts Docs: ... > Checked the latest release R 3.4.1 and the signature change wasn't there. > This likely indicated an upcoming change in the next R release that could > incur this new warning when we attempt to publish the package. > Not sure what we can do now since we work with multiple versions of R and > they will have different signatures then. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-22281) Handle R method breaking signature changes
[ https://issues.apache.org/jira/browse/SPARK-22281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214751#comment-16214751 ] Felix Cheung edited comment on SPARK-22281 at 10/23/17 7:10 AM: ok, I have a solution for both. it turns out the fix for glm is quite a bit different from attach, so I added the error above. With attach, we need to match the signature of base::attach, since it changes we are going to generate the signature at runtime by pulling from base::attach directly. in short, with glm it's pulling in the function definition (ie. "usage") from the stats::glm function. Since this is "compiled in" when we build the source package into the .Rd, when/if it changes at runtime or in CRAN check it won't match the latest signature. was (Author: felixcheung): ok, I have a solution for both. it turns out the fix for glm is quite a bit different from attach, so I added the error above. in short, with glm it's pulling in the function definition (ie. "usage") from the stats::glm function. Since this is "compiled in" when we build the source package into the .Rd, when/if it changes at runtime or in CRAN check it won't match the latest signature. > Handle R method breaking signature changes > -- > > Key: SPARK-22281 > URL: https://issues.apache.org/jira/browse/SPARK-22281 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.1, 2.3.0 >Reporter: Felix Cheung > > cAs discussed here > http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555 > this WARNING on R-devel > * checking for code/documentation mismatches ... WARNING > Codoc mismatches from documentation object 'attach': > attach > Code: function(what, pos = 2L, name = deparse(substitute(what), > backtick = FALSE), warn.conflicts = TRUE) > Docs: function(what, pos = 2L, name = deparse(substitute(what)), > warn.conflicts = TRUE) > Mismatches in argument default values: > Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: > deparse(substitute(what)) > Codoc mismatches from documentation object 'glm': > glm > Code: function(formula, family = gaussian, data, weights, subset, > na.action, start = NULL, etastart, mustart, offset, > control = list(...), model = TRUE, method = "glm.fit", > x = FALSE, y = TRUE, singular.ok = TRUE, contrasts = > NULL, ...) > Docs: function(formula, family = gaussian, data, weights, subset, > na.action, start = NULL, etastart, mustart, offset, > control = list(...), model = TRUE, method = "glm.fit", > x = FALSE, y = TRUE, contrasts = NULL, ...) > Argument names in code not in docs: > singular.ok > Mismatches in argument names: > Position: 16 Code: singular.ok Docs: contrasts > Position: 17 Code: contrasts Docs: ... > Checked the latest release R 3.4.1 and the signature change wasn't there. > This likely indicated an upcoming change in the next R release that could > incur this new warning when we attempt to publish the package. > Not sure what we can do now since we work with multiple versions of R and > they will have different signatures then. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22281) Handle R method breaking signature changes
[ https://issues.apache.org/jira/browse/SPARK-22281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214751#comment-16214751 ] Felix Cheung commented on SPARK-22281: -- ok, I have a solution for both. it turns out the fix for glm is quite a bit different from attach, so I added the error above. in short, with glm it's pulling in the function definition (ie. "usage") from the stats::glm function. Since this is "compiled in" when we build the source package into the .Rd, when/if it changes at runtime or in CRAN check it won't match the latest signature. > Handle R method breaking signature changes > -- > > Key: SPARK-22281 > URL: https://issues.apache.org/jira/browse/SPARK-22281 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.1, 2.3.0 >Reporter: Felix Cheung > > cAs discussed here > http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555 > this WARNING on R-devel > * checking for code/documentation mismatches ... WARNING > Codoc mismatches from documentation object 'attach': > attach > Code: function(what, pos = 2L, name = deparse(substitute(what), > backtick = FALSE), warn.conflicts = TRUE) > Docs: function(what, pos = 2L, name = deparse(substitute(what)), > warn.conflicts = TRUE) > Mismatches in argument default values: > Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: > deparse(substitute(what)) > Codoc mismatches from documentation object 'glm': > glm > Code: function(formula, family = gaussian, data, weights, subset, > na.action, start = NULL, etastart, mustart, offset, > control = list(...), model = TRUE, method = "glm.fit", > x = FALSE, y = TRUE, singular.ok = TRUE, contrasts = > NULL, ...) > Docs: function(formula, family = gaussian, data, weights, subset, > na.action, start = NULL, etastart, mustart, offset, > control = list(...), model = TRUE, method = "glm.fit", > x = FALSE, y = TRUE, contrasts = NULL, ...) > Argument names in code not in docs: > singular.ok > Mismatches in argument names: > Position: 16 Code: singular.ok Docs: contrasts > Position: 17 Code: contrasts Docs: ... > Checked the latest release R 3.4.1 and the signature change wasn't there. > This likely indicated an upcoming change in the next R release that could > incur this new warning when we attempt to publish the package. > Not sure what we can do now since we work with multiple versions of R and > they will have different signatures then. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22281) Handle R method breaking signature changes
[ https://issues.apache.org/jira/browse/SPARK-22281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-22281: - Description: cAs discussed here http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555 this WARNING on R-devel * checking for code/documentation mismatches ... WARNING Codoc mismatches from documentation object 'attach': attach Code: function(what, pos = 2L, name = deparse(substitute(what), backtick = FALSE), warn.conflicts = TRUE) Docs: function(what, pos = 2L, name = deparse(substitute(what)), warn.conflicts = TRUE) Mismatches in argument default values: Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: deparse(substitute(what)) Codoc mismatches from documentation object 'glm': glm Code: function(formula, family = gaussian, data, weights, subset, na.action, start = NULL, etastart, mustart, offset, control = list(...), model = TRUE, method = "glm.fit", x = FALSE, y = TRUE, singular.ok = TRUE, contrasts = NULL, ...) Docs: function(formula, family = gaussian, data, weights, subset, na.action, start = NULL, etastart, mustart, offset, control = list(...), model = TRUE, method = "glm.fit", x = FALSE, y = TRUE, contrasts = NULL, ...) Argument names in code not in docs: singular.ok Mismatches in argument names: Position: 16 Code: singular.ok Docs: contrasts Position: 17 Code: contrasts Docs: ... Checked the latest release R 3.4.1 and the signature change wasn't there. This likely indicated an upcoming change in the next R release that could incur this new warning when we attempt to publish the package. Not sure what we can do now since we work with multiple versions of R and they will have different signatures then. was: cAs discussed here http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555 this WARNING on R-devel * checking for code/documentation mismatches ... WARNING Codoc mismatches from documentation object 'attach': attach Code: function(what, pos = 2L, name = deparse(substitute(what), backtick = FALSE), warn.conflicts = TRUE) Docs: function(what, pos = 2L, name = deparse(substitute(what)), warn.conflicts = TRUE) Mismatches in argument default values: Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: deparse(substitute(what)) Checked the latest release R 3.4.1 and the signature change wasn't there. This likely indicated an upcoming change in the next R release that could incur this new warning when we attempt to publish the package. Not sure what we can do now since we work with multiple versions of R and they will have different signatures then. > Handle R method breaking signature changes > -- > > Key: SPARK-22281 > URL: https://issues.apache.org/jira/browse/SPARK-22281 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.1, 2.3.0 >Reporter: Felix Cheung > > cAs discussed here > http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555 > this WARNING on R-devel > * checking for code/documentation mismatches ... WARNING > Codoc mismatches from documentation object 'attach': > attach > Code: function(what, pos = 2L, name = deparse(substitute(what), > backtick = FALSE), warn.conflicts = TRUE) > Docs: function(what, pos = 2L, name = deparse(substitute(what)), > warn.conflicts = TRUE) > Mismatches in argument default values: > Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: > deparse(substitute(what)) > Codoc mismatches from documentation object 'glm': > glm > Code: function(formula, family = gaussian, data, weights, subset, > na.action, start = NULL, etastart, mustart, offset, > control = list(...), model = TRUE, method = "glm.fit", > x = FALSE, y = TRUE, singular.ok = TRUE, contrasts = > NULL, ...) > Docs: function(formula, family = gaussian, data, weights, subset, > na.action, start = NULL, etastart, mustart, offset, > control = list(...), model = TRUE, method = "glm.fit", > x = FALSE, y = TRUE, contrasts = NULL, ...) > Argument names in code not in docs: > singular.ok > Mismatches in argument names: > Position: 16 Code: singular.ok Docs: contrasts > Position: 17 Code: contrasts Docs: ... > Checked the latest release R 3.4.1 and the signature change wasn't there. > This likely indicated an upcoming change in the next R release that could > incur this
[jira] [Commented] (SPARK-22281) Handle R method breaking signature changes
[ https://issues.apache.org/jira/browse/SPARK-22281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214431#comment-16214431 ] Felix Cheung commented on SPARK-22281: -- tried a few things. If we remove the {code} @param {code} then cran checks fail with {code} * checking Rd \usage sections ... WARNING Undocumented arguments in documentation object 'attach' ‘what’ ‘pos’ ‘name’ ‘warn.conflicts’ Functions with \usage entries need to have the appropriate \alias entries, and all their arguments documented. The \usage entries must correspond to syntactically valid R code. See chapter ‘Writing R documentation files’ in the ‘Writing R Extensions’ manual. {code} if we change the method signature to {code} setMethod("attach", signature(what = "SparkDataFrame"), function(what, ...) { {code} Then it fails to install {code} Error in rematchDefinition(definition, fdef, mnames, fnames, signature) : methods can add arguments to the generic ‘attach’ only if '...' is an argument to the generic Error : unable to load R code in package ‘SparkR’ ERROR: lazy loading failed for package ‘SparkR’ {code} > Handle R method breaking signature changes > -- > > Key: SPARK-22281 > URL: https://issues.apache.org/jira/browse/SPARK-22281 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.1, 2.3.0 >Reporter: Felix Cheung > > cAs discussed here > http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555 > this WARNING on R-devel > * checking for code/documentation mismatches ... WARNING > Codoc mismatches from documentation object 'attach': > attach > Code: function(what, pos = 2L, name = deparse(substitute(what), > backtick = FALSE), warn.conflicts = TRUE) > Docs: function(what, pos = 2L, name = deparse(substitute(what)), > warn.conflicts = TRUE) > Mismatches in argument default values: > Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: > deparse(substitute(what)) > Checked the latest release R 3.4.1 and the signature change wasn't there. > This likely indicated an upcoming change in the next R release that could > incur this new warning when we attempt to publish the package. > Not sure what we can do now since we work with multiple versions of R and > they will have different signatures then. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22327) R CRAN check fails on non-latest branches
[ https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-22327: - Description: with warning * checking CRAN incoming feasibility ... WARNING Maintainer: 'Shivaram Venkataraman' Insufficient package version (submitted: 2.0.3, existing: 2.1.2) Unknown, possibly mis-spelled, fields in DESCRIPTION: 'RoxygenNote' WARNING: There was 1 warning. NOTE: There were 2 notes. We have seen this in branch-1.6, branch-2.0, and this would be a problem for branch-2.1 after we ship 2.2.1. The root cause of the issue is in the package version check in CRAN check. After the SparkR package version 2.1.2 (is first) published, any older version is failing the version check. As far as we know, there is no way to skip this version check. Also, there is previously a NOTE on new maintainer. was: with warning * checking CRAN incoming feasibility ... WARNING Maintainer: 'Shivaram Venkataraman ' Insufficient package version (submitted: 2.0.3, existing: 2.1.2) Unknown, possibly mis-spelled, fields in DESCRIPTION: 'RoxygenNote' We have seen this in branch-1.6, branch-2.0, and this would be a problem for branch-2.1 after we ship 2.2.1. The root cause of the issue is in the package version check in CRAN check. After the SparkR package version 2.1.2 (is first) published, any older version is failing the version check. As far as we know, there is no way to skip this version check. Also, there is previously a NOTE on new maintainer. > R CRAN check fails on non-latest branches > - > > Key: SPARK-22327 > URL: https://issues.apache.org/jira/browse/SPARK-22327 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0 >Reporter: Felix Cheung > > with warning > * checking CRAN incoming feasibility ... WARNING > Maintainer: 'Shivaram Venkataraman ' > Insufficient package version (submitted: 2.0.3, existing: 2.1.2) > Unknown, possibly mis-spelled, fields in DESCRIPTION: > 'RoxygenNote' > WARNING: There was 1 warning. > NOTE: There were 2 notes. > We have seen this in branch-1.6, branch-2.0, and this would be a problem for > branch-2.1 after we ship 2.2.1. > The root cause of the issue is in the package version check in CRAN check. > After the SparkR package version 2.1.2 (is first) published, any older > version is failing the version check. As far as we know, there is no way to > skip this version check. > Also, there is previously a NOTE on new maintainer. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-22327) R CRAN check fails on non-latest branches
[ https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213798#comment-16213798 ] Felix Cheung edited comment on SPARK-22327 at 10/21/17 7:45 AM: in contrast, this is from master * checking CRAN incoming feasibility ... NOTE Maintainer: 'Shivaram Venkataraman' Unknown, possibly mis-spelled, fields in DESCRIPTION: 'RoxygenNote' * checking package dependencies ... NOTE No repository set, so cyclic dependency check skipped * checking R code for possible problems ... NOTE Found the following calls to attach(): File 'SparkR/R/DataFrame.R': attach(newEnv, pos = pos, name = name, warn.conflicts = warn.conflicts) See section 'Good practice' in '?attach'. NOTE: There were 3 notes. So it should have 3 notes, or (2 notes + 1warning) as the one note turns into a warning for Insufficient package version was (Author: felixcheung): in contrast, this is from master * checking CRAN incoming feasibility ... NOTE Maintainer: 'Shivaram Venkataraman ' Unknown, possibly mis-spelled, fields in DESCRIPTION: 'RoxygenNote' * checking package dependencies ... NOTE No repository set, so cyclic dependency check skipped * checking R code for possible problems ... NOTE Found the following calls to attach(): File 'SparkR/R/DataFrame.R': attach(newEnv, pos = pos, name = name, warn.conflicts = warn.conflicts) See section 'Good practice' in '?attach'. NOTE: There were 3 notes. So it should have 3 notes or (2 notes + 1warning) as the one note turns into a warning for Insufficient package version > R CRAN check fails on non-latest branches > - > > Key: SPARK-22327 > URL: https://issues.apache.org/jira/browse/SPARK-22327 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0 >Reporter: Felix Cheung > > with warning > * checking CRAN incoming feasibility ... WARNING > Maintainer: 'Shivaram Venkataraman ' > Insufficient package version (submitted: 2.0.3, existing: 2.1.2) > Unknown, possibly mis-spelled, fields in DESCRIPTION: > 'RoxygenNote' > We have seen this in branch-1.6, branch-2.0, and this would be a problem for > branch-2.1 after we ship 2.2.1. > The root cause of the issue is in the package version check in CRAN check. > After the SparkR package version 2.1.2 (is first) published, any older > version is failing the version check. As far as we know, there is no way to > skip this version check. > Also, there is previously a NOTE on new maintainer. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22327) R CRAN check fails on non-latest branches
[ https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213798#comment-16213798 ] Felix Cheung commented on SPARK-22327: -- in contrast, this is from master * checking CRAN incoming feasibility ... NOTE Maintainer: 'Shivaram Venkataraman' Unknown, possibly mis-spelled, fields in DESCRIPTION: 'RoxygenNote' * checking package dependencies ... NOTE No repository set, so cyclic dependency check skipped * checking R code for possible problems ... NOTE Found the following calls to attach(): File 'SparkR/R/DataFrame.R': attach(newEnv, pos = pos, name = name, warn.conflicts = warn.conflicts) See section 'Good practice' in '?attach'. NOTE: There were 3 notes. So it should have 3 notes or (2 notes + 1warning) as the one note turns into a warning for Insufficient package version > R CRAN check fails on non-latest branches > - > > Key: SPARK-22327 > URL: https://issues.apache.org/jira/browse/SPARK-22327 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0 >Reporter: Felix Cheung > > with warning > * checking CRAN incoming feasibility ... WARNING > Maintainer: 'Shivaram Venkataraman ' > Insufficient package version (submitted: 2.0.3, existing: 2.1.2) > Unknown, possibly mis-spelled, fields in DESCRIPTION: > 'RoxygenNote' > We have seen this in branch-1.6, branch-2.0, and this would be a problem for > branch-2.1 after we ship 2.2.1. > The root cause of the issue is in the package version check in CRAN check. > After the SparkR package version 2.1.2 (is first) published, any older > version is failing the version check. As far as we know, there is no way to > skip this version check. > Also, there is previously a NOTE on new maintainer. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-22327) R CRAN check fails on non-latest branches
[ https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213798#comment-16213798 ] Felix Cheung edited comment on SPARK-22327 at 10/21/17 7:45 AM: in contrast, this is from master https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82919/consoleFull * checking CRAN incoming feasibility ... NOTE Maintainer: 'Shivaram Venkataraman' Unknown, possibly mis-spelled, fields in DESCRIPTION: 'RoxygenNote' * checking package dependencies ... NOTE No repository set, so cyclic dependency check skipped * checking R code for possible problems ... NOTE Found the following calls to attach(): File 'SparkR/R/DataFrame.R': attach(newEnv, pos = pos, name = name, warn.conflicts = warn.conflicts) See section 'Good practice' in '?attach'. NOTE: There were 3 notes. So it should have 3 notes, or (2 notes + 1warning) as the one note turns into a warning for Insufficient package version was (Author: felixcheung): in contrast, this is from master * checking CRAN incoming feasibility ... NOTE Maintainer: 'Shivaram Venkataraman ' Unknown, possibly mis-spelled, fields in DESCRIPTION: 'RoxygenNote' * checking package dependencies ... NOTE No repository set, so cyclic dependency check skipped * checking R code for possible problems ... NOTE Found the following calls to attach(): File 'SparkR/R/DataFrame.R': attach(newEnv, pos = pos, name = name, warn.conflicts = warn.conflicts) See section 'Good practice' in '?attach'. NOTE: There were 3 notes. So it should have 3 notes, or (2 notes + 1warning) as the one note turns into a warning for Insufficient package version > R CRAN check fails on non-latest branches > - > > Key: SPARK-22327 > URL: https://issues.apache.org/jira/browse/SPARK-22327 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0 >Reporter: Felix Cheung > > with warning > * checking CRAN incoming feasibility ... WARNING > Maintainer: 'Shivaram Venkataraman ' > Insufficient package version (submitted: 2.0.3, existing: 2.1.2) > Unknown, possibly mis-spelled, fields in DESCRIPTION: > 'RoxygenNote' > We have seen this in branch-1.6, branch-2.0, and this would be a problem for > branch-2.1 after we ship 2.2.1. > The root cause of the issue is in the package version check in CRAN check. > After the SparkR package version 2.1.2 (is first) published, any older > version is failing the version check. As far as we know, there is no way to > skip this version check. > Also, there is previously a NOTE on new maintainer. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22327) R CRAN check fails on non-latest branches
[ https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-22327: - Description: with warning * checking CRAN incoming feasibility ... WARNING Maintainer: 'Shivaram Venkataraman' Insufficient package version (submitted: 2.0.3, existing: 2.1.2) Unknown, possibly mis-spelled, fields in DESCRIPTION: 'RoxygenNote' We have seen this in branch-1.6, branch-2.0, and this would be a problem for branch-2.1 after we ship 2.2.1. The root cause of the issue is in the package version check in CRAN check. After the SparkR package version 2.1.2 (is first) published, any older version is failing the version check. As far as we know, there is no way to skip this version check. Also, there is previously a NOTE on new maintainer. was: with warning Insufficient package version (submitted: 2.0.3, existing: 2.1.2) We have seen this in branch-1.6, branch-2.0, and this would be a problem for branch-2.1 after we ship 2.2.1. The root cause of the issue is in the package version check in CRAN check. After the SparkR package version 2.1.2 (is first) published, any older version is failing the version check. As far as we know, there is no way to skip this version check. Also, there is previously a NOTE on new maintainer. > R CRAN check fails on non-latest branches > - > > Key: SPARK-22327 > URL: https://issues.apache.org/jira/browse/SPARK-22327 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0 >Reporter: Felix Cheung > > with warning > * checking CRAN incoming feasibility ... WARNING > Maintainer: 'Shivaram Venkataraman ' > Insufficient package version (submitted: 2.0.3, existing: 2.1.2) > Unknown, possibly mis-spelled, fields in DESCRIPTION: > 'RoxygenNote' > We have seen this in branch-1.6, branch-2.0, and this would be a problem for > branch-2.1 after we ship 2.2.1. > The root cause of the issue is in the package version check in CRAN check. > After the SparkR package version 2.1.2 (is first) published, any older > version is failing the version check. As far as we know, there is no way to > skip this version check. > Also, there is previously a NOTE on new maintainer. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22327) R CRAN check fails on non-latest branches
[ https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-22327: - Description: with warning Insufficient package version (submitted: 2.0.3, existing: 2.1.2) We have seen this in branch-1.6, branch-2.0, and this would be a problem for branch-2.1 after we ship 2.2.1. The root cause of the issue is in the package version check in CRAN check. After the SparkR package version 2.1.2 (is first) published, any older version is failing the version check. As far as we know, there is no way to skip this version check. Also, there is previously a NOTE on new maintainer. was: with error Insufficient package version (submitted: 2.0.3, existing: 2.1.2) We have seen this in branch-1.6, branch-2.0, and this would be a problem for branch-2.1 after we ship 2.2.1 > R CRAN check fails on non-latest branches > - > > Key: SPARK-22327 > URL: https://issues.apache.org/jira/browse/SPARK-22327 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0 >Reporter: Felix Cheung > > with warning > Insufficient package version (submitted: 2.0.3, existing: 2.1.2) > We have seen this in branch-1.6, branch-2.0, and this would be a problem for > branch-2.1 after we ship 2.2.1. > The root cause of the issue is in the package version check in CRAN check. > After the SparkR package version 2.1.2 (is first) published, any older > version is failing the version check. As far as we know, there is no way to > skip this version check. > Also, there is previously a NOTE on new maintainer. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22327) R CRAN check fails on non-latest branches
[ https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213781#comment-16213781 ] Felix Cheung commented on SPARK-22327: -- https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3956/consoleFull > R CRAN check fails on non-latest branches > - > > Key: SPARK-22327 > URL: https://issues.apache.org/jira/browse/SPARK-22327 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0 >Reporter: Felix Cheung > > with error > Insufficient package version (submitted: 2.0.3, existing: 2.1.2) > We have seen this in branch-1.6, branch-2.0, and this would be a problem for > branch-2.1 after we ship 2.2.1 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22327) R CRAN check fails on non-latest branches
[ https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-22327: - Affects Version/s: 2.3.0 > R CRAN check fails on non-latest branches > - > > Key: SPARK-22327 > URL: https://issues.apache.org/jira/browse/SPARK-22327 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0 >Reporter: Felix Cheung > > with error > Insufficient package version (submitted: 2.0.3, existing: 2.1.2) > We have seen this in branch-1.6, branch-2.0, and this would be a problem for > branch-2.1 after we ship 2.2.1 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22327) R CRAN check fails on non-latest branches
[ https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-22327: - Affects Version/s: 2.2.1 > R CRAN check fails on non-latest branches > - > > Key: SPARK-22327 > URL: https://issues.apache.org/jira/browse/SPARK-22327 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1 >Reporter: Felix Cheung > > with error > Insufficient package version (submitted: 2.0.3, existing: 2.1.2) > We have seen this in branch-1.6, branch-2.0, and this would be a problem for > branch-2.1 after we ship 2.2.1 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22327) R CRAN check fails on non-latest branches
Felix Cheung created SPARK-22327: Summary: R CRAN check fails on non-latest branches Key: SPARK-22327 URL: https://issues.apache.org/jira/browse/SPARK-22327 Project: Spark Issue Type: Bug Components: SparkR Affects Versions: 1.6.4, 2.0.3, 2.1.3 Reporter: Felix Cheung with error Insufficient package version (submitted: 2.0.3, existing: 2.1.2) We have seen this in branch-1.6, branch-2.0, and this would be a problem for branch-2.1 after we ship 2.2.1 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17608) Long type has incorrect serialization/deserialization
[ https://issues.apache.org/jira/browse/SPARK-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205379#comment-16205379 ] Felix Cheung commented on SPARK-17608: -- any taker on this? > Long type has incorrect serialization/deserialization > - > > Key: SPARK-17608 > URL: https://issues.apache.org/jira/browse/SPARK-17608 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.0 >Reporter: Thomas Powell > > Am hitting issues when using {{dapply}} on a data frame that contains a > {{bigint}} in its schema. When this is converted to a SparkR data frame a > "bigint" gets converted to a R {{numeric}} type: > https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25. > However, the R {{numeric}} type gets converted to > {{org.apache.spark.sql.types.DoubleType}}: > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L97. > The two directions therefore aren't compatible. If I use the same schema when > using dapply (and just an identity function) I will get type collisions > because the output type is a double but the schema expects a bigint. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22281) Handle R method breaking signature changes
[ https://issues.apache.org/jira/browse/SPARK-22281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-22281: - Description: cAs discussed here http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555 this WARNING on R-devel * checking for code/documentation mismatches ... WARNING Codoc mismatches from documentation object 'attach': attach Code: function(what, pos = 2L, name = deparse(substitute(what), backtick = FALSE), warn.conflicts = TRUE) Docs: function(what, pos = 2L, name = deparse(substitute(what)), warn.conflicts = TRUE) Mismatches in argument default values: Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: deparse(substitute(what)) Checked the latest release R 3.4.1 and the signature change wasn't there. This likely indicated an upcoming change in the next R release that could incur this new warning when we attempt to publish the package. Not sure what we can do now since we work with multiple versions of R and they will have different signatures then. was: As discussed here http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555 this WARNING on R-devel * checking for code/documentation mismatches ... WARNING Codoc mismatches from documentation object 'attach': attach Code: function(what, pos = 2L, name = deparse(substitute(what), backtick = FALSE), warn.conflicts = TRUE) Docs: function(what, pos = 2L, name = deparse(substitute(what)), warn.conflicts = TRUE) Mismatches in argument default values: Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: deparse(substitute(what)) Checked the latest release R 3.4.1 and the signature change wasn't there. This likely indicated an upcoming change in the next R release that could insur this new warning when we attempt to publish the package. Not sure what we can do now since we work with multiple versions of R and they will have different signatures then. > Handle R method breaking signature changes > -- > > Key: SPARK-22281 > URL: https://issues.apache.org/jira/browse/SPARK-22281 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.1, 2.3.0 >Reporter: Felix Cheung > > cAs discussed here > http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555 > this WARNING on R-devel > * checking for code/documentation mismatches ... WARNING > Codoc mismatches from documentation object 'attach': > attach > Code: function(what, pos = 2L, name = deparse(substitute(what), > backtick = FALSE), warn.conflicts = TRUE) > Docs: function(what, pos = 2L, name = deparse(substitute(what)), > warn.conflicts = TRUE) > Mismatches in argument default values: > Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: > deparse(substitute(what)) > Checked the latest release R 3.4.1 and the signature change wasn't there. > This likely indicated an upcoming change in the next R release that could > incur this new warning when we attempt to publish the package. > Not sure what we can do now since we work with multiple versions of R and > they will have different signatures then. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22281) Handle R method breaking signature changes
[ https://issues.apache.org/jira/browse/SPARK-22281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205313#comment-16205313 ] Felix Cheung commented on SPARK-22281: -- And here for all r-devel WARN https://cran.r-project.org/web/checks/check_results_SparkR.html > Handle R method breaking signature changes > -- > > Key: SPARK-22281 > URL: https://issues.apache.org/jira/browse/SPARK-22281 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0, 2.21 >Reporter: Felix Cheung > > As discussed here > http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555 > this WARNING on R-devel > * checking for code/documentation mismatches ... WARNING > Codoc mismatches from documentation object 'attach': > attach > Code: function(what, pos = 2L, name = deparse(substitute(what), > backtick = FALSE), warn.conflicts = TRUE) > Docs: function(what, pos = 2L, name = deparse(substitute(what)), > warn.conflicts = TRUE) > Mismatches in argument default values: > Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: > deparse(substitute(what)) > Checked the latest release R 3.4.1 and the signature change wasn't there. > This likely indicated an upcoming change in the next R release that could > insur this new warning when we attempt to publish the package. > Not sure what we can do now since we work with multiple versions of R and > they will have different signatures then. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22281) Handle R method breaking signature changes
Felix Cheung created SPARK-22281: Summary: Handle R method breaking signature changes Key: SPARK-22281 URL: https://issues.apache.org/jira/browse/SPARK-22281 Project: Spark Issue Type: Bug Components: SparkR Affects Versions: 2.21, 2.3.0 Reporter: Felix Cheung As discussed here http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555 this WARNING on R-devel * checking for code/documentation mismatches ... WARNING Codoc mismatches from documentation object 'attach': attach Code: function(what, pos = 2L, name = deparse(substitute(what), backtick = FALSE), warn.conflicts = TRUE) Docs: function(what, pos = 2L, name = deparse(substitute(what)), warn.conflicts = TRUE) Mismatches in argument default values: Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: deparse(substitute(what)) Checked the latest release R 3.4.1 and the signature change wasn't there. This likely indicated an upcoming change in the next R release that could insur this new warning when we attempt to publish the package. Not sure what we can do now since we work with multiple versions of R and they will have different signatures then. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19700) Design an API for pluggable scheduler implementations
[ https://issues.apache.org/jira/browse/SPARK-19700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200659#comment-16200659 ] Felix Cheung commented on SPARK-19700: -- Not that I'm aware of - I agree it is very important to take feedback from all these different efforts into considerations when we come up with the plan. I was thinking about starting to draft up a plan basing on the k8s effort. Does anyone have a better suggestion on how do we start on this? > Design an API for pluggable scheduler implementations > - > > Key: SPARK-19700 > URL: https://issues.apache.org/jira/browse/SPARK-19700 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: Matt Cheah > > One point that was brought up in discussing SPARK-18278 was that schedulers > cannot easily be added to Spark without forking the whole project. The main > reason is that much of the scheduler's behavior fundamentally depends on the > CoarseGrainedSchedulerBackend class, which is not part of the public API of > Spark and is in fact quite a complex module. As resource management and > allocation continues evolves, Spark will need to be integrated with more > cluster managers, but maintaining support for all possible allocators in the > Spark project would be untenable. Furthermore, it would be impossible for > Spark to support proprietary frameworks that are developed by specific users > for their other particular use cases. > Therefore, this ticket proposes making scheduler implementations fully > pluggable. The idea is that Spark will provide a Java/Scala interface that is > to be implemented by a scheduler that is backed by the cluster manager of > interest. The user can compile their scheduler's code into a JAR that is > placed on the driver's classpath. Finally, as is the case in the current > world, the scheduler implementation is selected and dynamically loaded > depending on the user's provided master URL. > Determining the correct API is the most challenging problem. The current > CoarseGrainedSchedulerBackend handles many responsibilities, some of which > will be common across all cluster managers, and some which will be specific > to a particular cluster manager. For example, the particular mechanism for > creating the executor processes will differ between YARN and Mesos, but, once > these executors have started running, the means to submit tasks to them over > the Netty RPC is identical across the board. > We must also consider a plugin model and interface for submitting the > application as well, because different cluster managers support different > configuration options, and thus the driver must be bootstrapped accordingly. > For example, in YARN mode the application and Hadoop configuration must be > packaged and shipped to the distributed cache prior to launching the job. A > prototype of a Kubernetes implementation starts a Kubernetes pod that runs > the driver in cluster mode. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17275) Flaky test: org.apache.spark.deploy.RPackageUtilsSuite.jars that don't exist are skipped and print warning
[ https://issues.apache.org/jira/browse/SPARK-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196380#comment-16196380 ] Felix Cheung commented on SPARK-17275: -- perhaps we should close this? it's been a year... > Flaky test: org.apache.spark.deploy.RPackageUtilsSuite.jars that don't exist > are skipped and print warning > -- > > Key: SPARK-17275 > URL: https://issues.apache.org/jira/browse/SPARK-17275 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Yin Huai > > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.4/1623/testReport/junit/org.apache.spark.deploy/RPackageUtilsSuite/jars_that_don_t_exist_are_skipped_and_print_warning/ > {code} > Error Message > java.io.IOException: Unable to delete directory > /home/jenkins/.ivy2/cache/a/mylib. > Stacktrace > sbt.ForkMain$ForkError: java.io.IOException: Unable to delete directory > /home/jenkins/.ivy2/cache/a/mylib. > at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1541) > at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270) > at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653) > at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535) > at > org.apache.spark.deploy.IvyTestUtils$.purgeLocalIvyCache(IvyTestUtils.scala:394) > at > org.apache.spark.deploy.IvyTestUtils$.withRepository(IvyTestUtils.scala:384) > at > org.apache.spark.deploy.RPackageUtilsSuite$$anonfun$3.apply$mcV$sp(RPackageUtilsSuite.scala:103) > at > org.apache.spark.deploy.RPackageUtilsSuite$$anonfun$3.apply(RPackageUtilsSuite.scala:100) > at > org.apache.spark.deploy.RPackageUtilsSuite$$anonfun$3.apply(RPackageUtilsSuite.scala:100) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) > at > org.apache.spark.deploy.RPackageUtilsSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(RPackageUtilsSuite.scala:38) > at > org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) > at > org.apache.spark.deploy.RPackageUtilsSuite.runTest(RPackageUtilsSuite.scala:38) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:381) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) > at org.scalatest.Suite$class.run(Suite.scala:1424) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) > at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:29) > at > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) > at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) > at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:29) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:357) >
[jira] [Commented] (SPARK-22202) Release tgz content differences for python and R
[ https://issues.apache.org/jira/browse/SPARK-22202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16194141#comment-16194141 ] Felix Cheung commented on SPARK-22202: -- [~holden.ka...@gmail.com] actually, I think for R we would go the other way - we would want to include what's in hadoop2.6 only in all other release profiles (ie. run *this* then create tgz) so I think the approaches are potentially opposite for R and python. > Release tgz content differences for python and R > > > Key: SPARK-22202 > URL: https://issues.apache.org/jira/browse/SPARK-22202 > Project: Spark > Issue Type: Bug > Components: PySpark, SparkR >Affects Versions: 2.1.2, 2.2.1, 2.3.0 >Reporter: Felix Cheung >Priority: Minor > > As a follow up to SPARK-22167, currently we are running different > profiles/steps in make-release.sh for hadoop2.7 vs hadoop2.6 (and others), we > should consider if these differences are significant and whether they should > be addressed. > A couple of things: > - R.../doc directory is not in any release jar except hadoop 2.6 > - python/dist, python.egg-info are not in any release jar except hadoop 2.7 > - R DESCRIPTION has a few additions > I've checked to confirm these are the same in 2.1.1 release so this isn't a > regression. > {code} > spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/doc: > sparkr-vignettes.Rmd > sparkr-vignettes.R > sparkr-vignettes.html > index.html > Only in spark-2.1.2-bin-hadoop2.7/python: dist > Only in spark-2.1.2-bin-hadoop2.7/python/pyspark: python > Only in spark-2.1.2-bin-hadoop2.7/python: pyspark.egg-info > diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/DESCRIPTION > spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/DESCRIPTION > 25a26,27 > > NeedsCompilation: no > > Packaged: 2017-10-03 00:42:30 UTC; holden > 31c33 > < Built: R 3.4.1; ; 2017-10-02 23:18:21 UTC; unix > --- > > Built: R 3.4.1; ; 2017-10-03 00:45:27 UTC; unix > Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR: doc > diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/html/00Index.html > spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/html/00Index.html > 16a17 > > User guides, package vignettes and other > > documentation. > Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/Meta: vignette.rds > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22202) Release tgz content differences for python and R
[ https://issues.apache.org/jira/browse/SPARK-22202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-22202: - Priority: Minor (was: Major) > Release tgz content differences for python and R > > > Key: SPARK-22202 > URL: https://issues.apache.org/jira/browse/SPARK-22202 > Project: Spark > Issue Type: Bug > Components: PySpark, SparkR >Affects Versions: 2.1.2, 2.2.1, 2.3.0 >Reporter: Felix Cheung >Priority: Minor > > As a follow up to SPARK-22167, currently we are running different > profiles/steps in make-release.sh for hadoop2.7 vs hadoop2.6 (and others), we > should consider if these differences are significant and whether they should > be addressed. > A couple of things: > - R.../doc directory is not in any release jar except hadoop 2.6 > - python/dist, python.egg-info are not in any release jar except hadoop 2.7 > - R DESCRIPTION has a few additions > I've checked to confirm these are the same in 2.1.1 release so this isn't a > regression. > {code} > spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/doc: > sparkr-vignettes.Rmd > sparkr-vignettes.R > sparkr-vignettes.html > index.html > Only in spark-2.1.2-bin-hadoop2.7/python: dist > Only in spark-2.1.2-bin-hadoop2.7/python/pyspark: python > Only in spark-2.1.2-bin-hadoop2.7/python: pyspark.egg-info > diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/DESCRIPTION > spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/DESCRIPTION > 25a26,27 > > NeedsCompilation: no > > Packaged: 2017-10-03 00:42:30 UTC; holden > 31c33 > < Built: R 3.4.1; ; 2017-10-02 23:18:21 UTC; unix > --- > > Built: R 3.4.1; ; 2017-10-03 00:45:27 UTC; unix > Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR: doc > diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/html/00Index.html > spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/html/00Index.html > 16a17 > > User guides, package vignettes and other > > documentation. > Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/Meta: vignette.rds > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22202) Release tgz content differences for python and R
[ https://issues.apache.org/jira/browse/SPARK-22202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193239#comment-16193239 ] Felix Cheung commented on SPARK-22202: -- [~holden.ka...@gmail.com] would you be concerned with the python differences? if not, I'll turn this into just for R. > Release tgz content differences for python and R > > > Key: SPARK-22202 > URL: https://issues.apache.org/jira/browse/SPARK-22202 > Project: Spark > Issue Type: Bug > Components: PySpark, SparkR >Affects Versions: 2.1.2, 2.2.1, 2.3.0 >Reporter: Felix Cheung > > As a follow up to SPARK-22167, currently we are running different > profiles/steps in make-release.sh for hadoop2.7 vs hadoop2.6 (and others), we > should consider if these differences are significant and whether they should > be addressed. > A couple of things: > - R.../doc directory is not in any release jar except hadoop 2.6 > - python/dist, python.egg-info are not in any release jar except hadoop 2.7 > - R DESCRIPTION has a few additions > I've checked to confirm these are the same in 2.1.1 release so this isn't a > regression. > {code} > spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/doc: > sparkr-vignettes.Rmd > sparkr-vignettes.R > sparkr-vignettes.html > index.html > Only in spark-2.1.2-bin-hadoop2.7/python: dist > Only in spark-2.1.2-bin-hadoop2.7/python/pyspark: python > Only in spark-2.1.2-bin-hadoop2.7/python: pyspark.egg-info > diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/DESCRIPTION > spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/DESCRIPTION > 25a26,27 > > NeedsCompilation: no > > Packaged: 2017-10-03 00:42:30 UTC; holden > 31c33 > < Built: R 3.4.1; ; 2017-10-02 23:18:21 UTC; unix > --- > > Built: R 3.4.1; ; 2017-10-03 00:45:27 UTC; unix > Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR: doc > diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/html/00Index.html > spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/html/00Index.html > 16a17 > > User guides, package vignettes and other > > documentation. > Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/Meta: vignette.rds > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22202) Release tgz content differences for python and R
[ https://issues.apache.org/jira/browse/SPARK-22202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193238#comment-16193238 ] Felix Cheung commented on SPARK-22202: -- Yes, exactly. > Release tgz content differences for python and R > > > Key: SPARK-22202 > URL: https://issues.apache.org/jira/browse/SPARK-22202 > Project: Spark > Issue Type: Bug > Components: PySpark, SparkR >Affects Versions: 2.1.2, 2.2.1, 2.3.0 >Reporter: Felix Cheung > > As a follow up to SPARK-22167, currently we are running different > profiles/steps in make-release.sh for hadoop2.7 vs hadoop2.6 (and others), we > should consider if these differences are significant and whether they should > be addressed. > A couple of things: > - R.../doc directory is not in any release jar except hadoop 2.6 > - python/dist, python.egg-info are not in any release jar except hadoop 2.7 > - R DESCRIPTION has a few additions > I've checked to confirm these are the same in 2.1.1 release so this isn't a > regression. > {code} > spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/doc: > sparkr-vignettes.Rmd > sparkr-vignettes.R > sparkr-vignettes.html > index.html > Only in spark-2.1.2-bin-hadoop2.7/python: dist > Only in spark-2.1.2-bin-hadoop2.7/python/pyspark: python > Only in spark-2.1.2-bin-hadoop2.7/python: pyspark.egg-info > diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/DESCRIPTION > spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/DESCRIPTION > 25a26,27 > > NeedsCompilation: no > > Packaged: 2017-10-03 00:42:30 UTC; holden > 31c33 > < Built: R 3.4.1; ; 2017-10-02 23:18:21 UTC; unix > --- > > Built: R 3.4.1; ; 2017-10-03 00:45:27 UTC; unix > Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR: doc > diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/html/00Index.html > spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/html/00Index.html > 16a17 > > User guides, package vignettes and other > > documentation. > Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/Meta: vignette.rds > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22202) Release tgz content differences for python and R
[ https://issues.apache.org/jira/browse/SPARK-22202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-22202: - Description: As a follow up to SPARK-22167, currently we are running different profiles/steps in make-release.sh for hadoop2.7 vs hadoop2.6 (and others), we should consider if these differences are significant and whether they should be addressed. A couple of things: - R.../doc directory is not in any release jar except hadoop 2.6 - python/dist, python.egg-info are not in any release jar except hadoop 2.7 - R DESCRIPTION has a few additions I've checked to confirm these are the same in 2.1.1 release so this isn't a regression. {code} spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/doc: sparkr-vignettes.Rmd sparkr-vignettes.R sparkr-vignettes.html index.html Only in spark-2.1.2-bin-hadoop2.7/python: dist Only in spark-2.1.2-bin-hadoop2.7/python/pyspark: python Only in spark-2.1.2-bin-hadoop2.7/python: pyspark.egg-info diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/DESCRIPTION spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/DESCRIPTION 25a26,27 > NeedsCompilation: no > Packaged: 2017-10-03 00:42:30 UTC; holden 31c33 < Built: R 3.4.1; ; 2017-10-02 23:18:21 UTC; unix --- > Built: R 3.4.1; ; 2017-10-03 00:45:27 UTC; unix Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR: doc diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/html/00Index.html spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/html/00Index.html 16a17 > User guides, package vignettes and other > documentation. Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/Meta: vignette.rds {code} was: As a follow up to SPARK-22167, currently we are running different profiles/steps in make-release.sh for hadoop2.7 vs hadoop2.6 (and others), we should consider if these differences are significant and whether they should be addressed. [will add more info on this soon] > Release tgz content differences for python and R > > > Key: SPARK-22202 > URL: https://issues.apache.org/jira/browse/SPARK-22202 > Project: Spark > Issue Type: Bug > Components: PySpark, SparkR >Affects Versions: 2.1.2, 2.2.1, 2.3.0 >Reporter: Felix Cheung > > As a follow up to SPARK-22167, currently we are running different > profiles/steps in make-release.sh for hadoop2.7 vs hadoop2.6 (and others), we > should consider if these differences are significant and whether they should > be addressed. > A couple of things: > - R.../doc directory is not in any release jar except hadoop 2.6 > - python/dist, python.egg-info are not in any release jar except hadoop 2.7 > - R DESCRIPTION has a few additions > I've checked to confirm these are the same in 2.1.1 release so this isn't a > regression. > {code} > spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/doc: > sparkr-vignettes.Rmd > sparkr-vignettes.R > sparkr-vignettes.html > index.html > Only in spark-2.1.2-bin-hadoop2.7/python: dist > Only in spark-2.1.2-bin-hadoop2.7/python/pyspark: python > Only in spark-2.1.2-bin-hadoop2.7/python: pyspark.egg-info > diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/DESCRIPTION > spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/DESCRIPTION > 25a26,27 > > NeedsCompilation: no > > Packaged: 2017-10-03 00:42:30 UTC; holden > 31c33 > < Built: R 3.4.1; ; 2017-10-02 23:18:21 UTC; unix > --- > > Built: R 3.4.1; ; 2017-10-03 00:45:27 UTC; unix > Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR: doc > diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/html/00Index.html > spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/html/00Index.html > 16a17 > > User guides, package vignettes and other > > documentation. > Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/Meta: vignette.rds > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22202) Release tgz content differences for python and R
[ https://issues.apache.org/jira/browse/SPARK-22202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-22202: - Description: As a follow up to SPARK-22167, currently we are running different profiles/steps in make-release.sh for hadoop2.7 vs hadoop2.6 (and others), we should consider if these differences are significant and whether they should be addressed. [will add more info on this soon] > Release tgz content differences for python and R > > > Key: SPARK-22202 > URL: https://issues.apache.org/jira/browse/SPARK-22202 > Project: Spark > Issue Type: Bug > Components: PySpark, SparkR >Affects Versions: 2.1.2, 2.2.1, 2.3.0 >Reporter: Felix Cheung > > As a follow up to SPARK-22167, currently we are running different > profiles/steps in make-release.sh for hadoop2.7 vs hadoop2.6 (and others), we > should consider if these differences are significant and whether they should > be addressed. > [will add more info on this soon] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22202) Release tgz content differences for python and R
Felix Cheung created SPARK-22202: Summary: Release tgz content differences for python and R Key: SPARK-22202 URL: https://issues.apache.org/jira/browse/SPARK-22202 Project: Spark Issue Type: Bug Components: PySpark, SparkR Affects Versions: 2.1.2, 2.2.1, 2.3.0 Reporter: Felix Cheung -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22167) Spark Packaging w/R distro issues
[ https://issues.apache.org/jira/browse/SPARK-22167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16189954#comment-16189954 ] Felix Cheung commented on SPARK-22167: -- There are likely 2 stages to this. More pressing might be the fact that hadoop-2.6 and hadoop-2.7 release tgz have fairly different content because of how the make-release script is structured. I will open a new JIRA on this. > Spark Packaging w/R distro issues > - > > Key: SPARK-22167 > URL: https://issues.apache.org/jira/browse/SPARK-22167 > Project: Spark > Issue Type: Bug > Components: Build, SparkR >Affects Versions: 2.1.2 >Reporter: holdenk >Assignee: holdenk >Priority: Blocker > Fix For: 2.1.2, 2.2.1, 2.3.0 > > > The Spark packaging for Spark R in 2.1.2 did not work as expected, namely the > R directory was missing from the hadoop-2.7 bin distro. This is the version > we build the PySpark package for so it's possible this is related. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-22063) Upgrade lintr to latest commit sha1 ID
[ https://issues.apache.org/jira/browse/SPARK-22063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187574#comment-16187574 ] Felix Cheung edited comment on SPARK-22063 at 10/2/17 9:16 AM: --- surely, I think we could even start with something simple with install.package(..., lib =) (or install_github(..., lib=)) and then library(... lib.loc=) was (Author: felixcheung): surely, I think we could even start with something simple with install.package(..., lib =) (or install_github(..., lib=)) and then library(... lib.loc) > Upgrade lintr to latest commit sha1 ID > -- > > Key: SPARK-22063 > URL: https://issues.apache.org/jira/browse/SPARK-22063 > Project: Spark > Issue Type: Improvement > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Hyukjin Kwon >Priority: Minor > > Currently, we set lintr to {{jimhester/lintr@a769c0b}} (see [this > pr|https://github.com/apache/spark/commit/7d1175011c976756efcd4e4e4f70a8fd6f287026]) > and SPARK-14074. > Today, I tried to upgrade the latest, > https://github.com/jimhester/lintr/commit/5431140ffea65071f1327625d4a8de9688fa7e72 > This fixes many bugs and now finds many instances that I have observed and > thought should be caught time to time: > {code} > inst/worker/worker.R:71:10: style: Remove spaces before the left parenthesis > in a function call. > return (output) > ^ > R/column.R:241:1: style: Lines should not be more than 100 characters. > #' > \href{https://spark.apache.org/docs/latest/sparkr.html#data-type-mapping-between-r-and-spark}{ > ^~~~ > R/context.R:332:1: style: Variable and function names should not be longer > than 30 characters. > spark.getSparkFilesRootDirectory <- function() { > ^~~~ > R/DataFrame.R:1912:1: style: Lines should not be more than 100 characters. > #' @param j,select expression for the single Column or a list of columns to > select from the SparkDataFrame. > ^~~ > R/DataFrame.R:1918:1: style: Lines should not be more than 100 characters. > #' @return A new SparkDataFrame containing only the rows that meet the > condition with selected columns. > ^~~ > R/DataFrame.R:2597:22: style: Remove spaces before the left parenthesis in a > function call. > return (joinRes) > ^ > R/DataFrame.R:2652:1: style: Variable and function names should not be longer > than 30 characters. > generateAliasesForIntersectedCols <- function (x, intersectedColNames, > suffix) { > ^ > R/DataFrame.R:2652:47: style: Remove spaces before the left parenthesis in a > function call. > generateAliasesForIntersectedCols <- function (x, intersectedColNames, > suffix) { > ^ > R/DataFrame.R:2660:14: style: Remove spaces before the left parenthesis in a > function call. > stop ("The following column name: ", newJoin, " occurs more than once > in the 'DataFrame'.", > ^ > R/DataFrame.R:3047:1: style: Lines should not be more than 100 characters. > #' @note The statistics provided by \code{summary} were change in 2.3.0 use > \link{describe} for previous defaults. > ^~ > R/DataFrame.R:3754:1: style: Lines should not be more than 100 characters. > #' If grouping expression is missing \code{cube} creates a single global > aggregate and is equivalent to > ^~~ > R/DataFrame.R:3789:1: style: Lines should not be more than 100 characters. > #' If grouping expression is missing \code{rollup} creates a single global > aggregate and is equivalent to > ^ > R/deserialize.R:46:10: style: Remove spaces before the left parenthesis in a > function call. > switch (type, > ^ > R/functions.R:41:1: style: Lines should not be more than 100 characters. > #' @param x Column to compute on. In \code{window}, it must be a time Column > of \code{TimestampType}. > ^ > R/functions.R:93:1: style: Lines should not be more than 100 characters. > #' @param x Column to compute on. In \code{shiftLeft}, \code{shiftRight} and > \code{shiftRightUnsigned}, >
[jira] [Commented] (SPARK-22063) Upgrade lintr to latest commit sha1 ID
[ https://issues.apache.org/jira/browse/SPARK-22063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187574#comment-16187574 ] Felix Cheung commented on SPARK-22063: -- surely, I think we could even start with something simple with install.package(..., lib =) (or install_github(..., lib=)) and then library(... lib.loc) > Upgrade lintr to latest commit sha1 ID > -- > > Key: SPARK-22063 > URL: https://issues.apache.org/jira/browse/SPARK-22063 > Project: Spark > Issue Type: Improvement > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Hyukjin Kwon >Priority: Minor > > Currently, we set lintr to {{jimhester/lintr@a769c0b}} (see [this > pr|https://github.com/apache/spark/commit/7d1175011c976756efcd4e4e4f70a8fd6f287026]) > and SPARK-14074. > Today, I tried to upgrade the latest, > https://github.com/jimhester/lintr/commit/5431140ffea65071f1327625d4a8de9688fa7e72 > This fixes many bugs and now finds many instances that I have observed and > thought should be caught time to time: > {code} > inst/worker/worker.R:71:10: style: Remove spaces before the left parenthesis > in a function call. > return (output) > ^ > R/column.R:241:1: style: Lines should not be more than 100 characters. > #' > \href{https://spark.apache.org/docs/latest/sparkr.html#data-type-mapping-between-r-and-spark}{ > ^~~~ > R/context.R:332:1: style: Variable and function names should not be longer > than 30 characters. > spark.getSparkFilesRootDirectory <- function() { > ^~~~ > R/DataFrame.R:1912:1: style: Lines should not be more than 100 characters. > #' @param j,select expression for the single Column or a list of columns to > select from the SparkDataFrame. > ^~~ > R/DataFrame.R:1918:1: style: Lines should not be more than 100 characters. > #' @return A new SparkDataFrame containing only the rows that meet the > condition with selected columns. > ^~~ > R/DataFrame.R:2597:22: style: Remove spaces before the left parenthesis in a > function call. > return (joinRes) > ^ > R/DataFrame.R:2652:1: style: Variable and function names should not be longer > than 30 characters. > generateAliasesForIntersectedCols <- function (x, intersectedColNames, > suffix) { > ^ > R/DataFrame.R:2652:47: style: Remove spaces before the left parenthesis in a > function call. > generateAliasesForIntersectedCols <- function (x, intersectedColNames, > suffix) { > ^ > R/DataFrame.R:2660:14: style: Remove spaces before the left parenthesis in a > function call. > stop ("The following column name: ", newJoin, " occurs more than once > in the 'DataFrame'.", > ^ > R/DataFrame.R:3047:1: style: Lines should not be more than 100 characters. > #' @note The statistics provided by \code{summary} were change in 2.3.0 use > \link{describe} for previous defaults. > ^~ > R/DataFrame.R:3754:1: style: Lines should not be more than 100 characters. > #' If grouping expression is missing \code{cube} creates a single global > aggregate and is equivalent to > ^~~ > R/DataFrame.R:3789:1: style: Lines should not be more than 100 characters. > #' If grouping expression is missing \code{rollup} creates a single global > aggregate and is equivalent to > ^ > R/deserialize.R:46:10: style: Remove spaces before the left parenthesis in a > function call. > switch (type, > ^ > R/functions.R:41:1: style: Lines should not be more than 100 characters. > #' @param x Column to compute on. In \code{window}, it must be a time Column > of \code{TimestampType}. > ^ > R/functions.R:93:1: style: Lines should not be more than 100 characters. > #' @param x Column to compute on. In \code{shiftLeft}, \code{shiftRight} and > \code{shiftRightUnsigned}, > ^~~ > R/functions.R:483:52: style: Remove spaces before the left parenthesis in a > function call. > jcols <- lapply(list(x, ...), function
[jira] [Commented] (SPARK-22167) Spark Packaging w/R distro issues
[ https://issues.apache.org/jira/browse/SPARK-22167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187156#comment-16187156 ] Felix Cheung commented on SPARK-22167: -- I think I'd propose a change on this part of the release build to depends on (a subset of) check-cran output instead, but thinking more about this the -Psparkr is more for developer and should be left as-is. The issue is the output of install-dev is not really a release format, and I guess this has been the de facto release form we have for a very long time. But this could be a separate follow up for 2.2.1/2.3. > Spark Packaging w/R distro issues > - > > Key: SPARK-22167 > URL: https://issues.apache.org/jira/browse/SPARK-22167 > Project: Spark > Issue Type: Bug > Components: Build, SparkR >Affects Versions: 2.1.2 >Reporter: holdenk >Assignee: holdenk >Priority: Blocker > > The Spark packaging for Spark R in 2.1.2 did not work as expected, namely the > R directory was missing from the hadoop-2.7 bin distro. This is the version > we build the PySpark package for so it's possible this is related. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15799) Release SparkR on CRAN
[ https://issues.apache.org/jira/browse/SPARK-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180261#comment-16180261 ] Felix Cheung commented on SPARK-15799: -- I commented on the PR. I don't think there is any code changes pending, we are just waiting for the next RC for 2.1.2 release at this point > Release SparkR on CRAN > -- > > Key: SPARK-15799 > URL: https://issues.apache.org/jira/browse/SPARK-15799 > Project: Spark > Issue Type: New Feature > Components: SparkR >Reporter: Xiangrui Meng > > Story: "As an R user, I would like to see SparkR released on CRAN, so I can > use SparkR easily in an existing R environment and have other packages built > on top of SparkR." > I made this JIRA with the following questions in mind: > * Are there known issues that prevent us releasing SparkR on CRAN? > * Do we want to package Spark jars in the SparkR release? > * Are there license issues? > * How does it fit into Spark's release process? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18131) Support returning Vector/Dense Vector from backend
[ https://issues.apache.org/jira/browse/SPARK-18131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169579#comment-16169579 ] Felix Cheung commented on SPARK-18131: -- bump. I think this is a real big problem - results from mllib is basically unusable for R user: {code} ead(predict(model, test))$probability [[1]] Java ref type org.apache.spark.ml.linalg.DenseVector id 130 [[2]] Java ref type org.apache.spark.ml.linalg.DenseVector id 131 [[3]] Java ref type org.apache.spark.ml.linalg.DenseVector id 132 [[4]] Java ref type org.apache.spark.ml.linalg.DenseVector id 133 [[5]] Java ref type org.apache.spark.ml.linalg.DenseVector id 134 [[6]] Java ref type org.apache.spark.ml.linalg.DenseVector id 135 > head(predict(model, test))$feature [[1]] Java ref type org.apache.spark.ml.linalg.SparseVector id 161 [[2]] Java ref type org.apache.spark.ml.linalg.SparseVector id 162 [[3]] Java ref type org.apache.spark.ml.linalg.SparseVector id 163 [[4]] Java ref type org.apache.spark.ml.linalg.SparseVector id 164 [[5]] Java ref type org.apache.spark.ml.linalg.SparseVector id 165 [[6]] Java ref type org.apache.spark.ml.linalg.SparseVector id 166 > head(predict(model, test))$rawPrediction [[1]] Java ref type org.apache.spark.ml.linalg.DenseVector id 210 [[2]] Java ref type org.apache.spark.ml.linalg.DenseVector id 211 [[3]] Java ref type org.apache.spark.ml.linalg.DenseVector id 212 [[4]] Java ref type org.apache.spark.ml.linalg.DenseVector id 213 ... {code} > Support returning Vector/Dense Vector from backend > -- > > Key: SPARK-18131 > URL: https://issues.apache.org/jira/browse/SPARK-18131 > Project: Spark > Issue Type: New Feature > Components: SparkR >Reporter: Miao Wang > > For `spark.logit`, there is a `probabilityCol`, which is a vector in the > backend (scala side). When we do collect(select(df, "probabilityCol")), > backend returns the java object handle (memory address). We need to implement > a method to convert a Vector/Dense Vector column as R vector, which can be > read in SparkR. It is a followup JIRA of adding `spark.logit`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21802) Make sparkR MLP summary() expose probability column
[ https://issues.apache.org/jira/browse/SPARK-21802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169575#comment-16169575 ] Felix Cheung commented on SPARK-21802: -- yes if this is from the prediction (with rawPrediction etc) it should be from predict not summary, sorry I misspoke > Make sparkR MLP summary() expose probability column > --- > > Key: SPARK-21802 > URL: https://issues.apache.org/jira/browse/SPARK-21802 > Project: Spark > Issue Type: New Feature > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Weichen Xu >Priority: Minor > > Make sparkR MLP summary() expose probability column -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21802) Make sparkR MLP summary() expose probability column
[ https://issues.apache.org/jira/browse/SPARK-21802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169487#comment-16169487 ] Felix Cheung commented on SPARK-21802: -- Can you clarify where you see it? I just ran against the latest from master branch with R's spark.mlp and don't see any probability? {code} summary <- summary(model) > > summar Error: object 'summar' not found > summary $numOfInputs [1] 4 $numOfOutputs [1] 3 $layers [1] 4 5 4 3 $weights $weights[[1]] [1] -0.878743 $weights[[2]] [1] 0.2154151 $weights[[3]] [1] -1.16304 $weights[[4]] [1] -0.6583214 $weights[[5]] [1] 1.009825 $weights[[6]] [1] 0.2934758 $weights[[7]] [1] -0.9528391 $weights[[8]] [1] 0.4029029 $weights[[9]] [1] -1.038043 $weights[[10]] [1] 0.05164362 $weights[[11]] [1] 0.9349549 $weights[[12]] [1] -0.4283766 $weights[[13]] [1] -0.5082246 $weights[[14]] [1] -0.09600512 $weights[[15]] [1] -0.7843158 $weights[[16]] [1] -1.199724 $weights[[17]] [1] 0.6001083 $weights[[18]] [1] 0.1102863 $weights[[19]] [1] 0.8259955 $weights[[20]] [1] -0.4428631 $weights[[21]] [1] 0.9691921 $weights[[22]] [1] -0.8472953 $weights[[23]] [1] -0.8521915 $weights[[24]] [1] -0.770886 $weights[[25]] [1] 0.7276595 $weights[[26]] [1] -0.7675585 $weights[[27]] [1] 0.1299603 $weights[[28]] [1] -1.056605 $weights[[29]] [1] 0.4421284 $weights[[30]] [1] -0.3245397 $weights[[31]] [1] -0.904001 $weights[[32]] [1] 0.2793773 $weights[[33]] [1] 1.045579 $weights[[34]] [1] -0.5379433 $weights[[35]] [1] -1.006988 $weights[[36]] [1] -0.9652683 $weights[[37]] [1] 0.8719215 $weights[[38]] [1] -0.917228 $weights[[39]] [1] 1.020896 $weights[[40]] [1] 0.4951883 $weights[[41]] [1] 0.7487854 $weights[[42]] [1] -0.7130144 $weights[[43]] [1] 0.598029 $weights[[44]] [1] 0.8097242 $weights[[45]] [1] -1.056401 $weights[[46]] [1] -0.2041643 $weights[[47]] [1] -0.9605507 $weights[[48]] [1] -0.2151837 $weights[[49]] [1] 0.9075675 $weights[[50]] [1] 0.004306968 $weights[[51]] [1] -0.4778498 $weights[[52]] [1] 0.3312689 $weights[[53]] [1] 0.6160091 $weights[[54]] [1] 0.431806 $weights[[55]] [1] -0.6039096 $weights[[56]] [1] -0.008508999 $weights[[57]] [1] 0.7539017 $weights[[58]] [1] -1.186487 $weights[[59]] [1] -0.8660557 $weights[[60]] [1] 0.4443504 $weights[[61]] [1] 0.5170843 $weights[[62]] [1] 0.08373222 $weights[[63]] [1] -1.039143 $weights[[64]] [1] -0.4787311 {code} this isn't the summary() right, it's the prediction I think > Make sparkR MLP summary() expose probability column > --- > > Key: SPARK-21802 > URL: https://issues.apache.org/jira/browse/SPARK-21802 > Project: Spark > Issue Type: New Feature > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Weichen Xu >Priority: Minor > > Make sparkR MLP summary() expose probability column -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20684) expose createOrReplaceGlobalTempView/createGlobalTempView and dropGlobalTempView in SparkR
[ https://issues.apache.org/jira/browse/SPARK-20684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160420#comment-16160420 ] Felix Cheung commented on SPARK-20684: -- I"m making this primary JIRA for tracking this issue and keeping this open. Please see the discussion in the PR. > expose createOrReplaceGlobalTempView/createGlobalTempView and > dropGlobalTempView in SparkR > -- > > Key: SPARK-20684 > URL: https://issues.apache.org/jira/browse/SPARK-20684 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Hossein Falaki > > This is a useful API that is not exposed in SparkR. It will help with moving > data between languages on a single single Spark application. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20684) expose createOrReplaceGlobalTempView/createGlobalTempView and dropGlobalTempView in SparkR
[ https://issues.apache.org/jira/browse/SPARK-20684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-20684: - Summary: expose createOrReplaceGlobalTempView/createGlobalTempView and dropGlobalTempView in SparkR (was: expose createGlobalTempView and dropGlobalTempView in SparkR) > expose createOrReplaceGlobalTempView/createGlobalTempView and > dropGlobalTempView in SparkR > -- > > Key: SPARK-20684 > URL: https://issues.apache.org/jira/browse/SPARK-20684 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Hossein Falaki > > This is a useful API that is not exposed in SparkR. It will help with moving > data between languages on a single single Spark application. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-20684) expose createGlobalTempView and dropGlobalTempView in SparkR
[ https://issues.apache.org/jira/browse/SPARK-20684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung reopened SPARK-20684: -- > expose createGlobalTempView and dropGlobalTempView in SparkR > > > Key: SPARK-20684 > URL: https://issues.apache.org/jira/browse/SPARK-20684 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Hossein Falaki > > This is a useful API that is not exposed in SparkR. It will help with moving > data between languages on a single single Spark application. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21128) Running R tests multiple times failed due to pre-exiting "spark-warehouse" / "metastore_db"
[ https://issues.apache.org/jira/browse/SPARK-21128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-21128: - Target Version/s: 2.2.1, 2.3.0 (was: 2.3.0) Fix Version/s: 2.2.1 > Running R tests multiple times failed due to pre-exiting "spark-warehouse" / > "metastore_db" > --- > > Key: SPARK-21128 > URL: https://issues.apache.org/jira/browse/SPARK-21128 > Project: Spark > Issue Type: Test > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Fix For: 2.2.1, 2.3.0 > > > Currently, running R tests multiple times fails due to pre-exiting > "spark-warehouse" / "metastore_db" as below: > {code} > SparkSQL functions: Spark package found in SPARK_HOME: .../spark > ...1234... > {code} > {code} > Failed > - > 1. Failure: No extra files are created in SPARK_HOME by starting session and > making calls (@test_sparkSQL.R#3384) > length(list1) not equal to length(list2). > 1/1 mismatches > [1] 25 - 23 == 2 > 2. Failure: No extra files are created in SPARK_HOME by starting session and > making calls (@test_sparkSQL.R#3384) > sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE). > 10/25 mismatches > x[16]: "metastore_db" > y[16]: "pkg" > x[17]: "pkg" > y[17]: "R" > x[18]: "R" > y[18]: "README.md" > x[19]: "README.md" > y[19]: "run-tests.sh" > x[20]: "run-tests.sh" > y[20]: "SparkR_2.2.0.tar.gz" > x[21]: "metastore_db" > y[21]: "pkg" > x[22]: "pkg" > y[22]: "R" > x[23]: "R" > y[23]: "README.md" > x[24]: "README.md" > y[24]: "run-tests.sh" > x[25]: "run-tests.sh" > y[25]: "SparkR_2.2.0.tar.gz" > 3. Failure: No extra files are created in SPARK_HOME by starting session and > making calls (@test_sparkSQL.R#3388) > length(list1) not equal to length(list2). > 1/1 mismatches > [1] 25 - 23 == 2 > 4. Failure: No extra files are created in SPARK_HOME by starting session and > making calls (@test_sparkSQL.R#3388) > sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE). > 10/25 mismatches > x[16]: "metastore_db" > y[16]: "pkg" > x[17]: "pkg" > y[17]: "R" > x[18]: "R" > y[18]: "README.md" > x[19]: "README.md" > y[19]: "run-tests.sh" > x[20]: "run-tests.sh" > y[20]: "SparkR_2.2.0.tar.gz" > x[21]: "metastore_db" > y[21]: "pkg" > x[22]: "pkg" > y[22]: "R" > x[23]: "R" > y[23]: "README.md" > x[24]: "README.md" > y[24]: "run-tests.sh" > x[25]: "run-tests.sh" > y[25]: "SparkR_2.2.0.tar.gz" > DONE > === > {code} > It looks we should remove both "spark-warehouse" and "metastore_db" _before_ > listing files into {{sparkRFilesBefore}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21727) Operating on an ArrayType in a SparkR DataFrame throws error
[ https://issues.apache.org/jira/browse/SPARK-21727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152973#comment-16152973 ] Felix Cheung edited comment on SPARK-21727 at 9/4/17 11:08 PM: --- precisely. as far as I can tell, everything should "just work" if we return "array" from `getSerdeType()` for this case when length > 1. was (Author: felixcheung): precisely. as far as I can tell, everything should "just work" if we return `array` from `getSerdeType()` for this case when length > 1. > Operating on an ArrayType in a SparkR DataFrame throws error > > > Key: SPARK-21727 > URL: https://issues.apache.org/jira/browse/SPARK-21727 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Neil McQuarrie > > Previously > [posted|https://stackoverflow.com/questions/45056973/sparkr-dataframe-with-r-lists-as-elements] > this as a stack overflow question but it seems to be a bug. > If I have an R data.frame where one of the column data types is an integer > *list* -- i.e., each of the elements in the column embeds an entire R list of > integers -- then it seems I can convert this data.frame to a SparkR DataFrame > just fine... SparkR treats the column as ArrayType(Double). > However, any subsequent operation on this SparkR DataFrame appears to throw > an error. > Create an example R data.frame: > {code} > indices <- 1:4 > myDf <- data.frame(indices) > myDf$data <- list(rep(0, 20))}} > {code} > Examine it to make sure it looks okay: > {code} > > str(myDf) > 'data.frame': 4 obs. of 2 variables: > $ indices: int 1 2 3 4 > $ data :List of 4 >..$ : num 0 0 0 0 0 0 0 0 0 0 ... >..$ : num 0 0 0 0 0 0 0 0 0 0 ... >..$ : num 0 0 0 0 0 0 0 0 0 0 ... >..$ : num 0 0 0 0 0 0 0 0 0 0 ... > > head(myDf) > indices data > 1 1 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > 2 2 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > 3 3 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > 4 4 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > {code} > Convert it to a SparkR DataFrame: > {code} > library(SparkR, lib.loc=paste0(Sys.getenv("SPARK_HOME"),"/R/lib")) > sparkR.session(master = "local[*]") > mySparkDf <- as.DataFrame(myDf) > {code} > Examine the SparkR DataFrame schema; notice that the list column was > successfully converted to ArrayType: > {code} > > schema(mySparkDf) > StructType > |-name = "indices", type = "IntegerType", nullable = TRUE > |-name = "data", type = "ArrayType(DoubleType,true)", nullable = TRUE > {code} > However, operating on the SparkR DataFrame throws an error: > {code} > > collect(mySparkDf) > 17/07/13 17:23:00 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 > (TID 1) > java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: > java.lang.Double is not a valid external type for schema of array > if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null > else validateexternaltype(getexternalrowfield(assertnotnull(input[0, > org.apache.spark.sql.Row, true]), 0, indices), IntegerType) AS indices#0 > ... long stack trace ... > {code} > Using Spark 2.2.0, R 3.4.0, Java 1.8.0_131, Windows 10. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21727) Operating on an ArrayType in a SparkR DataFrame throws error
[ https://issues.apache.org/jira/browse/SPARK-21727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152973#comment-16152973 ] Felix Cheung commented on SPARK-21727: -- precisely. as far as I can tell, everything should "just work" if we return `array` from `getSerdeType()` for this case when length > 1. > Operating on an ArrayType in a SparkR DataFrame throws error > > > Key: SPARK-21727 > URL: https://issues.apache.org/jira/browse/SPARK-21727 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Neil McQuarrie > > Previously > [posted|https://stackoverflow.com/questions/45056973/sparkr-dataframe-with-r-lists-as-elements] > this as a stack overflow question but it seems to be a bug. > If I have an R data.frame where one of the column data types is an integer > *list* -- i.e., each of the elements in the column embeds an entire R list of > integers -- then it seems I can convert this data.frame to a SparkR DataFrame > just fine... SparkR treats the column as ArrayType(Double). > However, any subsequent operation on this SparkR DataFrame appears to throw > an error. > Create an example R data.frame: > {code} > indices <- 1:4 > myDf <- data.frame(indices) > myDf$data <- list(rep(0, 20))}} > {code} > Examine it to make sure it looks okay: > {code} > > str(myDf) > 'data.frame': 4 obs. of 2 variables: > $ indices: int 1 2 3 4 > $ data :List of 4 >..$ : num 0 0 0 0 0 0 0 0 0 0 ... >..$ : num 0 0 0 0 0 0 0 0 0 0 ... >..$ : num 0 0 0 0 0 0 0 0 0 0 ... >..$ : num 0 0 0 0 0 0 0 0 0 0 ... > > head(myDf) > indices data > 1 1 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > 2 2 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > 3 3 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > 4 4 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > {code} > Convert it to a SparkR DataFrame: > {code} > library(SparkR, lib.loc=paste0(Sys.getenv("SPARK_HOME"),"/R/lib")) > sparkR.session(master = "local[*]") > mySparkDf <- as.DataFrame(myDf) > {code} > Examine the SparkR DataFrame schema; notice that the list column was > successfully converted to ArrayType: > {code} > > schema(mySparkDf) > StructType > |-name = "indices", type = "IntegerType", nullable = TRUE > |-name = "data", type = "ArrayType(DoubleType,true)", nullable = TRUE > {code} > However, operating on the SparkR DataFrame throws an error: > {code} > > collect(mySparkDf) > 17/07/13 17:23:00 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 > (TID 1) > java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: > java.lang.Double is not a valid external type for schema of array > if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null > else validateexternaltype(getexternalrowfield(assertnotnull(input[0, > org.apache.spark.sql.Row, true]), 0, indices), IntegerType) AS indices#0 > ... long stack trace ... > {code} > Using Spark 2.2.0, R 3.4.0, Java 1.8.0_131, Windows 10. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21727) Operating on an ArrayType in a SparkR DataFrame throws error
[ https://issues.apache.org/jira/browse/SPARK-21727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151589#comment-16151589 ] Felix Cheung commented on SPARK-21727: -- any taker of this change? > Operating on an ArrayType in a SparkR DataFrame throws error > > > Key: SPARK-21727 > URL: https://issues.apache.org/jira/browse/SPARK-21727 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Neil McQuarrie > > Previously > [posted|https://stackoverflow.com/questions/45056973/sparkr-dataframe-with-r-lists-as-elements] > this as a stack overflow question but it seems to be a bug. > If I have an R data.frame where one of the column data types is an integer > *list* -- i.e., each of the elements in the column embeds an entire R list of > integers -- then it seems I can convert this data.frame to a SparkR DataFrame > just fine... SparkR treats the column as ArrayType(Double). > However, any subsequent operation on this SparkR DataFrame appears to throw > an error. > Create an example R data.frame: > {code} > indices <- 1:4 > myDf <- data.frame(indices) > myDf$data <- list(rep(0, 20))}} > {code} > Examine it to make sure it looks okay: > {code} > > str(myDf) > 'data.frame': 4 obs. of 2 variables: > $ indices: int 1 2 3 4 > $ data :List of 4 >..$ : num 0 0 0 0 0 0 0 0 0 0 ... >..$ : num 0 0 0 0 0 0 0 0 0 0 ... >..$ : num 0 0 0 0 0 0 0 0 0 0 ... >..$ : num 0 0 0 0 0 0 0 0 0 0 ... > > head(myDf) > indices data > 1 1 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > 2 2 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > 3 3 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > 4 4 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > {code} > Convert it to a SparkR DataFrame: > {code} > library(SparkR, lib.loc=paste0(Sys.getenv("SPARK_HOME"),"/R/lib")) > sparkR.session(master = "local[*]") > mySparkDf <- as.DataFrame(myDf) > {code} > Examine the SparkR DataFrame schema; notice that the list column was > successfully converted to ArrayType: > {code} > > schema(mySparkDf) > StructType > |-name = "indices", type = "IntegerType", nullable = TRUE > |-name = "data", type = "ArrayType(DoubleType,true)", nullable = TRUE > {code} > However, operating on the SparkR DataFrame throws an error: > {code} > > collect(mySparkDf) > 17/07/13 17:23:00 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 > (TID 1) > java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: > java.lang.Double is not a valid external type for schema of array > if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null > else validateexternaltype(getexternalrowfield(assertnotnull(input[0, > org.apache.spark.sql.Row, true]), 0, indices), IntegerType) AS indices#0 > ... long stack trace ... > {code} > Using Spark 2.2.0, R 3.4.0, Java 1.8.0_131, Windows 10. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21727) Operating on an ArrayType in a SparkR DataFrame throws error
[ https://issues.apache.org/jira/browse/SPARK-21727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151588#comment-16151588 ] Felix Cheung commented on SPARK-21727: -- That is true. I think the documentation is unclear in this case - it should say the vector should be converted to list or similar. In fact the code explicitly does not support column with atomic vector values into array column type https://github.com/apache/spark/blob/master/R/pkg/R/serialize.R#L54 But with that said, I think we could and should make a minor change to support that implicitly https://github.com/apache/spark/blob/master/R/pkg/R/serialize.R#L39 > Operating on an ArrayType in a SparkR DataFrame throws error > > > Key: SPARK-21727 > URL: https://issues.apache.org/jira/browse/SPARK-21727 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Neil McQuarrie > > Previously > [posted|https://stackoverflow.com/questions/45056973/sparkr-dataframe-with-r-lists-as-elements] > this as a stack overflow question but it seems to be a bug. > If I have an R data.frame where one of the column data types is an integer > *list* -- i.e., each of the elements in the column embeds an entire R list of > integers -- then it seems I can convert this data.frame to a SparkR DataFrame > just fine... SparkR treats the column as ArrayType(Double). > However, any subsequent operation on this SparkR DataFrame appears to throw > an error. > Create an example R data.frame: > {code} > indices <- 1:4 > myDf <- data.frame(indices) > myDf$data <- list(rep(0, 20))}} > {code} > Examine it to make sure it looks okay: > {code} > > str(myDf) > 'data.frame': 4 obs. of 2 variables: > $ indices: int 1 2 3 4 > $ data :List of 4 >..$ : num 0 0 0 0 0 0 0 0 0 0 ... >..$ : num 0 0 0 0 0 0 0 0 0 0 ... >..$ : num 0 0 0 0 0 0 0 0 0 0 ... >..$ : num 0 0 0 0 0 0 0 0 0 0 ... > > head(myDf) > indices data > 1 1 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > 2 2 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > 3 3 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > 4 4 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > {code} > Convert it to a SparkR DataFrame: > {code} > library(SparkR, lib.loc=paste0(Sys.getenv("SPARK_HOME"),"/R/lib")) > sparkR.session(master = "local[*]") > mySparkDf <- as.DataFrame(myDf) > {code} > Examine the SparkR DataFrame schema; notice that the list column was > successfully converted to ArrayType: > {code} > > schema(mySparkDf) > StructType > |-name = "indices", type = "IntegerType", nullable = TRUE > |-name = "data", type = "ArrayType(DoubleType,true)", nullable = TRUE > {code} > However, operating on the SparkR DataFrame throws an error: > {code} > > collect(mySparkDf) > 17/07/13 17:23:00 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 > (TID 1) > java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: > java.lang.Double is not a valid external type for schema of array > if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null > else validateexternaltype(getexternalrowfield(assertnotnull(input[0, > org.apache.spark.sql.Row, true]), 0, indices), IntegerType) AS indices#0 > ... long stack trace ... > {code} > Using Spark 2.2.0, R 3.4.0, Java 1.8.0_131, Windows 10. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21727) Operating on an ArrayType in a SparkR DataFrame throws error
[ https://issues.apache.org/jira/browse/SPARK-21727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151437#comment-16151437 ] Felix Cheung commented on SPARK-21727: -- hmm.. I think that's what the error message is saying {code} java.lang.Double is not a valid external type for schema of array {code} it's finding a double and not an array of double > Operating on an ArrayType in a SparkR DataFrame throws error > > > Key: SPARK-21727 > URL: https://issues.apache.org/jira/browse/SPARK-21727 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Neil McQuarrie > > Previously > [posted|https://stackoverflow.com/questions/45056973/sparkr-dataframe-with-r-lists-as-elements] > this as a stack overflow question but it seems to be a bug. > If I have an R data.frame where one of the column data types is an integer > *list* -- i.e., each of the elements in the column embeds an entire R list of > integers -- then it seems I can convert this data.frame to a SparkR DataFrame > just fine... SparkR treats the column as ArrayType(Double). > However, any subsequent operation on this SparkR DataFrame appears to throw > an error. > Create an example R data.frame: > {code} > indices <- 1:4 > myDf <- data.frame(indices) > myDf$data <- list(rep(0, 20))}} > {code} > Examine it to make sure it looks okay: > {code} > > str(myDf) > 'data.frame': 4 obs. of 2 variables: > $ indices: int 1 2 3 4 > $ data :List of 4 >..$ : num 0 0 0 0 0 0 0 0 0 0 ... >..$ : num 0 0 0 0 0 0 0 0 0 0 ... >..$ : num 0 0 0 0 0 0 0 0 0 0 ... >..$ : num 0 0 0 0 0 0 0 0 0 0 ... > > head(myDf) > indices data > 1 1 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > 2 2 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > 3 3 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > 4 4 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > {code} > Convert it to a SparkR DataFrame: > {code} > library(SparkR, lib.loc=paste0(Sys.getenv("SPARK_HOME"),"/R/lib")) > sparkR.session(master = "local[*]") > mySparkDf <- as.DataFrame(myDf) > {code} > Examine the SparkR DataFrame schema; notice that the list column was > successfully converted to ArrayType: > {code} > > schema(mySparkDf) > StructType > |-name = "indices", type = "IntegerType", nullable = TRUE > |-name = "data", type = "ArrayType(DoubleType,true)", nullable = TRUE > {code} > However, operating on the SparkR DataFrame throws an error: > {code} > > collect(mySparkDf) > 17/07/13 17:23:00 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 > (TID 1) > java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: > java.lang.Double is not a valid external type for schema of array > if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null > else validateexternaltype(getexternalrowfield(assertnotnull(input[0, > org.apache.spark.sql.Row, true]), 0, indices), IntegerType) AS indices#0 > ... long stack trace ... > {code} > Using Spark 2.2.0, R 3.4.0, Java 1.8.0_131, Windows 10. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12157) Support numpy types as return values of Python UDFs
[ https://issues.apache.org/jira/browse/SPARK-12157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151336#comment-16151336 ] Felix Cheung commented on SPARK-12157: -- any more thought on this? I think we should at least document this if this is won't fix. > Support numpy types as return values of Python UDFs > --- > > Key: SPARK-12157 > URL: https://issues.apache.org/jira/browse/SPARK-12157 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 1.5.2 >Reporter: Justin Uang > > Currently, if I have a python UDF > {code} > import pyspark.sql.types as T > import pyspark.sql.functions as F > from pyspark.sql import Row > import numpy as np > argmax = F.udf(lambda x: np.argmax(x), T.IntegerType()) > df = sqlContext.createDataFrame([Row(array=[1,2,3])]) > df.select(argmax("array")).count() > {code} > I get an exception that is fairly opaque: > {code} > Caused by: net.razorvine.pickle.PickleException: expected zero arguments for > construction of ClassDict (for numpy.dtype) > at > net.razorvine.pickle.objects.ClassDictConstructor.construct(ClassDictConstructor.java:23) > at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:701) > at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:171) > at net.razorvine.pickle.Unpickler.load(Unpickler.java:85) > at net.razorvine.pickle.Unpickler.loads(Unpickler.java:98) > at > org.apache.spark.sql.execution.BatchPythonEvaluation$$anonfun$doExecute$1$$anonfun$apply$3.apply(python.scala:404) > at > org.apache.spark.sql.execution.BatchPythonEvaluation$$anonfun$doExecute$1$$anonfun$apply$3.apply(python.scala:403) > {code} > Numpy types like np.int and np.float64 should automatically be cast to the > proper dtypes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21801) SparkR unit test randomly fail on trees
[ https://issues.apache.org/jira/browse/SPARK-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung reassigned SPARK-21801: Assignee: Felix Cheung > SparkR unit test randomly fail on trees > --- > > Key: SPARK-21801 > URL: https://issues.apache.org/jira/browse/SPARK-21801 > Project: Spark > Issue Type: Bug > Components: SparkR, Tests >Affects Versions: 2.2.0 >Reporter: Weichen Xu >Assignee: Felix Cheung >Priority: Critical > Fix For: 2.3.0 > > > SparkR unit test sometimes will randomly occur such error: > ``` > 1. Error: spark.randomForest (@test_mllib_tree.R#236) > -- > java.lang.IllegalArgumentException: requirement failed: The input column > stridx_87ea3065aeb2 should have at least two distinct values. > ``` > or > ``` > 1. Error: spark.decisionTree (@test_mllib_tree.R#353) > -- > java.lang.IllegalArgumentException: requirement failed: The input column > stridx_d6a0b492cfa1 should have at least two distinct values. > ``` -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21801) SparkR unit test randomly fail on trees
[ https://issues.apache.org/jira/browse/SPARK-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-21801. -- Resolution: Fixed Fix Version/s: 2.3.0 > SparkR unit test randomly fail on trees > --- > > Key: SPARK-21801 > URL: https://issues.apache.org/jira/browse/SPARK-21801 > Project: Spark > Issue Type: Bug > Components: SparkR, Tests >Affects Versions: 2.2.0 >Reporter: Weichen Xu >Assignee: Felix Cheung >Priority: Critical > Fix For: 2.3.0 > > > SparkR unit test sometimes will randomly occur such error: > ``` > 1. Error: spark.randomForest (@test_mllib_tree.R#236) > -- > java.lang.IllegalArgumentException: requirement failed: The input column > stridx_87ea3065aeb2 should have at least two distinct values. > ``` > or > ``` > 1. Error: spark.decisionTree (@test_mllib_tree.R#353) > -- > java.lang.IllegalArgumentException: requirement failed: The input column > stridx_d6a0b492cfa1 should have at least two distinct values. > ``` -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21805) disable R vignettes code on Windows
[ https://issues.apache.org/jira/browse/SPARK-21805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-21805. -- Resolution: Fixed Assignee: Felix Cheung Fix Version/s: 2.3.0 2.2.1 Target Version/s: 2.2.1, 2.3.0 > disable R vignettes code on Windows > --- > > Key: SPARK-21805 > URL: https://issues.apache.org/jira/browse/SPARK-21805 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.0, 2.3.0 >Reporter: Felix Cheung >Assignee: Felix Cheung > Fix For: 2.2.1, 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15799) Release SparkR on CRAN
[ https://issues.apache.org/jira/browse/SPARK-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138099#comment-16138099 ] Felix Cheung commented on SPARK-15799: -- [~shivaram] might have more updates. The CRAN submission process is tied to the Maintainer email. There was a few comments from the submission. SPARK-21805 should fix the main part of it, and I recall now there is another one with the description text - I'll fix that as well. > Release SparkR on CRAN > -- > > Key: SPARK-15799 > URL: https://issues.apache.org/jira/browse/SPARK-15799 > Project: Spark > Issue Type: New Feature > Components: SparkR >Reporter: Xiangrui Meng > > Story: "As an R user, I would like to see SparkR released on CRAN, so I can > use SparkR easily in an existing R environment and have other packages built > on top of SparkR." > I made this JIRA with the following questions in mind: > * Are there known issues that prevent us releasing SparkR on CRAN? > * Do we want to package Spark jars in the SparkR release? > * Are there license issues? > * How does it fit into Spark's release process? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21805) disable R vignettes code on Windows
[ https://issues.apache.org/jira/browse/SPARK-21805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138097#comment-16138097 ] Felix Cheung commented on SPARK-21805: -- https://github.com/apache/spark/pull/19016 > disable R vignettes code on Windows > --- > > Key: SPARK-21805 > URL: https://issues.apache.org/jira/browse/SPARK-21805 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.0, 2.3.0 >Reporter: Felix Cheung > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12157) Support numpy types as return values of Python UDFs
[ https://issues.apache.org/jira/browse/SPARK-12157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137028#comment-16137028 ] Felix Cheung commented on SPARK-12157: -- seems like we have a couple of issues here. I ran into this recently with scalar types - where are we on this? > Support numpy types as return values of Python UDFs > --- > > Key: SPARK-12157 > URL: https://issues.apache.org/jira/browse/SPARK-12157 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 1.5.2 >Reporter: Justin Uang > > Currently, if I have a python UDF > {code} > import pyspark.sql.types as T > import pyspark.sql.functions as F > from pyspark.sql import Row > import numpy as np > argmax = F.udf(lambda x: np.argmax(x), T.IntegerType()) > df = sqlContext.createDataFrame([Row(array=[1,2,3])]) > df.select(argmax("array")).count() > {code} > I get an exception that is fairly opaque: > {code} > Caused by: net.razorvine.pickle.PickleException: expected zero arguments for > construction of ClassDict (for numpy.dtype) > at > net.razorvine.pickle.objects.ClassDictConstructor.construct(ClassDictConstructor.java:23) > at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:701) > at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:171) > at net.razorvine.pickle.Unpickler.load(Unpickler.java:85) > at net.razorvine.pickle.Unpickler.loads(Unpickler.java:98) > at > org.apache.spark.sql.execution.BatchPythonEvaluation$$anonfun$doExecute$1$$anonfun$apply$3.apply(python.scala:404) > at > org.apache.spark.sql.execution.BatchPythonEvaluation$$anonfun$doExecute$1$$anonfun$apply$3.apply(python.scala:403) > {code} > Numpy types like np.int and np.float64 should automatically be cast to the > proper dtypes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21801) SparkR unit test randomly fail on trees
[ https://issues.apache.org/jira/browse/SPARK-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136445#comment-16136445 ] Felix Cheung commented on SPARK-21801: -- https://github.com/apache/spark/pull/19018 > SparkR unit test randomly fail on trees > --- > > Key: SPARK-21801 > URL: https://issues.apache.org/jira/browse/SPARK-21801 > Project: Spark > Issue Type: Bug > Components: SparkR, Tests >Affects Versions: 2.2.0 >Reporter: Weichen Xu >Priority: Critical > > SparkR unit test sometimes will randomly occur such error: > ``` > 1. Error: spark.randomForest (@test_mllib_tree.R#236) > -- > java.lang.IllegalArgumentException: requirement failed: The input column > stridx_87ea3065aeb2 should have at least two distinct values. > ``` > or > ``` > 1. Error: spark.decisionTree (@test_mllib_tree.R#353) > -- > java.lang.IllegalArgumentException: requirement failed: The input column > stridx_d6a0b492cfa1 should have at least two distinct values. > ``` -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21805) disable R vignettes code on Windows
Felix Cheung created SPARK-21805: Summary: disable R vignettes code on Windows Key: SPARK-21805 URL: https://issues.apache.org/jira/browse/SPARK-21805 Project: Spark Issue Type: Bug Components: SparkR Affects Versions: 2.2.0, 2.3.0 Reporter: Felix Cheung -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21584) Update R method for summary to call new implementation
[ https://issues.apache.org/jira/browse/SPARK-21584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-21584. -- Resolution: Fixed Fix Version/s: 2.3.0 Target Version/s: 2.3.0 > Update R method for summary to call new implementation > -- > > Key: SPARK-21584 > URL: https://issues.apache.org/jira/browse/SPARK-21584 > Project: Spark > Issue Type: Improvement > Components: SparkR, SQL >Affects Versions: 2.3.0 >Reporter: Andrew Ray >Assignee: Andrew Ray > Fix For: 2.3.0 > > > Follow up to SPARK-21100 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21801) SparkR unit test randomly fail on trees
[ https://issues.apache.org/jira/browse/SPARK-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136370#comment-16136370 ] Felix Cheung commented on SPARK-21801: -- I've seen it a couple of times. Looking at the test now i think this is because the tests are not run with a random seed. let me submit a PR to see if it addresses the failure > SparkR unit test randomly fail on trees > --- > > Key: SPARK-21801 > URL: https://issues.apache.org/jira/browse/SPARK-21801 > Project: Spark > Issue Type: Bug > Components: SparkR, Tests >Affects Versions: 2.2.0 >Reporter: Weichen Xu >Priority: Critical > > SparkR unit test sometimes will randomly occur such error: > ``` > 1. Error: spark.randomForest (@test_mllib_tree.R#236) > -- > java.lang.IllegalArgumentException: requirement failed: The input column > stridx_87ea3065aeb2 should have at least two distinct values. > ``` > or > ``` > 1. Error: spark.decisionTree (@test_mllib_tree.R#353) > -- > java.lang.IllegalArgumentException: requirement failed: The input column > stridx_d6a0b492cfa1 should have at least two distinct values. > ``` -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21693) AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests
[ https://issues.apache.org/jira/browse/SPARK-21693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121866#comment-16121866 ] Felix Cheung commented on SPARK-21693: -- splitting test matrix is also possible, I worry though since caching is disabled, then isn't Spark jar being built multiple times? My main concerns are how long tests will run and whether that will lengthen queuing of test runs (which could get quite long already and people are ignoring pending appveyor runs sometimes) > AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests > - > > Key: SPARK-21693 > URL: https://issues.apache.org/jira/browse/SPARK-21693 > Project: Spark > Issue Type: Test > Components: Build, SparkR >Affects Versions: 2.3.0 >Reporter: Hyukjin Kwon > > We finally sometimes reach the time limit, 1.5 hours, > https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master > I requested to increase this from an hour to 1.5 hours before but it looks we > should fix this in AppVeyor. I asked this for my account few times before but > it looks we can't increase this time limit again and again. > I could identify two things that look taking a quite a bit of time: > 1. Disabled cache feature in pull request builder, which ends up downloading > Maven dependencies (10-20ish mins) > https://www.appveyor.com/docs/build-cache/ > {quote} > Note: Saving cache is disabled in Pull Request builds. > {quote} > and also see > http://help.appveyor.com/discussions/problems/4159-cache-doesnt-seem-to-be-working > This seems difficult to fix within Spark. > 2. "MLlib classification algorithms" tests (30-35ish mins) > This test below looks taking 30-35ish mins. > {code} > MLlib classification algorithms, except for tree-based algorithms: Spark > package found in SPARK_HOME: C:\projects\spark\bin\.. > .. > {code} > As a (I think) last resort, we could make a matrix for this test alone, so > that we run the other tests after a build and then run this test after > another build, for example, I run Scala tests by this workaround - > https://ci.appveyor.com/project/spark-test/spark/build/757-20170716 (a matrix > with 7 build and test each). > I am also checking and testing other ways. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21693) AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests
[ https://issues.apache.org/jira/browse/SPARK-21693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121860#comment-16121860 ] Felix Cheung commented on SPARK-21693: -- we could certainly simplify the classification set - but there's a fair number of API being tested in their, perhaps we could time them to see which ones are taking time. > AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests > - > > Key: SPARK-21693 > URL: https://issues.apache.org/jira/browse/SPARK-21693 > Project: Spark > Issue Type: Test > Components: Build, SparkR >Affects Versions: 2.3.0 >Reporter: Hyukjin Kwon > > We finally sometimes reach the time limit, 1.5 hours, > https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master > I requested to increase this from an hour to 1.5 hours before but it looks we > should fix this in AppVeyor. I asked this for my account few times before but > it looks we can't increase this time limit again and again. > I could identify two things that look taking a quite a bit of time: > 1. Disabled cache feature in pull request builder, which ends up downloading > Maven dependencies (10-20ish mins) > https://www.appveyor.com/docs/build-cache/ > {quote} > Note: Saving cache is disabled in Pull Request builds. > {quote} > and also see > http://help.appveyor.com/discussions/problems/4159-cache-doesnt-seem-to-be-working > This seems difficult to fix within Spark. > 2. "MLlib classification algorithms" tests (30-35ish mins) > This test below looks taking 30-35ish mins. > {code} > MLlib classification algorithms, except for tree-based algorithms: Spark > package found in SPARK_HOME: C:\projects\spark\bin\.. > .. > {code} > As a (I think) last resort, we could make a matrix for this test alone, so > that we run the other tests after a build and then run this test after > another build, for example, I run Scala tests by this workaround - > https://ci.appveyor.com/project/spark-test/spark/build/757-20170716 (a matrix > with 7 build and test each). > I am also checking and testing other ways. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21622) Support Offset in SparkR
[ https://issues.apache.org/jira/browse/SPARK-21622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-21622. -- Resolution: Fixed Assignee: Wayne Zhang Fix Version/s: 2.3.0 Target Version/s: 2.3.0 > Support Offset in SparkR > > > Key: SPARK-21622 > URL: https://issues.apache.org/jira/browse/SPARK-21622 > Project: Spark > Issue Type: Improvement > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Wayne Zhang >Assignee: Wayne Zhang > Fix For: 2.3.0 > > > Support offset in GLM in SparkR. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21622) Support Offset in SparkR
[ https://issues.apache.org/jira/browse/SPARK-21622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115951#comment-16115951 ] Felix Cheung commented on SPARK-21622: -- https://github.com/apache/spark/pull/18831 > Support Offset in SparkR > > > Key: SPARK-21622 > URL: https://issues.apache.org/jira/browse/SPARK-21622 > Project: Spark > Issue Type: Improvement > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Wayne Zhang > > Support offset in GLM in SparkR. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15799) Release SparkR on CRAN
[ https://issues.apache.org/jira/browse/SPARK-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113974#comment-16113974 ] Felix Cheung commented on SPARK-15799: -- we submitted 2.2.0 release to CRAN and got some comment that we hope to resolve (or get an exception, if we could).. > Release SparkR on CRAN > -- > > Key: SPARK-15799 > URL: https://issues.apache.org/jira/browse/SPARK-15799 > Project: Spark > Issue Type: New Feature > Components: SparkR >Reporter: Xiangrui Meng > > Story: "As an R user, I would like to see SparkR released on CRAN, so I can > use SparkR easily in an existing R environment and have other packages built > on top of SparkR." > I made this JIRA with the following questions in mind: > * Are there known issues that prevent us releasing SparkR on CRAN? > * Do we want to package Spark jars in the SparkR release? > * Are there license issues? > * How does it fit into Spark's release process? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21367) R older version of Roxygen2 on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113145#comment-16113145 ] Felix Cheung commented on SPARK-21367: -- still seeing it Warning messages: 1: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 2: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 * installing *source* package 'SparkR' ... https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80213/console > R older version of Roxygen2 on Jenkins > -- > > Key: SPARK-21367 > URL: https://issues.apache.org/jira/browse/SPARK-21367 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: shane knapp > Attachments: R.paks > > > Getting this message from a recent build. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console > Warning messages: > 1: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > 2: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > * installing *source* package 'SparkR' ... > ** R > We have been running with 5.0.1 and haven't changed for a year. > NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21584) Update R method for summary to call new implementation
[ https://issues.apache.org/jira/browse/SPARK-21584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung reassigned SPARK-21584: Assignee: Andrew Ray > Update R method for summary to call new implementation > -- > > Key: SPARK-21584 > URL: https://issues.apache.org/jira/browse/SPARK-21584 > Project: Spark > Issue Type: Improvement > Components: SparkR, SQL >Affects Versions: 2.3.0 >Reporter: Andrew Ray >Assignee: Andrew Ray > > Follow up to SPARK-21100 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21616) SparkR 2.3.0 migration guide, release note
[ https://issues.apache.org/jira/browse/SPARK-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111483#comment-16111483 ] Felix Cheung commented on SPARK-21616: -- SPARK-21584 > SparkR 2.3.0 migration guide, release note > -- > > Key: SPARK-21616 > URL: https://issues.apache.org/jira/browse/SPARK-21616 > Project: Spark > Issue Type: Documentation > Components: Documentation, SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: Felix Cheung > Fix For: 2.3.0 > > > From looking at changes since 2.2.0, this/these should be documented in the > migration guide / release note for the 2.3.0 release, as it is behavior > changes -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21584) Update R method for summary to call new implementation
[ https://issues.apache.org/jira/browse/SPARK-21584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111485#comment-16111485 ] Felix Cheung commented on SPARK-21584: -- https://github.com/apache/spark/pull/18786 > Update R method for summary to call new implementation > -- > > Key: SPARK-21584 > URL: https://issues.apache.org/jira/browse/SPARK-21584 > Project: Spark > Issue Type: Improvement > Components: SparkR, SQL >Affects Versions: 2.3.0 >Reporter: Andrew Ray > > Follow up to SPARK-21100 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21616) SparkR 2.3.0 migration guide, release note
[ https://issues.apache.org/jira/browse/SPARK-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-21616: - Fix Version/s: (was: 2.2.0) > SparkR 2.3.0 migration guide, release note > -- > > Key: SPARK-21616 > URL: https://issues.apache.org/jira/browse/SPARK-21616 > Project: Spark > Issue Type: Documentation > Components: Documentation, SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: Felix Cheung > Fix For: 2.3.0 > > > From looking at changes since 2.1.0, this/these should be documented in the > migration guide / release note for the 2.2.0 release, as it is behavior > changes > https://github.com/apache/spark/commit/422aa67d1bb84f913b06e6d94615adb6557e2870 > https://github.com/apache/spark/pull/17483 (createExternalTable) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21616) SparkR 2.3.0 migration guide, release note
[ https://issues.apache.org/jira/browse/SPARK-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-21616: - Description: >From looking at changes since 2.2.0, this/these should be documented in the >migration guide / release note for the 2.3.0 release, as it is behavior changes was: >From looking at changes since 2.1.0, this/these should be documented in the >migration guide / release note for the 2.2.0 release, as it is behavior changes https://github.com/apache/spark/commit/422aa67d1bb84f913b06e6d94615adb6557e2870 https://github.com/apache/spark/pull/17483 (createExternalTable) > SparkR 2.3.0 migration guide, release note > -- > > Key: SPARK-21616 > URL: https://issues.apache.org/jira/browse/SPARK-21616 > Project: Spark > Issue Type: Documentation > Components: Documentation, SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: Felix Cheung > Fix For: 2.3.0 > > > From looking at changes since 2.2.0, this/these should be documented in the > migration guide / release note for the 2.3.0 release, as it is behavior > changes -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21616) SparkR 2.3.0 migration guide, release note
[ https://issues.apache.org/jira/browse/SPARK-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-21616: - Affects Version/s: (was: 2.2.0) 2.3.0 > SparkR 2.3.0 migration guide, release note > -- > > Key: SPARK-21616 > URL: https://issues.apache.org/jira/browse/SPARK-21616 > Project: Spark > Issue Type: Documentation > Components: Documentation, SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: Felix Cheung > Fix For: 2.3.0 > > > From looking at changes since 2.1.0, this/these should be documented in the > migration guide / release note for the 2.2.0 release, as it is behavior > changes > https://github.com/apache/spark/commit/422aa67d1bb84f913b06e6d94615adb6557e2870 > https://github.com/apache/spark/pull/17483 (createExternalTable) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21616) SparkR 2.3.0 migration guide, release note
[ https://issues.apache.org/jira/browse/SPARK-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-21616: - Summary: SparkR 2.3.0 migration guide, release note (was: CLONE - SparkR 2.2.0 migration guide, release note) > SparkR 2.3.0 migration guide, release note > -- > > Key: SPARK-21616 > URL: https://issues.apache.org/jira/browse/SPARK-21616 > Project: Spark > Issue Type: Documentation > Components: Documentation, SparkR >Affects Versions: 2.2.0 >Reporter: Felix Cheung >Assignee: Felix Cheung > Fix For: 2.2.0, 2.3.0 > > > From looking at changes since 2.1.0, this/these should be documented in the > migration guide / release note for the 2.2.0 release, as it is behavior > changes > https://github.com/apache/spark/commit/422aa67d1bb84f913b06e6d94615adb6557e2870 > https://github.com/apache/spark/pull/17483 (createExternalTable) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21616) SparkR 2.3.0 migration guide, release note
[ https://issues.apache.org/jira/browse/SPARK-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-21616: - Target Version/s: 2.3.0 (was: 2.2.0, 2.3.0) > SparkR 2.3.0 migration guide, release note > -- > > Key: SPARK-21616 > URL: https://issues.apache.org/jira/browse/SPARK-21616 > Project: Spark > Issue Type: Documentation > Components: Documentation, SparkR >Affects Versions: 2.2.0 >Reporter: Felix Cheung >Assignee: Felix Cheung > Fix For: 2.2.0, 2.3.0 > > > From looking at changes since 2.1.0, this/these should be documented in the > migration guide / release note for the 2.2.0 release, as it is behavior > changes > https://github.com/apache/spark/commit/422aa67d1bb84f913b06e6d94615adb6557e2870 > https://github.com/apache/spark/pull/17483 (createExternalTable) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21616) CLONE - SparkR 2.2.0 migration guide, release note
Felix Cheung created SPARK-21616: Summary: CLONE - SparkR 2.2.0 migration guide, release note Key: SPARK-21616 URL: https://issues.apache.org/jira/browse/SPARK-21616 Project: Spark Issue Type: Documentation Components: Documentation, SparkR Affects Versions: 2.2.0 Reporter: Felix Cheung Assignee: Felix Cheung Fix For: 2.2.0, 2.3.0 >From looking at changes since 2.1.0, this/these should be documented in the >migration guide / release note for the 2.2.0 release, as it is behavior changes https://github.com/apache/spark/commit/422aa67d1bb84f913b06e6d94615adb6557e2870 https://github.com/apache/spark/pull/17483 (createExternalTable) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21381) SparkR: pass on setHandleInvalid for classification algorithms
[ https://issues.apache.org/jira/browse/SPARK-21381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-21381. -- Resolution: Fixed Assignee: Miao Wang Fix Version/s: 2.3.0 Target Version/s: 2.3.0 > SparkR: pass on setHandleInvalid for classification algorithms > -- > > Key: SPARK-21381 > URL: https://issues.apache.org/jira/browse/SPARK-21381 > Project: Spark > Issue Type: Improvement > Components: SparkR >Affects Versions: 2.1.1 >Reporter: Miao Wang >Assignee: Miao Wang > Fix For: 2.3.0 > > > SPARK-20307 Added handleInvalid option to RFormula for tree-based > classification algorithms. We should add this parameter for other > classification algorithms in SparkR. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18226) SparkR displaying vector columns in incorrect way
[ https://issues.apache.org/jira/browse/SPARK-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094277#comment-16094277 ] Felix Cheung commented on SPARK-18226: -- I agree, usability is poor, though I think if you subset the collected data.frame, you should be able to operate the environment in the specific row and column individually... > SparkR displaying vector columns in incorrect way > - > > Key: SPARK-18226 > URL: https://issues.apache.org/jira/browse/SPARK-18226 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.0 >Reporter: Grzegorz Chilkiewicz >Priority: Trivial > > I have encountered a problem with SparkR presenting Spark vectors from > org.apache.spark.mllib.linalg package > * `head(df)` shows in vector column: "" > * cast to string does not work as expected, it shows: > "[1,null,null,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@79f50a91]" > * `showDF(df)` work correctly > to reproduce, start SparkR and paste following code (example taken from > https://spark.apache.org/docs/latest/sparkr.html#naive-bayes-model) > {code} > # Fit a Bernoulli naive Bayes model with spark.naiveBayes > titanic <- as.data.frame(Titanic) > titanicDF <- createDataFrame(titanic[titanic$Freq > 0, -5]) > nbDF <- titanicDF > nbTestDF <- titanicDF > nbModel <- spark.naiveBayes(nbDF, Survived ~ Class + Sex + Age) > # Model summary > summary(nbModel) > # Prediction > nbPredictions <- predict(nbModel, nbTestDF) > # > # My modification to expose the problem # > nbPredictions$rawPrediction_str <- cast(nbPredictions$rawPrediction, "string") > head(nbPredictions) > showDF(nbPredictions) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18226) SparkR displaying vector columns in incorrect way
[ https://issues.apache.org/jira/browse/SPARK-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16093466#comment-16093466 ] Felix Cheung commented on SPARK-18226: -- if you collect on what's returned by predict(), you should be able to manipulate it in native R? > SparkR displaying vector columns in incorrect way > - > > Key: SPARK-18226 > URL: https://issues.apache.org/jira/browse/SPARK-18226 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.0 >Reporter: Grzegorz Chilkiewicz >Priority: Trivial > > I have encountered a problem with SparkR presenting Spark vectors from > org.apache.spark.mllib.linalg package > * `head(df)` shows in vector column: "" > * cast to string does not work as expected, it shows: > "[1,null,null,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@79f50a91]" > * `showDF(df)` work correctly > to reproduce, start SparkR and paste following code (example taken from > https://spark.apache.org/docs/latest/sparkr.html#naive-bayes-model) > {code} > # Fit a Bernoulli naive Bayes model with spark.naiveBayes > titanic <- as.data.frame(Titanic) > titanicDF <- createDataFrame(titanic[titanic$Freq > 0, -5]) > nbDF <- titanicDF > nbTestDF <- titanicDF > nbModel <- spark.naiveBayes(nbDF, Survived ~ Class + Sex + Age) > # Model summary > summary(nbModel) > # Prediction > nbPredictions <- predict(nbModel, nbTestDF) > # > # My modification to expose the problem # > nbPredictions$rawPrediction_str <- cast(nbPredictions$rawPrediction, "string") > head(nbPredictions) > showDF(nbPredictions) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21450) List of NA is flattened inside a SparkR struct type
[ https://issues.apache.org/jira/browse/SPARK-21450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091132#comment-16091132 ] Felix Cheung commented on SPARK-21450: -- [~hyukjin.kwon] If you follow the code in test_sparkSQL.R, {code} df <- as.DataFrame(list(list("col" = "{\"date\":\"21/10/2014\"}"))) schema2 <- structType(structField("date", "date")) s <- collect(select(df, from_json(df$col, schema2))) expect_equal(s[[1]][[1]], NA) s <- collect(select(df, from_json(df$col, schema2, dateFormat = "dd/MM/"))) {code} Both lines should be using schema2 - not schema. schema is actually defined as {code} schema <- structType(structField("age", "integer"), structField("height", "double")) {code} which doesn't match the json blob. Is this a copy/paste error in this JIRA? could you check? In any case, I wonder - didn't get to test it in Scala - if the different result is cause by unparseable json blob because schema/format passed in. The logi NA would be a null in Scala > List of NA is flattened inside a SparkR struct type > --- > > Key: SPARK-21450 > URL: https://issues.apache.org/jira/browse/SPARK-21450 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Hossein Falaki > > Consider the following two cases copied from {{test_sparkSQL.R}}: > {code} > df <- as.DataFrame(list(list("col" = "{\"date\":\"21/10/2014\"}"))) > schema <- structType(structField("date", "date")) > s1 <- collect(select(df, from_json(df$col, schema))) > s2 <- collect(select(df, from_json(df$col, schema2, dateFormat = > "dd/MM/"))) > {code} > If you inspect s1 using {{str(s1)}} you will find: > {code} > 'data.frame': 2 obs. of 1 variable: > $ jsontostructs(col):List of 2 > ..$ : logi NA > {code} > But for s2, running {{str(s2)}} results in: > {code} > 'data.frame': 2 obs. of 1 variable: > $ jsontostructs(col):List of 2 > ..$ :List of 1 > .. ..$ date: Date, format: "2014-10-21" > .. ..- attr(*, "class")= chr "struct" > {code} > I assume this is not intentional and is just a subtle bug. Do you think > otherwise? [~shivaram] and [~felixcheung] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21367) R older version of Roxygen2 on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082692#comment-16082692 ] Felix Cheung commented on SPARK-21367: -- *SOMETIMES*? :) > R older version of Roxygen2 on Jenkins > -- > > Key: SPARK-21367 > URL: https://issues.apache.org/jira/browse/SPARK-21367 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: shane knapp > > Getting this message from a recent build. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console > Warning messages: > 1: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > 2: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > * installing *source* package 'SparkR' ... > ** R > We have been running with 5.0.1 and haven't changed for a year. > NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21367) R older version of Roxygen2 on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081720#comment-16081720 ] Felix Cheung edited comment on SPARK-21367 at 7/11/17 6:28 AM: --- I'm not sure exactly why yet, but comparing the working and non-working build working: {code} First time using roxygen2 4.0. Upgrading automatically... Writing SparkDataFrame.Rd Writing printSchema.Rd Writing schema.Rd Writing explain.Rd ... Warning messages: 1: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 2: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 * installing *source* package 'SparkR' ... {code} not working: {code} First time using roxygen2 4.0. Upgrading automatically... There were 50 or more warnings (use warnings() to see the first 50) * installing *source* package 'SparkR' ... {code} Bascially, the .Rd files are not getting created (because of warnings that are not captured) That cause the CRAN check to fail with "checking for missing documentation entries ... WARNING Undocumented code objects: '%<=>%' 'add_months' 'agg' 'approxCountDistinc" (which would be to be expected) was (Author: felixcheung): I'm not sure exactly why yet, but comparing the working and non-working build working: {code} First time using roxygen2 4.0. Upgrading automatically... Writing SparkDataFrame.Rd Writing printSchema.Rd Writing schema.Rd Writing explain.Rd ... Warning messages: 1: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 2: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 * installing *source* package 'SparkR' ... {code} not working: {code} First time using roxygen2 4.0. Upgrading automatically... There were 50 or more warnings (use warnings() to see the first 50) * installing *source* package 'SparkR' ... {code} Bascially, the .Rd files are not getting created (because of warnings that are not captured) That cause the CRAN check to fail with "checking for missing documentation entries ... WARNING Undocumented code objects: '%<=>%' 'add_months' 'agg' 'approxCountDistinc" (which would be to be expected) As explained in the description above, I"m pretty sure these are not in the build a while ago " Warning messages: 1: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 " > R older version of Roxygen2 on Jenkins > -- > > Key: SPARK-21367 > URL: https://issues.apache.org/jira/browse/SPARK-21367 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: shane knapp > > Getting this message from a recent build. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console > Warning messages: > 1: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > 2: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > * installing *source* package 'SparkR' ... > ** R > We have been running with 5.0.1 and haven't changed for a year. > NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21367) R older version of Roxygen2 on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081744#comment-16081744 ] Felix Cheung commented on SPARK-21367: -- I think I found the first error, it's one build before the build failures listed above, 79470 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79470/console {code} Updating roxygen version in /home/jenkins/workspace/SparkPullRequestBuilder/R/pkg/DESCRIPTION Deleting AFTSurvivalRegressionModel-class.Rd Deleting ALSModel-class.Rd ... There were 50 or more warnings (use warnings() to see the first 50) {code} > R older version of Roxygen2 on Jenkins > -- > > Key: SPARK-21367 > URL: https://issues.apache.org/jira/browse/SPARK-21367 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: shane knapp > > Getting this message from a recent build. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console > Warning messages: > 1: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > 2: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > * installing *source* package 'SparkR' ... > ** R > We have been running with 5.0.1 and haven't changed for a year. > NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21367) R older version of Roxygen2 on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081720#comment-16081720 ] Felix Cheung edited comment on SPARK-21367 at 7/11/17 6:33 AM: --- I'm not sure exactly why yet, but comparing the working and non-working build working: {code} First time using roxygen2 4.0. Upgrading automatically... Writing SparkDataFrame.Rd Writing printSchema.Rd Writing schema.Rd Writing explain.Rd ... Warning messages: 1: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 2: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 * installing *source* package 'SparkR' ... {code} not working: {code} First time using roxygen2 4.0. Upgrading automatically... There were 50 or more warnings (use warnings() to see the first 50) * installing *source* package 'SparkR' ... {code} Bascially, the .Rd files are not getting created (because of warnings that are not captured) That cause the CRAN check to fail with "checking for missing documentation entries ... WARNING Undocumented code objects: '%<=>%' 'add_months' 'agg' 'approxCountDistinc" (which would be completely expected - without Rd files it will not have the documentation hence the check will fail) was (Author: felixcheung): I'm not sure exactly why yet, but comparing the working and non-working build working: {code} First time using roxygen2 4.0. Upgrading automatically... Writing SparkDataFrame.Rd Writing printSchema.Rd Writing schema.Rd Writing explain.Rd ... Warning messages: 1: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 2: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 * installing *source* package 'SparkR' ... {code} not working: {code} First time using roxygen2 4.0. Upgrading automatically... There were 50 or more warnings (use warnings() to see the first 50) * installing *source* package 'SparkR' ... {code} Bascially, the .Rd files are not getting created (because of warnings that are not captured) That cause the CRAN check to fail with "checking for missing documentation entries ... WARNING Undocumented code objects: '%<=>%' 'add_months' 'agg' 'approxCountDistinc" (which would be to be expected) > R older version of Roxygen2 on Jenkins > -- > > Key: SPARK-21367 > URL: https://issues.apache.org/jira/browse/SPARK-21367 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: shane knapp > > Getting this message from a recent build. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console > Warning messages: > 1: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > 2: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > * installing *source* package 'SparkR' ... > ** R > We have been running with 5.0.1 and haven't changed for a year. > NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21367) R older version of Roxygen2 on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081744#comment-16081744 ] Felix Cheung edited comment on SPARK-21367 at 7/11/17 6:26 AM: --- I think I found the first error, it's one build before the build failures listed above, 79470 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79470/console {code} Updating roxygen version in /home/jenkins/workspace/SparkPullRequestBuilder/R/pkg/DESCRIPTION Deleting AFTSurvivalRegressionModel-class.Rd Deleting ALSModel-class.Rd ... There were 50 or more warnings (use warnings() to see the first 50) {code} Whereas this build from mid June https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/SparkPullRequestBuilder/78020/console was (Author: felixcheung): I think I found the first error, it's one build before the build failures listed above, 79470 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79470/console {code} Updating roxygen version in /home/jenkins/workspace/SparkPullRequestBuilder/R/pkg/DESCRIPTION Deleting AFTSurvivalRegressionModel-class.Rd Deleting ALSModel-class.Rd ... There were 50 or more warnings (use warnings() to see the first 50) {code} > R older version of Roxygen2 on Jenkins > -- > > Key: SPARK-21367 > URL: https://issues.apache.org/jira/browse/SPARK-21367 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: shane knapp > > Getting this message from a recent build. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console > Warning messages: > 1: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > 2: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > * installing *source* package 'SparkR' ... > ** R > We have been running with 5.0.1 and haven't changed for a year. > NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21367) R older version of Roxygen2 on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081744#comment-16081744 ] Felix Cheung edited comment on SPARK-21367 at 7/11/17 6:26 AM: --- I think I found the first error, it's one build before the build failures listed above, 79470 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79470/console {code} Updating roxygen version in /home/jenkins/workspace/SparkPullRequestBuilder/R/pkg/DESCRIPTION Deleting AFTSurvivalRegressionModel-class.Rd Deleting ALSModel-class.Rd ... There were 50 or more warnings (use warnings() to see the first 50) {code} Whereas this build from mid June https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/SparkPullRequestBuilder/78020/console Does NOT have this "Need roxygen2 >= 5.0.0 but loaded version is 4.1.1" message in the console output was (Author: felixcheung): I think I found the first error, it's one build before the build failures listed above, 79470 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79470/console {code} Updating roxygen version in /home/jenkins/workspace/SparkPullRequestBuilder/R/pkg/DESCRIPTION Deleting AFTSurvivalRegressionModel-class.Rd Deleting ALSModel-class.Rd ... There were 50 or more warnings (use warnings() to see the first 50) {code} Whereas this build from mid June https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/SparkPullRequestBuilder/78020/console > R older version of Roxygen2 on Jenkins > -- > > Key: SPARK-21367 > URL: https://issues.apache.org/jira/browse/SPARK-21367 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: shane knapp > > Getting this message from a recent build. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console > Warning messages: > 1: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > 2: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > * installing *source* package 'SparkR' ... > ** R > We have been running with 5.0.1 and haven't changed for a year. > NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-21367) R older version of Roxygen2 on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-21367: - Comment: was deleted (was: it looks like instead of 5.x, the older 4.0 is being loaded? First time using roxygen2 4.0. Upgrading automatically... ) > R older version of Roxygen2 on Jenkins > -- > > Key: SPARK-21367 > URL: https://issues.apache.org/jira/browse/SPARK-21367 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: shane knapp > > Getting this message from a recent build. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console > Warning messages: > 1: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > 2: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > * installing *source* package 'SparkR' ... > ** R > We have been running with 5.0.1 and haven't changed for a year. > NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21367) R older version of Roxygen2 on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081720#comment-16081720 ] Felix Cheung edited comment on SPARK-21367 at 7/11/17 6:06 AM: --- I'm not sure exactly why yet, but comparing the working and non-working build working: {code} First time using roxygen2 4.0. Upgrading automatically... Writing SparkDataFrame.Rd Writing printSchema.Rd Writing schema.Rd Writing explain.Rd ... Warning messages: 1: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 2: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 * installing *source* package 'SparkR' ... {code} not working: {code} First time using roxygen2 4.0. Upgrading automatically... There were 50 or more warnings (use warnings() to see the first 50) * installing *source* package 'SparkR' ... {code} Bascially, the .Rd files are not getting created (because of warnings that are not captured) That cause the CRAN check to fail with "checking for missing documentation entries ... WARNING Undocumented code objects: '%<=>%' 'add_months' 'agg' 'approxCountDistinc" (which would be to be expected) As explained in the description above, I"m pretty sure these are not in the build a while ago " Warning messages: 1: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 " was (Author: felixcheung): I'm not sure exactly why yet, but comparing the working and non-working buid working: {code} First time using roxygen2 4.0. Upgrading automatically... Writing SparkDataFrame.Rd Writing printSchema.Rd Writing schema.Rd Writing explain.Rd ... Warning messages: 1: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 2: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 * installing *source* package 'SparkR' ... {code} not working: {code} First time using roxygen2 4.0. Upgrading automatically... There were 50 or more warnings (use warnings() to see the first 50) * installing *source* package 'SparkR' ... {code} Bascially, the .Rd files are not getting created (because of warnings that are not captured) That cause the CRAN check to fail with "checking for missing documentation entries ... WARNING Undocumented code objects: '%<=>%' 'add_months' 'agg' 'approxCountDistinc" (which would be to be expected) As explained in the description above, I"m pretty sure these are not in the build a while ago " Warning messages: 1: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 " > R older version of Roxygen2 on Jenkins > -- > > Key: SPARK-21367 > URL: https://issues.apache.org/jira/browse/SPARK-21367 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: shane knapp > > Getting this message from a recent build. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console > Warning messages: > 1: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > 2: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > * installing *source* package 'SparkR' ... > ** R > We have been running with 5.0.1 and haven't changed for a year. > NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21367) R older version of Roxygen2 on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081723#comment-16081723 ] Felix Cheung commented on SPARK-21367: -- And I'm pretty sure we should build with Roxygen2 5.0.1 https://github.com/apache/spark/blob/master/R/pkg/DESCRIPTION#L60 RoxygenNote: 5.0.1 > R older version of Roxygen2 on Jenkins > -- > > Key: SPARK-21367 > URL: https://issues.apache.org/jira/browse/SPARK-21367 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: shane knapp > > Getting this message from a recent build. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console > Warning messages: > 1: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > 2: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > * installing *source* package 'SparkR' ... > ** R > We have been running with 5.0.1 and haven't changed for a year. > NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21367) R older version of Roxygen2 on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081720#comment-16081720 ] Felix Cheung edited comment on SPARK-21367 at 7/11/17 6:02 AM: --- I'm not sure exactly why yet, but comparing the working and non-working buid working: {code} First time using roxygen2 4.0. Upgrading automatically... Writing SparkDataFrame.Rd Writing printSchema.Rd Writing schema.Rd Writing explain.Rd ... Warning messages: 1: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 2: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 * installing *source* package 'SparkR' ... {code} not working: {code} First time using roxygen2 4.0. Upgrading automatically... There were 50 or more warnings (use warnings() to see the first 50) * installing *source* package 'SparkR' ... {code} Bascially, the .Rd files are not getting created (because of warnings that are not captured) That cause the CRAN check to fail with "checking for missing documentation entries ... WARNING Undocumented code objects: '%<=>%' 'add_months' 'agg' 'approxCountDistinc" (which would be to be expected) As explained in the description above, I"m pretty sure these are not in the build a while ago " Warning messages: 1: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 " was (Author: felixcheung): I'm not sure exactly why yet, but comparing the working and non-working buid working: First time using roxygen2 4.0. Upgrading automatically... Writing SparkDataFrame.Rd Writing printSchema.Rd Writing schema.Rd Writing explain.Rd ... Warning messages: 1: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 2: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 * installing *source* package 'SparkR' ... not working: First time using roxygen2 4.0. Upgrading automatically... There were 50 or more warnings (use warnings() to see the first 50) * installing *source* package 'SparkR' ... Bascially, the .Rd files are not getting created (because of warnings that are not captured) That cause the CRAN check to fail with "checking for missing documentation entries ... WARNING Undocumented code objects: '%<=>%' 'add_months' 'agg' 'approxCountDistinc" (which would be to be expected) As explained in the description above, I"m pretty sure these are not in the build a while ago " Warning messages: 1: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 " > R older version of Roxygen2 on Jenkins > -- > > Key: SPARK-21367 > URL: https://issues.apache.org/jira/browse/SPARK-21367 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: shane knapp > > Getting this message from a recent build. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console > Warning messages: > 1: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > 2: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > * installing *source* package 'SparkR' ... > ** R > We have been running with 5.0.1 and haven't changed for a year. > NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21367) R older version of Roxygen2 on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081720#comment-16081720 ] Felix Cheung commented on SPARK-21367: -- I'm not sure exactly why yet, but comparing the working and non-working buid working: First time using roxygen2 4.0. Upgrading automatically... Writing SparkDataFrame.Rd Writing printSchema.Rd Writing schema.Rd Writing explain.Rd ... Warning messages: 1: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 2: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 * installing *source* package 'SparkR' ... not working: First time using roxygen2 4.0. Upgrading automatically... There were 50 or more warnings (use warnings() to see the first 50) * installing *source* package 'SparkR' ... Bascially, the .Rd files are not getting created (because of warnings that are not captured) That cause the CRAN check to fail with "checking for missing documentation entries ... WARNING Undocumented code objects: '%<=>%' 'add_months' 'agg' 'approxCountDistinc" (which would be to be expected) As explained in the description above, I"m pretty sure these are not in the build a while ago " Warning messages: 1: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 " > R older version of Roxygen2 on Jenkins > -- > > Key: SPARK-21367 > URL: https://issues.apache.org/jira/browse/SPARK-21367 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: shane knapp > > Getting this message from a recent build. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console > Warning messages: > 1: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > 2: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > * installing *source* package 'SparkR' ... > ** R > We have been running with 5.0.1 and haven't changed for a year. > NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21367) R older version of Roxygen2 on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081712#comment-16081712 ] Felix Cheung commented on SPARK-21367: -- it looks like instead of 5.x, the older 4.0 is being loaded? First time using roxygen2 4.0. Upgrading automatically... > R older version of Roxygen2 on Jenkins > -- > > Key: SPARK-21367 > URL: https://issues.apache.org/jira/browse/SPARK-21367 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: shane knapp > > Getting this message from a recent build. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console > Warning messages: > 1: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > 2: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > * installing *source* package 'SparkR' ... > ** R > We have been running with 5.0.1 and haven't changed for a year. > NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21367) R older version of Roxygen2 on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-21367: - Description: Getting this message from a recent build. https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console Warning messages: 1: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 2: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 * installing *source* package 'SparkR' ... ** R We have been running with 5.0.1 and haven't changed for a year. NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet. > R older version of Roxygen2 on Jenkins > -- > > Key: SPARK-21367 > URL: https://issues.apache.org/jira/browse/SPARK-21367 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung > > Getting this message from a recent build. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console > Warning messages: > 1: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > 2: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > * installing *source* package 'SparkR' ... > ** R > We have been running with 5.0.1 and haven't changed for a year. > NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21367) R older version of Roxygen2 on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080728#comment-16080728 ] Felix Cheung edited comment on SPARK-21367 at 7/10/17 5:45 PM: --- [~shaneknapp] could you check? thanks! was (Author: felixcheung): ~shane knapp could you check? thanks! > R older version of Roxygen2 on Jenkins > -- > > Key: SPARK-21367 > URL: https://issues.apache.org/jira/browse/SPARK-21367 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung > > Getting this message from a recent build. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console > Warning messages: > 1: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > 2: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > * installing *source* package 'SparkR' ... > ** R > We have been running with 5.0.1 and haven't changed for a year. > NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21367) R older version of Roxygen2 on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080728#comment-16080728 ] Felix Cheung commented on SPARK-21367: -- @shane knapp could you check? thanks! > R older version of Roxygen2 on Jenkins > -- > > Key: SPARK-21367 > URL: https://issues.apache.org/jira/browse/SPARK-21367 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung > > Getting this message from a recent build. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console > Warning messages: > 1: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > 2: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > * installing *source* package 'SparkR' ... > ** R > We have been running with 5.0.1 and haven't changed for a year. > NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21367) R older version of Roxygen2 on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080728#comment-16080728 ] Felix Cheung edited comment on SPARK-21367 at 7/10/17 5:44 PM: --- ~shane knapp could you check? thanks! was (Author: felixcheung): @shane knapp could you check? thanks! > R older version of Roxygen2 on Jenkins > -- > > Key: SPARK-21367 > URL: https://issues.apache.org/jira/browse/SPARK-21367 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung > > Getting this message from a recent build. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console > Warning messages: > 1: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > 2: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > * installing *source* package 'SparkR' ... > ** R > We have been running with 5.0.1 and haven't changed for a year. > NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21367) R older version of Roxygen2 on Jenkins
Felix Cheung created SPARK-21367: Summary: R older version of Roxygen2 on Jenkins Key: SPARK-21367 URL: https://issues.apache.org/jira/browse/SPARK-21367 Project: Spark Issue Type: Bug Components: SparkR Affects Versions: 2.3.0 Reporter: Felix Cheung -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21266) Support schema a DDL-formatted string in dapply/gapply/from_json
[ https://issues.apache.org/jira/browse/SPARK-21266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-21266. -- Resolution: Fixed Assignee: Hyukjin Kwon Fix Version/s: 2.3.0 Target Version/s: 2.3.0 > Support schema a DDL-formatted string in dapply/gapply/from_json > > > Key: SPARK-21266 > URL: https://issues.apache.org/jira/browse/SPARK-21266 > Project: Spark > Issue Type: Improvement > Components: PySpark, SparkR >Affects Versions: 2.2.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Fix For: 2.3.0 > > > A DDL-formatted string is now supported in schema API in dataframe > reader/writer across other language APIs. > {{from_json}} in R/Python look not supporting this. > Also, It could be done in other commonly used APIs too in R specifically - > {{dapply}}/{{gapply}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19568) Must include class/method documentation for CRAN check
[ https://issues.apache.org/jira/browse/SPARK-19568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079325#comment-16079325 ] Felix Cheung commented on SPARK-19568: -- separate from CRAN task > Must include class/method documentation for CRAN check > -- > > Key: SPARK-19568 > URL: https://issues.apache.org/jira/browse/SPARK-19568 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.1.0 >Reporter: Felix Cheung >Assignee: Felix Cheung > > While tests are running, R CMD check --as-cran is still complaining > {code} > * checking for missing documentation entries ... WARNING > Undocumented code objects: > ‘add_months’ ‘agg’ ‘approxCountDistinct’ ‘approxQuantile’ ‘arrange’ > ‘array_contains’ ‘as.DataFrame’ ‘as.data.frame’ ‘asc’ ‘ascii’ ‘avg’ > ‘base64’ ‘between’ ‘bin’ ‘bitwiseNOT’ ‘bround’ ‘cache’ ‘cacheTable’ > ‘cancelJobGroup’ ‘cast’ ‘cbrt’ ‘ceil’ ‘clearCache’ ‘clearJobGroup’ > ‘collect’ ‘colnames’ ‘colnames<-’ ‘coltypes’ ‘coltypes<-’ ‘column’ > ‘columns’ ‘concat’ ‘concat_ws’ ‘contains’ ‘conv’ ‘corr’ ‘count’ > ‘countDistinct’ ‘cov’ ‘covar_pop’ ‘covar_samp’ ‘crc32’ > ‘createDataFrame’ ‘createExternalTable’ ‘createOrReplaceTempView’ > ‘crossJoin’ ‘crosstab’ ‘cume_dist’ ‘dapply’ ‘dapplyCollect’ > ‘date_add’ ‘date_format’ ‘date_sub’ ‘datediff’ ‘dayofmonth’ > ‘dayofyear’ ‘decode’ ‘dense_rank’ ‘desc’ ‘describe’ ‘distinct’ ‘drop’ > ... > {code} > This is because of lack of .Rd files in a clean environment when running > against the content of the R source package. > I think we need to generate the .Rd files under man\ when building the > release and then package with them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19568) Must include class/method documentation for CRAN check
[ https://issues.apache.org/jira/browse/SPARK-19568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-19568: - Issue Type: Bug (was: Sub-task) Parent: (was: SPARK-15799) > Must include class/method documentation for CRAN check > -- > > Key: SPARK-19568 > URL: https://issues.apache.org/jira/browse/SPARK-19568 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.1.0 >Reporter: Felix Cheung >Assignee: Felix Cheung > > While tests are running, R CMD check --as-cran is still complaining > {code} > * checking for missing documentation entries ... WARNING > Undocumented code objects: > ‘add_months’ ‘agg’ ‘approxCountDistinct’ ‘approxQuantile’ ‘arrange’ > ‘array_contains’ ‘as.DataFrame’ ‘as.data.frame’ ‘asc’ ‘ascii’ ‘avg’ > ‘base64’ ‘between’ ‘bin’ ‘bitwiseNOT’ ‘bround’ ‘cache’ ‘cacheTable’ > ‘cancelJobGroup’ ‘cast’ ‘cbrt’ ‘ceil’ ‘clearCache’ ‘clearJobGroup’ > ‘collect’ ‘colnames’ ‘colnames<-’ ‘coltypes’ ‘coltypes<-’ ‘column’ > ‘columns’ ‘concat’ ‘concat_ws’ ‘contains’ ‘conv’ ‘corr’ ‘count’ > ‘countDistinct’ ‘cov’ ‘covar_pop’ ‘covar_samp’ ‘crc32’ > ‘createDataFrame’ ‘createExternalTable’ ‘createOrReplaceTempView’ > ‘crossJoin’ ‘crosstab’ ‘cume_dist’ ‘dapply’ ‘dapplyCollect’ > ‘date_add’ ‘date_format’ ‘date_sub’ ‘datediff’ ‘dayofmonth’ > ‘dayofyear’ ‘decode’ ‘dense_rank’ ‘desc’ ‘describe’ ‘distinct’ ‘drop’ > ... > {code} > This is because of lack of .Rd files in a clean environment when running > against the content of the R source package. > I think we need to generate the .Rd files under man\ when building the > release and then package with them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21290) R document Programmatically Specifying the Schema in SQL guide
[ https://issues.apache.org/jira/browse/SPARK-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-21290: - Target Version/s: 2.3.0 > R document Programmatically Specifying the Schema in SQL guide > -- > > Key: SPARK-21290 > URL: https://issues.apache.org/jira/browse/SPARK-21290 > Project: Spark > Issue Type: Documentation > Components: SparkR, SQL >Affects Versions: 2.2.0 >Reporter: Felix Cheung >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21290) R document Programmatically Specifying the Schema in SQL guide
[ https://issues.apache.org/jira/browse/SPARK-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079323#comment-16079323 ] Felix Cheung commented on SPARK-21290: -- with some changes to schema string support on the way, let's target 2.3 > R document Programmatically Specifying the Schema in SQL guide > -- > > Key: SPARK-21290 > URL: https://issues.apache.org/jira/browse/SPARK-21290 > Project: Spark > Issue Type: Documentation > Components: SparkR, SQL >Affects Versions: 2.2.0 >Reporter: Felix Cheung >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21093) Multiple gapply execution occasionally failed in SparkR
[ https://issues.apache.org/jira/browse/SPARK-21093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-21093. -- Resolution: Fixed Fix Version/s: 2.3.0 > Multiple gapply execution occasionally failed in SparkR > > > Key: SPARK-21093 > URL: https://issues.apache.org/jira/browse/SPARK-21093 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.1.1, 2.2.0 > Environment: CentOS 7.2.1511 / R 3.4.0, CentOS 7.2.1511 / R 3.3.3 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Critical > Fix For: 2.3.0 > > > On Centos 7.2.1511 with R 3.4.0/3.3.0, multiple execution of {{gapply}} looks > failed as below: > {code} > Welcome to > __ >/ __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.3.0-SNAPSHOT > /_/ > SparkSession available as 'spark'. > > df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d")) > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > 17/06/14 18:21:01 WARN Utils: Truncated the string representation of a plan > since it was too large. This behavior can be adjusted by setting > 'spark.debug.maxToStringFields' in SparkEnv.conf. > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > Error in handleErrors(returnStatus, conn) : > org.apache.spark.SparkException: Job aborted due to stage failure: Task 98 > in stage 14.0 failed 1 times, most recent failure: Lost task 98.0 in stage > 14.0 (TID 1305, localhost, executor driver): org.apache.spark.SparkException: > R computation failed with > at org.apache.spark.api.r.RRunner.compute(RRunner.scala:108) > at > org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$13.apply(objects.scala:432) > at > org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$13.apply(objects.scala:414) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.a > ... > *** buffer overflow detected ***: /usr/lib64/R/bin/exec/R terminated > === Backtrace: = > /lib64/libc.so.6(__fortify_fail+0x37)[0x7fe699b3f597] > /lib64/libc.so.6(+0x10c750)[0x7fe699b3d750] > /lib64/libc.so.6(+0x10e507)[0x7fe699b3f507] > /usr/lib64/R/modules//internet.so(+0x6015)[0x7fe689bb7015] > /usr/lib64/R/modules//internet.so(+0xe81e)[0x7fe689bbf81e] > /usr/lib64/R/lib/libR.so(+0xbd1b6)[0x7fe69c54a1b6] > /usr/lib64/R/lib/libR.so(+0x1104d0)[0x7fe69c59d4d0] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af] > /usr/lib64/R/lib/libR.so(Rf_eval+0x354)[0x7fe69c5ad2f4] > /usr/lib64/R/lib/libR.so(+0x123f8e)[0x7fe69c5b0f8e] > /usr/lib64/R/lib/libR.so(Rf_eval+0x589)[0x7fe69c5ad529] > /usr/lib64/R/lib/libR.so(+0x1254ce)[0x7fe69c5b24ce] > /usr/lib64/R/lib/libR.so(+0x1104d0)[0x7fe69c59d4d0] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af] > /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x120a7e)[0x7fe69c5ada7e] > /usr/lib64/R/lib/libR.so(Rf_eval+0x817)[0x7fe69c5ad7b7] > /usr/lib64/R/lib/libR.so(+0x1256d1)[0x7fe69c5b26d1] > /usr/lib64/R/lib/libR.so(+0x1552e9)[0x7fe69c5e22e9] > /usr/lib64/R/lib/libR.so(+0x11062a)[0x7fe69c59d62a] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af] > /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af] > /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x120a7e)[0x7fe69c5ada7e] >
[jira] [Assigned] (SPARK-20456) Add examples for functions collection for pyspark
[ https://issues.apache.org/jira/browse/SPARK-20456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung reassigned SPARK-20456: Assignee: Michael Patterson > Add examples for functions collection for pyspark > - > > Key: SPARK-20456 > URL: https://issues.apache.org/jira/browse/SPARK-20456 > Project: Spark > Issue Type: Documentation > Components: Documentation, PySpark >Affects Versions: 2.1.0 >Reporter: Michael Patterson >Assignee: Michael Patterson >Priority: Minor > Fix For: 2.3.0 > > > Document sql.functions.py: > 1. Add examples for the common string functions (upper, lower, and reverse) > 2. Rename columns in datetime examples to be more informative (e.g. from 'd' > to 'date') > 3. Add examples for unix_timestamp, from_unixtime, rand, randn, collect_list, > collect_set, lit, > 4. Add note to all trigonometry functions that units are radians. > 5. Add links between functions, (e.g. add link to radians from toRadians) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20456) Add examples for functions collection for pyspark
[ https://issues.apache.org/jira/browse/SPARK-20456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-20456. -- Resolution: Fixed Fix Version/s: 2.3.0 Target Version/s: 2.3.0 > Add examples for functions collection for pyspark > - > > Key: SPARK-20456 > URL: https://issues.apache.org/jira/browse/SPARK-20456 > Project: Spark > Issue Type: Documentation > Components: Documentation, PySpark >Affects Versions: 2.1.0 >Reporter: Michael Patterson >Assignee: Michael Patterson >Priority: Minor > Fix For: 2.3.0 > > > Document sql.functions.py: > 1. Add examples for the common string functions (upper, lower, and reverse) > 2. Rename columns in datetime examples to be more informative (e.g. from 'd' > to 'date') > 3. Add examples for unix_timestamp, from_unixtime, rand, randn, collect_list, > collect_set, lit, > 4. Add note to all trigonometry functions that units are radians. > 5. Add links between functions, (e.g. add link to radians from toRadians) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org