[jira] [Comment Edited] (SPARK-22281) Handle R method breaking signature changes

2017-10-23 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214751#comment-16214751
 ] 

Felix Cheung edited comment on SPARK-22281 at 10/23/17 7:30 AM:


ok, I have a solution for each of both. it turns out the fix for glm is quite a 
bit different from attach, so I added the error above.

With attach, we need to match the signature of base::attach, since it changes 
we are going to generate the signature at runtime by pulling from base::attach 
directly.

in short, with glm it's pulling in the function definition (ie. "usage") from 
the stats::glm function. Since this is "compiled in" when we build the source 
package into the .Rd, when/if it changes at runtime or in CRAN check it won't 
match the latest signature.


was (Author: felixcheung):
ok, I have a solution for both. it turns out the fix for glm is quite a bit 
different from attach, so I added the error above.

With attach, we need to match the signature of base::attach, since it changes 
we are going to generate the signature at runtime by pulling from base::attach 
directly.

in short, with glm it's pulling in the function definition (ie. "usage") from 
the stats::glm function. Since this is "compiled in" when we build the source 
package into the .Rd, when/if it changes at runtime or in CRAN check it won't 
match the latest signature.

> Handle R method breaking signature changes
> --
>
> Key: SPARK-22281
> URL: https://issues.apache.org/jira/browse/SPARK-22281
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> cAs discussed here
> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555
> this WARNING on R-devel
> * checking for code/documentation mismatches ... WARNING
> Codoc mismatches from documentation object 'attach':
> attach
>   Code: function(what, pos = 2L, name = deparse(substitute(what),
>  backtick = FALSE), warn.conflicts = TRUE)
>   Docs: function(what, pos = 2L, name = deparse(substitute(what)),
>  warn.conflicts = TRUE)
>   Mismatches in argument default values:
> Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: 
> deparse(substitute(what))
> Codoc mismatches from documentation object 'glm':
> glm
>   Code: function(formula, family = gaussian, data, weights, subset,
>  na.action, start = NULL, etastart, mustart, offset,
>  control = list(...), model = TRUE, method = "glm.fit",
>  x = FALSE, y = TRUE, singular.ok = TRUE, contrasts =
>  NULL, ...)
>   Docs: function(formula, family = gaussian, data, weights, subset,
>  na.action, start = NULL, etastart, mustart, offset,
>  control = list(...), model = TRUE, method = "glm.fit",
>  x = FALSE, y = TRUE, contrasts = NULL, ...)
>   Argument names in code not in docs:
> singular.ok
>   Mismatches in argument names:
> Position: 16 Code: singular.ok Docs: contrasts
> Position: 17 Code: contrasts Docs: ...
> Checked the latest release R 3.4.1 and the signature change wasn't there. 
> This likely indicated an upcoming change in the next R release that could 
> incur this new warning when we attempt to publish the package.
> Not sure what we can do now since we work with multiple versions of R and 
> they will have different signatures then.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-22281) Handle R method breaking signature changes

2017-10-23 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214751#comment-16214751
 ] 

Felix Cheung edited comment on SPARK-22281 at 10/23/17 7:10 AM:


ok, I have a solution for both. it turns out the fix for glm is quite a bit 
different from attach, so I added the error above.

With attach, we need to match the signature of base::attach, since it changes 
we are going to generate the signature at runtime by pulling from base::attach 
directly.

in short, with glm it's pulling in the function definition (ie. "usage") from 
the stats::glm function. Since this is "compiled in" when we build the source 
package into the .Rd, when/if it changes at runtime or in CRAN check it won't 
match the latest signature.


was (Author: felixcheung):
ok, I have a solution for both. it turns out the fix for glm is quite a bit 
different from attach, so I added the error above.

in short, with glm it's pulling in the function definition (ie. "usage") from 
the stats::glm function. Since this is "compiled in" when we build the source 
package into the .Rd, when/if it changes at runtime or in CRAN check it won't 
match the latest signature.

> Handle R method breaking signature changes
> --
>
> Key: SPARK-22281
> URL: https://issues.apache.org/jira/browse/SPARK-22281
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> cAs discussed here
> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555
> this WARNING on R-devel
> * checking for code/documentation mismatches ... WARNING
> Codoc mismatches from documentation object 'attach':
> attach
>   Code: function(what, pos = 2L, name = deparse(substitute(what),
>  backtick = FALSE), warn.conflicts = TRUE)
>   Docs: function(what, pos = 2L, name = deparse(substitute(what)),
>  warn.conflicts = TRUE)
>   Mismatches in argument default values:
> Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: 
> deparse(substitute(what))
> Codoc mismatches from documentation object 'glm':
> glm
>   Code: function(formula, family = gaussian, data, weights, subset,
>  na.action, start = NULL, etastart, mustart, offset,
>  control = list(...), model = TRUE, method = "glm.fit",
>  x = FALSE, y = TRUE, singular.ok = TRUE, contrasts =
>  NULL, ...)
>   Docs: function(formula, family = gaussian, data, weights, subset,
>  na.action, start = NULL, etastart, mustart, offset,
>  control = list(...), model = TRUE, method = "glm.fit",
>  x = FALSE, y = TRUE, contrasts = NULL, ...)
>   Argument names in code not in docs:
> singular.ok
>   Mismatches in argument names:
> Position: 16 Code: singular.ok Docs: contrasts
> Position: 17 Code: contrasts Docs: ...
> Checked the latest release R 3.4.1 and the signature change wasn't there. 
> This likely indicated an upcoming change in the next R release that could 
> incur this new warning when we attempt to publish the package.
> Not sure what we can do now since we work with multiple versions of R and 
> they will have different signatures then.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22281) Handle R method breaking signature changes

2017-10-23 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214751#comment-16214751
 ] 

Felix Cheung commented on SPARK-22281:
--

ok, I have a solution for both. it turns out the fix for glm is quite a bit 
different from attach, so I added the error above.

in short, with glm it's pulling in the function definition (ie. "usage") from 
the stats::glm function. Since this is "compiled in" when we build the source 
package into the .Rd, when/if it changes at runtime or in CRAN check it won't 
match the latest signature.

> Handle R method breaking signature changes
> --
>
> Key: SPARK-22281
> URL: https://issues.apache.org/jira/browse/SPARK-22281
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> cAs discussed here
> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555
> this WARNING on R-devel
> * checking for code/documentation mismatches ... WARNING
> Codoc mismatches from documentation object 'attach':
> attach
>   Code: function(what, pos = 2L, name = deparse(substitute(what),
>  backtick = FALSE), warn.conflicts = TRUE)
>   Docs: function(what, pos = 2L, name = deparse(substitute(what)),
>  warn.conflicts = TRUE)
>   Mismatches in argument default values:
> Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: 
> deparse(substitute(what))
> Codoc mismatches from documentation object 'glm':
> glm
>   Code: function(formula, family = gaussian, data, weights, subset,
>  na.action, start = NULL, etastart, mustart, offset,
>  control = list(...), model = TRUE, method = "glm.fit",
>  x = FALSE, y = TRUE, singular.ok = TRUE, contrasts =
>  NULL, ...)
>   Docs: function(formula, family = gaussian, data, weights, subset,
>  na.action, start = NULL, etastart, mustart, offset,
>  control = list(...), model = TRUE, method = "glm.fit",
>  x = FALSE, y = TRUE, contrasts = NULL, ...)
>   Argument names in code not in docs:
> singular.ok
>   Mismatches in argument names:
> Position: 16 Code: singular.ok Docs: contrasts
> Position: 17 Code: contrasts Docs: ...
> Checked the latest release R 3.4.1 and the signature change wasn't there. 
> This likely indicated an upcoming change in the next R release that could 
> incur this new warning when we attempt to publish the package.
> Not sure what we can do now since we work with multiple versions of R and 
> they will have different signatures then.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22281) Handle R method breaking signature changes

2017-10-23 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-22281:
-
Description: 
cAs discussed here
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555

this WARNING on R-devel

* checking for code/documentation mismatches ... WARNING
Codoc mismatches from documentation object 'attach':
attach
  Code: function(what, pos = 2L, name = deparse(substitute(what),
 backtick = FALSE), warn.conflicts = TRUE)
  Docs: function(what, pos = 2L, name = deparse(substitute(what)),
 warn.conflicts = TRUE)
  Mismatches in argument default values:
Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: 
deparse(substitute(what))

Codoc mismatches from documentation object 'glm':
glm
  Code: function(formula, family = gaussian, data, weights, subset,
 na.action, start = NULL, etastart, mustart, offset,
 control = list(...), model = TRUE, method = "glm.fit",
 x = FALSE, y = TRUE, singular.ok = TRUE, contrasts =
 NULL, ...)
  Docs: function(formula, family = gaussian, data, weights, subset,
 na.action, start = NULL, etastart, mustart, offset,
 control = list(...), model = TRUE, method = "glm.fit",
 x = FALSE, y = TRUE, contrasts = NULL, ...)
  Argument names in code not in docs:
singular.ok
  Mismatches in argument names:
Position: 16 Code: singular.ok Docs: contrasts
Position: 17 Code: contrasts Docs: ...

Checked the latest release R 3.4.1 and the signature change wasn't there. This 
likely indicated an upcoming change in the next R release that could incur this 
new warning when we attempt to publish the package.

Not sure what we can do now since we work with multiple versions of R and they 
will have different signatures then.


  was:
cAs discussed here
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555

this WARNING on R-devel

* checking for code/documentation mismatches ... WARNING
Codoc mismatches from documentation object 'attach':
attach
  Code: function(what, pos = 2L, name = deparse(substitute(what),
 backtick = FALSE), warn.conflicts = TRUE)
  Docs: function(what, pos = 2L, name = deparse(substitute(what)),
 warn.conflicts = TRUE)
  Mismatches in argument default values:
Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: 
deparse(substitute(what))

Checked the latest release R 3.4.1 and the signature change wasn't there. This 
likely indicated an upcoming change in the next R release that could incur this 
new warning when we attempt to publish the package.

Not sure what we can do now since we work with multiple versions of R and they 
will have different signatures then.



> Handle R method breaking signature changes
> --
>
> Key: SPARK-22281
> URL: https://issues.apache.org/jira/browse/SPARK-22281
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> cAs discussed here
> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555
> this WARNING on R-devel
> * checking for code/documentation mismatches ... WARNING
> Codoc mismatches from documentation object 'attach':
> attach
>   Code: function(what, pos = 2L, name = deparse(substitute(what),
>  backtick = FALSE), warn.conflicts = TRUE)
>   Docs: function(what, pos = 2L, name = deparse(substitute(what)),
>  warn.conflicts = TRUE)
>   Mismatches in argument default values:
> Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: 
> deparse(substitute(what))
> Codoc mismatches from documentation object 'glm':
> glm
>   Code: function(formula, family = gaussian, data, weights, subset,
>  na.action, start = NULL, etastart, mustart, offset,
>  control = list(...), model = TRUE, method = "glm.fit",
>  x = FALSE, y = TRUE, singular.ok = TRUE, contrasts =
>  NULL, ...)
>   Docs: function(formula, family = gaussian, data, weights, subset,
>  na.action, start = NULL, etastart, mustart, offset,
>  control = list(...), model = TRUE, method = "glm.fit",
>  x = FALSE, y = TRUE, contrasts = NULL, ...)
>   Argument names in code not in docs:
> singular.ok
>   Mismatches in argument names:
> Position: 16 Code: singular.ok Docs: contrasts
> Position: 17 Code: contrasts Docs: ...
> Checked the latest release R 3.4.1 and the signature change wasn't there. 
> This likely indicated an upcoming change in the next R release that could 
> incur this 

[jira] [Commented] (SPARK-22281) Handle R method breaking signature changes

2017-10-22 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214431#comment-16214431
 ] 

Felix Cheung commented on SPARK-22281:
--

tried a few things. If we remove the 
{code}
@param
{code}

then cran checks fail with
{code}
* checking Rd \usage sections ... WARNING
Undocumented arguments in documentation object 'attach'
  ‘what’ ‘pos’ ‘name’ ‘warn.conflicts’

Functions with \usage entries need to have the appropriate \alias
entries, and all their arguments documented.
The \usage entries must correspond to syntactically valid R code.
See chapter ‘Writing R documentation files’ in the ‘Writing R
Extensions’ manual.
{code}

if we change the method signature to

{code}
setMethod("attach",
  signature(what = "SparkDataFrame"),
  function(what, ...) {
{code}

Then it fails to install
{code}
Error in rematchDefinition(definition, fdef, mnames, fnames, signature) :
  methods can add arguments to the generic ‘attach’ only if '...' is an 
argument to the generic
Error : unable to load R code in package ‘SparkR’
ERROR: lazy loading failed for package ‘SparkR’
{code}


> Handle R method breaking signature changes
> --
>
> Key: SPARK-22281
> URL: https://issues.apache.org/jira/browse/SPARK-22281
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> cAs discussed here
> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555
> this WARNING on R-devel
> * checking for code/documentation mismatches ... WARNING
> Codoc mismatches from documentation object 'attach':
> attach
>   Code: function(what, pos = 2L, name = deparse(substitute(what),
>  backtick = FALSE), warn.conflicts = TRUE)
>   Docs: function(what, pos = 2L, name = deparse(substitute(what)),
>  warn.conflicts = TRUE)
>   Mismatches in argument default values:
> Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: 
> deparse(substitute(what))
> Checked the latest release R 3.4.1 and the signature change wasn't there. 
> This likely indicated an upcoming change in the next R release that could 
> incur this new warning when we attempt to publish the package.
> Not sure what we can do now since we work with multiple versions of R and 
> they will have different signatures then.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-22327:
-
Description: 
with warning
* checking CRAN incoming feasibility ... WARNING
Maintainer: 'Shivaram Venkataraman '
Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
Unknown, possibly mis-spelled, fields in DESCRIPTION:
  'RoxygenNote'
WARNING: There was 1 warning.
NOTE: There were 2 notes.

We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
branch-2.1 after we ship 2.2.1.

The root cause of the issue is in the package version check in CRAN check. 
After the SparkR package version 2.1.2 (is first) published, any older version 
is failing the version check. As far as we know, there is no way to skip this 
version check.

Also, there is previously a NOTE on new maintainer. 

  was:
with warning
* checking CRAN incoming feasibility ... WARNING
Maintainer: 'Shivaram Venkataraman '
Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
Unknown, possibly mis-spelled, fields in DESCRIPTION:
  'RoxygenNote'

We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
branch-2.1 after we ship 2.2.1.

The root cause of the issue is in the package version check in CRAN check. 
After the SparkR package version 2.1.2 (is first) published, any older version 
is failing the version check. As far as we know, there is no way to skip this 
version check.

Also, there is previously a NOTE on new maintainer. 


> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> with warning
> * checking CRAN incoming feasibility ... WARNING
> Maintainer: 'Shivaram Venkataraman '
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> Unknown, possibly mis-spelled, fields in DESCRIPTION:
>   'RoxygenNote'
> WARNING: There was 1 warning.
> NOTE: There were 2 notes.
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1.
> The root cause of the issue is in the package version check in CRAN check. 
> After the SparkR package version 2.1.2 (is first) published, any older 
> version is failing the version check. As far as we know, there is no way to 
> skip this version check.
> Also, there is previously a NOTE on new maintainer. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213798#comment-16213798
 ] 

Felix Cheung edited comment on SPARK-22327 at 10/21/17 7:45 AM:


in contrast, this is from master

* checking CRAN incoming feasibility ... NOTE
Maintainer: 'Shivaram Venkataraman '
Unknown, possibly mis-spelled, fields in DESCRIPTION:
  'RoxygenNote'
* checking package dependencies ... NOTE
  No repository set, so cyclic dependency check skipped
* checking R code for possible problems ... NOTE
Found the following calls to attach():
File 'SparkR/R/DataFrame.R':
  attach(newEnv, pos = pos, name = name, warn.conflicts = warn.conflicts)
See section 'Good practice' in '?attach'.
NOTE: There were 3 notes.

So it should have 3 notes, or (2 notes +  1warning) as the one note turns into 
a warning for Insufficient package version


was (Author: felixcheung):
in contrast, this is from master

* checking CRAN incoming feasibility ... NOTE
Maintainer: 'Shivaram Venkataraman '
Unknown, possibly mis-spelled, fields in DESCRIPTION:
  'RoxygenNote'
* checking package dependencies ... NOTE
  No repository set, so cyclic dependency check skipped
* checking R code for possible problems ... NOTE
Found the following calls to attach():
File 'SparkR/R/DataFrame.R':
  attach(newEnv, pos = pos, name = name, warn.conflicts = warn.conflicts)
See section 'Good practice' in '?attach'.
NOTE: There were 3 notes.

So it should have 3 notes or (2 notes +  1warning) as the one note turns into a 
warning for Insufficient package version

> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> with warning
> * checking CRAN incoming feasibility ... WARNING
> Maintainer: 'Shivaram Venkataraman '
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> Unknown, possibly mis-spelled, fields in DESCRIPTION:
>   'RoxygenNote'
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1.
> The root cause of the issue is in the package version check in CRAN check. 
> After the SparkR package version 2.1.2 (is first) published, any older 
> version is failing the version check. As far as we know, there is no way to 
> skip this version check.
> Also, there is previously a NOTE on new maintainer. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213798#comment-16213798
 ] 

Felix Cheung commented on SPARK-22327:
--

in contrast, this is from master

* checking CRAN incoming feasibility ... NOTE
Maintainer: 'Shivaram Venkataraman '
Unknown, possibly mis-spelled, fields in DESCRIPTION:
  'RoxygenNote'
* checking package dependencies ... NOTE
  No repository set, so cyclic dependency check skipped
* checking R code for possible problems ... NOTE
Found the following calls to attach():
File 'SparkR/R/DataFrame.R':
  attach(newEnv, pos = pos, name = name, warn.conflicts = warn.conflicts)
See section 'Good practice' in '?attach'.
NOTE: There were 3 notes.

So it should have 3 notes or (2 notes +  1warning) as the one note turns into a 
warning for Insufficient package version

> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> with warning
> * checking CRAN incoming feasibility ... WARNING
> Maintainer: 'Shivaram Venkataraman '
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> Unknown, possibly mis-spelled, fields in DESCRIPTION:
>   'RoxygenNote'
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1.
> The root cause of the issue is in the package version check in CRAN check. 
> After the SparkR package version 2.1.2 (is first) published, any older 
> version is failing the version check. As far as we know, there is no way to 
> skip this version check.
> Also, there is previously a NOTE on new maintainer. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213798#comment-16213798
 ] 

Felix Cheung edited comment on SPARK-22327 at 10/21/17 7:45 AM:


in contrast, this is from master 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82919/consoleFull

* checking CRAN incoming feasibility ... NOTE
Maintainer: 'Shivaram Venkataraman '
Unknown, possibly mis-spelled, fields in DESCRIPTION:
  'RoxygenNote'
* checking package dependencies ... NOTE
  No repository set, so cyclic dependency check skipped
* checking R code for possible problems ... NOTE
Found the following calls to attach():
File 'SparkR/R/DataFrame.R':
  attach(newEnv, pos = pos, name = name, warn.conflicts = warn.conflicts)
See section 'Good practice' in '?attach'.
NOTE: There were 3 notes.

So it should have 3 notes, or (2 notes +  1warning) as the one note turns into 
a warning for Insufficient package version


was (Author: felixcheung):
in contrast, this is from master

* checking CRAN incoming feasibility ... NOTE
Maintainer: 'Shivaram Venkataraman '
Unknown, possibly mis-spelled, fields in DESCRIPTION:
  'RoxygenNote'
* checking package dependencies ... NOTE
  No repository set, so cyclic dependency check skipped
* checking R code for possible problems ... NOTE
Found the following calls to attach():
File 'SparkR/R/DataFrame.R':
  attach(newEnv, pos = pos, name = name, warn.conflicts = warn.conflicts)
See section 'Good practice' in '?attach'.
NOTE: There were 3 notes.

So it should have 3 notes, or (2 notes +  1warning) as the one note turns into 
a warning for Insufficient package version

> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> with warning
> * checking CRAN incoming feasibility ... WARNING
> Maintainer: 'Shivaram Venkataraman '
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> Unknown, possibly mis-spelled, fields in DESCRIPTION:
>   'RoxygenNote'
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1.
> The root cause of the issue is in the package version check in CRAN check. 
> After the SparkR package version 2.1.2 (is first) published, any older 
> version is failing the version check. As far as we know, there is no way to 
> skip this version check.
> Also, there is previously a NOTE on new maintainer. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-22327:
-
Description: 
with warning
* checking CRAN incoming feasibility ... WARNING
Maintainer: 'Shivaram Venkataraman '
Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
Unknown, possibly mis-spelled, fields in DESCRIPTION:
  'RoxygenNote'

We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
branch-2.1 after we ship 2.2.1.

The root cause of the issue is in the package version check in CRAN check. 
After the SparkR package version 2.1.2 (is first) published, any older version 
is failing the version check. As far as we know, there is no way to skip this 
version check.

Also, there is previously a NOTE on new maintainer. 

  was:
with warning
Insufficient package version (submitted: 2.0.3, existing: 2.1.2)

We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
branch-2.1 after we ship 2.2.1.

The root cause of the issue is in the package version check in CRAN check. 
After the SparkR package version 2.1.2 (is first) published, any older version 
is failing the version check. As far as we know, there is no way to skip this 
version check.

Also, there is previously a NOTE on new maintainer. 


> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> with warning
> * checking CRAN incoming feasibility ... WARNING
> Maintainer: 'Shivaram Venkataraman '
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> Unknown, possibly mis-spelled, fields in DESCRIPTION:
>   'RoxygenNote'
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1.
> The root cause of the issue is in the package version check in CRAN check. 
> After the SparkR package version 2.1.2 (is first) published, any older 
> version is failing the version check. As far as we know, there is no way to 
> skip this version check.
> Also, there is previously a NOTE on new maintainer. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-22327:
-
Description: 
with warning
Insufficient package version (submitted: 2.0.3, existing: 2.1.2)

We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
branch-2.1 after we ship 2.2.1.

The root cause of the issue is in the package version check in CRAN check. 
After the SparkR package version 2.1.2 (is first) published, any older version 
is failing the version check. As far as we know, there is no way to skip this 
version check.

Also, there is previously a NOTE on new maintainer. 

  was:
with error
Insufficient package version (submitted: 2.0.3, existing: 2.1.2)

We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
branch-2.1 after we ship 2.2.1


> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> with warning
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1.
> The root cause of the issue is in the package version check in CRAN check. 
> After the SparkR package version 2.1.2 (is first) published, any older 
> version is failing the version check. As far as we know, there is no way to 
> skip this version check.
> Also, there is previously a NOTE on new maintainer. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213781#comment-16213781
 ] 

Felix Cheung commented on SPARK-22327:
--

https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3956/consoleFull

> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> with error
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-22327:
-
Affects Version/s: 2.3.0

> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> with error
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-22327:
-
Affects Version/s: 2.2.1

> R CRAN check fails on non-latest branches
> -
>
> Key: SPARK-22327
> URL: https://issues.apache.org/jira/browse/SPARK-22327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.4, 2.0.3, 2.1.3, 2.2.1
>Reporter: Felix Cheung
>
> with error
> Insufficient package version (submitted: 2.0.3, existing: 2.1.2)
> We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
> branch-2.1 after we ship 2.2.1



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22327) R CRAN check fails on non-latest branches

2017-10-21 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-22327:


 Summary: R CRAN check fails on non-latest branches
 Key: SPARK-22327
 URL: https://issues.apache.org/jira/browse/SPARK-22327
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 1.6.4, 2.0.3, 2.1.3
Reporter: Felix Cheung


with error
Insufficient package version (submitted: 2.0.3, existing: 2.1.2)

We have seen this in branch-1.6, branch-2.0, and this would be a problem for 
branch-2.1 after we ship 2.2.1



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17608) Long type has incorrect serialization/deserialization

2017-10-15 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205379#comment-16205379
 ] 

Felix Cheung commented on SPARK-17608:
--

any taker on this?

> Long type has incorrect serialization/deserialization
> -
>
> Key: SPARK-17608
> URL: https://issues.apache.org/jira/browse/SPARK-17608
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Thomas Powell
>
> Am hitting issues when using {{dapply}} on a data frame that contains a 
> {{bigint}} in its schema. When this is converted to a SparkR data frame a 
> "bigint" gets converted to a R {{numeric}} type: 
> https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25.
> However, the R {{numeric}} type gets converted to 
> {{org.apache.spark.sql.types.DoubleType}}: 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L97.
> The two directions therefore aren't compatible. If I use the same schema when 
> using dapply (and just an identity function) I will get type collisions 
> because the output type is a double but the schema expects a bigint. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22281) Handle R method breaking signature changes

2017-10-15 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-22281:
-
Description: 
cAs discussed here
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555

this WARNING on R-devel

* checking for code/documentation mismatches ... WARNING
Codoc mismatches from documentation object 'attach':
attach
  Code: function(what, pos = 2L, name = deparse(substitute(what),
 backtick = FALSE), warn.conflicts = TRUE)
  Docs: function(what, pos = 2L, name = deparse(substitute(what)),
 warn.conflicts = TRUE)
  Mismatches in argument default values:
Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: 
deparse(substitute(what))

Checked the latest release R 3.4.1 and the signature change wasn't there. This 
likely indicated an upcoming change in the next R release that could incur this 
new warning when we attempt to publish the package.

Not sure what we can do now since we work with multiple versions of R and they 
will have different signatures then.


  was:
As discussed here
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555

this WARNING on R-devel

* checking for code/documentation mismatches ... WARNING
Codoc mismatches from documentation object 'attach':
attach
  Code: function(what, pos = 2L, name = deparse(substitute(what),
 backtick = FALSE), warn.conflicts = TRUE)
  Docs: function(what, pos = 2L, name = deparse(substitute(what)),
 warn.conflicts = TRUE)
  Mismatches in argument default values:
Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: 
deparse(substitute(what))

Checked the latest release R 3.4.1 and the signature change wasn't there. This 
likely indicated an upcoming change in the next R release that could insur this 
new warning when we attempt to publish the package.

Not sure what we can do now since we work with multiple versions of R and they 
will have different signatures then.



> Handle R method breaking signature changes
> --
>
> Key: SPARK-22281
> URL: https://issues.apache.org/jira/browse/SPARK-22281
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> cAs discussed here
> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555
> this WARNING on R-devel
> * checking for code/documentation mismatches ... WARNING
> Codoc mismatches from documentation object 'attach':
> attach
>   Code: function(what, pos = 2L, name = deparse(substitute(what),
>  backtick = FALSE), warn.conflicts = TRUE)
>   Docs: function(what, pos = 2L, name = deparse(substitute(what)),
>  warn.conflicts = TRUE)
>   Mismatches in argument default values:
> Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: 
> deparse(substitute(what))
> Checked the latest release R 3.4.1 and the signature change wasn't there. 
> This likely indicated an upcoming change in the next R release that could 
> incur this new warning when we attempt to publish the package.
> Not sure what we can do now since we work with multiple versions of R and 
> they will have different signatures then.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22281) Handle R method breaking signature changes

2017-10-15 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205313#comment-16205313
 ] 

Felix Cheung commented on SPARK-22281:
--

And here for all r-devel WARN
https://cran.r-project.org/web/checks/check_results_SparkR.html


> Handle R method breaking signature changes
> --
>
> Key: SPARK-22281
> URL: https://issues.apache.org/jira/browse/SPARK-22281
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0, 2.21
>Reporter: Felix Cheung
>
> As discussed here
> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555
> this WARNING on R-devel
> * checking for code/documentation mismatches ... WARNING
> Codoc mismatches from documentation object 'attach':
> attach
>   Code: function(what, pos = 2L, name = deparse(substitute(what),
>  backtick = FALSE), warn.conflicts = TRUE)
>   Docs: function(what, pos = 2L, name = deparse(substitute(what)),
>  warn.conflicts = TRUE)
>   Mismatches in argument default values:
> Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: 
> deparse(substitute(what))
> Checked the latest release R 3.4.1 and the signature change wasn't there. 
> This likely indicated an upcoming change in the next R release that could 
> insur this new warning when we attempt to publish the package.
> Not sure what we can do now since we work with multiple versions of R and 
> they will have different signatures then.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22281) Handle R method breaking signature changes

2017-10-15 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-22281:


 Summary: Handle R method breaking signature changes
 Key: SPARK-22281
 URL: https://issues.apache.org/jira/browse/SPARK-22281
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 2.21, 2.3.0
Reporter: Felix Cheung


As discussed here
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555

this WARNING on R-devel

* checking for code/documentation mismatches ... WARNING
Codoc mismatches from documentation object 'attach':
attach
  Code: function(what, pos = 2L, name = deparse(substitute(what),
 backtick = FALSE), warn.conflicts = TRUE)
  Docs: function(what, pos = 2L, name = deparse(substitute(what)),
 warn.conflicts = TRUE)
  Mismatches in argument default values:
Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: 
deparse(substitute(what))

Checked the latest release R 3.4.1 and the signature change wasn't there. This 
likely indicated an upcoming change in the next R release that could insur this 
new warning when we attempt to publish the package.

Not sure what we can do now since we work with multiple versions of R and they 
will have different signatures then.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19700) Design an API for pluggable scheduler implementations

2017-10-11 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200659#comment-16200659
 ] 

Felix Cheung commented on SPARK-19700:
--

Not that I'm aware of - I agree it is very important to take feedback from all 
these different efforts into considerations when we come up with the plan.

I was thinking about starting to draft up a plan basing on the k8s effort. Does 
anyone have a better suggestion on how do we start on this?


> Design an API for pluggable scheduler implementations
> -
>
> Key: SPARK-19700
> URL: https://issues.apache.org/jira/browse/SPARK-19700
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Matt Cheah
>
> One point that was brought up in discussing SPARK-18278 was that schedulers 
> cannot easily be added to Spark without forking the whole project. The main 
> reason is that much of the scheduler's behavior fundamentally depends on the 
> CoarseGrainedSchedulerBackend class, which is not part of the public API of 
> Spark and is in fact quite a complex module. As resource management and 
> allocation continues evolves, Spark will need to be integrated with more 
> cluster managers, but maintaining support for all possible allocators in the 
> Spark project would be untenable. Furthermore, it would be impossible for 
> Spark to support proprietary frameworks that are developed by specific users 
> for their other particular use cases.
> Therefore, this ticket proposes making scheduler implementations fully 
> pluggable. The idea is that Spark will provide a Java/Scala interface that is 
> to be implemented by a scheduler that is backed by the cluster manager of 
> interest. The user can compile their scheduler's code into a JAR that is 
> placed on the driver's classpath. Finally, as is the case in the current 
> world, the scheduler implementation is selected and dynamically loaded 
> depending on the user's provided master URL.
> Determining the correct API is the most challenging problem. The current 
> CoarseGrainedSchedulerBackend handles many responsibilities, some of which 
> will be common across all cluster managers, and some which will be specific 
> to a particular cluster manager. For example, the particular mechanism for 
> creating the executor processes will differ between YARN and Mesos, but, once 
> these executors have started running, the means to submit tasks to them over 
> the Netty RPC is identical across the board.
> We must also consider a plugin model and interface for submitting the 
> application as well, because different cluster managers support different 
> configuration options, and thus the driver must be bootstrapped accordingly. 
> For example, in YARN mode the application and Hadoop configuration must be 
> packaged and shipped to the distributed cache prior to launching the job. A 
> prototype of a Kubernetes implementation starts a Kubernetes pod that runs 
> the driver in cluster mode.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17275) Flaky test: org.apache.spark.deploy.RPackageUtilsSuite.jars that don't exist are skipped and print warning

2017-10-08 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196380#comment-16196380
 ] 

Felix Cheung commented on SPARK-17275:
--

perhaps we should close this? it's been a year...

> Flaky test: org.apache.spark.deploy.RPackageUtilsSuite.jars that don't exist 
> are skipped and print warning
> --
>
> Key: SPARK-17275
> URL: https://issues.apache.org/jira/browse/SPARK-17275
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Yin Huai
>
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.4/1623/testReport/junit/org.apache.spark.deploy/RPackageUtilsSuite/jars_that_don_t_exist_are_skipped_and_print_warning/
> {code}
> Error Message
> java.io.IOException: Unable to delete directory 
> /home/jenkins/.ivy2/cache/a/mylib.
> Stacktrace
> sbt.ForkMain$ForkError: java.io.IOException: Unable to delete directory 
> /home/jenkins/.ivy2/cache/a/mylib.
>   at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1541)
>   at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270)
>   at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653)
>   at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
>   at 
> org.apache.spark.deploy.IvyTestUtils$.purgeLocalIvyCache(IvyTestUtils.scala:394)
>   at 
> org.apache.spark.deploy.IvyTestUtils$.withRepository(IvyTestUtils.scala:384)
>   at 
> org.apache.spark.deploy.RPackageUtilsSuite$$anonfun$3.apply$mcV$sp(RPackageUtilsSuite.scala:103)
>   at 
> org.apache.spark.deploy.RPackageUtilsSuite$$anonfun$3.apply(RPackageUtilsSuite.scala:100)
>   at 
> org.apache.spark.deploy.RPackageUtilsSuite$$anonfun$3.apply(RPackageUtilsSuite.scala:100)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at 
> org.apache.spark.deploy.RPackageUtilsSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(RPackageUtilsSuite.scala:38)
>   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255)
>   at 
> org.apache.spark.deploy.RPackageUtilsSuite.runTest(RPackageUtilsSuite.scala:38)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:29)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:29)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:357)
>  

[jira] [Commented] (SPARK-22202) Release tgz content differences for python and R

2017-10-05 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16194141#comment-16194141
 ] 

Felix Cheung commented on SPARK-22202:
--

[~holden.ka...@gmail.com] actually, I think for R we would go the other way - 
we would want to include what's in hadoop2.6 only in all other release profiles 
(ie. run *this* then create tgz)
so I think the approaches are potentially opposite for R and python.

> Release tgz content differences for python and R
> 
>
> Key: SPARK-22202
> URL: https://issues.apache.org/jira/browse/SPARK-22202
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SparkR
>Affects Versions: 2.1.2, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>Priority: Minor
>
> As a follow up to SPARK-22167, currently we are running different 
> profiles/steps in make-release.sh for hadoop2.7 vs hadoop2.6 (and others), we 
> should consider if these differences are significant and whether they should 
> be addressed.
> A couple of things:
> - R.../doc directory is not in any release jar except hadoop 2.6
> - python/dist, python.egg-info are not in any release jar except hadoop 2.7
> - R DESCRIPTION has a few additions
> I've checked to confirm these are the same in 2.1.1 release so this isn't a 
> regression.
> {code}
> spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/doc:
> sparkr-vignettes.Rmd
> sparkr-vignettes.R
> sparkr-vignettes.html
> index.html
> Only in spark-2.1.2-bin-hadoop2.7/python: dist
> Only in spark-2.1.2-bin-hadoop2.7/python/pyspark: python
> Only in spark-2.1.2-bin-hadoop2.7/python: pyspark.egg-info
> diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/DESCRIPTION 
> spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/DESCRIPTION
> 25a26,27
> > NeedsCompilation: no
> > Packaged: 2017-10-03 00:42:30 UTC; holden
> 31c33
> < Built: R 3.4.1; ; 2017-10-02 23:18:21 UTC; unix
> ---
> > Built: R 3.4.1; ; 2017-10-03 00:45:27 UTC; unix
> Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR: doc
> diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/html/00Index.html 
> spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/html/00Index.html
> 16a17
> > User guides, package vignettes and other 
> > documentation.
> Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/Meta: vignette.rds
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22202) Release tgz content differences for python and R

2017-10-05 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-22202:
-
Priority: Minor  (was: Major)

> Release tgz content differences for python and R
> 
>
> Key: SPARK-22202
> URL: https://issues.apache.org/jira/browse/SPARK-22202
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SparkR
>Affects Versions: 2.1.2, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>Priority: Minor
>
> As a follow up to SPARK-22167, currently we are running different 
> profiles/steps in make-release.sh for hadoop2.7 vs hadoop2.6 (and others), we 
> should consider if these differences are significant and whether they should 
> be addressed.
> A couple of things:
> - R.../doc directory is not in any release jar except hadoop 2.6
> - python/dist, python.egg-info are not in any release jar except hadoop 2.7
> - R DESCRIPTION has a few additions
> I've checked to confirm these are the same in 2.1.1 release so this isn't a 
> regression.
> {code}
> spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/doc:
> sparkr-vignettes.Rmd
> sparkr-vignettes.R
> sparkr-vignettes.html
> index.html
> Only in spark-2.1.2-bin-hadoop2.7/python: dist
> Only in spark-2.1.2-bin-hadoop2.7/python/pyspark: python
> Only in spark-2.1.2-bin-hadoop2.7/python: pyspark.egg-info
> diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/DESCRIPTION 
> spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/DESCRIPTION
> 25a26,27
> > NeedsCompilation: no
> > Packaged: 2017-10-03 00:42:30 UTC; holden
> 31c33
> < Built: R 3.4.1; ; 2017-10-02 23:18:21 UTC; unix
> ---
> > Built: R 3.4.1; ; 2017-10-03 00:45:27 UTC; unix
> Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR: doc
> diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/html/00Index.html 
> spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/html/00Index.html
> 16a17
> > User guides, package vignettes and other 
> > documentation.
> Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/Meta: vignette.rds
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22202) Release tgz content differences for python and R

2017-10-05 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193239#comment-16193239
 ] 

Felix Cheung commented on SPARK-22202:
--

[~holden.ka...@gmail.com] would you be concerned with the python differences?
if not, I'll turn this into just for R.

> Release tgz content differences for python and R
> 
>
> Key: SPARK-22202
> URL: https://issues.apache.org/jira/browse/SPARK-22202
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SparkR
>Affects Versions: 2.1.2, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> As a follow up to SPARK-22167, currently we are running different 
> profiles/steps in make-release.sh for hadoop2.7 vs hadoop2.6 (and others), we 
> should consider if these differences are significant and whether they should 
> be addressed.
> A couple of things:
> - R.../doc directory is not in any release jar except hadoop 2.6
> - python/dist, python.egg-info are not in any release jar except hadoop 2.7
> - R DESCRIPTION has a few additions
> I've checked to confirm these are the same in 2.1.1 release so this isn't a 
> regression.
> {code}
> spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/doc:
> sparkr-vignettes.Rmd
> sparkr-vignettes.R
> sparkr-vignettes.html
> index.html
> Only in spark-2.1.2-bin-hadoop2.7/python: dist
> Only in spark-2.1.2-bin-hadoop2.7/python/pyspark: python
> Only in spark-2.1.2-bin-hadoop2.7/python: pyspark.egg-info
> diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/DESCRIPTION 
> spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/DESCRIPTION
> 25a26,27
> > NeedsCompilation: no
> > Packaged: 2017-10-03 00:42:30 UTC; holden
> 31c33
> < Built: R 3.4.1; ; 2017-10-02 23:18:21 UTC; unix
> ---
> > Built: R 3.4.1; ; 2017-10-03 00:45:27 UTC; unix
> Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR: doc
> diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/html/00Index.html 
> spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/html/00Index.html
> 16a17
> > User guides, package vignettes and other 
> > documentation.
> Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/Meta: vignette.rds
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22202) Release tgz content differences for python and R

2017-10-05 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193238#comment-16193238
 ] 

Felix Cheung commented on SPARK-22202:
--

Yes, exactly.

> Release tgz content differences for python and R
> 
>
> Key: SPARK-22202
> URL: https://issues.apache.org/jira/browse/SPARK-22202
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SparkR
>Affects Versions: 2.1.2, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> As a follow up to SPARK-22167, currently we are running different 
> profiles/steps in make-release.sh for hadoop2.7 vs hadoop2.6 (and others), we 
> should consider if these differences are significant and whether they should 
> be addressed.
> A couple of things:
> - R.../doc directory is not in any release jar except hadoop 2.6
> - python/dist, python.egg-info are not in any release jar except hadoop 2.7
> - R DESCRIPTION has a few additions
> I've checked to confirm these are the same in 2.1.1 release so this isn't a 
> regression.
> {code}
> spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/doc:
> sparkr-vignettes.Rmd
> sparkr-vignettes.R
> sparkr-vignettes.html
> index.html
> Only in spark-2.1.2-bin-hadoop2.7/python: dist
> Only in spark-2.1.2-bin-hadoop2.7/python/pyspark: python
> Only in spark-2.1.2-bin-hadoop2.7/python: pyspark.egg-info
> diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/DESCRIPTION 
> spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/DESCRIPTION
> 25a26,27
> > NeedsCompilation: no
> > Packaged: 2017-10-03 00:42:30 UTC; holden
> 31c33
> < Built: R 3.4.1; ; 2017-10-02 23:18:21 UTC; unix
> ---
> > Built: R 3.4.1; ; 2017-10-03 00:45:27 UTC; unix
> Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR: doc
> diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/html/00Index.html 
> spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/html/00Index.html
> 16a17
> > User guides, package vignettes and other 
> > documentation.
> Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/Meta: vignette.rds
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22202) Release tgz content differences for python and R

2017-10-04 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-22202:
-
Description: 
As a follow up to SPARK-22167, currently we are running different 
profiles/steps in make-release.sh for hadoop2.7 vs hadoop2.6 (and others), we 
should consider if these differences are significant and whether they should be 
addressed.

A couple of things:
- R.../doc directory is not in any release jar except hadoop 2.6
- python/dist, python.egg-info are not in any release jar except hadoop 2.7
- R DESCRIPTION has a few additions

I've checked to confirm these are the same in 2.1.1 release so this isn't a 
regression.

{code}
spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/doc:
sparkr-vignettes.Rmd
sparkr-vignettes.R
sparkr-vignettes.html
index.html

Only in spark-2.1.2-bin-hadoop2.7/python: dist
Only in spark-2.1.2-bin-hadoop2.7/python/pyspark: python
Only in spark-2.1.2-bin-hadoop2.7/python: pyspark.egg-info

diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/DESCRIPTION 
spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/DESCRIPTION
25a26,27
> NeedsCompilation: no
> Packaged: 2017-10-03 00:42:30 UTC; holden
31c33
< Built: R 3.4.1; ; 2017-10-02 23:18:21 UTC; unix
---
> Built: R 3.4.1; ; 2017-10-03 00:45:27 UTC; unix
Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR: doc
diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/html/00Index.html 
spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/html/00Index.html
16a17
> User guides, package vignettes and other 
> documentation.
Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/Meta: vignette.rds
{code}

  was:
As a follow up to SPARK-22167, currently we are running different 
profiles/steps in make-release.sh for hadoop2.7 vs hadoop2.6 (and others), we 
should consider if these differences are significant and whether they should be 
addressed.

[will add more info on this soon]


> Release tgz content differences for python and R
> 
>
> Key: SPARK-22202
> URL: https://issues.apache.org/jira/browse/SPARK-22202
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SparkR
>Affects Versions: 2.1.2, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> As a follow up to SPARK-22167, currently we are running different 
> profiles/steps in make-release.sh for hadoop2.7 vs hadoop2.6 (and others), we 
> should consider if these differences are significant and whether they should 
> be addressed.
> A couple of things:
> - R.../doc directory is not in any release jar except hadoop 2.6
> - python/dist, python.egg-info are not in any release jar except hadoop 2.7
> - R DESCRIPTION has a few additions
> I've checked to confirm these are the same in 2.1.1 release so this isn't a 
> regression.
> {code}
> spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/doc:
> sparkr-vignettes.Rmd
> sparkr-vignettes.R
> sparkr-vignettes.html
> index.html
> Only in spark-2.1.2-bin-hadoop2.7/python: dist
> Only in spark-2.1.2-bin-hadoop2.7/python/pyspark: python
> Only in spark-2.1.2-bin-hadoop2.7/python: pyspark.egg-info
> diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/DESCRIPTION 
> spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/DESCRIPTION
> 25a26,27
> > NeedsCompilation: no
> > Packaged: 2017-10-03 00:42:30 UTC; holden
> 31c33
> < Built: R 3.4.1; ; 2017-10-02 23:18:21 UTC; unix
> ---
> > Built: R 3.4.1; ; 2017-10-03 00:45:27 UTC; unix
> Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR: doc
> diff -r spark-2.1.2-bin-hadoop2.7/R/lib/SparkR/html/00Index.html 
> spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/html/00Index.html
> 16a17
> > User guides, package vignettes and other 
> > documentation.
> Only in spark-2.1.2-bin-hadoop2.6/R/lib/SparkR/Meta: vignette.rds
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22202) Release tgz content differences for python and R

2017-10-04 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-22202:
-
Description: 
As a follow up to SPARK-22167, currently we are running different 
profiles/steps in make-release.sh for hadoop2.7 vs hadoop2.6 (and others), we 
should consider if these differences are significant and whether they should be 
addressed.

[will add more info on this soon]

> Release tgz content differences for python and R
> 
>
> Key: SPARK-22202
> URL: https://issues.apache.org/jira/browse/SPARK-22202
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SparkR
>Affects Versions: 2.1.2, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>
> As a follow up to SPARK-22167, currently we are running different 
> profiles/steps in make-release.sh for hadoop2.7 vs hadoop2.6 (and others), we 
> should consider if these differences are significant and whether they should 
> be addressed.
> [will add more info on this soon]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22202) Release tgz content differences for python and R

2017-10-04 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-22202:


 Summary: Release tgz content differences for python and R
 Key: SPARK-22202
 URL: https://issues.apache.org/jira/browse/SPARK-22202
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SparkR
Affects Versions: 2.1.2, 2.2.1, 2.3.0
Reporter: Felix Cheung






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22167) Spark Packaging w/R distro issues

2017-10-03 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16189954#comment-16189954
 ] 

Felix Cheung commented on SPARK-22167:
--

There are likely 2 stages to this.
More pressing might be the fact that hadoop-2.6 and hadoop-2.7 release tgz have 
fairly different content because of how the make-release script is structured.
I will open a new JIRA on this.


> Spark Packaging w/R distro issues
> -
>
> Key: SPARK-22167
> URL: https://issues.apache.org/jira/browse/SPARK-22167
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SparkR
>Affects Versions: 2.1.2
>Reporter: holdenk
>Assignee: holdenk
>Priority: Blocker
> Fix For: 2.1.2, 2.2.1, 2.3.0
>
>
> The Spark packaging for Spark R in 2.1.2 did not work as expected, namely the 
> R directory was missing from the hadoop-2.7 bin distro. This is the version 
> we build the PySpark package for so it's possible this is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-22063) Upgrade lintr to latest commit sha1 ID

2017-10-02 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187574#comment-16187574
 ] 

Felix Cheung edited comment on SPARK-22063 at 10/2/17 9:16 AM:
---

surely, I think we could even start with something simple with 
install.package(..., lib =) (or install_github(..., lib=)) and then library(... 
lib.loc=)



was (Author: felixcheung):
surely, I think we could even start with something simple with 
install.package(..., lib =) (or install_github(..., lib=)) and then library(... 
lib.loc)


> Upgrade lintr to latest commit sha1 ID
> --
>
> Key: SPARK-22063
> URL: https://issues.apache.org/jira/browse/SPARK-22063
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> Currently, we set lintr to {{jimhester/lintr@a769c0b}} (see [this 
> pr|https://github.com/apache/spark/commit/7d1175011c976756efcd4e4e4f70a8fd6f287026])
>  and SPARK-14074.
> Today, I tried to upgrade the latest, 
> https://github.com/jimhester/lintr/commit/5431140ffea65071f1327625d4a8de9688fa7e72
> This fixes many bugs and now finds many instances that I have observed and 
> thought should be caught time to time:
> {code}
> inst/worker/worker.R:71:10: style: Remove spaces before the left parenthesis 
> in a function call.
>   return (output)
>  ^
> R/column.R:241:1: style: Lines should not be more than 100 characters.
> #'
> \href{https://spark.apache.org/docs/latest/sparkr.html#data-type-mapping-between-r-and-spark}{
> ^~~~
> R/context.R:332:1: style: Variable and function names should not be longer 
> than 30 characters.
> spark.getSparkFilesRootDirectory <- function() {
> ^~~~
> R/DataFrame.R:1912:1: style: Lines should not be more than 100 characters.
> #' @param j,select expression for the single Column or a list of columns to 
> select from the SparkDataFrame.
> ^~~
> R/DataFrame.R:1918:1: style: Lines should not be more than 100 characters.
> #' @return A new SparkDataFrame containing only the rows that meet the 
> condition with selected columns.
> ^~~
> R/DataFrame.R:2597:22: style: Remove spaces before the left parenthesis in a 
> function call.
>   return (joinRes)
>  ^
> R/DataFrame.R:2652:1: style: Variable and function names should not be longer 
> than 30 characters.
> generateAliasesForIntersectedCols <- function (x, intersectedColNames, 
> suffix) {
> ^
> R/DataFrame.R:2652:47: style: Remove spaces before the left parenthesis in a 
> function call.
> generateAliasesForIntersectedCols <- function (x, intersectedColNames, 
> suffix) {
>   ^
> R/DataFrame.R:2660:14: style: Remove spaces before the left parenthesis in a 
> function call.
> stop ("The following column name: ", newJoin, " occurs more than once 
> in the 'DataFrame'.",
>  ^
> R/DataFrame.R:3047:1: style: Lines should not be more than 100 characters.
> #' @note The statistics provided by \code{summary} were change in 2.3.0 use 
> \link{describe} for previous defaults.
> ^~
> R/DataFrame.R:3754:1: style: Lines should not be more than 100 characters.
> #' If grouping expression is missing \code{cube} creates a single global 
> aggregate and is equivalent to
> ^~~
> R/DataFrame.R:3789:1: style: Lines should not be more than 100 characters.
> #' If grouping expression is missing \code{rollup} creates a single global 
> aggregate and is equivalent to
> ^
> R/deserialize.R:46:10: style: Remove spaces before the left parenthesis in a 
> function call.
>   switch (type,
>  ^
> R/functions.R:41:1: style: Lines should not be more than 100 characters.
> #' @param x Column to compute on. In \code{window}, it must be a time Column 
> of \code{TimestampType}.
> ^
> R/functions.R:93:1: style: Lines should not be more than 100 characters.
> #' @param x Column to compute on. In \code{shiftLeft}, \code{shiftRight} and 
> \code{shiftRightUnsigned},
> 

[jira] [Commented] (SPARK-22063) Upgrade lintr to latest commit sha1 ID

2017-10-01 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187574#comment-16187574
 ] 

Felix Cheung commented on SPARK-22063:
--

surely, I think we could even start with something simple with 
install.package(..., lib =) (or install_github(..., lib=)) and then library(... 
lib.loc)


> Upgrade lintr to latest commit sha1 ID
> --
>
> Key: SPARK-22063
> URL: https://issues.apache.org/jira/browse/SPARK-22063
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> Currently, we set lintr to {{jimhester/lintr@a769c0b}} (see [this 
> pr|https://github.com/apache/spark/commit/7d1175011c976756efcd4e4e4f70a8fd6f287026])
>  and SPARK-14074.
> Today, I tried to upgrade the latest, 
> https://github.com/jimhester/lintr/commit/5431140ffea65071f1327625d4a8de9688fa7e72
> This fixes many bugs and now finds many instances that I have observed and 
> thought should be caught time to time:
> {code}
> inst/worker/worker.R:71:10: style: Remove spaces before the left parenthesis 
> in a function call.
>   return (output)
>  ^
> R/column.R:241:1: style: Lines should not be more than 100 characters.
> #'
> \href{https://spark.apache.org/docs/latest/sparkr.html#data-type-mapping-between-r-and-spark}{
> ^~~~
> R/context.R:332:1: style: Variable and function names should not be longer 
> than 30 characters.
> spark.getSparkFilesRootDirectory <- function() {
> ^~~~
> R/DataFrame.R:1912:1: style: Lines should not be more than 100 characters.
> #' @param j,select expression for the single Column or a list of columns to 
> select from the SparkDataFrame.
> ^~~
> R/DataFrame.R:1918:1: style: Lines should not be more than 100 characters.
> #' @return A new SparkDataFrame containing only the rows that meet the 
> condition with selected columns.
> ^~~
> R/DataFrame.R:2597:22: style: Remove spaces before the left parenthesis in a 
> function call.
>   return (joinRes)
>  ^
> R/DataFrame.R:2652:1: style: Variable and function names should not be longer 
> than 30 characters.
> generateAliasesForIntersectedCols <- function (x, intersectedColNames, 
> suffix) {
> ^
> R/DataFrame.R:2652:47: style: Remove spaces before the left parenthesis in a 
> function call.
> generateAliasesForIntersectedCols <- function (x, intersectedColNames, 
> suffix) {
>   ^
> R/DataFrame.R:2660:14: style: Remove spaces before the left parenthesis in a 
> function call.
> stop ("The following column name: ", newJoin, " occurs more than once 
> in the 'DataFrame'.",
>  ^
> R/DataFrame.R:3047:1: style: Lines should not be more than 100 characters.
> #' @note The statistics provided by \code{summary} were change in 2.3.0 use 
> \link{describe} for previous defaults.
> ^~
> R/DataFrame.R:3754:1: style: Lines should not be more than 100 characters.
> #' If grouping expression is missing \code{cube} creates a single global 
> aggregate and is equivalent to
> ^~~
> R/DataFrame.R:3789:1: style: Lines should not be more than 100 characters.
> #' If grouping expression is missing \code{rollup} creates a single global 
> aggregate and is equivalent to
> ^
> R/deserialize.R:46:10: style: Remove spaces before the left parenthesis in a 
> function call.
>   switch (type,
>  ^
> R/functions.R:41:1: style: Lines should not be more than 100 characters.
> #' @param x Column to compute on. In \code{window}, it must be a time Column 
> of \code{TimestampType}.
> ^
> R/functions.R:93:1: style: Lines should not be more than 100 characters.
> #' @param x Column to compute on. In \code{shiftLeft}, \code{shiftRight} and 
> \code{shiftRightUnsigned},
> ^~~
> R/functions.R:483:52: style: Remove spaces before the left parenthesis in a 
> function call.
> jcols <- lapply(list(x, ...), function 

[jira] [Commented] (SPARK-22167) Spark Packaging w/R distro issues

2017-09-30 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187156#comment-16187156
 ] 

Felix Cheung commented on SPARK-22167:
--

I think I'd propose a change on this part of the release build to depends on (a 
subset of) check-cran output instead, but thinking more about this the -Psparkr 
is more for developer and should be left as-is.

The issue is the output of install-dev is not really a release format, and I 
guess this has been the de facto release form we have for a very long time.

But this could be a separate follow up for 2.2.1/2.3.



> Spark Packaging w/R distro issues
> -
>
> Key: SPARK-22167
> URL: https://issues.apache.org/jira/browse/SPARK-22167
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SparkR
>Affects Versions: 2.1.2
>Reporter: holdenk
>Assignee: holdenk
>Priority: Blocker
>
> The Spark packaging for Spark R in 2.1.2 did not work as expected, namely the 
> R directory was missing from the hadoop-2.7 bin distro. This is the version 
> we build the PySpark package for so it's possible this is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15799) Release SparkR on CRAN

2017-09-25 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180261#comment-16180261
 ] 

Felix Cheung commented on SPARK-15799:
--

I commented on the PR.
I don't think there is any code changes pending, we are just waiting for the 
next RC for 2.1.2 release at this point

> Release SparkR on CRAN
> --
>
> Key: SPARK-15799
> URL: https://issues.apache.org/jira/browse/SPARK-15799
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Xiangrui Meng
>
> Story: "As an R user, I would like to see SparkR released on CRAN, so I can 
> use SparkR easily in an existing R environment and have other packages built 
> on top of SparkR."
> I made this JIRA with the following questions in mind:
> * Are there known issues that prevent us releasing SparkR on CRAN?
> * Do we want to package Spark jars in the SparkR release?
> * Are there license issues?
> * How does it fit into Spark's release process?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18131) Support returning Vector/Dense Vector from backend

2017-09-17 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169579#comment-16169579
 ] 

Felix Cheung commented on SPARK-18131:
--

bump. I think this is a real big problem - results from mllib is basically 
unusable for R user:
{code}
ead(predict(model, test))$probability
[[1]]
Java ref type org.apache.spark.ml.linalg.DenseVector id 130

[[2]]
Java ref type org.apache.spark.ml.linalg.DenseVector id 131

[[3]]
Java ref type org.apache.spark.ml.linalg.DenseVector id 132

[[4]]
Java ref type org.apache.spark.ml.linalg.DenseVector id 133

[[5]]
Java ref type org.apache.spark.ml.linalg.DenseVector id 134

[[6]]
Java ref type org.apache.spark.ml.linalg.DenseVector id 135

> head(predict(model, test))$feature
[[1]]
Java ref type org.apache.spark.ml.linalg.SparseVector id 161

[[2]]
Java ref type org.apache.spark.ml.linalg.SparseVector id 162

[[3]]
Java ref type org.apache.spark.ml.linalg.SparseVector id 163

[[4]]
Java ref type org.apache.spark.ml.linalg.SparseVector id 164

[[5]]
Java ref type org.apache.spark.ml.linalg.SparseVector id 165

[[6]]
Java ref type org.apache.spark.ml.linalg.SparseVector id 166

> head(predict(model, test))$rawPrediction
[[1]]
Java ref type org.apache.spark.ml.linalg.DenseVector id 210

[[2]]
Java ref type org.apache.spark.ml.linalg.DenseVector id 211

[[3]]
Java ref type org.apache.spark.ml.linalg.DenseVector id 212

[[4]]
Java ref type org.apache.spark.ml.linalg.DenseVector id 213
...

{code}

> Support returning Vector/Dense Vector from backend
> --
>
> Key: SPARK-18131
> URL: https://issues.apache.org/jira/browse/SPARK-18131
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Miao Wang
>
> For `spark.logit`, there is a `probabilityCol`, which is a vector in the 
> backend (scala side). When we do collect(select(df, "probabilityCol")), 
> backend returns the java object handle (memory address). We need to implement 
> a method to convert a Vector/Dense Vector column as R vector, which can be 
> read in SparkR. It is a followup JIRA of adding `spark.logit`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21802) Make sparkR MLP summary() expose probability column

2017-09-17 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169575#comment-16169575
 ] 

Felix Cheung commented on SPARK-21802:
--

yes if this is from the prediction (with rawPrediction etc) it should be from 
predict not summary, sorry I misspoke

> Make sparkR MLP summary() expose probability column
> ---
>
> Key: SPARK-21802
> URL: https://issues.apache.org/jira/browse/SPARK-21802
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Weichen Xu
>Priority: Minor
>
> Make sparkR MLP summary() expose probability column



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21802) Make sparkR MLP summary() expose probability column

2017-09-17 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169487#comment-16169487
 ] 

Felix Cheung commented on SPARK-21802:
--

Can you clarify where you see it? I just ran against the latest from master 
branch with R's spark.mlp and don't see any probability?
{code}
   summary <- summary(model)
>
> summar
Error: object 'summar' not found
> summary
$numOfInputs
[1] 4

$numOfOutputs
[1] 3

$layers
[1] 4 5 4 3

$weights
$weights[[1]]
[1] -0.878743

$weights[[2]]
[1] 0.2154151

$weights[[3]]
[1] -1.16304

$weights[[4]]
[1] -0.6583214

$weights[[5]]
[1] 1.009825

$weights[[6]]
[1] 0.2934758

$weights[[7]]
[1] -0.9528391

$weights[[8]]
[1] 0.4029029

$weights[[9]]
[1] -1.038043

$weights[[10]]
[1] 0.05164362

$weights[[11]]
[1] 0.9349549

$weights[[12]]
[1] -0.4283766

$weights[[13]]
[1] -0.5082246

$weights[[14]]
[1] -0.09600512

$weights[[15]]
[1] -0.7843158

$weights[[16]]
[1] -1.199724

$weights[[17]]
[1] 0.6001083

$weights[[18]]
[1] 0.1102863

$weights[[19]]
[1] 0.8259955

$weights[[20]]
[1] -0.4428631

$weights[[21]]
[1] 0.9691921

$weights[[22]]
[1] -0.8472953

$weights[[23]]
[1] -0.8521915

$weights[[24]]
[1] -0.770886

$weights[[25]]
[1] 0.7276595

$weights[[26]]
[1] -0.7675585

$weights[[27]]
[1] 0.1299603

$weights[[28]]
[1] -1.056605

$weights[[29]]
[1] 0.4421284

$weights[[30]]
[1] -0.3245397

$weights[[31]]
[1] -0.904001

$weights[[32]]
[1] 0.2793773

$weights[[33]]
[1] 1.045579

$weights[[34]]
[1] -0.5379433

$weights[[35]]
[1] -1.006988

$weights[[36]]
[1] -0.9652683

$weights[[37]]
[1] 0.8719215

$weights[[38]]
[1] -0.917228

$weights[[39]]
[1] 1.020896

$weights[[40]]
[1] 0.4951883

$weights[[41]]
[1] 0.7487854

$weights[[42]]
[1] -0.7130144

$weights[[43]]
[1] 0.598029

$weights[[44]]
[1] 0.8097242

$weights[[45]]
[1] -1.056401

$weights[[46]]
[1] -0.2041643

$weights[[47]]
[1] -0.9605507

$weights[[48]]
[1] -0.2151837

$weights[[49]]
[1] 0.9075675

$weights[[50]]
[1] 0.004306968

$weights[[51]]
[1] -0.4778498

$weights[[52]]
[1] 0.3312689

$weights[[53]]
[1] 0.6160091

$weights[[54]]
[1] 0.431806

$weights[[55]]
[1] -0.6039096

$weights[[56]]
[1] -0.008508999

$weights[[57]]
[1] 0.7539017

$weights[[58]]
[1] -1.186487

$weights[[59]]
[1] -0.8660557

$weights[[60]]
[1] 0.4443504

$weights[[61]]
[1] 0.5170843

$weights[[62]]
[1] 0.08373222

$weights[[63]]
[1] -1.039143

$weights[[64]]
[1] -0.4787311
{code}

this isn't the summary() right, it's the prediction I think

> Make sparkR MLP summary() expose probability column
> ---
>
> Key: SPARK-21802
> URL: https://issues.apache.org/jira/browse/SPARK-21802
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Weichen Xu
>Priority: Minor
>
> Make sparkR MLP summary() expose probability column



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20684) expose createOrReplaceGlobalTempView/createGlobalTempView and dropGlobalTempView in SparkR

2017-09-10 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160420#comment-16160420
 ] 

Felix Cheung commented on SPARK-20684:
--

I"m making this primary JIRA for tracking this issue and keeping this open.
Please see the discussion in the PR.


> expose createOrReplaceGlobalTempView/createGlobalTempView and 
> dropGlobalTempView in SparkR
> --
>
> Key: SPARK-20684
> URL: https://issues.apache.org/jira/browse/SPARK-20684
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Hossein Falaki
>
> This is a useful API that is not exposed in SparkR. It will help with moving 
> data between languages on a single single Spark application.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20684) expose createOrReplaceGlobalTempView/createGlobalTempView and dropGlobalTempView in SparkR

2017-09-10 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-20684:
-
Summary: expose createOrReplaceGlobalTempView/createGlobalTempView and 
dropGlobalTempView in SparkR  (was: expose createGlobalTempView and 
dropGlobalTempView in SparkR)

> expose createOrReplaceGlobalTempView/createGlobalTempView and 
> dropGlobalTempView in SparkR
> --
>
> Key: SPARK-20684
> URL: https://issues.apache.org/jira/browse/SPARK-20684
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Hossein Falaki
>
> This is a useful API that is not exposed in SparkR. It will help with moving 
> data between languages on a single single Spark application.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-20684) expose createGlobalTempView and dropGlobalTempView in SparkR

2017-09-10 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung reopened SPARK-20684:
--

> expose createGlobalTempView and dropGlobalTempView in SparkR
> 
>
> Key: SPARK-20684
> URL: https://issues.apache.org/jira/browse/SPARK-20684
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Hossein Falaki
>
> This is a useful API that is not exposed in SparkR. It will help with moving 
> data between languages on a single single Spark application.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21128) Running R tests multiple times failed due to pre-exiting "spark-warehouse" / "metastore_db"

2017-09-08 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-21128:
-
Target Version/s: 2.2.1, 2.3.0  (was: 2.3.0)
   Fix Version/s: 2.2.1

> Running R tests multiple times failed due to pre-exiting "spark-warehouse" / 
> "metastore_db"
> ---
>
> Key: SPARK-21128
> URL: https://issues.apache.org/jira/browse/SPARK-21128
> Project: Spark
>  Issue Type: Test
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.2.1, 2.3.0
>
>
> Currently, running R tests multiple times fails due to pre-exiting 
> "spark-warehouse" / "metastore_db" as below:
> {code}
> SparkSQL functions: Spark package found in SPARK_HOME: .../spark
> ...1234...
> {code}
> {code}
> Failed 
> -
> 1. Failure: No extra files are created in SPARK_HOME by starting session and 
> making calls (@test_sparkSQL.R#3384)
> length(list1) not equal to length(list2).
> 1/1 mismatches
> [1] 25 - 23 == 2
> 2. Failure: No extra files are created in SPARK_HOME by starting session and 
> making calls (@test_sparkSQL.R#3384)
> sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE).
> 10/25 mismatches
> x[16]: "metastore_db"
> y[16]: "pkg"
> x[17]: "pkg"
> y[17]: "R"
> x[18]: "R"
> y[18]: "README.md"
> x[19]: "README.md"
> y[19]: "run-tests.sh"
> x[20]: "run-tests.sh"
> y[20]: "SparkR_2.2.0.tar.gz"
> x[21]: "metastore_db"
> y[21]: "pkg"
> x[22]: "pkg"
> y[22]: "R"
> x[23]: "R"
> y[23]: "README.md"
> x[24]: "README.md"
> y[24]: "run-tests.sh"
> x[25]: "run-tests.sh"
> y[25]: "SparkR_2.2.0.tar.gz"
> 3. Failure: No extra files are created in SPARK_HOME by starting session and 
> making calls (@test_sparkSQL.R#3388)
> length(list1) not equal to length(list2).
> 1/1 mismatches
> [1] 25 - 23 == 2
> 4. Failure: No extra files are created in SPARK_HOME by starting session and 
> making calls (@test_sparkSQL.R#3388)
> sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE).
> 10/25 mismatches
> x[16]: "metastore_db"
> y[16]: "pkg"
> x[17]: "pkg"
> y[17]: "R"
> x[18]: "R"
> y[18]: "README.md"
> x[19]: "README.md"
> y[19]: "run-tests.sh"
> x[20]: "run-tests.sh"
> y[20]: "SparkR_2.2.0.tar.gz"
> x[21]: "metastore_db"
> y[21]: "pkg"
> x[22]: "pkg"
> y[22]: "R"
> x[23]: "R"
> y[23]: "README.md"
> x[24]: "README.md"
> y[24]: "run-tests.sh"
> x[25]: "run-tests.sh"
> y[25]: "SparkR_2.2.0.tar.gz"
> DONE 
> ===
> {code}
> It looks we should remove both "spark-warehouse" and "metastore_db" _before_ 
> listing files into  {{sparkRFilesBefore}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21727) Operating on an ArrayType in a SparkR DataFrame throws error

2017-09-04 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152973#comment-16152973
 ] 

Felix Cheung edited comment on SPARK-21727 at 9/4/17 11:08 PM:
---

precisely.
as far as I can tell, everything should "just work" if we return "array" from 
`getSerdeType()` for this case when length > 1.



was (Author: felixcheung):
precisely.
as far as I can tell, everything should "just work" if we return `array` from 
`getSerdeType()` for this case when length > 1.


> Operating on an ArrayType in a SparkR DataFrame throws error
> 
>
> Key: SPARK-21727
> URL: https://issues.apache.org/jira/browse/SPARK-21727
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Neil McQuarrie
>
> Previously 
> [posted|https://stackoverflow.com/questions/45056973/sparkr-dataframe-with-r-lists-as-elements]
>  this as a stack overflow question but it seems to be a bug.
> If I have an R data.frame where one of the column data types is an integer 
> *list* -- i.e., each of the elements in the column embeds an entire R list of 
> integers -- then it seems I can convert this data.frame to a SparkR DataFrame 
> just fine... SparkR treats the column as ArrayType(Double). 
> However, any subsequent operation on this SparkR DataFrame appears to throw 
> an error.
> Create an example R data.frame:
> {code}
> indices <- 1:4
> myDf <- data.frame(indices)
> myDf$data <- list(rep(0, 20))}}
> {code}
> Examine it to make sure it looks okay:
> {code}
> > str(myDf) 
> 'data.frame':   4 obs. of  2 variables:  
>  $ indices: int  1 2 3 4  
>  $ data   :List of 4
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
> > head(myDf)   
>   indices   data 
> 1   1 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
> 2   2 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
> 3   3 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
> 4   4 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
> {code}
> Convert it to a SparkR DataFrame:
> {code}
> library(SparkR, lib.loc=paste0(Sys.getenv("SPARK_HOME"),"/R/lib"))
> sparkR.session(master = "local[*]")
> mySparkDf <- as.DataFrame(myDf)
> {code}
> Examine the SparkR DataFrame schema; notice that the list column was 
> successfully converted to ArrayType:
> {code}
> > schema(mySparkDf)
> StructType
> |-name = "indices", type = "IntegerType", nullable = TRUE
> |-name = "data", type = "ArrayType(DoubleType,true)", nullable = TRUE
> {code}
> However, operating on the SparkR DataFrame throws an error:
> {code}
> > collect(mySparkDf)
> 17/07/13 17:23:00 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 
> (TID 1)
> java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: 
> java.lang.Double is not a valid external type for schema of array
> if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null 
> else validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
> org.apache.spark.sql.Row, true]), 0, indices), IntegerType) AS indices#0
> ... long stack trace ...
> {code}
> Using Spark 2.2.0, R 3.4.0, Java 1.8.0_131, Windows 10.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21727) Operating on an ArrayType in a SparkR DataFrame throws error

2017-09-04 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152973#comment-16152973
 ] 

Felix Cheung commented on SPARK-21727:
--

precisely.
as far as I can tell, everything should "just work" if we return `array` from 
`getSerdeType()` for this case when length > 1.


> Operating on an ArrayType in a SparkR DataFrame throws error
> 
>
> Key: SPARK-21727
> URL: https://issues.apache.org/jira/browse/SPARK-21727
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Neil McQuarrie
>
> Previously 
> [posted|https://stackoverflow.com/questions/45056973/sparkr-dataframe-with-r-lists-as-elements]
>  this as a stack overflow question but it seems to be a bug.
> If I have an R data.frame where one of the column data types is an integer 
> *list* -- i.e., each of the elements in the column embeds an entire R list of 
> integers -- then it seems I can convert this data.frame to a SparkR DataFrame 
> just fine... SparkR treats the column as ArrayType(Double). 
> However, any subsequent operation on this SparkR DataFrame appears to throw 
> an error.
> Create an example R data.frame:
> {code}
> indices <- 1:4
> myDf <- data.frame(indices)
> myDf$data <- list(rep(0, 20))}}
> {code}
> Examine it to make sure it looks okay:
> {code}
> > str(myDf) 
> 'data.frame':   4 obs. of  2 variables:  
>  $ indices: int  1 2 3 4  
>  $ data   :List of 4
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
> > head(myDf)   
>   indices   data 
> 1   1 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
> 2   2 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
> 3   3 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
> 4   4 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
> {code}
> Convert it to a SparkR DataFrame:
> {code}
> library(SparkR, lib.loc=paste0(Sys.getenv("SPARK_HOME"),"/R/lib"))
> sparkR.session(master = "local[*]")
> mySparkDf <- as.DataFrame(myDf)
> {code}
> Examine the SparkR DataFrame schema; notice that the list column was 
> successfully converted to ArrayType:
> {code}
> > schema(mySparkDf)
> StructType
> |-name = "indices", type = "IntegerType", nullable = TRUE
> |-name = "data", type = "ArrayType(DoubleType,true)", nullable = TRUE
> {code}
> However, operating on the SparkR DataFrame throws an error:
> {code}
> > collect(mySparkDf)
> 17/07/13 17:23:00 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 
> (TID 1)
> java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: 
> java.lang.Double is not a valid external type for schema of array
> if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null 
> else validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
> org.apache.spark.sql.Row, true]), 0, indices), IntegerType) AS indices#0
> ... long stack trace ...
> {code}
> Using Spark 2.2.0, R 3.4.0, Java 1.8.0_131, Windows 10.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21727) Operating on an ArrayType in a SparkR DataFrame throws error

2017-09-02 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151589#comment-16151589
 ] 

Felix Cheung commented on SPARK-21727:
--

any taker of this change?

> Operating on an ArrayType in a SparkR DataFrame throws error
> 
>
> Key: SPARK-21727
> URL: https://issues.apache.org/jira/browse/SPARK-21727
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Neil McQuarrie
>
> Previously 
> [posted|https://stackoverflow.com/questions/45056973/sparkr-dataframe-with-r-lists-as-elements]
>  this as a stack overflow question but it seems to be a bug.
> If I have an R data.frame where one of the column data types is an integer 
> *list* -- i.e., each of the elements in the column embeds an entire R list of 
> integers -- then it seems I can convert this data.frame to a SparkR DataFrame 
> just fine... SparkR treats the column as ArrayType(Double). 
> However, any subsequent operation on this SparkR DataFrame appears to throw 
> an error.
> Create an example R data.frame:
> {code}
> indices <- 1:4
> myDf <- data.frame(indices)
> myDf$data <- list(rep(0, 20))}}
> {code}
> Examine it to make sure it looks okay:
> {code}
> > str(myDf) 
> 'data.frame':   4 obs. of  2 variables:  
>  $ indices: int  1 2 3 4  
>  $ data   :List of 4
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
> > head(myDf)   
>   indices   data 
> 1   1 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
> 2   2 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
> 3   3 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
> 4   4 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
> {code}
> Convert it to a SparkR DataFrame:
> {code}
> library(SparkR, lib.loc=paste0(Sys.getenv("SPARK_HOME"),"/R/lib"))
> sparkR.session(master = "local[*]")
> mySparkDf <- as.DataFrame(myDf)
> {code}
> Examine the SparkR DataFrame schema; notice that the list column was 
> successfully converted to ArrayType:
> {code}
> > schema(mySparkDf)
> StructType
> |-name = "indices", type = "IntegerType", nullable = TRUE
> |-name = "data", type = "ArrayType(DoubleType,true)", nullable = TRUE
> {code}
> However, operating on the SparkR DataFrame throws an error:
> {code}
> > collect(mySparkDf)
> 17/07/13 17:23:00 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 
> (TID 1)
> java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: 
> java.lang.Double is not a valid external type for schema of array
> if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null 
> else validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
> org.apache.spark.sql.Row, true]), 0, indices), IntegerType) AS indices#0
> ... long stack trace ...
> {code}
> Using Spark 2.2.0, R 3.4.0, Java 1.8.0_131, Windows 10.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21727) Operating on an ArrayType in a SparkR DataFrame throws error

2017-09-02 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151588#comment-16151588
 ] 

Felix Cheung commented on SPARK-21727:
--

That is true. I think the documentation is unclear in this case - it should say 
the vector should be converted to list or similar.

In fact the code explicitly does not support column with atomic vector values 
into array column type
https://github.com/apache/spark/blob/master/R/pkg/R/serialize.R#L54

But with that said, I think we could and should make a minor change to support 
that implicitly
https://github.com/apache/spark/blob/master/R/pkg/R/serialize.R#L39


> Operating on an ArrayType in a SparkR DataFrame throws error
> 
>
> Key: SPARK-21727
> URL: https://issues.apache.org/jira/browse/SPARK-21727
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Neil McQuarrie
>
> Previously 
> [posted|https://stackoverflow.com/questions/45056973/sparkr-dataframe-with-r-lists-as-elements]
>  this as a stack overflow question but it seems to be a bug.
> If I have an R data.frame where one of the column data types is an integer 
> *list* -- i.e., each of the elements in the column embeds an entire R list of 
> integers -- then it seems I can convert this data.frame to a SparkR DataFrame 
> just fine... SparkR treats the column as ArrayType(Double). 
> However, any subsequent operation on this SparkR DataFrame appears to throw 
> an error.
> Create an example R data.frame:
> {code}
> indices <- 1:4
> myDf <- data.frame(indices)
> myDf$data <- list(rep(0, 20))}}
> {code}
> Examine it to make sure it looks okay:
> {code}
> > str(myDf) 
> 'data.frame':   4 obs. of  2 variables:  
>  $ indices: int  1 2 3 4  
>  $ data   :List of 4
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
> > head(myDf)   
>   indices   data 
> 1   1 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
> 2   2 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
> 3   3 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
> 4   4 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
> {code}
> Convert it to a SparkR DataFrame:
> {code}
> library(SparkR, lib.loc=paste0(Sys.getenv("SPARK_HOME"),"/R/lib"))
> sparkR.session(master = "local[*]")
> mySparkDf <- as.DataFrame(myDf)
> {code}
> Examine the SparkR DataFrame schema; notice that the list column was 
> successfully converted to ArrayType:
> {code}
> > schema(mySparkDf)
> StructType
> |-name = "indices", type = "IntegerType", nullable = TRUE
> |-name = "data", type = "ArrayType(DoubleType,true)", nullable = TRUE
> {code}
> However, operating on the SparkR DataFrame throws an error:
> {code}
> > collect(mySparkDf)
> 17/07/13 17:23:00 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 
> (TID 1)
> java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: 
> java.lang.Double is not a valid external type for schema of array
> if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null 
> else validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
> org.apache.spark.sql.Row, true]), 0, indices), IntegerType) AS indices#0
> ... long stack trace ...
> {code}
> Using Spark 2.2.0, R 3.4.0, Java 1.8.0_131, Windows 10.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21727) Operating on an ArrayType in a SparkR DataFrame throws error

2017-09-02 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151437#comment-16151437
 ] 

Felix Cheung commented on SPARK-21727:
--

hmm.. I think that's what the error message is saying
{code}
java.lang.Double is not a valid external type for schema of array
{code}

it's finding a double and not an array of double

> Operating on an ArrayType in a SparkR DataFrame throws error
> 
>
> Key: SPARK-21727
> URL: https://issues.apache.org/jira/browse/SPARK-21727
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Neil McQuarrie
>
> Previously 
> [posted|https://stackoverflow.com/questions/45056973/sparkr-dataframe-with-r-lists-as-elements]
>  this as a stack overflow question but it seems to be a bug.
> If I have an R data.frame where one of the column data types is an integer 
> *list* -- i.e., each of the elements in the column embeds an entire R list of 
> integers -- then it seems I can convert this data.frame to a SparkR DataFrame 
> just fine... SparkR treats the column as ArrayType(Double). 
> However, any subsequent operation on this SparkR DataFrame appears to throw 
> an error.
> Create an example R data.frame:
> {code}
> indices <- 1:4
> myDf <- data.frame(indices)
> myDf$data <- list(rep(0, 20))}}
> {code}
> Examine it to make sure it looks okay:
> {code}
> > str(myDf) 
> 'data.frame':   4 obs. of  2 variables:  
>  $ indices: int  1 2 3 4  
>  $ data   :List of 4
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
> > head(myDf)   
>   indices   data 
> 1   1 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
> 2   2 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
> 3   3 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
> 4   4 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
> {code}
> Convert it to a SparkR DataFrame:
> {code}
> library(SparkR, lib.loc=paste0(Sys.getenv("SPARK_HOME"),"/R/lib"))
> sparkR.session(master = "local[*]")
> mySparkDf <- as.DataFrame(myDf)
> {code}
> Examine the SparkR DataFrame schema; notice that the list column was 
> successfully converted to ArrayType:
> {code}
> > schema(mySparkDf)
> StructType
> |-name = "indices", type = "IntegerType", nullable = TRUE
> |-name = "data", type = "ArrayType(DoubleType,true)", nullable = TRUE
> {code}
> However, operating on the SparkR DataFrame throws an error:
> {code}
> > collect(mySparkDf)
> 17/07/13 17:23:00 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 
> (TID 1)
> java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: 
> java.lang.Double is not a valid external type for schema of array
> if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null 
> else validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
> org.apache.spark.sql.Row, true]), 0, indices), IntegerType) AS indices#0
> ... long stack trace ...
> {code}
> Using Spark 2.2.0, R 3.4.0, Java 1.8.0_131, Windows 10.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12157) Support numpy types as return values of Python UDFs

2017-09-01 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151336#comment-16151336
 ] 

Felix Cheung commented on SPARK-12157:
--

any more thought on this?
I think we should at least document this if this is won't fix.

> Support numpy types as return values of Python UDFs
> ---
>
> Key: SPARK-12157
> URL: https://issues.apache.org/jira/browse/SPARK-12157
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 1.5.2
>Reporter: Justin Uang
>
> Currently, if I have a python UDF
> {code}
> import pyspark.sql.types as T
> import pyspark.sql.functions as F
> from pyspark.sql import Row
> import numpy as np
> argmax = F.udf(lambda x: np.argmax(x), T.IntegerType())
> df = sqlContext.createDataFrame([Row(array=[1,2,3])])
> df.select(argmax("array")).count()
> {code}
> I get an exception that is fairly opaque:
> {code}
> Caused by: net.razorvine.pickle.PickleException: expected zero arguments for 
> construction of ClassDict (for numpy.dtype)
> at 
> net.razorvine.pickle.objects.ClassDictConstructor.construct(ClassDictConstructor.java:23)
> at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:701)
> at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:171)
> at net.razorvine.pickle.Unpickler.load(Unpickler.java:85)
> at net.razorvine.pickle.Unpickler.loads(Unpickler.java:98)
> at 
> org.apache.spark.sql.execution.BatchPythonEvaluation$$anonfun$doExecute$1$$anonfun$apply$3.apply(python.scala:404)
> at 
> org.apache.spark.sql.execution.BatchPythonEvaluation$$anonfun$doExecute$1$$anonfun$apply$3.apply(python.scala:403)
> {code}
> Numpy types like np.int and np.float64 should automatically be cast to the 
> proper dtypes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21801) SparkR unit test randomly fail on trees

2017-08-29 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung reassigned SPARK-21801:


Assignee: Felix Cheung

> SparkR unit test randomly fail on trees
> ---
>
> Key: SPARK-21801
> URL: https://issues.apache.org/jira/browse/SPARK-21801
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Tests
>Affects Versions: 2.2.0
>Reporter: Weichen Xu
>Assignee: Felix Cheung
>Priority: Critical
> Fix For: 2.3.0
>
>
> SparkR unit test sometimes will randomly occur such error:
> ```
> 1. Error: spark.randomForest (@test_mllib_tree.R#236) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_87ea3065aeb2 should have at least two distinct values.
> ```
> or
> ```
> 1. Error: spark.decisionTree (@test_mllib_tree.R#353) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_d6a0b492cfa1 should have at least two distinct values.
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21801) SparkR unit test randomly fail on trees

2017-08-29 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-21801.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

> SparkR unit test randomly fail on trees
> ---
>
> Key: SPARK-21801
> URL: https://issues.apache.org/jira/browse/SPARK-21801
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Tests
>Affects Versions: 2.2.0
>Reporter: Weichen Xu
>Assignee: Felix Cheung
>Priority: Critical
> Fix For: 2.3.0
>
>
> SparkR unit test sometimes will randomly occur such error:
> ```
> 1. Error: spark.randomForest (@test_mllib_tree.R#236) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_87ea3065aeb2 should have at least two distinct values.
> ```
> or
> ```
> 1. Error: spark.decisionTree (@test_mllib_tree.R#353) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_d6a0b492cfa1 should have at least two distinct values.
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21805) disable R vignettes code on Windows

2017-08-23 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-21805.
--
  Resolution: Fixed
Assignee: Felix Cheung
   Fix Version/s: 2.3.0
  2.2.1
Target Version/s: 2.2.1, 2.3.0

> disable R vignettes code on Windows
> ---
>
> Key: SPARK-21805
> URL: https://issues.apache.org/jira/browse/SPARK-21805
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
> Fix For: 2.2.1, 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15799) Release SparkR on CRAN

2017-08-23 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138099#comment-16138099
 ] 

Felix Cheung commented on SPARK-15799:
--

[~shivaram] might have more updates. The CRAN submission process is tied to the 
Maintainer email.

There was a few comments from the submission. SPARK-21805 should fix the main 
part of it, and I recall now there is another one with the description text - 
I'll fix that as well.

> Release SparkR on CRAN
> --
>
> Key: SPARK-15799
> URL: https://issues.apache.org/jira/browse/SPARK-15799
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Xiangrui Meng
>
> Story: "As an R user, I would like to see SparkR released on CRAN, so I can 
> use SparkR easily in an existing R environment and have other packages built 
> on top of SparkR."
> I made this JIRA with the following questions in mind:
> * Are there known issues that prevent us releasing SparkR on CRAN?
> * Do we want to package Spark jars in the SparkR release?
> * Are there license issues?
> * How does it fit into Spark's release process?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21805) disable R vignettes code on Windows

2017-08-23 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138097#comment-16138097
 ] 

Felix Cheung commented on SPARK-21805:
--

https://github.com/apache/spark/pull/19016

> disable R vignettes code on Windows
> ---
>
> Key: SPARK-21805
> URL: https://issues.apache.org/jira/browse/SPARK-21805
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Felix Cheung
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12157) Support numpy types as return values of Python UDFs

2017-08-22 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137028#comment-16137028
 ] 

Felix Cheung commented on SPARK-12157:
--

seems like we have a couple of issues here.
I ran into this recently with scalar types - where are we on this?


> Support numpy types as return values of Python UDFs
> ---
>
> Key: SPARK-12157
> URL: https://issues.apache.org/jira/browse/SPARK-12157
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 1.5.2
>Reporter: Justin Uang
>
> Currently, if I have a python UDF
> {code}
> import pyspark.sql.types as T
> import pyspark.sql.functions as F
> from pyspark.sql import Row
> import numpy as np
> argmax = F.udf(lambda x: np.argmax(x), T.IntegerType())
> df = sqlContext.createDataFrame([Row(array=[1,2,3])])
> df.select(argmax("array")).count()
> {code}
> I get an exception that is fairly opaque:
> {code}
> Caused by: net.razorvine.pickle.PickleException: expected zero arguments for 
> construction of ClassDict (for numpy.dtype)
> at 
> net.razorvine.pickle.objects.ClassDictConstructor.construct(ClassDictConstructor.java:23)
> at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:701)
> at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:171)
> at net.razorvine.pickle.Unpickler.load(Unpickler.java:85)
> at net.razorvine.pickle.Unpickler.loads(Unpickler.java:98)
> at 
> org.apache.spark.sql.execution.BatchPythonEvaluation$$anonfun$doExecute$1$$anonfun$apply$3.apply(python.scala:404)
> at 
> org.apache.spark.sql.execution.BatchPythonEvaluation$$anonfun$doExecute$1$$anonfun$apply$3.apply(python.scala:403)
> {code}
> Numpy types like np.int and np.float64 should automatically be cast to the 
> proper dtypes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21801) SparkR unit test randomly fail on trees

2017-08-22 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136445#comment-16136445
 ] 

Felix Cheung commented on SPARK-21801:
--

https://github.com/apache/spark/pull/19018

> SparkR unit test randomly fail on trees
> ---
>
> Key: SPARK-21801
> URL: https://issues.apache.org/jira/browse/SPARK-21801
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Tests
>Affects Versions: 2.2.0
>Reporter: Weichen Xu
>Priority: Critical
>
> SparkR unit test sometimes will randomly occur such error:
> ```
> 1. Error: spark.randomForest (@test_mllib_tree.R#236) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_87ea3065aeb2 should have at least two distinct values.
> ```
> or
> ```
> 1. Error: spark.decisionTree (@test_mllib_tree.R#353) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_d6a0b492cfa1 should have at least two distinct values.
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21805) disable R vignettes code on Windows

2017-08-22 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-21805:


 Summary: disable R vignettes code on Windows
 Key: SPARK-21805
 URL: https://issues.apache.org/jira/browse/SPARK-21805
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 2.2.0, 2.3.0
Reporter: Felix Cheung






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21584) Update R method for summary to call new implementation

2017-08-22 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-21584.
--
  Resolution: Fixed
   Fix Version/s: 2.3.0
Target Version/s: 2.3.0

> Update R method for summary to call new implementation
> --
>
> Key: SPARK-21584
> URL: https://issues.apache.org/jira/browse/SPARK-21584
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR, SQL
>Affects Versions: 2.3.0
>Reporter: Andrew Ray
>Assignee: Andrew Ray
> Fix For: 2.3.0
>
>
> Follow up to SPARK-21100



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21801) SparkR unit test randomly fail on trees

2017-08-22 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136370#comment-16136370
 ] 

Felix Cheung commented on SPARK-21801:
--

I've seen it a couple of times. Looking at the test now i think this is because 
the tests are not run with a random seed.

let me submit a PR to see if it addresses the failure

> SparkR unit test randomly fail on trees
> ---
>
> Key: SPARK-21801
> URL: https://issues.apache.org/jira/browse/SPARK-21801
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Tests
>Affects Versions: 2.2.0
>Reporter: Weichen Xu
>Priority: Critical
>
> SparkR unit test sometimes will randomly occur such error:
> ```
> 1. Error: spark.randomForest (@test_mllib_tree.R#236) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_87ea3065aeb2 should have at least two distinct values.
> ```
> or
> ```
> 1. Error: spark.decisionTree (@test_mllib_tree.R#353) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_d6a0b492cfa1 should have at least two distinct values.
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21693) AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests

2017-08-10 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121866#comment-16121866
 ] 

Felix Cheung commented on SPARK-21693:
--

splitting test matrix is also possible, I worry though since caching is 
disabled, then isn't Spark jar being built multiple times? My main concerns are 
how long tests will run and whether that will lengthen queuing of test runs 
(which could get quite long already and people are ignoring pending appveyor 
runs sometimes)

> AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests
> -
>
> Key: SPARK-21693
> URL: https://issues.apache.org/jira/browse/SPARK-21693
> Project: Spark
>  Issue Type: Test
>  Components: Build, SparkR
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>
> We finally sometimes reach the time limit, 1.5 hours, 
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master
> I requested to increase this from an hour to 1.5 hours before but it looks we 
> should fix this in AppVeyor. I asked this for my account few times before but 
> it looks we can't increase this time limit again and again.
> I could identify two things that look taking a quite a bit of time:
> 1. Disabled cache feature in pull request builder, which ends up downloading 
> Maven dependencies (10-20ish mins)
> https://www.appveyor.com/docs/build-cache/
> {quote}
> Note: Saving cache is disabled in Pull Request builds.
> {quote}
> and also see 
> http://help.appveyor.com/discussions/problems/4159-cache-doesnt-seem-to-be-working
> This seems difficult to fix within Spark.
> 2. "MLlib classification algorithms" tests (30-35ish mins)
> This test below looks taking 30-35ish mins.
> {code}
> MLlib classification algorithms, except for tree-based algorithms: Spark 
> package found in SPARK_HOME: C:\projects\spark\bin\..
> ..
> {code}
> As a (I think) last resort, we could make a matrix for this test alone, so 
> that we run the other tests after a build and then run this test after 
> another build, for example, I run Scala tests by this workaround - 
> https://ci.appveyor.com/project/spark-test/spark/build/757-20170716 (a matrix 
> with 7 build and test each).
> I am also checking and testing other ways.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21693) AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests

2017-08-10 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121860#comment-16121860
 ] 

Felix Cheung commented on SPARK-21693:
--

we could certainly simplify the classification set - but there's a fair number 
of API being tested in their, perhaps we could time them to see which ones are 
taking time.

> AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests
> -
>
> Key: SPARK-21693
> URL: https://issues.apache.org/jira/browse/SPARK-21693
> Project: Spark
>  Issue Type: Test
>  Components: Build, SparkR
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>
> We finally sometimes reach the time limit, 1.5 hours, 
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1676-master
> I requested to increase this from an hour to 1.5 hours before but it looks we 
> should fix this in AppVeyor. I asked this for my account few times before but 
> it looks we can't increase this time limit again and again.
> I could identify two things that look taking a quite a bit of time:
> 1. Disabled cache feature in pull request builder, which ends up downloading 
> Maven dependencies (10-20ish mins)
> https://www.appveyor.com/docs/build-cache/
> {quote}
> Note: Saving cache is disabled in Pull Request builds.
> {quote}
> and also see 
> http://help.appveyor.com/discussions/problems/4159-cache-doesnt-seem-to-be-working
> This seems difficult to fix within Spark.
> 2. "MLlib classification algorithms" tests (30-35ish mins)
> This test below looks taking 30-35ish mins.
> {code}
> MLlib classification algorithms, except for tree-based algorithms: Spark 
> package found in SPARK_HOME: C:\projects\spark\bin\..
> ..
> {code}
> As a (I think) last resort, we could make a matrix for this test alone, so 
> that we run the other tests after a build and then run this test after 
> another build, for example, I run Scala tests by this workaround - 
> https://ci.appveyor.com/project/spark-test/spark/build/757-20170716 (a matrix 
> with 7 build and test each).
> I am also checking and testing other ways.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21622) Support Offset in SparkR

2017-08-06 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-21622.
--
  Resolution: Fixed
Assignee: Wayne Zhang
   Fix Version/s: 2.3.0
Target Version/s: 2.3.0

> Support Offset in SparkR
> 
>
> Key: SPARK-21622
> URL: https://issues.apache.org/jira/browse/SPARK-21622
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Wayne Zhang
>Assignee: Wayne Zhang
> Fix For: 2.3.0
>
>
> Support offset in GLM in SparkR.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21622) Support Offset in SparkR

2017-08-06 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115951#comment-16115951
 ] 

Felix Cheung commented on SPARK-21622:
--

https://github.com/apache/spark/pull/18831

> Support Offset in SparkR
> 
>
> Key: SPARK-21622
> URL: https://issues.apache.org/jira/browse/SPARK-21622
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Wayne Zhang
>
> Support offset in GLM in SparkR.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15799) Release SparkR on CRAN

2017-08-03 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113974#comment-16113974
 ] 

Felix Cheung commented on SPARK-15799:
--

we submitted 2.2.0 release to CRAN and got some comment that we hope to resolve 
(or get an exception, if we could)..

> Release SparkR on CRAN
> --
>
> Key: SPARK-15799
> URL: https://issues.apache.org/jira/browse/SPARK-15799
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Xiangrui Meng
>
> Story: "As an R user, I would like to see SparkR released on CRAN, so I can 
> use SparkR easily in an existing R environment and have other packages built 
> on top of SparkR."
> I made this JIRA with the following questions in mind:
> * Are there known issues that prevent us releasing SparkR on CRAN?
> * Do we want to package Spark jars in the SparkR release?
> * Are there license issues?
> * How does it fit into Spark's release process?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21367) R older version of Roxygen2 on Jenkins

2017-08-03 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113145#comment-16113145
 ] 

Felix Cheung commented on SPARK-21367:
--

still seeing it

Warning messages:
1: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
2: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
* installing *source* package 'SparkR' ...

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80213/console

> R older version of Roxygen2 on Jenkins
> --
>
> Key: SPARK-21367
> URL: https://issues.apache.org/jira/browse/SPARK-21367
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: shane knapp
> Attachments: R.paks
>
>
> Getting this message from a recent build.
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console
> Warning messages:
> 1: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> 2: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> * installing *source* package 'SparkR' ...
> ** R
> We have been running with 5.0.1 and haven't changed for a year.
> NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21584) Update R method for summary to call new implementation

2017-08-02 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung reassigned SPARK-21584:


Assignee: Andrew Ray

> Update R method for summary to call new implementation
> --
>
> Key: SPARK-21584
> URL: https://issues.apache.org/jira/browse/SPARK-21584
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR, SQL
>Affects Versions: 2.3.0
>Reporter: Andrew Ray
>Assignee: Andrew Ray
>
> Follow up to SPARK-21100



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21616) SparkR 2.3.0 migration guide, release note

2017-08-02 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111483#comment-16111483
 ] 

Felix Cheung commented on SPARK-21616:
--

SPARK-21584

> SparkR 2.3.0 migration guide, release note
> --
>
> Key: SPARK-21616
> URL: https://issues.apache.org/jira/browse/SPARK-21616
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
> Fix For: 2.3.0
>
>
> From looking at changes since 2.2.0, this/these should be documented in the 
> migration guide / release note for the 2.3.0 release, as it is behavior 
> changes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21584) Update R method for summary to call new implementation

2017-08-02 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111485#comment-16111485
 ] 

Felix Cheung commented on SPARK-21584:
--

https://github.com/apache/spark/pull/18786

> Update R method for summary to call new implementation
> --
>
> Key: SPARK-21584
> URL: https://issues.apache.org/jira/browse/SPARK-21584
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR, SQL
>Affects Versions: 2.3.0
>Reporter: Andrew Ray
>
> Follow up to SPARK-21100



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21616) SparkR 2.3.0 migration guide, release note

2017-08-02 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-21616:
-
Fix Version/s: (was: 2.2.0)

> SparkR 2.3.0 migration guide, release note
> --
>
> Key: SPARK-21616
> URL: https://issues.apache.org/jira/browse/SPARK-21616
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
> Fix For: 2.3.0
>
>
> From looking at changes since 2.1.0, this/these should be documented in the 
> migration guide / release note for the 2.2.0 release, as it is behavior 
> changes
> https://github.com/apache/spark/commit/422aa67d1bb84f913b06e6d94615adb6557e2870
> https://github.com/apache/spark/pull/17483 (createExternalTable)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21616) SparkR 2.3.0 migration guide, release note

2017-08-02 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-21616:
-
Description: 
>From looking at changes since 2.2.0, this/these should be documented in the 
>migration guide / release note for the 2.3.0 release, as it is behavior changes


  was:
>From looking at changes since 2.1.0, this/these should be documented in the 
>migration guide / release note for the 2.2.0 release, as it is behavior changes

https://github.com/apache/spark/commit/422aa67d1bb84f913b06e6d94615adb6557e2870
https://github.com/apache/spark/pull/17483 (createExternalTable)


> SparkR 2.3.0 migration guide, release note
> --
>
> Key: SPARK-21616
> URL: https://issues.apache.org/jira/browse/SPARK-21616
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
> Fix For: 2.3.0
>
>
> From looking at changes since 2.2.0, this/these should be documented in the 
> migration guide / release note for the 2.3.0 release, as it is behavior 
> changes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21616) SparkR 2.3.0 migration guide, release note

2017-08-02 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-21616:
-
Affects Version/s: (was: 2.2.0)
   2.3.0

> SparkR 2.3.0 migration guide, release note
> --
>
> Key: SPARK-21616
> URL: https://issues.apache.org/jira/browse/SPARK-21616
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
> Fix For: 2.3.0
>
>
> From looking at changes since 2.1.0, this/these should be documented in the 
> migration guide / release note for the 2.2.0 release, as it is behavior 
> changes
> https://github.com/apache/spark/commit/422aa67d1bb84f913b06e6d94615adb6557e2870
> https://github.com/apache/spark/pull/17483 (createExternalTable)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21616) SparkR 2.3.0 migration guide, release note

2017-08-02 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-21616:
-
Summary: SparkR 2.3.0 migration guide, release note  (was: CLONE - SparkR 
2.2.0 migration guide, release note)

> SparkR 2.3.0 migration guide, release note
> --
>
> Key: SPARK-21616
> URL: https://issues.apache.org/jira/browse/SPARK-21616
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
> Fix For: 2.2.0, 2.3.0
>
>
> From looking at changes since 2.1.0, this/these should be documented in the 
> migration guide / release note for the 2.2.0 release, as it is behavior 
> changes
> https://github.com/apache/spark/commit/422aa67d1bb84f913b06e6d94615adb6557e2870
> https://github.com/apache/spark/pull/17483 (createExternalTable)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21616) SparkR 2.3.0 migration guide, release note

2017-08-02 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-21616:
-
Target Version/s: 2.3.0  (was: 2.2.0, 2.3.0)

> SparkR 2.3.0 migration guide, release note
> --
>
> Key: SPARK-21616
> URL: https://issues.apache.org/jira/browse/SPARK-21616
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
> Fix For: 2.2.0, 2.3.0
>
>
> From looking at changes since 2.1.0, this/these should be documented in the 
> migration guide / release note for the 2.2.0 release, as it is behavior 
> changes
> https://github.com/apache/spark/commit/422aa67d1bb84f913b06e6d94615adb6557e2870
> https://github.com/apache/spark/pull/17483 (createExternalTable)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21616) CLONE - SparkR 2.2.0 migration guide, release note

2017-08-02 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-21616:


 Summary: CLONE - SparkR 2.2.0 migration guide, release note
 Key: SPARK-21616
 URL: https://issues.apache.org/jira/browse/SPARK-21616
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, SparkR
Affects Versions: 2.2.0
Reporter: Felix Cheung
Assignee: Felix Cheung
 Fix For: 2.2.0, 2.3.0


>From looking at changes since 2.1.0, this/these should be documented in the 
>migration guide / release note for the 2.2.0 release, as it is behavior changes

https://github.com/apache/spark/commit/422aa67d1bb84f913b06e6d94615adb6557e2870
https://github.com/apache/spark/pull/17483 (createExternalTable)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21381) SparkR: pass on setHandleInvalid for classification algorithms

2017-07-31 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-21381.
--
  Resolution: Fixed
Assignee: Miao Wang
   Fix Version/s: 2.3.0
Target Version/s: 2.3.0

> SparkR: pass on setHandleInvalid for classification algorithms
> --
>
> Key: SPARK-21381
> URL: https://issues.apache.org/jira/browse/SPARK-21381
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.1.1
>Reporter: Miao Wang
>Assignee: Miao Wang
> Fix For: 2.3.0
>
>
> SPARK-20307 Added handleInvalid option to RFormula for tree-based 
> classification algorithms. We should add this parameter for other 
> classification algorithms in SparkR.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18226) SparkR displaying vector columns in incorrect way

2017-07-20 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094277#comment-16094277
 ] 

Felix Cheung commented on SPARK-18226:
--

I agree, usability is poor, though I think if you subset the collected 
data.frame, you should be able to operate the environment in the specific row 
and column individually...

> SparkR displaying vector columns in incorrect way
> -
>
> Key: SPARK-18226
> URL: https://issues.apache.org/jira/browse/SPARK-18226
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Grzegorz Chilkiewicz
>Priority: Trivial
>
> I have encountered a problem with SparkR presenting Spark vectors from 
> org.apache.spark.mllib.linalg package
> * `head(df)` shows in vector column: ""
> * cast to string does not work as expected, it shows: 
> "[1,null,null,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@79f50a91]"
> * `showDF(df)` work correctly
> to reproduce, start SparkR and paste following code (example taken from 
> https://spark.apache.org/docs/latest/sparkr.html#naive-bayes-model)
> {code}
> # Fit a Bernoulli naive Bayes model with spark.naiveBayes
> titanic <- as.data.frame(Titanic)
> titanicDF <- createDataFrame(titanic[titanic$Freq > 0, -5])
> nbDF <- titanicDF
> nbTestDF <- titanicDF
> nbModel <- spark.naiveBayes(nbDF, Survived ~ Class + Sex + Age)
> # Model summary
> summary(nbModel)
> # Prediction
> nbPredictions <- predict(nbModel, nbTestDF)
> #
> # My modification to expose the problem #
> nbPredictions$rawPrediction_str <- cast(nbPredictions$rawPrediction, "string")
> head(nbPredictions)
> showDF(nbPredictions)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18226) SparkR displaying vector columns in incorrect way

2017-07-19 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16093466#comment-16093466
 ] 

Felix Cheung commented on SPARK-18226:
--

if you collect on what's returned by predict(), you should be able to 
manipulate it in native R?

> SparkR displaying vector columns in incorrect way
> -
>
> Key: SPARK-18226
> URL: https://issues.apache.org/jira/browse/SPARK-18226
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Grzegorz Chilkiewicz
>Priority: Trivial
>
> I have encountered a problem with SparkR presenting Spark vectors from 
> org.apache.spark.mllib.linalg package
> * `head(df)` shows in vector column: ""
> * cast to string does not work as expected, it shows: 
> "[1,null,null,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@79f50a91]"
> * `showDF(df)` work correctly
> to reproduce, start SparkR and paste following code (example taken from 
> https://spark.apache.org/docs/latest/sparkr.html#naive-bayes-model)
> {code}
> # Fit a Bernoulli naive Bayes model with spark.naiveBayes
> titanic <- as.data.frame(Titanic)
> titanicDF <- createDataFrame(titanic[titanic$Freq > 0, -5])
> nbDF <- titanicDF
> nbTestDF <- titanicDF
> nbModel <- spark.naiveBayes(nbDF, Survived ~ Class + Sex + Age)
> # Model summary
> summary(nbModel)
> # Prediction
> nbPredictions <- predict(nbModel, nbTestDF)
> #
> # My modification to expose the problem #
> nbPredictions$rawPrediction_str <- cast(nbPredictions$rawPrediction, "string")
> head(nbPredictions)
> showDF(nbPredictions)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21450) List of NA is flattened inside a SparkR struct type

2017-07-17 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091132#comment-16091132
 ] 

Felix Cheung commented on SPARK-21450:
--

[~hyukjin.kwon] 

If you follow the code in test_sparkSQL.R, 
{code}
df <- as.DataFrame(list(list("col" = "{\"date\":\"21/10/2014\"}")))
  schema2 <- structType(structField("date", "date"))
  s <- collect(select(df, from_json(df$col, schema2)))
  expect_equal(s[[1]][[1]], NA)
  s <- collect(select(df, from_json(df$col, schema2, dateFormat = 
"dd/MM/")))
{code}

Both lines should be using schema2 - not schema. schema is actually defined as 
{code}
schema <- structType(structField("age", "integer"),
   structField("height", "double"))
{code}
 which doesn't match the json blob.

Is this a copy/paste error in this JIRA? could you check?

In any case, I wonder - didn't get to test it in Scala - if the different 
result is cause by unparseable json blob because schema/format passed in. The 
logi NA would be a null in Scala

> List of NA is flattened inside a SparkR struct type
> ---
>
> Key: SPARK-21450
> URL: https://issues.apache.org/jira/browse/SPARK-21450
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Hossein Falaki
>
> Consider the following two cases copied from {{test_sparkSQL.R}}:
> {code}
> df <- as.DataFrame(list(list("col" = "{\"date\":\"21/10/2014\"}")))
> schema <- structType(structField("date", "date"))
> s1 <- collect(select(df, from_json(df$col, schema)))
> s2 <- collect(select(df, from_json(df$col, schema2, dateFormat = 
> "dd/MM/")))
> {code}
> If you inspect s1 using {{str(s1)}} you will find:
> {code}
> 'data.frame': 2 obs. of  1 variable:
>  $ jsontostructs(col):List of 2
>   ..$ : logi NA
> {code}
> But for s2, running {{str(s2)}} results in:
> {code}
> 'data.frame': 2 obs. of  1 variable:
>  $ jsontostructs(col):List of 2
>   ..$ :List of 1
>   .. ..$ date: Date, format: "2014-10-21"
>   .. ..- attr(*, "class")= chr "struct"
> {code}
> I assume this is not intentional and is just a subtle bug. Do you think 
> otherwise? [~shivaram] and [~felixcheung]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21367) R older version of Roxygen2 on Jenkins

2017-07-11 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082692#comment-16082692
 ] 

Felix Cheung commented on SPARK-21367:
--

*SOMETIMES*? :)

> R older version of Roxygen2 on Jenkins
> --
>
> Key: SPARK-21367
> URL: https://issues.apache.org/jira/browse/SPARK-21367
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: shane knapp
>
> Getting this message from a recent build.
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console
> Warning messages:
> 1: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> 2: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> * installing *source* package 'SparkR' ...
> ** R
> We have been running with 5.0.1 and haven't changed for a year.
> NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21367) R older version of Roxygen2 on Jenkins

2017-07-11 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081720#comment-16081720
 ] 

Felix Cheung edited comment on SPARK-21367 at 7/11/17 6:28 AM:
---

I'm not sure exactly why yet, but comparing the working and non-working build

working:
{code}
First time using roxygen2 4.0. Upgrading automatically...
Writing SparkDataFrame.Rd
Writing printSchema.Rd
Writing schema.Rd
Writing explain.Rd
...
Warning messages:
1: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
2: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
* installing *source* package 'SparkR' ...
{code}

not working:
{code}
First time using roxygen2 4.0. Upgrading automatically...
There were 50 or more warnings (use warnings() to see the first 50)
* installing *source* package 'SparkR' ...
{code}

Bascially, the .Rd files are not getting created (because of warnings that are 
not captured)
That cause the CRAN check to fail with 
"checking for missing documentation entries ... WARNING
Undocumented code objects:
  '%<=>%' 'add_months' 'agg' 'approxCountDistinc"

(which would be to be expected)




was (Author: felixcheung):
I'm not sure exactly why yet, but comparing the working and non-working build

working:
{code}
First time using roxygen2 4.0. Upgrading automatically...
Writing SparkDataFrame.Rd
Writing printSchema.Rd
Writing schema.Rd
Writing explain.Rd
...
Warning messages:
1: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
2: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
* installing *source* package 'SparkR' ...
{code}

not working:
{code}
First time using roxygen2 4.0. Upgrading automatically...
There were 50 or more warnings (use warnings() to see the first 50)
* installing *source* package 'SparkR' ...
{code}

Bascially, the .Rd files are not getting created (because of warnings that are 
not captured)
That cause the CRAN check to fail with 
"checking for missing documentation entries ... WARNING
Undocumented code objects:
  '%<=>%' 'add_months' 'agg' 'approxCountDistinc"
(which would be to be expected)

As explained in the description above, I"m pretty sure these are not in the 
build a while ago
"
Warning messages:
1: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
"

> R older version of Roxygen2 on Jenkins
> --
>
> Key: SPARK-21367
> URL: https://issues.apache.org/jira/browse/SPARK-21367
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: shane knapp
>
> Getting this message from a recent build.
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console
> Warning messages:
> 1: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> 2: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> * installing *source* package 'SparkR' ...
> ** R
> We have been running with 5.0.1 and haven't changed for a year.
> NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21367) R older version of Roxygen2 on Jenkins

2017-07-11 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081744#comment-16081744
 ] 

Felix Cheung commented on SPARK-21367:
--

I think I found the first error, it's one build before the build failures 
listed above, 79470

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79470/console

{code}
Updating roxygen version in  
/home/jenkins/workspace/SparkPullRequestBuilder/R/pkg/DESCRIPTION 
Deleting AFTSurvivalRegressionModel-class.Rd
Deleting ALSModel-class.Rd
...
There were 50 or more warnings (use warnings() to see the first 50)
{code}

> R older version of Roxygen2 on Jenkins
> --
>
> Key: SPARK-21367
> URL: https://issues.apache.org/jira/browse/SPARK-21367
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: shane knapp
>
> Getting this message from a recent build.
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console
> Warning messages:
> 1: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> 2: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> * installing *source* package 'SparkR' ...
> ** R
> We have been running with 5.0.1 and haven't changed for a year.
> NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21367) R older version of Roxygen2 on Jenkins

2017-07-11 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081720#comment-16081720
 ] 

Felix Cheung edited comment on SPARK-21367 at 7/11/17 6:33 AM:
---

I'm not sure exactly why yet, but comparing the working and non-working build

working:
{code}
First time using roxygen2 4.0. Upgrading automatically...
Writing SparkDataFrame.Rd
Writing printSchema.Rd
Writing schema.Rd
Writing explain.Rd
...
Warning messages:
1: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
2: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
* installing *source* package 'SparkR' ...
{code}

not working:
{code}
First time using roxygen2 4.0. Upgrading automatically...
There were 50 or more warnings (use warnings() to see the first 50)
* installing *source* package 'SparkR' ...
{code}

Bascially, the .Rd files are not getting created (because of warnings that are 
not captured)
That cause the CRAN check to fail with 
"checking for missing documentation entries ... WARNING
Undocumented code objects:
  '%<=>%' 'add_months' 'agg' 'approxCountDistinc"

(which would be completely expected - without Rd files it will not have the 
documentation hence the check will fail)




was (Author: felixcheung):
I'm not sure exactly why yet, but comparing the working and non-working build

working:
{code}
First time using roxygen2 4.0. Upgrading automatically...
Writing SparkDataFrame.Rd
Writing printSchema.Rd
Writing schema.Rd
Writing explain.Rd
...
Warning messages:
1: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
2: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
* installing *source* package 'SparkR' ...
{code}

not working:
{code}
First time using roxygen2 4.0. Upgrading automatically...
There were 50 or more warnings (use warnings() to see the first 50)
* installing *source* package 'SparkR' ...
{code}

Bascially, the .Rd files are not getting created (because of warnings that are 
not captured)
That cause the CRAN check to fail with 
"checking for missing documentation entries ... WARNING
Undocumented code objects:
  '%<=>%' 'add_months' 'agg' 'approxCountDistinc"

(which would be to be expected)



> R older version of Roxygen2 on Jenkins
> --
>
> Key: SPARK-21367
> URL: https://issues.apache.org/jira/browse/SPARK-21367
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: shane knapp
>
> Getting this message from a recent build.
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console
> Warning messages:
> 1: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> 2: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> * installing *source* package 'SparkR' ...
> ** R
> We have been running with 5.0.1 and haven't changed for a year.
> NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21367) R older version of Roxygen2 on Jenkins

2017-07-11 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081744#comment-16081744
 ] 

Felix Cheung edited comment on SPARK-21367 at 7/11/17 6:26 AM:
---

I think I found the first error, it's one build before the build failures 
listed above, 79470

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79470/console

{code}
Updating roxygen version in  
/home/jenkins/workspace/SparkPullRequestBuilder/R/pkg/DESCRIPTION 
Deleting AFTSurvivalRegressionModel-class.Rd
Deleting ALSModel-class.Rd
...
There were 50 or more warnings (use warnings() to see the first 50)
{code}

Whereas this build from mid June
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/SparkPullRequestBuilder/78020/console



was (Author: felixcheung):
I think I found the first error, it's one build before the build failures 
listed above, 79470

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79470/console

{code}
Updating roxygen version in  
/home/jenkins/workspace/SparkPullRequestBuilder/R/pkg/DESCRIPTION 
Deleting AFTSurvivalRegressionModel-class.Rd
Deleting ALSModel-class.Rd
...
There were 50 or more warnings (use warnings() to see the first 50)
{code}

> R older version of Roxygen2 on Jenkins
> --
>
> Key: SPARK-21367
> URL: https://issues.apache.org/jira/browse/SPARK-21367
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: shane knapp
>
> Getting this message from a recent build.
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console
> Warning messages:
> 1: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> 2: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> * installing *source* package 'SparkR' ...
> ** R
> We have been running with 5.0.1 and haven't changed for a year.
> NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21367) R older version of Roxygen2 on Jenkins

2017-07-11 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081744#comment-16081744
 ] 

Felix Cheung edited comment on SPARK-21367 at 7/11/17 6:26 AM:
---

I think I found the first error, it's one build before the build failures 
listed above, 79470

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79470/console

{code}
Updating roxygen version in  
/home/jenkins/workspace/SparkPullRequestBuilder/R/pkg/DESCRIPTION 
Deleting AFTSurvivalRegressionModel-class.Rd
Deleting ALSModel-class.Rd
...
There were 50 or more warnings (use warnings() to see the first 50)
{code}

Whereas this build from mid June
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/SparkPullRequestBuilder/78020/console

Does NOT have this "Need roxygen2 >= 5.0.0 but loaded version is 4.1.1" message 
in the console output


was (Author: felixcheung):
I think I found the first error, it's one build before the build failures 
listed above, 79470

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79470/console

{code}
Updating roxygen version in  
/home/jenkins/workspace/SparkPullRequestBuilder/R/pkg/DESCRIPTION 
Deleting AFTSurvivalRegressionModel-class.Rd
Deleting ALSModel-class.Rd
...
There were 50 or more warnings (use warnings() to see the first 50)
{code}

Whereas this build from mid June
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/SparkPullRequestBuilder/78020/console


> R older version of Roxygen2 on Jenkins
> --
>
> Key: SPARK-21367
> URL: https://issues.apache.org/jira/browse/SPARK-21367
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: shane knapp
>
> Getting this message from a recent build.
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console
> Warning messages:
> 1: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> 2: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> * installing *source* package 'SparkR' ...
> ** R
> We have been running with 5.0.1 and haven't changed for a year.
> NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-21367) R older version of Roxygen2 on Jenkins

2017-07-11 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-21367:
-
Comment: was deleted

(was: it looks like instead of 5.x, the older 4.0 is being loaded?

First time using roxygen2 4.0. Upgrading automatically...

)

> R older version of Roxygen2 on Jenkins
> --
>
> Key: SPARK-21367
> URL: https://issues.apache.org/jira/browse/SPARK-21367
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: shane knapp
>
> Getting this message from a recent build.
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console
> Warning messages:
> 1: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> 2: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> * installing *source* package 'SparkR' ...
> ** R
> We have been running with 5.0.1 and haven't changed for a year.
> NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21367) R older version of Roxygen2 on Jenkins

2017-07-11 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081720#comment-16081720
 ] 

Felix Cheung edited comment on SPARK-21367 at 7/11/17 6:06 AM:
---

I'm not sure exactly why yet, but comparing the working and non-working build

working:
{code}
First time using roxygen2 4.0. Upgrading automatically...
Writing SparkDataFrame.Rd
Writing printSchema.Rd
Writing schema.Rd
Writing explain.Rd
...
Warning messages:
1: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
2: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
* installing *source* package 'SparkR' ...
{code}

not working:
{code}
First time using roxygen2 4.0. Upgrading automatically...
There were 50 or more warnings (use warnings() to see the first 50)
* installing *source* package 'SparkR' ...
{code}

Bascially, the .Rd files are not getting created (because of warnings that are 
not captured)
That cause the CRAN check to fail with 
"checking for missing documentation entries ... WARNING
Undocumented code objects:
  '%<=>%' 'add_months' 'agg' 'approxCountDistinc"
(which would be to be expected)

As explained in the description above, I"m pretty sure these are not in the 
build a while ago
"
Warning messages:
1: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
"


was (Author: felixcheung):
I'm not sure exactly why yet, but comparing the working and non-working buid

working:
{code}
First time using roxygen2 4.0. Upgrading automatically...
Writing SparkDataFrame.Rd
Writing printSchema.Rd
Writing schema.Rd
Writing explain.Rd
...
Warning messages:
1: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
2: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
* installing *source* package 'SparkR' ...
{code}

not working:
{code}
First time using roxygen2 4.0. Upgrading automatically...
There were 50 or more warnings (use warnings() to see the first 50)
* installing *source* package 'SparkR' ...
{code}

Bascially, the .Rd files are not getting created (because of warnings that are 
not captured)
That cause the CRAN check to fail with 
"checking for missing documentation entries ... WARNING
Undocumented code objects:
  '%<=>%' 'add_months' 'agg' 'approxCountDistinc"
(which would be to be expected)

As explained in the description above, I"m pretty sure these are not in the 
build a while ago
"
Warning messages:
1: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
"

> R older version of Roxygen2 on Jenkins
> --
>
> Key: SPARK-21367
> URL: https://issues.apache.org/jira/browse/SPARK-21367
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: shane knapp
>
> Getting this message from a recent build.
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console
> Warning messages:
> 1: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> 2: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> * installing *source* package 'SparkR' ...
> ** R
> We have been running with 5.0.1 and haven't changed for a year.
> NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21367) R older version of Roxygen2 on Jenkins

2017-07-11 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081723#comment-16081723
 ] 

Felix Cheung commented on SPARK-21367:
--

And I'm pretty sure we should build with Roxygen2 5.0.1

https://github.com/apache/spark/blob/master/R/pkg/DESCRIPTION#L60
RoxygenNote: 5.0.1

> R older version of Roxygen2 on Jenkins
> --
>
> Key: SPARK-21367
> URL: https://issues.apache.org/jira/browse/SPARK-21367
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: shane knapp
>
> Getting this message from a recent build.
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console
> Warning messages:
> 1: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> 2: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> * installing *source* package 'SparkR' ...
> ** R
> We have been running with 5.0.1 and haven't changed for a year.
> NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21367) R older version of Roxygen2 on Jenkins

2017-07-11 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081720#comment-16081720
 ] 

Felix Cheung edited comment on SPARK-21367 at 7/11/17 6:02 AM:
---

I'm not sure exactly why yet, but comparing the working and non-working buid

working:
{code}
First time using roxygen2 4.0. Upgrading automatically...
Writing SparkDataFrame.Rd
Writing printSchema.Rd
Writing schema.Rd
Writing explain.Rd
...
Warning messages:
1: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
2: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
* installing *source* package 'SparkR' ...
{code}

not working:
{code}
First time using roxygen2 4.0. Upgrading automatically...
There were 50 or more warnings (use warnings() to see the first 50)
* installing *source* package 'SparkR' ...
{code}

Bascially, the .Rd files are not getting created (because of warnings that are 
not captured)
That cause the CRAN check to fail with 
"checking for missing documentation entries ... WARNING
Undocumented code objects:
  '%<=>%' 'add_months' 'agg' 'approxCountDistinc"
(which would be to be expected)

As explained in the description above, I"m pretty sure these are not in the 
build a while ago
"
Warning messages:
1: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
"


was (Author: felixcheung):
I'm not sure exactly why yet, but comparing the working and non-working buid

working:
First time using roxygen2 4.0. Upgrading automatically...
Writing SparkDataFrame.Rd
Writing printSchema.Rd
Writing schema.Rd
Writing explain.Rd
...
Warning messages:
1: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
2: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
* installing *source* package 'SparkR' ...

not working:
First time using roxygen2 4.0. Upgrading automatically...
There were 50 or more warnings (use warnings() to see the first 50)
* installing *source* package 'SparkR' ...

Bascially, the .Rd files are not getting created (because of warnings that are 
not captured)
That cause the CRAN check to fail with 
"checking for missing documentation entries ... WARNING
Undocumented code objects:
  '%<=>%' 'add_months' 'agg' 'approxCountDistinc"
(which would be to be expected)

As explained in the description above, I"m pretty sure these are not in the 
build a while ago
"
Warning messages:
1: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
"

> R older version of Roxygen2 on Jenkins
> --
>
> Key: SPARK-21367
> URL: https://issues.apache.org/jira/browse/SPARK-21367
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: shane knapp
>
> Getting this message from a recent build.
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console
> Warning messages:
> 1: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> 2: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> * installing *source* package 'SparkR' ...
> ** R
> We have been running with 5.0.1 and haven't changed for a year.
> NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21367) R older version of Roxygen2 on Jenkins

2017-07-11 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081720#comment-16081720
 ] 

Felix Cheung commented on SPARK-21367:
--

I'm not sure exactly why yet, but comparing the working and non-working buid

working:
First time using roxygen2 4.0. Upgrading automatically...
Writing SparkDataFrame.Rd
Writing printSchema.Rd
Writing schema.Rd
Writing explain.Rd
...
Warning messages:
1: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
2: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
* installing *source* package 'SparkR' ...

not working:
First time using roxygen2 4.0. Upgrading automatically...
There were 50 or more warnings (use warnings() to see the first 50)
* installing *source* package 'SparkR' ...

Bascially, the .Rd files are not getting created (because of warnings that are 
not captured)
That cause the CRAN check to fail with 
"checking for missing documentation entries ... WARNING
Undocumented code objects:
  '%<=>%' 'add_months' 'agg' 'approxCountDistinc"
(which would be to be expected)

As explained in the description above, I"m pretty sure these are not in the 
build a while ago
"
Warning messages:
1: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
"

> R older version of Roxygen2 on Jenkins
> --
>
> Key: SPARK-21367
> URL: https://issues.apache.org/jira/browse/SPARK-21367
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: shane knapp
>
> Getting this message from a recent build.
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console
> Warning messages:
> 1: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> 2: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> * installing *source* package 'SparkR' ...
> ** R
> We have been running with 5.0.1 and haven't changed for a year.
> NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21367) R older version of Roxygen2 on Jenkins

2017-07-10 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081712#comment-16081712
 ] 

Felix Cheung commented on SPARK-21367:
--

it looks like instead of 5.x, the older 4.0 is being loaded?

First time using roxygen2 4.0. Upgrading automatically...



> R older version of Roxygen2 on Jenkins
> --
>
> Key: SPARK-21367
> URL: https://issues.apache.org/jira/browse/SPARK-21367
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: shane knapp
>
> Getting this message from a recent build.
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console
> Warning messages:
> 1: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> 2: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> * installing *source* package 'SparkR' ...
> ** R
> We have been running with 5.0.1 and haven't changed for a year.
> NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21367) R older version of Roxygen2 on Jenkins

2017-07-10 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-21367:
-
Description: 
Getting this message from a recent build.

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console
Warning messages:
1: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
2: In check_dep_version(pkg, version, compare) :
  Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
* installing *source* package 'SparkR' ...
** R

We have been running with 5.0.1 and haven't changed for a year.
NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet.

> R older version of Roxygen2 on Jenkins
> --
>
> Key: SPARK-21367
> URL: https://issues.apache.org/jira/browse/SPARK-21367
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>
> Getting this message from a recent build.
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console
> Warning messages:
> 1: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> 2: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> * installing *source* package 'SparkR' ...
> ** R
> We have been running with 5.0.1 and haven't changed for a year.
> NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21367) R older version of Roxygen2 on Jenkins

2017-07-10 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080728#comment-16080728
 ] 

Felix Cheung edited comment on SPARK-21367 at 7/10/17 5:45 PM:
---

[~shaneknapp]
could you check? thanks!


was (Author: felixcheung):
~shane knapp
could you check? thanks!

> R older version of Roxygen2 on Jenkins
> --
>
> Key: SPARK-21367
> URL: https://issues.apache.org/jira/browse/SPARK-21367
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>
> Getting this message from a recent build.
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console
> Warning messages:
> 1: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> 2: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> * installing *source* package 'SparkR' ...
> ** R
> We have been running with 5.0.1 and haven't changed for a year.
> NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21367) R older version of Roxygen2 on Jenkins

2017-07-10 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080728#comment-16080728
 ] 

Felix Cheung commented on SPARK-21367:
--

@shane knapp
could you check? thanks!

> R older version of Roxygen2 on Jenkins
> --
>
> Key: SPARK-21367
> URL: https://issues.apache.org/jira/browse/SPARK-21367
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>
> Getting this message from a recent build.
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console
> Warning messages:
> 1: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> 2: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> * installing *source* package 'SparkR' ...
> ** R
> We have been running with 5.0.1 and haven't changed for a year.
> NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21367) R older version of Roxygen2 on Jenkins

2017-07-10 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080728#comment-16080728
 ] 

Felix Cheung edited comment on SPARK-21367 at 7/10/17 5:44 PM:
---

~shane knapp
could you check? thanks!


was (Author: felixcheung):
@shane knapp
could you check? thanks!

> R older version of Roxygen2 on Jenkins
> --
>
> Key: SPARK-21367
> URL: https://issues.apache.org/jira/browse/SPARK-21367
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>
> Getting this message from a recent build.
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console
> Warning messages:
> 1: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> 2: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> * installing *source* package 'SparkR' ...
> ** R
> We have been running with 5.0.1 and haven't changed for a year.
> NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21367) R older version of Roxygen2 on Jenkins

2017-07-10 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-21367:


 Summary: R older version of Roxygen2 on Jenkins
 Key: SPARK-21367
 URL: https://issues.apache.org/jira/browse/SPARK-21367
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 2.3.0
Reporter: Felix Cheung






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21266) Support schema a DDL-formatted string in dapply/gapply/from_json

2017-07-10 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-21266.
--
  Resolution: Fixed
Assignee: Hyukjin Kwon
   Fix Version/s: 2.3.0
Target Version/s: 2.3.0

> Support schema a DDL-formatted string in dapply/gapply/from_json
> 
>
> Key: SPARK-21266
> URL: https://issues.apache.org/jira/browse/SPARK-21266
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SparkR
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.3.0
>
>
> A DDL-formatted string is now supported in schema API in dataframe 
> reader/writer across other language APIs. 
> {{from_json}} in R/Python look not supporting this.
> Also, It could be done in other commonly used APIs too in R specifically - 
> {{dapply}}/{{gapply}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19568) Must include class/method documentation for CRAN check

2017-07-08 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079325#comment-16079325
 ] 

Felix Cheung commented on SPARK-19568:
--

separate from CRAN task

> Must include class/method documentation for CRAN check
> --
>
> Key: SPARK-19568
> URL: https://issues.apache.org/jira/browse/SPARK-19568
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>
> While tests are running, R CMD check --as-cran is still complaining
> {code}
> * checking for missing documentation entries ... WARNING
> Undocumented code objects:
>   ‘add_months’ ‘agg’ ‘approxCountDistinct’ ‘approxQuantile’ ‘arrange’
>   ‘array_contains’ ‘as.DataFrame’ ‘as.data.frame’ ‘asc’ ‘ascii’ ‘avg’
>   ‘base64’ ‘between’ ‘bin’ ‘bitwiseNOT’ ‘bround’ ‘cache’ ‘cacheTable’
>   ‘cancelJobGroup’ ‘cast’ ‘cbrt’ ‘ceil’ ‘clearCache’ ‘clearJobGroup’
>   ‘collect’ ‘colnames’ ‘colnames<-’ ‘coltypes’ ‘coltypes<-’ ‘column’
>   ‘columns’ ‘concat’ ‘concat_ws’ ‘contains’ ‘conv’ ‘corr’ ‘count’
>   ‘countDistinct’ ‘cov’ ‘covar_pop’ ‘covar_samp’ ‘crc32’
>   ‘createDataFrame’ ‘createExternalTable’ ‘createOrReplaceTempView’
>   ‘crossJoin’ ‘crosstab’ ‘cume_dist’ ‘dapply’ ‘dapplyCollect’
>   ‘date_add’ ‘date_format’ ‘date_sub’ ‘datediff’ ‘dayofmonth’
>   ‘dayofyear’ ‘decode’ ‘dense_rank’ ‘desc’ ‘describe’ ‘distinct’ ‘drop’
> ...
> {code}
> This is because of lack of .Rd files in a clean environment when running 
> against the content of the R source package.
> I think we need to generate the .Rd files under man\ when building the 
> release and then package with them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19568) Must include class/method documentation for CRAN check

2017-07-08 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-19568:
-
Issue Type: Bug  (was: Sub-task)
Parent: (was: SPARK-15799)

> Must include class/method documentation for CRAN check
> --
>
> Key: SPARK-19568
> URL: https://issues.apache.org/jira/browse/SPARK-19568
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>
> While tests are running, R CMD check --as-cran is still complaining
> {code}
> * checking for missing documentation entries ... WARNING
> Undocumented code objects:
>   ‘add_months’ ‘agg’ ‘approxCountDistinct’ ‘approxQuantile’ ‘arrange’
>   ‘array_contains’ ‘as.DataFrame’ ‘as.data.frame’ ‘asc’ ‘ascii’ ‘avg’
>   ‘base64’ ‘between’ ‘bin’ ‘bitwiseNOT’ ‘bround’ ‘cache’ ‘cacheTable’
>   ‘cancelJobGroup’ ‘cast’ ‘cbrt’ ‘ceil’ ‘clearCache’ ‘clearJobGroup’
>   ‘collect’ ‘colnames’ ‘colnames<-’ ‘coltypes’ ‘coltypes<-’ ‘column’
>   ‘columns’ ‘concat’ ‘concat_ws’ ‘contains’ ‘conv’ ‘corr’ ‘count’
>   ‘countDistinct’ ‘cov’ ‘covar_pop’ ‘covar_samp’ ‘crc32’
>   ‘createDataFrame’ ‘createExternalTable’ ‘createOrReplaceTempView’
>   ‘crossJoin’ ‘crosstab’ ‘cume_dist’ ‘dapply’ ‘dapplyCollect’
>   ‘date_add’ ‘date_format’ ‘date_sub’ ‘datediff’ ‘dayofmonth’
>   ‘dayofyear’ ‘decode’ ‘dense_rank’ ‘desc’ ‘describe’ ‘distinct’ ‘drop’
> ...
> {code}
> This is because of lack of .Rd files in a clean environment when running 
> against the content of the R source package.
> I think we need to generate the .Rd files under man\ when building the 
> release and then package with them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21290) R document Programmatically Specifying the Schema in SQL guide

2017-07-08 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-21290:
-
Target Version/s: 2.3.0

> R document Programmatically Specifying the Schema in SQL guide
> --
>
> Key: SPARK-21290
> URL: https://issues.apache.org/jira/browse/SPARK-21290
> Project: Spark
>  Issue Type: Documentation
>  Components: SparkR, SQL
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21290) R document Programmatically Specifying the Schema in SQL guide

2017-07-08 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079323#comment-16079323
 ] 

Felix Cheung commented on SPARK-21290:
--

with some changes to schema string support on the way, let's target 2.3

> R document Programmatically Specifying the Schema in SQL guide
> --
>
> Key: SPARK-21290
> URL: https://issues.apache.org/jira/browse/SPARK-21290
> Project: Spark
>  Issue Type: Documentation
>  Components: SparkR, SQL
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21093) Multiple gapply execution occasionally failed in SparkR

2017-07-08 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-21093.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

> Multiple gapply execution occasionally failed in SparkR 
> 
>
> Key: SPARK-21093
> URL: https://issues.apache.org/jira/browse/SPARK-21093
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.1, 2.2.0
> Environment: CentOS 7.2.1511 / R 3.4.0, CentOS 7.2.1511 / R 3.3.3
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Critical
> Fix For: 2.3.0
>
>
> On Centos 7.2.1511 with R 3.4.0/3.3.0, multiple execution of {{gapply}} looks 
> failed as below:
> {code}
>  Welcome to
>   __
>/ __/__  ___ _/ /__
>   _\ \/ _ \/ _ `/ __/  '_/
>  /___/ .__/\_,_/_/ /_/\_\   version  2.3.0-SNAPSHOT
> /_/
>  SparkSession available as 'spark'.
> > df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d"))
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
> 17/06/14 18:21:01 WARN Utils: Truncated the string representation of a plan 
> since it was too large. This behavior can be adjusted by setting 
> 'spark.debug.maxToStringFields' in SparkEnv.conf.
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
> Error in handleErrors(returnStatus, conn) :
>   org.apache.spark.SparkException: Job aborted due to stage failure: Task 98 
> in stage 14.0 failed 1 times, most recent failure: Lost task 98.0 in stage 
> 14.0 (TID 1305, localhost, executor driver): org.apache.spark.SparkException: 
> R computation failed with
> at org.apache.spark.api.r.RRunner.compute(RRunner.scala:108)
> at 
> org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$13.apply(objects.scala:432)
> at 
> org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$13.apply(objects.scala:414)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.a
> ...
> *** buffer overflow detected ***: /usr/lib64/R/bin/exec/R terminated
> === Backtrace: =
> /lib64/libc.so.6(__fortify_fail+0x37)[0x7fe699b3f597]
> /lib64/libc.so.6(+0x10c750)[0x7fe699b3d750]
> /lib64/libc.so.6(+0x10e507)[0x7fe699b3f507]
> /usr/lib64/R/modules//internet.so(+0x6015)[0x7fe689bb7015]
> /usr/lib64/R/modules//internet.so(+0xe81e)[0x7fe689bbf81e]
> /usr/lib64/R/lib/libR.so(+0xbd1b6)[0x7fe69c54a1b6]
> /usr/lib64/R/lib/libR.so(+0x1104d0)[0x7fe69c59d4d0]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x354)[0x7fe69c5ad2f4]
> /usr/lib64/R/lib/libR.so(+0x123f8e)[0x7fe69c5b0f8e]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x589)[0x7fe69c5ad529]
> /usr/lib64/R/lib/libR.so(+0x1254ce)[0x7fe69c5b24ce]
> /usr/lib64/R/lib/libR.so(+0x1104d0)[0x7fe69c59d4d0]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af]
> /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x120a7e)[0x7fe69c5ada7e]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x817)[0x7fe69c5ad7b7]
> /usr/lib64/R/lib/libR.so(+0x1256d1)[0x7fe69c5b26d1]
> /usr/lib64/R/lib/libR.so(+0x1552e9)[0x7fe69c5e22e9]
> /usr/lib64/R/lib/libR.so(+0x11062a)[0x7fe69c59d62a]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af]
> /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af]
> /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x120a7e)[0x7fe69c5ada7e]
> 

[jira] [Assigned] (SPARK-20456) Add examples for functions collection for pyspark

2017-07-08 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung reassigned SPARK-20456:


Assignee: Michael Patterson

> Add examples for functions collection for pyspark
> -
>
> Key: SPARK-20456
> URL: https://issues.apache.org/jira/browse/SPARK-20456
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 2.1.0
>Reporter: Michael Patterson
>Assignee: Michael Patterson
>Priority: Minor
> Fix For: 2.3.0
>
>
> Document sql.functions.py:
> 1. Add examples for the common string functions (upper, lower, and reverse)
> 2. Rename columns in datetime examples to be more informative (e.g. from 'd' 
> to 'date')
> 3. Add examples for unix_timestamp, from_unixtime, rand, randn, collect_list, 
> collect_set, lit, 
> 4. Add note to all trigonometry functions that units are radians.
> 5. Add links between functions, (e.g. add link to radians from toRadians)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20456) Add examples for functions collection for pyspark

2017-07-08 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-20456.
--
  Resolution: Fixed
   Fix Version/s: 2.3.0
Target Version/s: 2.3.0

> Add examples for functions collection for pyspark
> -
>
> Key: SPARK-20456
> URL: https://issues.apache.org/jira/browse/SPARK-20456
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 2.1.0
>Reporter: Michael Patterson
>Assignee: Michael Patterson
>Priority: Minor
> Fix For: 2.3.0
>
>
> Document sql.functions.py:
> 1. Add examples for the common string functions (upper, lower, and reverse)
> 2. Rename columns in datetime examples to be more informative (e.g. from 'd' 
> to 'date')
> 3. Add examples for unix_timestamp, from_unixtime, rand, randn, collect_list, 
> collect_set, lit, 
> 4. Add note to all trigonometry functions that units are radians.
> 5. Add links between functions, (e.g. add link to radians from toRadians)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >