Re: [Rd] Function 'factor' issues

2018-03-23 Thread Suharto Anggono Suharto Anggono via R-devel
I am trying once again.

By just changing
f <- match(xlevs[f], nlevs)
to
f <- match(xlevs, nlevs)[f]
, function 'factor' in R devel could be made more consistent and 
back-compatible. Why not picking it?


On Sat, 25/11/17, Suharto Anggono Suharto Anggono  
wrote:

 Subject: Re: [Rd] Function 'factor' issues
 To: r-devel@r-project.org
 Date: Saturday, 25 November, 2017, 6:03 PM

>From commits to R devel, I saw attempts to speed up subsetting and 'match', 
>and to cache results of conversion of small nonnegative integers to character 
>string. That's good.

I am sorry for pushing, still.

Is the partial new behavior of function 'factor' with respect to NA really 
worthy?

match(xlevs, nlevs)[f]  looks nice, too.

- Using
f <- match(xlevs, nlevs)[f]
instead of
f <- match(xlevs[f], nlevs)
for remapping
- Remapping only if length(nlevs) differs from length(xlevs)
Applying changes similar to above to function 'levels<-.factor' will not change 
'levels<-.factor' result at all. So, the corresponding part of functions 
'factor' and 'levels<-.factor' can be kept in sync.


On Sun, 22/10/17, Suharto Anggono Suharto Anggono  
wrote:

Subject: Re: [Rd] Function 'factor' issues
To: r-devel@r-project.org
Date: Sunday, 22 October, 2017, 6:43 AM

My idea (like in https://bugs.r-project.org/bugzilla/attachment.cgi?id=1540 ):
- For remapping, use
f <- match(xlevs, nlevs)[f]
instead of
f <- match(xlevs[f], nlevs)
(I have mentioned it).
- Remap only if length(nlevs) differs from length(xlevs) .


[snip]


On Wed, 18/10/17, Martin Maechler  wrote:

Subject: Re: [Rd] Function 'factor' issues

Cc: r-devel@r-project.org
Date: Wednesday, 18 October, 2017, 11:54 PM

> Suharto Anggono Suharto Anggono via R-devel 
>    on Sun, 15 Oct 2017 16:03:48 + writes:


    > In R devel, function 'factor' has been changed, allowing and merging 
duplicated 'labels'.

Indeed.  That had been asked for and discussed a bit on this
list from June 14 to June 23, starting at
  https://stat.ethz.ch/pipermail/r-devel/2017-June/074451.html

    > Issue 1: Handling of specified 'labels' without duplicates is slower than 
before.
    > Example:
    > x <- rep(1:26, 4)
    > system.time(factor(x, levels=1:26, labels=letters))

    > Function 'factor' is already rather slow because of conversion to 
character. Please don't add slowdown.

Indeed, I doo see a ~ 20%  performance loss for the example
above, and I may get to look into this.
However, in R-devel there have been important internal
changes (ALTREP additions) some of which are currently giving
some performance losses in some cases (but they have the
potential to give big performance _gains_ e.g. for simple
indexing into large vectors which may apply here !).
For factor(), these C level "ALTREP" changes may not be the reason at
all for the slow down;
I may find time to investigate further.

{{ For the ALTREP-change slowdowns I've noticed in some
  indexing/subset operations, we'll definitely have time to look into
  before R-devel is going to be released next spring... and as mentioned,
  these operations may even become considerably faster *thanks*
  to ALTREP ... }}

    > Issue 2: While default 'labels' is 'levels', not specifying 'labels' may 
be different from specifying 'labels' to be the same as 'levels'.

    > Example 1:
    > as.integer(factor(c(NA,2,3), levels = c(2, NA), exclude = NULL))
    > is different from
    > as.integer(factor(c(NA,2,3), levels = c(2, NA), labels = c(2, NA), 
exclude = NULL))

You are right.  But this is not so exceptional and part of the new feature of
'labels' allowing to "fix up" things in such cases.  While it
would be nice if this was not the case the same phenomenon
happens in other functions as well because of lazy evaluation.
I think I had noticed that already and at the time found
"not easy" to work around.
(There are many aspects about changing such important base functions:
1. not breaking back compatibility ((unless in rare
    border cases, where we are sure it's worth))
2. Keeping code relatively transparent
3. Keep the semantics "simple" to document and as intuitive as possible
)

    > File reg-tests-1d.R indicates that 'factor' behavior with NA is slightly 
changed, for the better. NA entry (because it is unmatched to 'levels' argument 
or is in 'exclude') is absorbed into NA in "levels" attribute (comes from 
'labels' argument), if any. The issue is that it happens only when 'labels' is 
specified.

I'm not sure anymore, but I think I had noticed that also in
June, considered to change it and found that such a changed
factor() would be too different from what it has "always been".
So, yes, IIRC, this current behavior is on purpose, if only for back 
compatibility.


    > Function 'factor' could use 

Re: [Rd] aggregate() naming -- bug or feature

2018-03-23 Thread Randall Pruim

Thanks.

I’m aware of the other syntax.  My example was just to illustrate the issue 
minimally, not to indicate how I am using aggregate().  In my application, 
aggregate() will be called within another function, and the information passed 
to aggregate() is columns of a matrix returned by model.frame().

For now, I’ve written by own local version of aggregate() with a few tweaks to 
retain the names I want.

But my question remains: Is this a bug or a feature?

—rjp


On Mar 23, 2018, at 6:57 PM, Ista Zahn 
> wrote:

On Fri, Mar 23, 2018 at 6:43 PM, Rui Barradas 
> wrote:
Hello,

Not exactly an answer but here it goes.
If you use the formula interface the names will be retained.

Also if you pass named arguments:

aggregate(iris["Sepal.Length"], by = iris["Species"], FUN = foo)
#  Species Sepal.Length
# 1 setosa5.006
# 2 versicolor5.936
# 3  virginica6.588

If fact, this
is even better than those names assigned by bar.


aggregate(Sepal.Length ~ Species, data = iris, FUN = foo)
# Species Sepal.Length
#1 setosa5.006
#2 versicolor5.936
#3  virginica6.588


Hope this helps,

Rui Barradas


On 3/23/2018 1:29 PM, Randall Pruim wrote:

In the examples below, the first loses the name attached by foo(), the
second retains names attached by bar().  Is this an intentional difference?
I’d prefer that the names be retained in both cases.

foo <- function(x) { c(mean = base::mean(x)) }
bar <- function(x) { c(mean = base::mean(x), sd = stats::sd(x))}
aggregate(iris$Sepal.Length, by = list(iris$Species), FUN = foo)
#>  Group.1 x
#> 1 setosa 5.006
#> 2 versicolor 5.936
#> 3  virginica 6.588
aggregate(iris$Sepal.Length, by = list(iris$Species), FUN = bar)
#>  Group.1x.mean  x.sd
#> 1 setosa 5.006 0.3524897
#> 2 versicolor 5.936 0.5161711
#> 3  virginica 6.588 0.6358796

—rjp


   [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel=DwIFaQ=4rZ6NPIETe-LE5i2KBR4rw=S6U-baLhvGcJ7iUQX_KZ6K2om1TTOeUI_-mjRpTrm00=wR-DoggMzZ5fX3PJlgbTQe2njoPJ03CTiimaCc_OHe0=rehyeJZBteb4wYmKFPvE74AzY4Nm__6Cm4h2q4xfXnk=


__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel=DwIFaQ=4rZ6NPIETe-LE5i2KBR4rw=S6U-baLhvGcJ7iUQX_KZ6K2om1TTOeUI_-mjRpTrm00=wR-DoggMzZ5fX3PJlgbTQe2njoPJ03CTiimaCc_OHe0=rehyeJZBteb4wYmKFPvE74AzY4Nm__6Cm4h2q4xfXnk=


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] aggregate() naming -- bug or feature

2018-03-23 Thread Ista Zahn
On Fri, Mar 23, 2018 at 6:43 PM, Rui Barradas  wrote:
> Hello,
>
> Not exactly an answer but here it goes.
> If you use the formula interface the names will be retained.

Also if you pass named arguments:

aggregate(iris["Sepal.Length"], by = iris["Species"], FUN = foo)
#  Species Sepal.Length
# 1 setosa5.006
# 2 versicolor5.936
# 3  virginica6.588

If fact, this
> is even better than those names assigned by bar.
>
>
> aggregate(Sepal.Length ~ Species, data = iris, FUN = foo)
> # Species Sepal.Length
> #1 setosa5.006
> #2 versicolor5.936
> #3  virginica6.588
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> On 3/23/2018 1:29 PM, Randall Pruim wrote:
>>
>> In the examples below, the first loses the name attached by foo(), the
>> second retains names attached by bar().  Is this an intentional difference?
>> I’d prefer that the names be retained in both cases.
>>
>> foo <- function(x) { c(mean = base::mean(x)) }
>> bar <- function(x) { c(mean = base::mean(x), sd = stats::sd(x))}
>> aggregate(iris$Sepal.Length, by = list(iris$Species), FUN = foo)
>> #>  Group.1 x
>> #> 1 setosa 5.006
>> #> 2 versicolor 5.936
>> #> 3  virginica 6.588
>> aggregate(iris$Sepal.Length, by = list(iris$Species), FUN = bar)
>> #>  Group.1x.mean  x.sd
>> #> 1 setosa 5.006 0.3524897
>> #> 2 versicolor 5.936 0.5161711
>> #> 3  virginica 6.588 0.6358796
>>
>> —rjp
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Integrate erros on certain functions

2018-03-23 Thread John Muschelli
In the help for ?integrate:

>When integrating over infinite intervals do so explicitly, rather than
just using a large number as the endpoint. This increases the chance of a
correct answer – any function whose integral over an infinite interval is
finite must be near zero for most of that interval.

I understand that and there are examples such as:

## a slowly-convergent integral
integrand <- function(x) {1/((x+1)*sqrt(x))}
integrate(integrand, lower = 0, upper = Inf)

## don't do this if you really want the integral from 0 to Inf
integrate(integrand, lower = 0, upper = 100, stop.on.error = FALSE)
#> failed with message ‘the integral is probably divergent’

which gives an error message if stop.on.error = FALSE. But what happens on
something like the function below:
integrate(function(x) exp(-x), lower = 0, upper =Inf)
#> 1 with absolute error < 5.7e-05
integrate(function(x) exp(-x), lower = 0, upper =13000)
#> 2.819306e-05 with absolute error < 5.6e-05

*integrate(function(x) exp(-x), lower = 0, upper =13000, stop.on.error =
FALSE)#> 2.819306e-05 with absolute error < 5.6e-05*

I'm not sure this is a bug or misuse of the function, but I would assume
the last integrate to give an error if stop.on.error = FALSE.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] aggregate() naming -- bug or feature

2018-03-23 Thread Rui Barradas

Hello,

Not exactly an answer but here it goes.
If you use the formula interface the names will be retained. If fact, 
this is even better than those names assigned by bar.



aggregate(Sepal.Length ~ Species, data = iris, FUN = foo)
# Species Sepal.Length
#1 setosa5.006
#2 versicolor5.936
#3  virginica6.588


Hope this helps,

Rui Barradas

On 3/23/2018 1:29 PM, Randall Pruim wrote:

In the examples below, the first loses the name attached by foo(), the second 
retains names attached by bar().  Is this an intentional difference?  I’d 
prefer that the names be retained in both cases.

foo <- function(x) { c(mean = base::mean(x)) }
bar <- function(x) { c(mean = base::mean(x), sd = stats::sd(x))}
aggregate(iris$Sepal.Length, by = list(iris$Species), FUN = foo)
#>  Group.1 x
#> 1 setosa 5.006
#> 2 versicolor 5.936
#> 3  virginica 6.588
aggregate(iris$Sepal.Length, by = list(iris$Species), FUN = bar)
#>  Group.1x.mean  x.sd
#> 1 setosa 5.006 0.3524897
#> 2 versicolor 5.936 0.5161711
#> 3  virginica 6.588 0.6358796

—rjp


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[R-pkg-devel] win-builder 3.3.3 libcurl error + NOTE

2018-03-23 Thread Tyler
I am getting a NOTE only on R-oldrelease when checking my package on
win-builder:

* checking CRAN incoming feasibility ... NOTE
Maintainer: 'Tyler Morgan-Wall '

Found the following (possibly) invalid URLs:
  URL: http://github.com/tylermorganwall/skpr
From: DESCRIPTION
Status: Error
Message: libcurl error code 35
error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert
protocol version
  URL: http://github.com/tylermorganwall/skpr/issues
From: DESCRIPTION
Status: Error
Message: libcurl error code 35
error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert
protocol version

I checked the version of the package currently on the CRAN that passed
without any libcurl errors or NOTEs back in January, and it too displayed
this NOTE, which again only occurred on R-oldrelease. Is there a way to
prevent this or should I just mention this NOTE in my CRAN submission
comment?

Tyler

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[Rd] aggregate() naming -- bug or feature

2018-03-23 Thread Randall Pruim
In the examples below, the first loses the name attached by foo(), the second 
retains names attached by bar().  Is this an intentional difference?  I’d 
prefer that the names be retained in both cases.

foo <- function(x) { c(mean = base::mean(x)) }
bar <- function(x) { c(mean = base::mean(x), sd = stats::sd(x))}
aggregate(iris$Sepal.Length, by = list(iris$Species), FUN = foo)
#>  Group.1 x
#> 1 setosa 5.006
#> 2 versicolor 5.936
#> 3  virginica 6.588
aggregate(iris$Sepal.Length, by = list(iris$Species), FUN = bar)
#>  Group.1x.mean  x.sd
#> 1 setosa 5.006 0.3524897
#> 2 versicolor 5.936 0.5161711
#> 3  virginica 6.588 0.6358796

—rjp


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] Update data in data package, pcxnData

2018-03-23 Thread Sokratis Kariotis
Hey Lori,

While under the RELEASE_3_6 branch and using git fetch upstream or git push
upstream/RELEASE_3_6:






*Warning: Permanently added 'git.bioconductor.org
,34.192.48.227' (ECDSA) to the list of known
hosts.Permission denied (publickey).fatal: Could not read from remote
repository.Please make sure you have the correct access rightsand the
repository exists.*


git remove -v results to:






*origin  https://github.com/hidelab/pcxn.git
 (fetch)origin
https://github.com/hidelab/pcxn.git 
(push)upstreamg...@git.bioconductor.org:packages/pcxn.git
(fetch)upstreamg...@git.bioconductor.org:packages/pcxn.git (push)*
The ssh command also gives:




*Warning: Permanently added 'git.bioconductor.org
,34.192.48.227' (ECDSA) to the list of known
hosts.Permission denied (publickey).*

*Cheers,*

*Sokratis*


On 23 March 2018 at 11:35, Shepherd, Lori 
wrote:

> Where you able to update master?  Or are you having access trouble
> for both branches?  Please use reply all so that other members on the team
> can answer when appropriate and the responses go to the mailing list.
>
> Your package is in the git repository and has access with the ID s.kariotis.
> Can you please copy the commands with the output you are receiving that
> make you think that you do not have access?
>
> Please also include the output of
>
> git remote -v
>
> and
>
> ssh -T g...@git.bioconductor.org
> (you should see RW access next to your package)
>
>
> Please
>
>
>
> Lori Shepherd
>
> Bioconductor Core Team
>
> Roswell Park Cancer Institute
>
> Department of Biostatistics & Bioinformatics
>
> Elm & Carlton Streets
>
> Buffalo, New York 14263
>
>
> --
> *From:* Bioc-devel  on behalf of
> Shepherd, Lori 
> *Sent:* Friday, November 24, 2017 10:13 AM
> *To:* Sokratis Kariotis; bioc-devel
> *Subject:* Re: [Bioc-devel] Update data in data package, pcxnData
>
> You should make changes and push to the master branch of your package at
> g...@git.bioconductor.org:packages/pcxnData.git  .  This will update the
> devel version of your package.
>
> As long as the package builds correctly, It will be available for download
> immediately with Bioc devel 3.7.
>
>
> We don't encourage updating release unless it is a bug correction or
> justifiable updates, but if you push changes to the RELEASE_3_6 branch they
> will also be available in Bioc 3.6
>
>
> Instructions for pushing data to our git repository can be found here:
>
> http://bioconductor.org/developers/how-to/git/
>
>
> Remember to do a pull from our git repositories if you have not done so
> since the release as we update the versions when rolling out the release.
>
> http://bioconductor.org/developers/how-to/version-numbering/
>
>
>
> Lori Shepherd
>
> Bioconductor Core Team
>
> Roswell Park Cancer Institute
>
> Department of Biostatistics & Bioinformatics
>
> Elm & Carlton Streets
>
> Buffalo, New York 14263
>
> 
> From: Bioc-devel  on behalf of Sokratis
> Kariotis 
> Sent: Friday, November 24, 2017 10:02:13 AM
> To: bioc-devel
> Subject: [Bioc-devel] Update data in data package, pcxnData
>
> Hi all,
>
> I have a data package (pcxnData) that got released in 3.6 and I am
> wondering how long will it take, if I changed some of the data in that
> package for them to be publicly available in bioconductor instead of the
> old data. Can they be available sooner than the next release? Thanks in
> advance.
>
> Cheers,
> Sokratis Kariotis
>
>
>
> --
> Sokratis Kariotis
> Scientific Programmer
> Hidelab
> Sheffield Institute for Translational Neuroscience
> 385a Glossop Rd, Sheffield, S10 2HQ
> 
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> This email message may contain legally privileged and/or confidential
> information. If you are not the 

Re: [Bioc-devel] Update data in data package, pcxnData

2018-03-23 Thread Shepherd, Lori
Where you able to update master?  Or are you having access trouble for both 
branches?  Please use reply all so that other members on the team can answer 
when appropriate and the responses go to the mailing list.

Your package is in the git repository and has access with the ID s.kariotis.  
Can you please copy the commands with the output you are receiving that make 
you think that you do not have access?

Please also include the output of

git remote -v

and

ssh -T g...@git.bioconductor.org
(you should see RW access next to your package)


Please




Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263



From: Bioc-devel  on behalf of Shepherd, Lori 

Sent: Friday, November 24, 2017 10:13 AM
To: Sokratis Kariotis; bioc-devel
Subject: Re: [Bioc-devel] Update data in data package, pcxnData

You should make changes and push to the master branch of your package at  
g...@git.bioconductor.org:packages/pcxnData.git  .  This will update the devel 
version of your package.

As long as the package builds correctly, It will be available for download 
immediately with Bioc devel 3.7.


We don't encourage updating release unless it is a bug correction or 
justifiable updates, but if you push changes to the RELEASE_3_6 branch they 
will also be available in Bioc 3.6


Instructions for pushing data to our git repository can be found here:

http://bioconductor.org/developers/how-to/git/


Remember to do a pull from our git repositories if you have not done so since 
the release as we update the versions when rolling out the release.

http://bioconductor.org/developers/how-to/version-numbering/



Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


From: Bioc-devel  on behalf of Sokratis 
Kariotis 
Sent: Friday, November 24, 2017 10:02:13 AM
To: bioc-devel
Subject: [Bioc-devel] Update data in data package, pcxnData

Hi all,

I have a data package (pcxnData) that got released in 3.6 and I am
wondering how long will it take, if I changed some of the data in that
package for them to be publicly available in bioconductor instead of the
old data. Can they be available sooner than the next release? Thanks in
advance.

Cheers,
Sokratis Kariotis



--
Sokratis Kariotis
Scientific Programmer
Hidelab
Sheffield Institute for Translational Neuroscience
385a Glossop Rd, Sheffield, S10 2HQ

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel