Re: [Rd] Function 'factor' issues

2018-03-24 Thread Martin Maechler
all change for both `factor` and `levels<-.factor` Martin > > On Sat, 25/11/17, Suharto Anggono Suharto Anggono <suharto_angg...@yahoo.com> wrote: > Subject: Re: [Rd] Function 'factor' issues > To: r-devel@r-project.org > Date: Saturday, 25 November, 2017, 6:03 P

Re: [Rd] Function 'factor' issues

2018-03-23 Thread Suharto Anggono Suharto Anggono via R-devel
uharto Anggono <suharto_angg...@yahoo.com> wrote: Subject: Re: [Rd] Function 'factor' issues To: r-devel@r-project.org Date: Saturday, 25 November, 2017, 6:03 PM >From commits to R devel, I saw attempts to speed up subsetting and 'match', >and to cache results of conversion of small no

Re: [Rd] Function 'factor' issues

2017-11-25 Thread Suharto Anggono Suharto Anggono via R-devel
s<-.factor' result at all. So, the corresponding part of functions 'factor' and 'levels<-.factor' can be kept in sync. ---- Subject: Re: [Rd] Function 'factor' issues To: r-devel@r-project.org Date: Sunday, 22 October, 2017, 6:43 AM My idea

Re: [Rd] Function 'factor' issues

2017-10-21 Thread Suharto Anggono Suharto Anggono via R-devel
corresponding to '['. Take data frame and "Surv" object (package survival) as examples. On Wed, 18/10/17, Martin Maechler <maech...@stat.math.ethz.ch> wrote: Subject: Re: [Rd] Function 'factor' issues Cc: r-devel@r-project.org D

Re: [Rd] Function 'factor' issues

2017-10-18 Thread Gabriel Becker
Martin, Suharto, et al., On Wed, Oct 18, 2017 at 9:54 AM, Martin Maechler wrote: > ** > > > Note: In theory, if function 'factor' merged duplicated 'labels' in > all cases, at least in > > factor(c(sqrt(2)^2, 2)) , > > function 'factor' could do

Re: [Rd] Function 'factor' issues

2017-10-18 Thread Martin Maechler
> Suharto Anggono Suharto Anggono via R-devel > on Sun, 15 Oct 2017 16:03:48 + writes: > In R devel, function 'factor' has been changed, allowing and merging duplicated 'labels'. Indeed. That had been asked for and discussed a bit on this list from

[Rd] Function 'factor' issues

2017-10-15 Thread Suharto Anggono Suharto Anggono via R-devel
In R devel, function 'factor' has been changed, allowing and merging duplicated 'labels'. Issue 1: Handling of specified 'labels' without duplicates is slower than before. Example: x <- rep(1:26, 4) system.time(factor(x, levels=1:26, labels=letters)) Function 'factor' is already rather

Re: [Rd] duplicated factor labels.

2017-06-23 Thread Paul Johnson
On Fri, Jun 23, 2017 at 7:20 AM, Uwe Ligges wrote: > > > On 23.06.2017 11:51, peter dalgaard wrote: >> >> Hmm, the danger in this is that duplicated factor levels _used_ to be >> allowed (i.e. multiple codes with the same level). Disallowing it is what >> broke

Re: [Rd] duplicated factor labels.

2017-06-23 Thread Joris Meys
On Fri, Jun 23, 2017 at 2:20 PM, Uwe Ligges wrote: > > > > I had the chance to look at > 1300 SPSS files our consulting center > collected during the last 20 year, and in several hundred cases we found > such a problem that was copy & paste error and simply

Re: [Rd] duplicated factor labels.

2017-06-23 Thread Martin Maechler
> peter dalgaard > on Fri, 23 Jun 2017 11:51:05 +0200 writes: > Hmm, the danger in this is that duplicated factor levels _used_ to be allowed (i.e. multiple codes with the same level). Disallowing it is what broke read.spss() on some files, because SPSS's

Re: [Rd] duplicated factor labels.

2017-06-23 Thread Uwe Ligges
On 23.06.2017 11:51, peter dalgaard wrote: Hmm, the danger in this is that duplicated factor levels _used_ to be allowed (i.e. multiple codes with the same level). Disallowing it is what broke read.spss() on some files, because SPSS's concept of value labels is not 1-to-1 with factors.

Re: [Rd] duplicated factor labels.

2017-06-23 Thread peter dalgaard
Hmm, the danger in this is that duplicated factor levels _used_ to be allowed (i.e. multiple codes with the same level). Disallowing it is what broke read.spss() on some files, because SPSS's concept of value labels is not 1-to-1 with factors. Reallowing it with different semantics could be

Re: [Rd] duplicated factor labels.

2017-06-23 Thread Martin Maechler
> Martin Maechler > on Thu, 22 Jun 2017 11:43:59 +0200 writes: > Paul Johnson > on Fri, 16 Jun 2017 11:02:34 -0500 writes: >> On Fri, Jun 16, 2017 at 2:35 AM, Joris Meys wrote: >>> To extwnd

Re: [Rd] duplicated factor labels.

2017-06-22 Thread Martin Maechler
> Paul Johnson > on Fri, 16 Jun 2017 11:02:34 -0500 writes: > On Fri, Jun 16, 2017 at 2:35 AM, Joris Meys wrote: >> To extwnd on Martin 's explanation : >> >> In factor(), levels are the unique input values and labels the

Re: [Rd] duplicated factor labels.

2017-06-16 Thread Joris Meys
Hi Paul, Now I see what you're getting at. I misread your original mail completely. So we definitely agree, and wholeheartedly even. The use case you just gave, is definitely in my top 5 of frustrations about R. I would like to be able to assign the same label to multiple levels without having

Re: [Rd] duplicated factor labels.

2017-06-16 Thread Paul Johnson
On Fri, Jun 16, 2017 at 2:35 AM, Joris Meys wrote: > To extwnd on Martin 's explanation : > > In factor(), levels are the unique input values and labels the unique output > values. So the function levels() actually displays the labels. > Dear Joris I think we agree.

Re: [Rd] duplicated factor labels.

2017-06-16 Thread Joris Meys
To extwnd on Martin 's explanation : In factor(), levels are the unique input values and labels the unique output values. So the function levels() actually displays the labels. Cheers Joris On 15 Jun 2017 17:15, "Martin Maechler" wrote: > Paul Johnson

Re: [Rd] duplicated factor labels.

2017-06-15 Thread Martin Maechler
> Paul Johnson > on Wed, 14 Jun 2017 19:00:11 -0500 writes: > Dear R devel > I've been wondering about this for a while. I am sorry to ask for your > time, but can one of you help me understand this? > This concerns duplicated labels, not

[Rd] duplicated factor labels.

2017-06-14 Thread Paul Johnson
Dear R devel I've been wondering about this for a while. I am sorry to ask for your time, but can one of you help me understand this? This concerns duplicated labels, not levels, in the factor function. I think it is hard to understand that factor() fails, but levels() after does not > x <-

[Rd] split(factor, shortGroupVector) gives incorrect results in R 2.12.2

2011-03-21 Thread William Dunlap
When split's x argument has a class attribute and the grouping vector, f, is shorter than x then split gives the wrong result. It appears to not extend f to the length of x before doing the split. E.g., split(factor(letters[1:3]), Group one) # expect all 3 elements in the single group

Re: [Rd] split(factor, shortGroupVector) gives incorrect results in R 2.12.2

2011-03-21 Thread peter dalgaard
On Mar 21, 2011, at 17:16 , William Dunlap wrote: split(factor(letters[1:3]), c(Group one, Group two)) Yes, that's a bug (at the very least, it is against documented behavior) The strong suspicion is that ind - .Internal(split(seq_along(f), f)) should have seq_along(x) , not f. But

Re: [Rd] No [[-.factor()

2010-08-30 Thread Martin Maechler
Prof Brian Ripley rip...@stats.ox.ac.uk on Mon, 30 Aug 2010 08:28:24 +0100 (BST) writes: On Thu, 26 Aug 2010, Martin Maechler wrote: WD == William Dunlap wdun...@tibco.com on Wed, 25 Aug 2010 17:31:27 -0700 writes: WD Should there be a [[-.factor() that either

Re: [Rd] No [[-.factor()

2010-08-26 Thread Martin Maechler
WD == William Dunlap wdun...@tibco.com on Wed, 25 Aug 2010 17:31:27 -0700 writes: WD Should there be a [[-.factor() that either throws WD an error or acts like [-.factor() to avoid making WD an illegal object of class factor? Yes, one or the other. Note that both `[-` and `[[-`

[Rd] No [[-.factor()

2010-08-25 Thread William Dunlap
Should there be a [[-.factor() that either throws an error or acts like [-.factor() to avoid making an illegal object of class factor? z - factor(c(Two,Two,Three), levels=c(One,Two,Three)) z [1] Two Two Three Levels: One Two Three str(z) Factor w/ 3 levels One,Two,Three: 2 2 3

Re: [Rd] table(factor(x), exclude=NULL) (PR#11494)

2008-05-28 Thread Peter Dalgaard
[EMAIL PROTECTED] wrote: Hi. I don't know if this a bug or just annoying to me: x - c(1,2,3,NA) table(x, exclude=NULL) x 123 NA 1111 table(factor(x), exclude=NULL) 1 2 3 1 1 1 I don't think many people use factor(x, exclude=NULL): it

[Rd] table(factor(x), exclude=NULL) (PR#11494)

2008-05-21 Thread David . Duffy
Hi. I don't know if this a bug or just annoying to me: x - c(1,2,3,NA) table(x, exclude=NULL) x 123 NA 1111 table(factor(x), exclude=NULL) 1 2 3 1 1 1 I don't think many people use factor(x, exclude=NULL): it is not the default handling of character data by

[Rd] Wishlist: Factor correlations in factanal (PR#10931)

2008-03-11 Thread uhkeller
Full_Name: Ulrich Keller Version: 2.6.2 OS: Ubuntu 7.10 Submission from: (NULL) (158.64.77.190) Most statistical packages report factor correlations for oblique factor rotations. R's factanal() does not. John Fox posted some modifications to R-devel back in 2005 that implement this, but

Re: [Rd] Aggregate factor names

2007-10-01 Thread Martin Elff
On Thursday 27 September 2007 (17:57:55), Mike Lawrence wrote: ex. it is annoying to type with( my.data ,aggregate( my.dv ,list( one.iv = one.iv ,another.iv = another.iv    

[Rd] Aggregate factor names

2007-09-27 Thread Mike Lawrence
Hi all, A suggestion derived from discussions amongst a number of R users in my research group: set the default column names produced by aggregate () equal to the names of the objects in the list passed to the 'by' object. ex. it is annoying to type with( my.data

Re: [Rd] Aggregate factor names

2007-09-27 Thread Gabor Grothendieck
You can do this: aggregate(iris[-5], iris[5], mean) On 9/27/07, Mike Lawrence [EMAIL PROTECTED] wrote: Hi all, A suggestion derived from discussions amongst a number of R users in my research group: set the default column names produced by aggregate () equal to the names of the objects in

Re: [Rd] Aggregate factor names

2007-09-27 Thread Mike Lawrence
Understood, but my point is that the naming I suggest should be the default. One should not be 'punished' for being explicit in calling aggregate. On 27-Sep-07, at 1:06 PM, Gabor Grothendieck wrote: You can do this: aggregate(iris[-5], iris[5], mean) On 9/27/07, Mike Lawrence [EMAIL

Re: [Rd] Aggregate factor names

2007-09-27 Thread Gabor Grothendieck
You can do this too: aggregate(iris[-5], iris[Species], mean) or this: with(iris, aggregate(iris[-5], data.frame(Species), mean)) or this: attach(iris) aggregate(iris[-5], data.frame(Species), mean) The point is that you already don't have to write x = x. The only reason you are writing it