[Rd] Bug in as.character? (PR#14206)

2010-02-05 Thread Havard . Rue

A long formula which is converted using as.character, looses its last
part: ``diagonal = 1e-12)'' 

Shorter formula is ok though.

Best,
HÃ¥vard



Browse[2] formula.str
y ~ -1 + b1 + b2 + b3 + b4 + b5 + b6 + b7 + b8 + b9 + b10 + b11 + 
b12 + b13 + b14 + b15 + b16 + b17 + b18 + b19 + b20 + b21 + 
b22 + b23 + b24 + b25 + b26 + b27 + b28 + b29 + b30 + b31 + 
b32 + b33 + b34 + b35 + b36 + b37 + b38 + b39 + b40 + b41 + 
b42 + b43 + b44 + b45 + b46 + b47 + b48 + b49 + elevation + 
f(idx, model = sphere, sphere.dir = global_temperature_80s, 
T.order = 2, K.order = 2, T.model = rotsym, K.model =
rotsym, 
initial = c(-4, 1, 0), param = c(-4, 0.01, 3, 0.01, 0, 
1), replicate = replicate, diagonal = 1e-12)

Browse[2] formula.str[3]
-1 + b1 + b2 + b3 + b4 + b5 + b6 + b7 + b8 + b9 + b10 + b11 + 
b12 + b13 + b14 + b15 + b16 + b17 + b18 + b19 + b20 + b21 + 
b22 + b23 + b24 + b25 + b26 + b27 + b28 + b29 + b30 + b31 + 
b32 + b33 + b34 + b35 + b36 + b37 + b38 + b39 + b40 + b41 + 
b42 + b43 + b44 + b45 + b46 + b47 + b48 + b49 + elevation + 
f(idx, model = sphere, sphere.dir = global_temperature_80s, 
T.order = 2, K.order = 2, T.model = rotsym, K.model =
rotsym, 
initial = c(-4, 1, 0), param = c(-4, 0.01, 3, 0.01, 0, 
1), replicate = replicate, diagonal = 1e-12)()

Browse[2] as.character(formula.str[3])
[1] -1 + b1 + b2 + b3 + b4 + b5 + b6 + b7 + b8 + b9 + b10 + b11 + b12 +
b13 + b14 + b15 + b16 + b17 + b18 + b19 + b20 + b21 + b22 + b23 + b24 +
b25 + b26 + b27 + b28 + b29 + b30 + b31 + b32 + b33 + b34 + b35 + b36 +
b37 + b38 + b39 + b40 + b41 + b42 + b43 + b44 + b45 + b46 + b47 + b48 +
b49 + elevation + f(idx, model = \sphere\, sphere.dir =
\global_temperature_80s\, T.order = 2, K.order = 2, T.model = \rotsym
\, K.model = \rotsym\, initial = c(-4, 1, 0), param = c(-4, 0.01, 3,
0.01, 0, 1), replicate = replicate, 



--please do not edit the information below--

Version:
 platform = x86_64-redhat-linux-gnu
 arch = x86_64
 os = linux-gnu
 system = x86_64, linux-gnu
 status = 
 major = 2
 minor = 10.1
 year = 2009
 month = 12
 day = 14
 svn rev = 50720
 language = R
 version.string = R version 2.10.1 (2009-12-14)

Locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_DK.utf8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

Search Path:
 .GlobalEnv, package:stats, package:graphics, package:grDevices,
package:datasets, package:INLA, package:R.utils, package:R.oo,
package:utils, package:R.methodsS3, package:methods, Autoloads,
package:base

-- 
 HÃ¥vard Rue
 Department of Mathematical Sciences
 Norwegian University of Science and Technology
 N-7491 Trondheim, Norway
 Voice: +47-7359-3533URL  : http://www.math.ntnu.no/~hrue  
 Fax  : +47-7359-3524Email: havard@math.ntnu.no

 This message was created in a Microsoft-free computing environment.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug in as.character? (PR#14206)

2010-02-05 Thread Peter Dalgaard
havard@math.ntnu.no wrote:
 A long formula which is converted using as.character, looses its last
 part: ``diagonal = 1e-12)'' 
 
 Shorter formula is ok though.

(If you have to put a ? in a bug report, ask instead!)

This is entirely consistent with  help(as.character):

Note:

 ‘as.character’ truncates components of language objects to 500
 characters (was about 70 before 1.3.1).


If you insist on working with very long formulas in their character
representation, you need to use deparse() and deal with the resulting
multi-line character vectors. (I can't tell what you're trying to do,
but update.formula() may provide a cleaner way of modifying formulas.)

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why is there no c.factor?

2010-02-05 Thread Hadley Wickham
On Thu, Feb 4, 2010 at 12:03 PM, Hadley Wickham had...@rice.edu wrote:
 I'd propose the following: If the sets of levels of all arguments are the
 same, then c.factor() would return a factor with the common set of levels;
 if the sets of levels differ, then, as Hadley suggests, the level-set of the
 result would be the union of sets of levels of the arguments, but a warning
 would be issued.

 I like this compromise (as long as there was an argument to suppress
 the warning)

If I provided code to do this, along with the warnings for ordered
factors and using the optimisation suggested by Matthew, is there any
member of R core would be interested in sponsoring it?

Hadley

-- 
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why is there no c.factor?

2010-02-05 Thread Peter Dalgaard
Hadley Wickham wrote:
 On Thu, Feb 4, 2010 at 12:03 PM, Hadley Wickham had...@rice.edu wrote:
 I'd propose the following: If the sets of levels of all arguments are the
 same, then c.factor() would return a factor with the common set of levels;
 if the sets of levels differ, then, as Hadley suggests, the level-set of the
 result would be the union of sets of levels of the arguments, but a warning
 would be issued.
 I like this compromise (as long as there was an argument to suppress
 the warning)
 
 If I provided code to do this, along with the warnings for ordered
 factors and using the optimisation suggested by Matthew, is there any
 member of R core would be interested in sponsoring it?
 
 Hadley
 

Messing with c() is a bit unattractive (I'm not too happy with the other
c methods either; normally c() strips attributes and reduces to the base
class, and those obviously do not), but a more general concat() function
has been suggested a number of times. With a suitable range of methods,
this could also be used to reimplement rbind.data.frame (which,
incidentally, already contains a method for concatenating factors, with
several ugly warts!)

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why is there no c.factor?

2010-02-05 Thread William Dunlap
 -Original Message-
 From: r-devel-boun...@r-project.org 
 [mailto:r-devel-boun...@r-project.org] On Behalf Of Peter Dalgaard
 Sent: Friday, February 05, 2010 7:41 AM
 To: Hadley Wickham
 Cc: John Fox; r-devel@r-project.org; Thomas Lumley
 Subject: Re: [Rd] Why is there no c.factor?
 
 Hadley Wickham wrote:
  On Thu, Feb 4, 2010 at 12:03 PM, Hadley Wickham 
 had...@rice.edu wrote:
  I'd propose the following: If the sets of levels of all 
 arguments are the
  same, then c.factor() would return a factor with the 
 common set of levels;
  if the sets of levels differ, then, as Hadley suggests, 
 the level-set of the
  result would be the union of sets of levels of the 
 arguments, but a warning
  would be issued.
  I like this compromise (as long as there was an argument 
 to suppress
  the warning)
  
  If I provided code to do this, along with the warnings for ordered
  factors and using the optimisation suggested by Matthew, is 
 there any
  member of R core would be interested in sponsoring it?
  
  Hadley
  
 
 Messing with c() is a bit unattractive (I'm not too happy 
 with the other
 c methods either; normally c() strips attributes and reduces 
 to the base
 class, and those obviously do not), but a more general 
 concat() function
 has been suggested a number of times. With a suitable range 
 of methods,
 this could also be used to reimplement rbind.data.frame (which,
 incidentally, already contains a method for concatenating 
 factors, with
 several ugly warts!)

Yes, c() should have been put on the deprecated list a couple
of decades ago, since people expect it to do too many
incompatible things.  And factor should have been a virtual
class, with subclasses FixedLevels (e.g., Sex) or AdHocLevels
(e.g., FamilyName), so c() and [()- could do the appropriate
thing in either case.

Back to reality, S+ has a concat(...) function, whose comments say
# This function works like c() except that names of arguments are
# ignored.  That is, it concatenates its arguments into a single
# S vector object, without considering the names of the arguments, 
# in the order that the arguments are given.
#
# To make this function work for new classes, it is only necessary
# to make methods for the concat.two function, which concatenates
# two vectors; recursion will take care of the rest.
concat() is not generic but it repeatedly calls concat.two(x,y), an
SV4-generic that dispatches on the classes of x and y.  Thus you
can easily predict the class of concat(x,y,z), although it may not
be the same as the class of concat(z,y,x), given suitably bizarre
methods for concat.two().

concat() doesn't get a lot of use but I think the idea is sound.
Perhaps that model would work well for a concatenation function in R.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 
 -- 
O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark  Ph:  
 (+45) 35327918
 ~~ - (p.dalga...@biostat.ku.dk)  FAX: 
 (+45) 35327907
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why is there no c.factor?

2010-02-05 Thread S Ellison


c() should have been put on the deprecated list a couple
of decades ago

Don't you dare!

Back to reality
phew! had me worried there.

c() is no problem at all for lists, Dates and most simple vector types;
why deprecate something solely because it doesn't behave for something
it doesn't claim to work on?


Steve E

***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why is there no c.factor?

2010-02-05 Thread Matthew Dowle

 concat() doesn't get a lot of use
How do you know?  Maybe its used a lot but the users had no need to tell you 
what they were using. The exact opposite might in fact be the case i.e. 
because concat is so good in splus,  you just never hear of problems with it 
from the users. That might be a very good sign.

 perhaps that model would work well for a concatenation function in R
I'd be happy to test it. I'm a bit concerned about performance though given 
what you said about repeated recursive calls, and dispatch. Could you run 
the following test in s-plus please and post back the timing?  If this small 
100MB example was fine, then we could proceed to a 64bit 10GB test. This is 
quite nippy at the moment in R (1.1sec). I'd be happy with a better way as 
long as speed wasn't compromised.

set.seed(1)
L = as.vector(outer(LETTERS,LETTERS,paste,sep=))   # union set of 676 
levels
F = lapply(1:100, function(i) 
{# create 100 factors
   f = sample(1:100, 1*1024^2 / 4, replace=TRUE)   # each factor 
1MB large (262144 integers), plus small amount for the levels
   levels(f) = sample(L,100) 
# pick 100 levels from the union set
   class(f) = factor
   f
})

 head(F[[1]])
[1] RT DM CO JV BG KU
100 Levels: YC FO PN IL CB CY HQ ...
 head(F[[2]])
[1] RK PD FE SG SJ CQ
100 Levels: JV FV DX NL XB ND CY QQ ...


With c.factor from data.table, as posted, placed in .GlobalEnv

 system.time(G - do.call(c,F))
   user  system elapsed
   0.810.321.12
 head(G)
[1] RT DM CO JV BG KU# looks right, comparing to F[[1]] above
676 Levels: AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS AT AU 
AV AW AX AY AZ BA BB BC BD BE BF ... ZZ
 G[262145:262150]
[1] RK PD FE SG SJ CQ  # looks right, comparing to F[[2]] above
676 Levels: AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS AT AU 
AV AW AX AY AZ BA BB BC BD BE BF ... ZZ
 identical(as.character(G),as.character(unlist(F)))
[1] TRUE

So I guess this would be compared to following in splus ?

system.time(G - do.call(concat, F))

or maybe its just the following :

system.time(G - concat(F))

I don't have splus so I can't test that myself.


William Dunlap wdun...@tibco.com wrote in message 
news:77eb52c6dd32ba4d87471dcd70c8d7000275b...@na-pa-vbe03.na.tibco.com...
 -Original Message-
 From: r-devel-boun...@r-project.org
 [mailto:r-devel-boun...@r-project.org] On Behalf Of Peter Dalgaard
 Sent: Friday, February 05, 2010 7:41 AM
 To: Hadley Wickham
 Cc: John Fox; r-devel@r-project.org; Thomas Lumley
 Subject: Re: [Rd] Why is there no c.factor?

 Hadley Wickham wrote:
  On Thu, Feb 4, 2010 at 12:03 PM, Hadley Wickham
 had...@rice.edu wrote:
  I'd propose the following: If the sets of levels of all
 arguments are the
  same, then c.factor() would return a factor with the
 common set of levels;
  if the sets of levels differ, then, as Hadley suggests,
 the level-set of the
  result would be the union of sets of levels of the
 arguments, but a warning
  would be issued.
  I like this compromise (as long as there was an argument
 to suppress
  the warning)
 
  If I provided code to do this, along with the warnings for ordered
  factors and using the optimisation suggested by Matthew, is
 there any
  member of R core would be interested in sponsoring it?
 
  Hadley
 

 Messing with c() is a bit unattractive (I'm not too happy
 with the other
 c methods either; normally c() strips attributes and reduces
 to the base
 class, and those obviously do not), but a more general
 concat() function
 has been suggested a number of times. With a suitable range
 of methods,
 this could also be used to reimplement rbind.data.frame (which,
 incidentally, already contains a method for concatenating
 factors, with
 several ugly warts!)

Yes, c() should have been put on the deprecated list a couple
of decades ago, since people expect it to do too many
incompatible things.  And factor should have been a virtual
class, with subclasses FixedLevels (e.g., Sex) or AdHocLevels
(e.g., FamilyName), so c() and [()- could do the appropriate
thing in either case.

Back to reality, S+ has a concat(...) function, whose comments say
# This function works like c() except that names of arguments are
# ignored.  That is, it concatenates its arguments into a single
# S vector object, without considering the names of the arguments,
# in the order that the arguments are given.
#
# To make this function work for new classes, it is only necessary
# to make methods for the concat.two function, which concatenates
# two vectors; recursion will take care of the rest.
concat() is not generic but it repeatedly calls concat.two(x,y), an
SV4-generic that dispatches on the classes of x and y.  Thus you
can easily predict the class of concat(x,y,z), although it may not
be the same as the class of concat(z,y,x), given suitably bizarre
methods for concat.two().

concat() doesn't get a lot of use but I think the idea is sound.
Perhaps 

Re: [Rd] Why is there no c.factor?

2010-02-05 Thread William Dunlap
 From: r-devel-boun...@r-project.org 
 [mailto:r-devel-boun...@r-project.org] On Behalf Of Matthew Dowle
 Sent: Friday, February 05, 2010 11:17 AM
 To: r-de...@stat.math.ethz.ch
 Subject: Re: [Rd] Why is there no c.factor?
 
 
  concat() doesn't get a lot of use
 How do you know?  Maybe its used a lot but the users had no 
 need to tell you 
 what they were using. The exact opposite might in fact be the 
 case i.e. 
 because concat is so good in splus,  you just never hear of 
 problems with it 
 from the users. That might be a very good sign.

We don't use concat in many of our functions.
It tends to be used only where c fails.  It
is slower than c(), in part because it is an SV4
generic while c is a .Internal (the fastest S+
interface to C code).  concat() is also written
entirely in S code, with calls to heavyweights like
sapply.  Writing it in C would speed it up a lot.

   sys.time(for(i in 1:1)c(1,2))
  [1] 0.27 0.27
   sys.time(for(i in 1:1)concat(1,2))
  [1] 20.29 20.29
   sys.time(for(i in 1:1)concat.two(1,2))
  [1] 0.52 0.52

The last just calls the default method of concat.two,
which is a call to c().

 
  perhaps that model would work well for a concatenation function in R
 I'd be happy to test it. I'm a bit concerned about 
 performance though given 
 what you said about repeated recursive calls, and dispatch. 
 Could you run 
 the following test in s-plus please and post back the timing? 
  If this small 
 100MB example was fine, then we could proceed to a 64bit 10GB 
 test. This is 
 quite nippy at the moment in R (1.1sec). I'd be happy with a 
 better way as 
 long as speed wasn't compromised.
 
 set.seed(1)
 L = as.vector(outer(LETTERS,LETTERS,paste,sep=))   # 
 union set of 676 
 levels
 F = lapply(1:100, function(i) 
 {# create 100 factors
f = sample(1:100, 1*1024^2 / 4, replace=TRUE)  
  # each factor 
 1MB large (262144 integers), plus small amount for the levels
levels(f) = sample(L,100) 
 # pick 100 levels from the union set
class(f) = factor
f
 })
 
  head(F[[1]])
 [1] RT DM CO JV BG KU
 100 Levels: YC FO PN IL CB CY HQ ...
  head(F[[2]])
 [1] RK PD FE SG SJ CQ
 100 Levels: JV FV DX NL XB ND CY QQ ...
 
 
 With c.factor from data.table, as posted, placed in .GlobalEnv
 
  system.time(G - do.call(c,F))
user  system elapsed
0.810.321.12
  head(G)
 [1] RT DM CO JV BG KU# looks right, comparing to F[[1]] above
 676 Levels: AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP 
 AQ AR AS AT AU 
 AV AW AX AY AZ BA BB BC BD BE BF ... ZZ
  G[262145:262150]
 [1] RK PD FE SG SJ CQ  # looks right, comparing to 
 F[[2]] above
 676 Levels: AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP 
 AQ AR AS AT AU 
 AV AW AX AY AZ BA BB BC BD BE BF ... ZZ
  identical(as.character(G),as.character(unlist(F)))
 [1] TRUE
 
 So I guess this would be compared to following in splus ?
 
 system.time(G - do.call(concat, F))
 
 or maybe its just the following :
 
 system.time(G - concat(F))
 
 I don't have splus so I can't test that myself.
 
 
 William Dunlap wdun...@tibco.com wrote in message 
 news:77eb52c6dd32ba4d87471dcd70c8d7000275b...@na-pa-vbe03.na.t
ibco.com...
  -Original Message-
  From: r-devel-boun...@r-project.org
  [mailto:r-devel-boun...@r-project.org] On Behalf Of Peter Dalgaard
  Sent: Friday, February 05, 2010 7:41 AM
  To: Hadley Wickham
  Cc: John Fox; r-devel@r-project.org; Thomas Lumley
  Subject: Re: [Rd] Why is there no c.factor?
 
  Hadley Wickham wrote:
   On Thu, Feb 4, 2010 at 12:03 PM, Hadley Wickham
  had...@rice.edu wrote:
   I'd propose the following: If the sets of levels of all
  arguments are the
   same, then c.factor() would return a factor with the
  common set of levels;
   if the sets of levels differ, then, as Hadley suggests,
  the level-set of the
   result would be the union of sets of levels of the
  arguments, but a warning
   would be issued.
   I like this compromise (as long as there was an argument
  to suppress
   the warning)
  
   If I provided code to do this, along with the warnings for ordered
   factors and using the optimisation suggested by Matthew, is
  there any
   member of R core would be interested in sponsoring it?
  
   Hadley
  
 
  Messing with c() is a bit unattractive (I'm not too happy
  with the other
  c methods either; normally c() strips attributes and reduces
  to the base
  class, and those obviously do not), but a more general
  concat() function
  has been suggested a number of times. With a suitable range
  of methods,
  this could also be used to reimplement rbind.data.frame (which,
  incidentally, already contains a method for concatenating
  factors, with
  several ugly warts!)
 
 Yes, c() should have been put on the deprecated list a couple
 of decades ago, since people expect it to do too many
 incompatible things.  And factor should have been a virtual
 class, with subclasses FixedLevels (e.g., Sex) or