[Rd] buglet in terms calculations

2007-04-08 Thread Robert Gentleman
Hi,
   Vince and I have noticed a problem with non-syntactic names in data 
frames and some modeling code (but not all modeling code).

   The following, while almost surely as documented could be a bit more 
helpful:

  m = matrix(rnorm(100), nc=10)
  colnames(m) = paste(1:10, letters[1:10], sep=_)

  d = data.frame(m, check.names=FALSE)

  f = formula(`1_a` ~ ., data=d)

  tm = terms(f, data=d)

  ##failure here, as somehow back-ticks have become part of the name
  ##not a quoting mechanism
  d[attr(tm, term.labels)]

   The variable attribute, in the terms object, keeps them as quotes, so 
modeling code that uses that attribute seems fine, but code that uses 
the term.labels fails. In particular, it seems (of those tested) that 
glm, lda, randomForest seem to work fine, while nnet, rpart can't
handle nonsyntactic names in formulae as such

   In particlar, rpart contains this code:

  lapply(m[attr(Terms, term.labels)], tfun)

   which fails for the reasons given.


  One way to get around this, might be to modify the do_termsform code,
right now we have:
PROTECT(varnames = allocVector(STRSXP, nvar));
 for (v = CDR(varlist), i = 0; v != R_NilValue; v = CDR(v))
 SET_STRING_ELT(varnames, i++, STRING_ELT(deparse1line(CAR(v), 
 0), 0));

  and then for term.labels, we copy over the varnames (with :, as 
needed) and perhaps we need to save the unquoted names somewhere?

  Or is there some other approach that will get us there? Certainly 
cleaning up the names via
   cleanTick = function(x) gsub(`, , x)

  works, but it seems a bit ugly, and it might be better if the modeling 
code was modified.

   best wishes



-- 

Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] buglet in terms calculations

2007-04-08 Thread Peter Dalgaard
Robert Gentleman wrote:
 Hi,
Vince and I have noticed a problem with non-syntactic names in data 
 frames and some modeling code (but not all modeling code).

The following, while almost surely as documented could be a bit more 
 helpful:

   m = matrix(rnorm(100), nc=10)
   colnames(m) = paste(1:10, letters[1:10], sep=_)

   d = data.frame(m, check.names=FALSE)

   f = formula(`1_a` ~ ., data=d)

   tm = terms(f, data=d)

   ##failure here, as somehow back-ticks have become part of the name
   ##not a quoting mechanism
   d[attr(tm, term.labels)]

The variable attribute, in the terms object, keeps them as quotes, so 
 modeling code that uses that attribute seems fine, but code that uses 
 the term.labels fails. In particular, it seems (of those tested) that 
 glm, lda, randomForest seem to work fine, while nnet, rpart can't
 handle nonsyntactic names in formulae as such

In particlar, rpart contains this code:

   lapply(m[attr(Terms, term.labels)], tfun)

which fails for the reasons given.


   One way to get around this, might be to modify the do_termsform code,
 right now we have:
 PROTECT(varnames = allocVector(STRSXP, nvar));
  for (v = CDR(varlist), i = 0; v != R_NilValue; v = CDR(v))
  SET_STRING_ELT(varnames, i++, STRING_ELT(deparse1line(CAR(v), 
  0), 0));

   and then for term.labels, we copy over the varnames (with :, as 
 needed) and perhaps we need to save the unquoted names somewhere?

   Or is there some other approach that will get us there? Certainly 
 cleaning up the names via
cleanTick = function(x) gsub(`, , x)

   works, but it seems a bit ugly, and it might be better if the modeling 
 code was modified.

   
Hmm, .Internal(deparse()) has a backtick option (for related 
reasons, IIRC). Could this be  used instead of  deparse1line?

(There's an inbuilt contradiction in having special terms like 
(Intercept) and at the same time allowing arbitrary non-syntactical 
names, but I suppose that  people who actually name their variables 
`(Intercept)` deserve whatever they get.)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel